Most recent papers in the journal Applied Psychological Measurement

#1

JOURNAL ARTICLE

How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects.

Kelly D Edwards, James Soland

Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all "easy"-that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates...

38585305

May 2024: Applied Psychological Measurement

#2

JOURNAL ARTICLE

Detecting Differential Item Functioning in Multidimensional Graded Response Models With Recursive Partitioning.

Franz Classe, Christoph Kern

Differential item functioning (DIF) is a common challenge when examining latent traits in large scale surveys. In recent work, methods from the field of machine learning such as model-based recursive partitioning have been proposed to identify subgroups with DIF when little theoretical guidance and many potential subgroups are available. On this basis, we propose and compare recursive partitioning techniques for detecting DIF with a focus on measurement models with multiple latent variables and ordinal response data...

38585304

May 2024: Applied Psychological Measurement

#3

JOURNAL ARTICLE

Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model.

Naidan Tu, Lavanya S Kumar, Sean Joo, Stephen Stark

Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios...

38585303

May 2024: Applied Psychological Measurement

#4

JOURNAL ARTICLE

Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items.

John R Donoghue, Adrienne Sgammato

Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen's 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items...

38585302

May 2024: Applied Psychological Measurement

#5

JOURNAL ARTICLE

Location-Matching Adaptive Testing for Polytomous Technology-Enhanced Items.

Hyeon-Ah Kang, Gregory Arbet, Joe Betts, William Muntean

The article presents adaptive testing strategies for polytomously scored technology-enhanced innovative items. We investigate item selection methods that match examinee's ability levels in location and explore ways to leverage test-taking speeds during item selection. Existing approaches to selecting polytomous items are mostly based on information measures and tend to experience an item pool usage problem. In this study, we introduce location indices for polytomous items and show that location-matched item selection significantly improves the usage problem and achieves more diverse item sampling...

38327610

March 2024: Applied Psychological Measurement

#6

JOURNAL ARTICLE

Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models-An in-Depth Investigation and Bias Correction.

Christoph König, Rainer W Alexandrowicz

When using Bayesian hierarchical modeling, a popular approach for Item Response Theory (IRT) models, researchers typically face a tradeoff between the precision and accuracy of the item parameter estimates. Given the pooling principle and variance-dependent shrinkage, the expected behavior of Bayesian hierarchical IRT models is to deliver more precise but biased item parameter estimates, compared to those obtained in nonhierarchical models. Previous research, however, points out the possibility that, in the context of the two-parameter logistic IRT model, the aforementioned tradeoff has not to be made...

38327609

March 2024: Applied Psychological Measurement

#7

JOURNAL ARTICLE

Detecting uniform differential item functioning for continuous response computerized adaptive testing.

Chun Wang, Ruoyi Zhu

Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms...

38327608

March 2024: Applied Psychological Measurement

#8

JOURNAL ARTICLE

Comparing Test-Taking Effort Between Paper-Based and Computer-Based Tests.

Sebastian Weirich, Karoline A Sachse, Sofie Henschel, Carola Schnitzler

The article compares the trajectories of students' self-reported test-taking effort during a 120 minutes low-stakes large-scale assessment of English comprehension between a paper-and-pencil (PPA) and a computer-based assessment (CBA). Test-taking effort was measured four times during the test. Using a within-subject design, each of the N = 2,676 German ninth-grade students completed half of the test in PPA and half in CBA mode, where the sequence of modes was balanced between students. Overall, students' test-taking effort decreased considerably during the course of the test...

38327607

March 2024: Applied Psychological Measurement

#9

Corrigendum to "irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT".

(no author information available yet)

[This corrects the article DOI: 10.1177/0146621620921247.].

38327606

March 2024: Applied Psychological Measurement

#10

JOURNAL ARTICLE

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.

Joakim Wallmark, Maria Josefsson, Marie Wiberg

This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors...

38027462

November 2023: Applied Psychological Measurement

#11

JOURNAL ARTICLE

Using Auxiliary Item Information in the Item Parameter Estimation of a Graded Response Model for a Small to Medium Sample Size: Empirical Versus Hierarchical Bayes Estimation.

Matthew Naveiras, Sun-Joo Cho

Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge...

38027461

November 2023: Applied Psychological Measurement

#12

JOURNAL ARTICLE

A Bayesian Random Weights Linear Logistic Test Model for Within-Test Practice Effects.

José H Lozano, Javier Revuelta

The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures...

37997580

November 2023: Applied Psychological Measurement

#13

JOURNAL ARTICLE

Controlling the Minimum Item Exposure Rate in Computerized Adaptive Testing: A Two-Stage Sympson-Hetter Procedure.

Hsiu-Yi Chao, Jyun-Hong Chen

Computerized adaptive testing (CAT) can improve test efficiency, but it also causes the problem of unbalanced item usage within a pool. The effect of uneven item exposure rates can not only induce a test security problem due to overexposed items but also raise economic concerns about item pool development due to underexposed items. Therefore, this study proposes a two-stage Sympson-Hetter (TSH) method to enhance balanced item pool utilization by simultaneously controlling the minimum and maximum item exposure rates...

37997579

November 2023: Applied Psychological Measurement

#14

JOURNAL ARTICLE

Two Statistics for Measuring the Score Comparability of Computerized Adaptive Tests.

Adam E Wyse

This study introduces two new statistics for measuring the score comparability of computerized adaptive tests (CATs) based on comparing conditional standard errors of measurement (CSEMs) for examinees that achieved the same scale scores. One statistic is designed to evaluate score comparability of alternate CAT forms for individual scale scores, while the other statistic is designed to evaluate the overall score comparability of alternate CAT forms. The effectiveness of the new statistics is illustrated using data from grade 3 through 8 reading and math CATs...

37997578

November 2023: Applied Psychological Measurement

#15

JOURNAL ARTICLE

Does Sparseness Matter? Examining the Use of Generalizability Theory and Many-Facet Rasch Measurement in Sparse Rating Designs.

Stefanie A Wind, Eli Jones, Sara Grajeda

Sparse rating designs, where each examinee's performance is scored by a small proportion of raters, are prevalent in practical performance assessments. However, relatively little research has focused on the degree to which different analytic techniques alert researchers to rater effects in such designs. We used a simulation study to compare the information provided by two popular approaches: Generalizability theory (G theory) and Many-Facet Rasch (MFR) measurement. In previous comparisons, researchers used complete data that were not simulated-thus limiting their ability to manipulate characteristics such as rater effects, and to understand the impact of incomplete data on the results...

37810544

September 2023: Applied Psychological Measurement

#16

JOURNAL ARTICLE

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.

Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong

Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats...

37810543

September 2023: Applied Psychological Measurement

#17

JOURNAL ARTICLE

Comparing Person-Fit and Traditional Indices Across Careless Response Patterns in Surveys.

Eli A Jones, Stefanie A Wind, Chia-Lin Tsai, Yuan Ge

Methods to identify carelessness in survey research can be valuable tools in reducing bias during survey development, validation, and use. Because carelessness may take multiple forms, researchers typically use multiple indices when identifying carelessness. In the current study, we extend the literature on careless response identification by examining the usefulness of three item-response theory-based person-fit indices for both random and overconsistent careless response identification: infit MSE outfit MSE , and the polytomous l z statistic...

37810542

September 2023: Applied Psychological Measurement

#18

JOURNAL ARTICLE

Using Item Scores and Distractors to Detect Test Speededness.

Kylie Gorney, James A Wollack, Daniel M Bolt

Test speededness refers to a situation in which examinee performance is inadvertently affected by the time limit of the test. Because speededness has the potential to severely bias both person and item parameter estimates, it is crucial that speeded examinees are detected. In this article, we develop a change-point analysis (CPA) procedure for detecting test speededness. Our procedure distinguishes itself from existing CPA procedures by using information from both item scores and distractors. Using detailed simulations, we show that under most conditions, the new CPA procedure improves the detection of speeded examinees and produces more accurate change-point estimates...

37810541

September 2023: Applied Psychological Measurement

#19

JOURNAL ARTICLE

The Effects of Aberrant Responding on Model-Fit Assuming Different Underlying Response Processes.

Jennifer Reimers, Ronna C Turner, Jorge N Tendeiro, Wen-Juo Lo, Elizabeth Keiffer

Aberrant responding on tests and surveys has been shown to affect the psychometric properties of scales and the statistical analyses from the use of those scales in cumulative model contexts. This study extends prior research by comparing the effects of four types of aberrant responding on model fit in both cumulative and ideal point model contexts using graded partial credit (GPCM) and generalized graded unfolding (GGUM) models. When fitting models to data, model misfit can be both a function of misspecification and aberrant responding...

37810540

September 2023: Applied Psychological Measurement

#20

JOURNAL ARTICLE

Online Parameter Estimation for Student Evaluation of Teaching.

Chia-Wen Chen, Chen-Wei Liu

Student evaluation of teaching (SET) assesses students' experiences in a class to evaluate teachers' performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers' teaching proficiency and students' harshness remains an unaddressed issue in the context of online SET...

37283593

June 2023: Applied Psychological Measurement

Use the journals feature with a free QxMD account.

Applied Psychological Measurement

Save your favorite articles in one place with a free QxMD account.

Read

Search Tips