journal
https://read.qxmd.com/read/38585305/how-scoring-approaches-impact-estimates-of-growth-in-the-presence-of-survey-item-ceiling-effects
#1
JOURNAL ARTICLE
Kelly D Edwards, James Soland
Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all "easy"-that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates...
May 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38585304/detecting-differential-item-functioning-in-multidimensional-graded-response-models-with-recursive-partitioning
#2
JOURNAL ARTICLE
Franz Classe, Christoph Kern
Differential item functioning (DIF) is a common challenge when examining latent traits in large scale surveys. In recent work, methods from the field of machine learning such as model-based recursive partitioning have been proposed to identify subgroups with DIF when little theoretical guidance and many potential subgroups are available. On this basis, we propose and compare recursive partitioning techniques for detecting DIF with a focus on measurement models with multiple latent variables and ordinal response data...
May 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38585303/linking-methods-for-multidimensional-forced-choice-tests-using-the-multi-unidimensional-pairwise-preference-model
#3
JOURNAL ARTICLE
Naidan Tu, Lavanya S Kumar, Sean Joo, Stephen Stark
Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios...
May 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38585302/evaluating-the-douglas-cohen-irt-goodness-of-fit-measure-with-bib-sampling-of-items
#4
JOURNAL ARTICLE
John R Donoghue, Adrienne Sgammato
Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen's 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items...
May 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38327610/location-matching-adaptive-testing-for-polytomous-technology-enhanced-items
#5
JOURNAL ARTICLE
Hyeon-Ah Kang, Gregory Arbet, Joe Betts, William Muntean
The article presents adaptive testing strategies for polytomously scored technology-enhanced innovative items. We investigate item selection methods that match examinee's ability levels in location and explore ways to leverage test-taking speeds during item selection. Existing approaches to selecting polytomous items are mostly based on information measures and tend to experience an item pool usage problem. In this study, we introduce location indices for polytomous items and show that location-matched item selection significantly improves the usage problem and achieves more diverse item sampling...
March 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38327609/benefits-of-the-curious-behavior-of-bayesian-hierarchical-item-response-theory-models-an-in-depth-investigation-and-bias-correction
#6
JOURNAL ARTICLE
Christoph König, Rainer W Alexandrowicz
When using Bayesian hierarchical modeling, a popular approach for Item Response Theory (IRT) models, researchers typically face a tradeoff between the precision and accuracy of the item parameter estimates. Given the pooling principle and variance-dependent shrinkage, the expected behavior of Bayesian hierarchical IRT models is to deliver more precise but biased item parameter estimates, compared to those obtained in nonhierarchical models. Previous research, however, points out the possibility that, in the context of the two-parameter logistic IRT model, the aforementioned tradeoff has not to be made...
March 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38327608/detecting-uniform-differential-item-functioning-for-continuous-response-computerized-adaptive-testing
#7
JOURNAL ARTICLE
Chun Wang, Ruoyi Zhu
Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms...
March 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38327607/comparing-test-taking-effort-between-paper-based-and-computer-based-tests
#8
JOURNAL ARTICLE
Sebastian Weirich, Karoline A Sachse, Sofie Henschel, Carola Schnitzler
The article compares the trajectories of students' self-reported test-taking effort during a 120 minutes low-stakes large-scale assessment of English comprehension between a paper-and-pencil (PPA) and a computer-based assessment (CBA). Test-taking effort was measured four times during the test. Using a within-subject design, each of the N = 2,676 German ninth-grade students completed half of the test in PPA and half in CBA mode, where the sequence of modes was balanced between students. Overall, students' test-taking effort decreased considerably during the course of the test...
March 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38327606/corrigendum-to-irtplay-an-r-package-for-online-item-calibration-scoring-evaluation-of-model-fit-and-useful-functions-for-unidimensional-irt
#9
(no author information available yet)
[This corrects the article DOI: 10.1177/0146621620921247.].
March 2024: Applied Psychological Measurement
https://read.qxmd.com/read/38027462/efficiency-analysis-of-item-response-theory-kernel-equating-for-mixed-format-tests
#10
JOURNAL ARTICLE
Joakim Wallmark, Maria Josefsson, Marie Wiberg
This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors...
November 2023: Applied Psychological Measurement
https://read.qxmd.com/read/38027461/using-auxiliary-item-information-in-the-item-parameter-estimation-of-a-graded-response-model-for-a-small-to-medium-sample-size-empirical-versus-hierarchical-bayes-estimation
#11
JOURNAL ARTICLE
Matthew Naveiras, Sun-Joo Cho
Marginal maximum likelihood estimation (MMLE) is commonly used for item response theory item parameter estimation. However, sufficiently large sample sizes are not always possible when studying rare populations. In this paper, empirical Bayes and hierarchical Bayes are presented as alternatives to MMLE in small sample sizes, using auxiliary item information to estimate the item parameters of a graded response model with higher accuracy. Empirical Bayes and hierarchical Bayes methods are compared with MMLE to determine under what conditions these Bayes methods can outperform MMLE, and to determine if hierarchical Bayes can act as an acceptable alternative to MMLE in conditions where MMLE is unable to converge...
November 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37997580/a-bayesian-random-weights-linear-logistic-test-model-for-within-test-practice-effects
#12
JOURNAL ARTICLE
José H Lozano, Javier Revuelta
The present paper introduces a random weights linear logistic test model for the measurement of individual differences in operation-specific practice effects within a single administration of a test. The proposed model is an extension of the linear logistic test model of learning developed by Spada (1977) in which the practice effects are considered random effects varying across examinees. A Bayesian framework was used for model estimation and evaluation. A simulation study was conducted to examine the behavior of the model in combination with the Bayesian procedures...
November 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37997579/controlling-the-minimum-item-exposure-rate-in-computerized-adaptive-testing-a-two-stage-sympson-hetter-procedure
#13
JOURNAL ARTICLE
Hsiu-Yi Chao, Jyun-Hong Chen
Computerized adaptive testing (CAT) can improve test efficiency, but it also causes the problem of unbalanced item usage within a pool. The effect of uneven item exposure rates can not only induce a test security problem due to overexposed items but also raise economic concerns about item pool development due to underexposed items. Therefore, this study proposes a two-stage Sympson-Hetter (TSH) method to enhance balanced item pool utilization by simultaneously controlling the minimum and maximum item exposure rates...
November 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37997578/two-statistics-for-measuring-the-score-comparability-of-computerized-adaptive-tests
#14
JOURNAL ARTICLE
Adam E Wyse
This study introduces two new statistics for measuring the score comparability of computerized adaptive tests (CATs) based on comparing conditional standard errors of measurement (CSEMs) for examinees that achieved the same scale scores. One statistic is designed to evaluate score comparability of alternate CAT forms for individual scale scores, while the other statistic is designed to evaluate the overall score comparability of alternate CAT forms. The effectiveness of the new statistics is illustrated using data from grade 3 through 8 reading and math CATs...
November 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37810544/does-sparseness-matter-examining-the-use-of-generalizability-theory-and-many-facet-rasch-measurement-in-sparse-rating-designs
#15
JOURNAL ARTICLE
Stefanie A Wind, Eli Jones, Sara Grajeda
Sparse rating designs, where each examinee's performance is scored by a small proportion of raters, are prevalent in practical performance assessments. However, relatively little research has focused on the degree to which different analytic techniques alert researchers to rater effects in such designs. We used a simulation study to compare the information provided by two popular approaches: Generalizability theory (G theory) and Many-Facet Rasch (MFR) measurement. In previous comparisons, researchers used complete data that were not simulated-thus limiting their ability to manipulate characteristics such as rater effects, and to understand the impact of incomplete data on the results...
September 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37810543/sequential-bayesian-ability-estimation-applied-to-mixed-format-item-tests
#16
JOURNAL ARTICLE
Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong
Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats...
September 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37810542/comparing-person-fit-and-traditional-indices-across-careless-response-patterns-in-surveys
#17
JOURNAL ARTICLE
Eli A Jones, Stefanie A Wind, Chia-Lin Tsai, Yuan Ge
Methods to identify carelessness in survey research can be valuable tools in reducing bias during survey development, validation, and use. Because carelessness may take multiple forms, researchers typically use multiple indices when identifying carelessness. In the current study, we extend the literature on careless response identification by examining the usefulness of three item-response theory-based person-fit indices for both random and overconsistent careless response identification: infit MSE outfit MSE , and the polytomous l z statistic...
September 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37810541/using-item-scores-and-distractors-to-detect-test-speededness
#18
JOURNAL ARTICLE
Kylie Gorney, James A Wollack, Daniel M Bolt
Test speededness refers to a situation in which examinee performance is inadvertently affected by the time limit of the test. Because speededness has the potential to severely bias both person and item parameter estimates, it is crucial that speeded examinees are detected. In this article, we develop a change-point analysis (CPA) procedure for detecting test speededness. Our procedure distinguishes itself from existing CPA procedures by using information from both item scores and distractors. Using detailed simulations, we show that under most conditions, the new CPA procedure improves the detection of speeded examinees and produces more accurate change-point estimates...
September 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37810540/the-effects-of-aberrant-responding-on-model-fit-assuming-different-underlying-response-processes
#19
JOURNAL ARTICLE
Jennifer Reimers, Ronna C Turner, Jorge N Tendeiro, Wen-Juo Lo, Elizabeth Keiffer
Aberrant responding on tests and surveys has been shown to affect the psychometric properties of scales and the statistical analyses from the use of those scales in cumulative model contexts. This study extends prior research by comparing the effects of four types of aberrant responding on model fit in both cumulative and ideal point model contexts using graded partial credit (GPCM) and generalized graded unfolding (GGUM) models. When fitting models to data, model misfit can be both a function of misspecification and aberrant responding...
September 2023: Applied Psychological Measurement
https://read.qxmd.com/read/37283593/online-parameter-estimation-for-student-evaluation-of-teaching
#20
JOURNAL ARTICLE
Chia-Wen Chen, Chen-Wei Liu
Student evaluation of teaching (SET) assesses students' experiences in a class to evaluate teachers' performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers' teaching proficiency and students' harshness remains an unaddressed issue in the context of online SET...
June 2023: Applied Psychological Measurement
journal
journal
27659
1
2
Fetch more papers »
Fetching more papers... Fetching...
Remove bar
Read by QxMD icon Read
×

Save your favorite articles in one place with a free QxMD account.

×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"

We want to hear from doctors like you!

Take a second to answer a survey question.