Proficiency Classification and Violated Local Independence: An Examination of Pass/Fail Decision Accuracy under Competing Rasch Models

Kari J. Hodge; Grant B. Morgan

Proficiency Classification and Violated Local Independence: An Examination of Pass/Fail Decision Accuracy under Competing Rasch Models

Authors

Kari J. Hodge
Ph.D., Sr. Exam Design, Development and Analysis Manager, NACE International Institute, 15835 Park Ten Place, Houston, TX 77084, USA
Grant B. Morgan
Ph. D., Associate Professor, Baylor University, Waco, TX, One Bear Place #97301Waco, TX 76798-7301, USA

Keywords:

Classification, Local Item Dependence, Rasch, Testlet

Abstract

The purpose of this study was to examine the use of a misspecified calibration model and its impact on proficiency classification. Monte Carlo simulation methods were employed to compare competing models when the true structure of the data is known (i.e., testlet conditions). The conditions used in the design (e.g., number of items, testlet to item ratio, testlet variance, proportion of items that are testlet-based and sample size) reflect those found in the applied educational literature. Decision Consistency (DC) was high between the models, ranging from 91.5% to 100%. Testlet variance had the greatest effect on DC. An empirical example using PISA data with nine testlets is also provided for the consistency of pass/ fail decisions between the competing models.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2019-09-06

How to Cite

Hodge, K. J., & Morgan, G. B. (2019). Proficiency Classification and Violated Local Independence: An Examination of Pass/Fail Decision Accuracy under Competing Rasch Models. Journal of Applied Testing Technology, 21(1), 25–42. Retrieved from http://www.jattjournal.net/index.php/atp/article/view/146485

Download Citation

Issue

Volume 21, Issue 1, 2020

Section

Articles

References

Almond, R. G., Mulder, J., Hemat, L. A., & Yan, D. (2009). Bayesian network models for local dependence among observable outcome variables. Journal of Educational and Behavioral Statistics. 34(4), 491â€“521.

Bao, H., Dayton, C. M., & Hendrickson, A. B. (2009). Differential item functioning amplification and cancellation in a reading test. Practical Assessment, Research & Evaluation. 14(19). Available online: http://pareonline.net/getvn. asp?v=14&n=19

Bandalos, D. L., & Leite, W. (2013). Use of Monte Carlo studies in structural equation modeling research. G.R. Hancock & R.O. Mueller (Eds.), Structural equation modeling a second course (2nd ed.). Charlotte, NC: Information Age Publishing, Inc. pp. 625â€“666.

Boomsma, A. (2013). Reporting Monte Carlo studies in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal. 20(3), 518-540. doi: 10.1080/10705511.2013.797839

Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika. 64(2), 153â€“168. doi: 10.1007/BF02294533

Chen J. (2014) Model selection for irt equating of testlet-based tests in the random groups design (dissertation). ProQuest, UMI Dissertations Publishing (3680050).

Dickenson, T. S. (2005). Comparison of various ability estimates to the composite ability best measured by the total test score. (Order No. 3181941, University of South Carolina).ProQuest Dissertations and Theses. (305414375).

Eckes, T. (2013). Examining testlet effects in the TestDaF listening section: A testlet response theory modeling approach. Language Testing. 31(1), 39â€“61. doi: 10.1177/0265532213492969.

Eckes, T. & Baghaei, P. (2015). Using testlet response theory to examine local dependence in C-tests.Applied Measurement in Education. 28(2), 85â€“98. doi: 10.1080/08957347.2014.1002919

Fan, X. (2012). Designing simulation studies. H. Cooper, P. M.Camic, D. L. Long, A. T. Panter, D. Rindskopf & K. J. Sher (Eds.). APA handbook of research methods in psychology (Vol.2.): Data analysis and research publication. Washington, DC: American Psychological Association. pp. 427â€“444.

Forero, C. G., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling. 16(4), 625â€“641.

Fountas I. C., Pinnell, G. S. (2012). Fountas and Pinnell benchmark assessment system. Heinemann. Retrieved from http://www.heinemann.com/-fountasandpinnell/ reading-assessment.aspx

Glas, C. A. W. (2012). Estimating and testing the extended testlet model: LSAC Research Report Series. Law School Admission Council. Retrieved from http://www.lsac.org/ docs/default-source/research-(lsac-resources)/rr-12-03.pdf

Glas, C. A., Wainer, H., & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. Computerized adaptive testing: Theory and practice. Springer, Dordrecht: pp. 271â€“287.

Ha, D. T. (2017). The implementation of testlet models into evaluating a reading comprehension test. International Journal of Scientific & Engineering Research. 8(1).

Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement. 20(2), 101â€“125. doi: 10.1177/014662169602000201

Hembry, I. F. (2014). Operational characteristics of mixedformat multistage tests using the 3PL testlet response theory model (dissertation). ProQuest, UMI Dissertations Publishing (3691396).

Hooker, G., & Finkelman, M. D. (2010). Paradoxical results of item bundles. Psychometrika. 75(2), 249â€“271. doi: 10.1007/ s11336-009-9143-Y

Jiao, H., Wang, S., & He, W. (2013). Estimation methods for oneâ€parameter testlet models. Journal of Educational Measurement. 50(2), 186â€“203. doi: 10.1111/jedm.12010 Jiao, H., Kamata, A., Wang, S., & Jin, Y. (2012). A multilevel testlet model for dual local dependence. Journal of Educational Measurement. 49(1), 82â€“100.

Kim, J. S., & Bolt, D. M. (2007). Estimating item response theory models using Markov Chain Monte Carlo methods.Educational Measurement: Issues and Practice. 26(4), 38â€“51.doi: 10.1111/j.1745-3992.2007.00107.x

Lee, W., Hanson, B., & Brennan, R. (2002). Estimating consistency and accuracy indices for multiple classifications.Applied Psychological Measurement. 26(4), 412â€“432. doi: 10.1177/014662102237797

Linacre, J. M. (2004). Estimation methods for Rasch measures. Smith, E.V., & Smith, R.M. (Eds.) Introduction to Rasch measurement. Maple Grove, MN: JAM Press. pp. 25-47.

Lu, R. (2010). Impacts of local item dependence of testlet items with the multistage tests for pass-fail decisions academic (dissertation). ProQuest, UMI Dissertations Publishing (3443478).

National Center for Education Statistics (2014). An Introduction to National Assessment of Educational Progress (NAEP).Retrieved from http://nces.ed.gov/nationsreportcard/pdf/ parents/2010468.pdf

OECD (2013). PISA 2012 Assessment and Analytical Framework: Mathematics, Reading, Science, Problem Solving and Financial Literacy. PISA, OECD Publishing.doi: 10.1787/9789264190511-en

Paek, I., Yon, H., Wilson, M., & Kang, T. (2008). Random parameter structure and the testlet model: Extension of the Rasch testlet model. Journal of Applied Measurement. 10(4), 394â€“407.

Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001).Monte Carlo experiments: Design and implementation.Structural Equation Modeling: A Multidisciplinary Journal. 8(2), 287â€“312. doi: 10.1207/S15328007SEM0802_7

R Development Core Team (2014). R: A language and environment for statistical computing 3.2.1. R Foundation for Statistical Computing: Vienna, Austria. ISBN 3-90005107-0, URL http://www.R-project.org/

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.

Robitzsch, A. (2014). Supplementary Item Response Theory Models: SIRT V1.1 Userâ€™s Manual. http://cran.r-project.org/web/packages/sirt/

Smarter Balanced Assessment Consortium (2014). Smarter Balanced Assessments. Retrieved from http://www. smarterbalanced.org/smarter-balanced-assessments/

Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.

Wang, X., Bradlow, E. T., &Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications.Applied Psychological Measurement. 26(1), 109â€“128. doi: 10.1177/0146621602026001007.

Wang, W. C., & Wilson, M. (2005a). The Rasch testlet model.Applied Psychological Measurement. 29(2), 126â€“149. doi: 10.1177/0146621604271053

Wang, W. C., & Wilson, M. (2005b). Assessment of differential item functioning in testlet-based items using the Rasch testlet model. Educational and Psychological Measurement. 65(4), 549â€“576. doi: 10.1177/0013164404268677

Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. Educational Measurement. 4, 111â€“153.

Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing. 27(1), 119â€“140. doi: 10.1177/0265532209347363

Zumbo, B. D., & Rupp, A. A. (2004). Responsible modeling of measurement data for appropriate inferences. D. Kaplan (Ed). The SAGE handbook of quantitative methodology for the social sciences. Thousand Oaks, CA: Sage Publications, Inc. pp.73â€“92.