Proficiency Classification and Violated Local Independence: An Examination of Pass/Fail Decision Accuracy under Competing Rasch Models
Keywords:
Classification, Local Item Dependence, Rasch, TestletAbstract
The purpose of this study was to examine the use of a misspecified calibration model and its impact on proficiency classification. Monte Carlo simulation methods were employed to compare competing models when the true structure of the data is known (i.e., testlet conditions). The conditions used in the design (e.g., number of items, testlet to item ratio, testlet variance, proportion of items that are testlet-based and sample size) reflect those found in the applied educational literature. Decision Consistency (DC) was high between the models, ranging from 91.5% to 100%. Testlet variance had the greatest effect on DC. An empirical example using PISA data with nine testlets is also provided for the consistency of pass/ fail decisions between the competing models.Downloads
Metrics
Downloads
Published
How to Cite
Issue
Section
References
Almond, R. G., Mulder, J., Hemat, L. A., & Yan, D. (2009). Bayesian network models for local dependence among observable outcome variables. Journal of Educational and Behavioral Statistics. 34(4), 491–521.
Bao, H., Dayton, C. M., & Hendrickson, A. B. (2009). Differential item functioning amplification and cancellation in a reading test. Practical Assessment, Research & Evaluation. 14(19). Available online: http://pareonline.net/getvn. asp?v=14&n=19
Bandalos, D. L., & Leite, W. (2013). Use of Monte Carlo studies in structural equation modeling research. G.R. Hancock & R.O. Mueller (Eds.), Structural equation modeling a second course (2nd ed.). Charlotte, NC: Information Age Publishing, Inc. pp. 625–666.
Boomsma, A. (2013). Reporting Monte Carlo studies in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal. 20(3), 518-540. doi: 10.1080/10705511.2013.797839
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika. 64(2), 153–168. doi: 10.1007/BF02294533
Chen J. (2014) Model selection for irt equating of testlet-based tests in the random groups design (dissertation). ProQuest, UMI Dissertations Publishing (3680050).
Dickenson, T. S. (2005). Comparison of various ability estimates to the composite ability best measured by the total test score. (Order No. 3181941, University of South Carolina).ProQuest Dissertations and Theses. (305414375).
Eckes, T. (2013). Examining testlet effects in the TestDaF listening section: A testlet response theory modeling approach. Language Testing. 31(1), 39–61. doi: 10.1177/0265532213492969.
Eckes, T. & Baghaei, P. (2015). Using testlet response theory to examine local dependence in C-tests.Applied Measurement in Education. 28(2), 85–98. doi: 10.1080/08957347.2014.1002919
Fan, X. (2012). Designing simulation studies. H. Cooper, P. M.Camic, D. L. Long, A. T. Panter, D. Rindskopf & K. J. Sher (Eds.). APA handbook of research methods in psychology (Vol.2.): Data analysis and research publication. Washington, DC: American Psychological Association. pp. 427–444.
Forero, C. G., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling. 16(4), 625–641.
Fountas I. C., Pinnell, G. S. (2012). Fountas and Pinnell benchmark assessment system. Heinemann. Retrieved from http://www.heinemann.com/-fountasandpinnell/ reading-assessment.aspx
Glas, C. A. W. (2012). Estimating and testing the extended testlet model: LSAC Research Report Series. Law School Admission Council. Retrieved from http://www.lsac.org/ docs/default-source/research-(lsac-resources)/rr-12-03.pdf
Glas, C. A., Wainer, H., & Bradlow, E. T. (2000). MML and EAP estimation in testlet-based adaptive testing. Computerized adaptive testing: Theory and practice. Springer, Dordrecht: pp. 271–287.
Ha, D. T. (2017). The implementation of testlet models into evaluating a reading comprehension test. International Journal of Scientific & Engineering Research. 8(1).
Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement. 20(2), 101–125. doi: 10.1177/014662169602000201
Hembry, I. F. (2014). Operational characteristics of mixedformat multistage tests using the 3PL testlet response theory model (dissertation). ProQuest, UMI Dissertations Publishing (3691396).
Hooker, G., & Finkelman, M. D. (2010). Paradoxical results of item bundles. Psychometrika. 75(2), 249–271. doi: 10.1007/ s11336-009-9143-Y
Jiao, H., Wang, S., & He, W. (2013). Estimation methods for oneâ€parameter testlet models. Journal of Educational Measurement. 50(2), 186–203. doi: 10.1111/jedm.12010 Jiao, H., Kamata, A., Wang, S., & Jin, Y. (2012). A multilevel testlet model for dual local dependence. Journal of Educational Measurement. 49(1), 82–100.
Kim, J. S., & Bolt, D. M. (2007). Estimating item response theory models using Markov Chain Monte Carlo methods.Educational Measurement: Issues and Practice. 26(4), 38–51.doi: 10.1111/j.1745-3992.2007.00107.x
Lee, W., Hanson, B., & Brennan, R. (2002). Estimating consistency and accuracy indices for multiple classifications.Applied Psychological Measurement. 26(4), 412–432. doi: 10.1177/014662102237797
Linacre, J. M. (2004). Estimation methods for Rasch measures. Smith, E.V., & Smith, R.M. (Eds.) Introduction to Rasch measurement. Maple Grove, MN: JAM Press. pp. 25-47.
Lu, R. (2010). Impacts of local item dependence of testlet items with the multistage tests for pass-fail decisions academic (dissertation). ProQuest, UMI Dissertations Publishing (3443478).
National Center for Education Statistics (2014). An Introduction to National Assessment of Educational Progress (NAEP).Retrieved from http://nces.ed.gov/nationsreportcard/pdf/ parents/2010468.pdf
OECD (2013). PISA 2012 Assessment and Analytical Framework: Mathematics, Reading, Science, Problem Solving and Financial Literacy. PISA, OECD Publishing.doi: 10.1787/9789264190511-en
Paek, I., Yon, H., Wilson, M., & Kang, T. (2008). Random parameter structure and the testlet model: Extension of the Rasch testlet model. Journal of Applied Measurement. 10(4), 394–407.
Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., & Chen, F. (2001).Monte Carlo experiments: Design and implementation.Structural Equation Modeling: A Multidisciplinary Journal. 8(2), 287–312. doi: 10.1207/S15328007SEM0802_7
R Development Core Team (2014). R: A language and environment for statistical computing 3.2.1. R Foundation for Statistical Computing: Vienna, Austria. ISBN 3-90005107-0, URL http://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute for Educational Research.
Robitzsch, A. (2014). Supplementary Item Response Theory Models: SIRT V1.1 User’s Manual. http://cran.r-project.org/web/packages/sirt/
Smarter Balanced Assessment Consortium (2014). Smarter Balanced Assessments. Retrieved from http://www. smarterbalanced.org/smarter-balanced-assessments/
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York, NY: Cambridge University Press.
Wang, X., Bradlow, E. T., &Wainer, H. (2002). A general Bayesian model for testlets: Theory and applications.Applied Psychological Measurement. 26(1), 109–128. doi: 10.1177/0146621602026001007.
Wang, W. C., & Wilson, M. (2005a). The Rasch testlet model.Applied Psychological Measurement. 29(2), 126–149. doi: 10.1177/0146621604271053
Wang, W. C., & Wilson, M. (2005b). Assessment of differential item functioning in testlet-based items using the Rasch testlet model. Educational and Psychological Measurement. 65(4), 549–576. doi: 10.1177/0013164404268677
Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. Educational Measurement. 4, 111–153.
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under competing measurement models. Language Testing. 27(1), 119–140. doi: 10.1177/0265532209347363
Zumbo, B. D., & Rupp, A. A. (2004). Responsible modeling of measurement data for appropriate inferences. D. Kaplan (Ed). The SAGE handbook of quantitative methodology for the social sciences. Thousand Oaks, CA: Sage Publications, Inc. pp.73–92.