The Use of Data Imputation when Investigating Dimensionality in Sparse Data from Computerized Adaptive Tests

The Use of Data Imputation when Investigating Dimensionality in Sparse Data from Computerized Adaptive Tests

Authors

  • Centre for Research in Applied Measurement and Evaluation, University of Alberta, 6-110 Education Centre North, 11210 87 Ave NW, Edmonton, AB T6G 2G5
  • National Council of State Boards of Nursing, Chicago, Illinois

Keywords:

Computerized Adaptive Testing, CART, Imputation, Mice, Sparseness

Abstract

The development of a Computerized Adaptive Test (CAT) for operational use begins with several important steps, such as creating a large-size item bank, piloting the items on a sizable and representative sample of examinees, dimensionality assessment of the item bank, and estimation of item parameters. Among these steps, testing the dimensionality of the item bank is particularly important because the subsequent analyses depend on the confirmation of the hypothesized factor structure (e.g., unidimensionality). After the CAT becomes operational, it is still important to periodically assess the dimensionality of the item bank because both the examinee population and the item bank may change over time. However, extreme sparseness of the response data returned from the CAT makes the test of dimensionality very difficult. This study investigated whether data imputation can be a feasible solution to the sparseness problem when examining test dimensionality in sparse data returned from CATs. Sparse data with unidimensional, multidimensional, and bi-factor test structures were simulated based on real data from a large-scale, operational CAT. Two-way imputation and Multivariate Imputation with Chain Equations (MICE) methods were used to replace missing responses in the data. Using confirmatory factor analysis, imputed datasets were analyzed to examine whether the true test structure was retained after imputations. Results indicated that MICE with classification and regression trees (MICE-CART) produced highly accurate results in retaining the true structure, whereas the performances of other imputation methods were quite poor. Data imputation with MICE-CART appears to be promising solution to data sparsity when examining test dimensionality for CATs.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2021-04-01

How to Cite

Bulut, O., & Kim, D. (2021). The Use of Data Imputation when Investigating Dimensionality in Sparse Data from Computerized Adaptive Tests. Journal of Applied Testing Technology. Retrieved from http://www.jattjournal.net/index.php/atp/article/view/158509

Issue

Section

Articles

References

Adams, R. J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23. https://doi.org/10.1177/0146621697211001

Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal of Abnormal Psychology, 112, 545-557. https://doi.org/10.1037/0021-843X.112.4.545 PMid:14674868

Azur, M. J., Stuart, E., Frangakis, C., & Leaf, P. (2011). Multiple imputation by chained equations: What is it and how does it work? International Journal of Methods in Psychiatric Research, 20(1), 40-49. https://doi.org/10.1002/mpr.329 https://doi.org/10.1002/mpr.329 PMid:21499542 PMCid:PMC3074241

Ban, J., Hanson, B.A., Yi, Q., & Harris, D. (2001). Data sparseness and online pretest calibration/scaling methods in CAT. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA.

Bernaards, C. A., & Sijtsma, K. (2000). Influence of simple imputation and EM methods on factor analysis when item nonresponse in questionnaire data is no ignorable. Multivariate Behavioral Research, 35(3), 321364. https://doi.org/10.1207/S15327906MBR3503_03 PMid:26745335

Birnbaum, A. (1968). Some latent trait models. In F.M. Lord & M.R. Novick, (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

Bock, D., Gibbons, R., & Muraki, E. (1988). Fullinformation item factor analysis. Applied Psychological Measurement, 12, 261-280. https://doi.org/10.1177/014662168801200305

Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford. https://doi.org/10.1080/00036810600603377

Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to Entrance Examination for Graduate Studies in Turkey. Eurasian Journal of Educational Research, 49, 61-80.

Burgette, L. F., & Reiter, J. P. (2010). Multiple imputation for missing data via sequential regression trees. American Journal of Epidemiology, 172(9), 1070-1076. https://doi.org/10.1093/aje/kwq260 PMid:20841346

Cappaert, K. J., Wen, Y., & Chang, Y. F. (2018). Evaluating CAT-adjusted approaches for suspected item parameter drift detection. Measurement: Interdisciplinary Research and Perspectives, 16(4), 226-238. https://doi.org/10.1080 /15366367.2018.1511199

Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Structural Equation Modeling, 8(3), 430-457. https://doi.org/10.1207/S15328007SEM0803_5

Finch, H. (2011). The use of multiple imputation for missing data in uniform DIF analysis: Power and type I error rates. Applied Measurement in Education, 24(4), 281301. https://doi.org/10.1080/08957347.2011.607054

Glas, C. A. W. (2006). Violations of ignorability in computerized adaptive testing. (LSAC research report series; No. 04-04). Newton, PA, USA: Law School Admission Council.

Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549-576. https://doi.org/10.1146/annurev.psych.58.110405.085530 PMid:18652544

Hallquist, M. N. & Wiley, J. F. (2018). Mplus Automation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 621-638. doi: 10.1080/10705511.2017.1402334. https://doi.org/ 10.1080/10705511.2017.1402334 PMid:30083048 PMCid:PMC6075832

Han, K. T., & Guo, F. (2014). Impact of violation of the missing-at-random assumption on full-information maximum likelihood method in multidimensional adaptive testing. Practical Assessment, Research & Evaluation, 19(2).

Harmes, J. C., Kromney, J. D., & Parshall, C. G. (2001). Online item parameter recalibration: Application of missing data treatments to overcome the effects of sparse data conditions in a computerized adaptive version of the MCAT. Report submitted to the Association of American Medical Colleges, Section for the MCAT. Retrieved from http://iacat.org/sites/default/files/biblio/ ha01-01.pdf

Harrison, D. A. (1986). Robustness of IRT parameter estimation to violations of the unidimensionality assumption. Journal of Educational Statistics, 11(2), 91-115. https://doi.org/10.3102/10769986011002091

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55. https://doi.org/10.1080/10705519909540118

Ito, K., & Sykes, R.C. (1994). The effect of restricting ability distributions in the estimation of item difficulties: Implications for a CAT implementation. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans.

Kadengye, D. T., Cools, W., Ceulemans, E., & Van den Noortgate, W. (2012). Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data. Behavior research methods, 44(2), 516-531. https://doi.org/10.3758/s13428-0110157-x PMid:22002637

Kingsbury, G. G. (2009). Adaptive item calibration: A process for estimating item parameters within a computerized adaptive test. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from http://iacat.org/sites/default/files/biblio/cat09kingsbury.pdf

Leite, W. L., & Beretvas, S. N. (2004). The performance of multiple imputation for Likert-type items with missing data. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.

Linacre, J. M. (2011). Rasch measures and unidimensionality. Rasch Measurement Transactions, 24(4), 1310.

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: Wiley. https://doi.org/10.1002/9781119013563

Liu, C., Han, K. T., & Li, J. (2019). Compromised item detection for computerized adaptive testing. Frontiers in psychology, 10, 829. https://doi.org/10.3389/ fpsyg.2019.00829 PMid:31105612 PMCid:PMC6499181

Lorenzo-Seva, U., & Van Ginkel, J. R. (2016). Multiple imputation of missing values in exploratory factor analysis of multidimensional scales: estimating latent trait scores. Annals of Psychology, 32(2), 596-608. https://doi.org/10.6018/analesps.32.2.215161

Makransky, G., & Glas, C. A. (2014). An automatic online calibration design in adaptive testing. Journal of Applied Testing Technology, 11(1), 1-20.

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.

Mislevy, R. J., & Wu, P.-K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 2, i-36. https://doi.org/10.1002/j.23338504.1996.tb01708.x

Muthén, L. K., & Muthén, B. O. (1998-2015). Mplus User’s Guide Seventh Edition. Los Angeles, CA: Muthén & Muthén. Nydick, S. W., & Weiss, D. J. (2009). A hybrid simulation procedure for the development of CATs. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved from http:// www.iacat.org/sites/default/files/biblio/cat09nydick.pdf

O’Neill, T., & Reynolds, M. (2006). Assessing the unidimensionality of the NCLEX-RN. Retrieved from https://www.ncsbn.org/2005.04_ONeill_-_AERA_-_ Assessing_the_Unidimensionality_of_the_NCLEX-RN.pdf

Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74, 525-556. https://doi.org/10.3102/00346543074004525

R Core Team (2019). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing.

Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press.

Rässler, S., Rubin, D. B., & Zell, E. R. (2013). Imputation. WIREs Computational Statistics, 5(1), 20-29. https://doi.org/10.1002/wics.1240

Ren, H., van der Linden, W. J., & Diao, Q. (2017). Continuous online item calibration: Parameter recovery and item utilization. Psychometrika, 82(2), 498-522. https://doi.org/10.1007/s11336-017-9553-1 PMid:28290109

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. https://doi.org/10.1002/9780470316696

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. https://doi.org/10.1037/1082-989X.7.2.147 PMid:12090408

Segall, D. O. (2005). Computerized adaptive testing. In K. Kempf-Leonard (Ed.), Encyclopedia of social measurement (pp. 429-438). Boston: Elsevier Academic. https://doi.org/10.1016/B0-12-369398-5/00444-8

Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement.

Structural Equation Modeling, 3, 25-40. https://doi.org/10.1080/10705519609540027

Thompson, N. A., & Weiss, D. A. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16(1). doi: https://doi.org/10.7275/wqzt-9427

Trendafilov, N., Kleinsteuber, M., & Zou, H. (2014). Sparse matrices in data analysis. Computational Statistics, 29(3), 403-405. https://doi.org/10.1007/s00180-013-0468-8

Van Buuren, S. (2018). Flexible imputation of missing data (2nd Ed.). Boca Raton, FL: Chapman & Hall/CRC.https://doi.org/10.1201/9780429492259

van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67. https://doi.org/10.18637/jss.v045.i03

Wainer H., & Mislevy R. J. (2000). Item response theory, item calibration, and proficiency estimation. In H. Wainer (Ed.), Computer adaptive testing: A primer (pp. 65-102). Hillsdale, NJ: Lawrence Erlbaum.

Wang, S., Jiao, H., & Xiang, Y. (2013, April). The effect of nonignorable missing data in computerized adaptive test on item fit statistics for polytomous item response models. In annual meeting of the National Council on Measurement in Education, San Francisco, CA.

Weiss, D. J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and Evaluation in Counseling and Development, 37(2), 70-84. https://doi.org/10.1080/07481756.2004.11909751

Wright, B. D. (1997). Rasch factor analysis. In M. Wilson, G. Engelhard, & K. Draney (Eds.), Objective measurement: Theory into practice (Vol. 4) (pp. 113-137). Norwood, NJ: Ablex.

Yu, C. Ho., Popp, S. O., DiGangi, S., & Jannasch-Pennell, A. (2007). Assessing unidimensionality: A comparison of Rasch modeling, parallel analysis, and TETRAD. Practical Assessment Research & Evaluation, 12(14).

Loading...