Designing Predictive Models for Early Prediction of Studentsâ€™ Test-taking Engagement in Computerized Formative Assessments

Seyma N. Yildirim-Erbasli; Okan Bulut

Designing Predictive Models for Early Prediction of Studentsâ€™ Test-taking Engagement in Computerized Formative Assessments

Authors

Seyma N. Yildirim-Erbasli
Department of Educational Psychology, University of Alberta, 6-110 Education Centre North,11210 87 Ave NW, Edmonton, AB T6G 2G5
Okan Bulut
Centre for Research in Applied Measurement and Evaluation, University of Alberta, 6-110 Education Centre North,11210 87 Ave NW, Edmonton, AB T6G 2G5

Keywords:

Item Response Time, Learning Analytics, Machine Learning, Predictive Models, Test-taking Engagement

Abstract

The purpose of this study was to develop predictive models of student test-taking engagement in computerized formative assessments. Using different machine learning algorithms, the models utilize student data with item responses and response time to detect aberrant test behaviors such as rapid guessing. The dataset consisted of 7,602 students (grades 1 to 4) who responded to 90 multiple-choice questions in a computerized reading assessment two times (i.e., fall and spring) during the 2017-2018 school year. We completed data analysis in four phases: 1. A response time method was used to label student engagement in both semesters; 2. The training data from the fall semester was used for training the machine learning models; 3. The testing data from the fall semester was used for evaluating the models and 4. The spring semester was used for model evaluation. Among the different algorithms, naive Bayes and support vector machine which were built on response time data from the fall semester, out performed other algorithms in predicting student engagement in the spring semester in terms of accuracy, sensitivity, specificity, area under the curve, kappa, and absolute residual values. The results are promising for early prediction of student test-taking engagement to intervene with the test administration and ensure that the validity of test scores and inferences made based on them.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2022-04-11

How to Cite

Yildirim-Erbasli, S. N., & Bulut, O. (2022). Designing Predictive Models for Early Prediction of Studentsâ€™ Test-taking Engagement in Computerized Formative Assessments. Journal of Applied Testing Technology. Retrieved from http://www.jattjournal.net/index.php/atp/article/view/167548

Download Citation

Issue

Online First

Section

Articles

References

Biecek, P. (2018). DALEX: Explainers for complex predictive models in R. Journal of Machine Learning Research. 19(84):1â€“5. https://jmlr.org/papers/v19/18-416.html

Bulut, O., Cormier, D. C. (2018). Validity evidence for progress monitoring with star reading: Slope estimates, administration frequency and number of data points. Frontiers in Education, 3(68):1â€“12. https://doi.org/10.3389/feduc.2018.00068

Bulut, O., Cormier, D. C., Shin, J. (2020). An intelligent recommender system for personalized test administration scheduling with computerized formative assessments. Frontiers in Education. 5:1â€“11. https://doi.org/10.3389/feduc.2020.572612

Christ, T. J., Zopluoglu, C., Monaghen, B. D., Van Norman, E. R. (2013). Curriculum-based measurement of oral reading: Multi-study evaluation of schedule, duration and dataset quality on progress monitoring outcomes. Journal of School Psychology. 51(1):19â€“57. PMid: 23375171. https://doi.org/10.1016/j.jsp.2012.11.001

Eklof, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement. 66(4):643â€“56. https://doi.org/10.1177/0013164405278574

Eklof, H. (2010). Skill and will: Testâ€taking motivation and assessment quality. Assessment in Education: Principles, Policy and Practice. 17(4):345â€“56. https://doi.org/10.1080/ 0969594X.2010.516569

Finn, B. (2015). Measuring motivation in lowâ€stakes assessments. ETS Research Report Series. 2:1â€“17. https://doi.org/10.1002/ets2.12067

Gierl, M. J., Bulut, O., Zhang, X. (2018). Using computerized formative testing to support personalized learning in higher education: An application of two assessment technologies. R. Zheng (Ed.), Digital technologies and instructional design for personalized learning (pp. 99-119). Hershey, PA: IGI Global. https://doi.org/10.4018/978-15225-3940-7.ch005

Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., Paek, I. (2016). A new procedure for detection of studentsâ€™ rapid guessing responses using response time. Applied Measurement in Education. 29(3):173â€“83. http://dx.doi.org /10.1080/08957347.2016.1171766

Haladyna, T. M., Downing, S. M. (2005). Constructirrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice. 23(1):17â€“27. http://dx.doi.org/10.1111/j.1745-3992.2004.tb00149.x

Hauser, C., Kingsbury, G. G. (2009). Individual score validity in a modest-stakes adaptive educational testing setting. Paper presented at the meeting of the National Council on Measurement in Education, San Diego, CA.

Hew, K. F., Qiao, C., Tang, Y. (2018). Understanding student engagement in large-scale open online courses: A machine learning facilitated analysis of studentâ€™s reflections in 18 highly rated MOOCs. International Review of Research in Open and Distributed Learning. 19(3):70â€“93. http://dx.doi. org/10.19173/irrodl.v19i3.3596

Hussain, M., Zhu, W., Zhang, W., Abidi, S. M. R. (2018). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience. 1â€“21. PMid: 30369946 PMCid: PMC6189675. https://doi.org/10.1155/2018/6347186

Hintze, J.M., Silberglitt, B. (2005). A longitudinal examination of the diagnostic accuracy and predictive validity of R-CBM and high-stakes testing. School Psychology Review. 34:37â€“86. https://doi.org/10.1080/02796015.2005.120862 92

James, G., Witten, D., Hastie, T., Tibshirani, R. (2017). An introduction to statistical learning with applications in R. New York: Springer.

January, S. A. A., Van Norman, E. R., Christ, T. J., Ardoin, S. P., Eckert, T. L., White, M. J. (2018). Progress monitoring in reading: Comparison of weekly, bimonthly and monthly assessments for students at risk for reading difficulties in grades 2-4. School Psychology Review. 47(1):83â€“94. http://dx.doi.org/10.17105/SPR-2017-0009.V47-1

Kilgus, S. P., Chafouleas, S. M., Riley-Tillman, T. C. (2013). Development and initial validation of the Social and Academic Behavior Risk Screener for elementary grades. School Psychology Quarterly. 28:210â€“26. PMid: 23773134. http://dx.doi.org/10.1037/spq0000024

Kuhn, M. (2020). Caret: Classification and Regression Training. R package version 6.0-86. https://CRAN.Rproject.org/package=caret

Lee, Y.-H., Jia, Y. (2014). Using response time to investigate studentsâ€™ test-taking behaviors in a NAEP computer-based study. Large-scale Assessments in Education. 2(8):1â€“24. http://dx.doi.org/10.1186/s40536-014-0008-1

Moubayed, A., Injadat, M., Shami, A., Lutfiyya, H. (2020). Student engagement level in an e-learning environment: Clustering using k-means. American Journal of Distance Education. 34(2):137â€“56. http://dx.doi.org/10.1080/08923 647.2020.1696140

R Core Team. (2021). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Rios, J. A., Guo, H., Mao, L., Liu, O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing. 17(1):74â€“104. http://dx.doi.org/10.1080/153050 58.2016.1231193

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C., Muller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 12(77):1â€“8. PMid: 21414208 PMCid: PMC3068975. https://doi.org/10.1186/1471-2105-12-77

Shaw, A., Elizondo, F., Wadlington, P. L. (2020). Reasoning, fast and slow: How noncognitive factors may alter the abilityspeed relationship. Intelligence. 83:1â€“12. http://dx.doi.org/10.1016/j.intell.2020.101490

Swerdzewski, P. J., Harmes, J. C., Finney, S. J. (2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education. 24(2):162â€“88. http://dx.doi.org/10.1080/08957347.2011.555217

Van Norman, E. R., Nelson, P. M., Parker, D. C. (2017). Technical adequacy of growth estimates from a computer adaptive test: Implications for progress monitoring. School Psychology Quarterly. 32(3):379â€“91. PMid: 27504817. http://dx.doi.org/10.1037/spq0000175

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://doi.

org/10.1007/978-3-319-24277-4

Williamson, D. M., Xi, X., Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice. 31(1):2â€“13. https://doi.org/10.1111/j.1745-3992.2011.00223.x

Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education. 19(2):95â€“114. http://dx.doi.org/10.1207/s15324818ame1902_2

Wise, S. L. (2015). Effort analysis: Individual score validation of achievement test data. Applied Measurement in Education. 28:237â€“52. https://doi.org/10.1080/08957347.2015.1042155

Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretation and implications. EducationalMeasurement: Issues and Practice. 36(4):52â€“61. http://dx.doi.org/10.1111/emip.12165

Wise, S. L. (2019). Controlling construct-irrelevant factors through computer-based testing: disengagement, anxiety and cheating. Education Inquiry. 10(1):21â€“33. http://dx.doi.org/10.1080/20004508.2018.1490127

Wise, S. L., DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment. 10(1):1â€“17. http://dx.doi.org/10.1207/s15326977ea1001_1

Wise, S. L., Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education. 18(2):163â€“83. http://dx.doi.org/10.1207/s15324818ame1802_2

Wise, S. L., Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.

Yildirim-Erbasli, S. N., Bulut, O. (2021). The impact of studentsâ€™ test-taking effort on growth estimates in lowstakes educational assessments. Educational Research and Evaluation. 26(7-8):368â€“86. http://dx.doi.org/10.1080/13803611.2021.1977152

Yin, Y., Shavelson, R. J., Ayala, C. C., Ruiz-Primo, M. A., Brandon, P. R., Furtak, E. M., Tomita, M. K., Young, D. B. (2008). On the impact of formative assessment on student motivation, achievement and conceptual change. Applied Measurement in Education, 21(4):335â€“59. http://dx.doi.org/10.1080/08957340802347845

Designing Predictive Models for Early Prediction of Studentsâ€™ Test-taking Engagement in Computerized Formative Assessments