A Historical Analysis of Technological Advances to Educational Testing: A Drive For Efficiency and the Interplay with Validity

A Historical Analysis of Technological Advances to Educational Testing: A Drive For Efficiency and the Interplay with Validity

Authors

  • Boston College, Chestnut Hill, MA 02467
  • Boston College, Chestnut Hill, MA 02467

Keywords:

Automatic Scoring, Educational Measurement, History

Abstract

2017 marked a century since the development and administration of the first large-scale group administered standardized test. Since that time, both the importance of testing and the technology of testing have advanced significantly. This paper traces the technological advances that have led to the large-scale administration of educational tests in a digital format. Through this historical review, a drive to develop and apply new technologies to increase efficiency is revealed. In addition, this review reveals a pattern in which each new advance unveils a new drag on efficiency that becomes the focus of future innovation. The interplay between a drive for efficiency and interest in improved validity is also explored. Upon reaching the recent introduction of technology-enhanced items, the paper suggests that it may be advantageous to relax the drive for efficiency with hopes of realizing gains in validity.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2018-09-28

How to Cite

Moncaleano, S., & Russell, M. (2018). A Historical Analysis of Technological Advances to Educational Testing: A Drive For Efficiency and the Interplay with Validity. Journal of Applied Testing Technology, 19(1), 1–19. Retrieved from http://www.jattjournal.net/index.php/atp/article/view/131017

Issue

Section

Articles

References

Adams, A.S. (1961). The pace of change. Paper presented at the 1960 Invitational Conference on Testing Problems, Princeton, NJ.

Barak, M., & Dori, Y. J. (2009). Enhancing higher order thinking skills among in service science teachers via embedded assessment. Journal of Science Teacher Education, 20(5), 459-474. https://doi.org/10.1007/s10972-009-9141-z

Bayroff, A. (1964). Feasability of a Programmed testing machine. Army personnel research office. Washington D.C.: and the efforts of the Psychological Corporation were.

Ben-Simon, A., & Bennett, R. E. (2007). Toward more substantively meaningful automated essay scoring. Journal of Technology, Learning and Assessment, 6(1).

Bennett, R. E. (1999). Reinventing assessment: Speculations on the future of large scale educational testing. Princeton, NJ: Educational Testing Service. PMCid:PMC2269318

Bennett, R. E. (2003). Online assessment and the comparability of score meaning. In International Association for Educational Assessment Annual conference, Manchester, October 2003.

Bennett, R. E., Persky, H., Weiss, A. R., & Jenkins, F. (2007). Problem solving in technology-rich environments. a report from the NAEP technology-based assessment project, research and development series. NCES 2007-466. National Center for Education Statistics.

Betz, N. E., & Weiss, D. J. (1973). An empirical study of Computer-Administered Two-Stage Ability Testing. Minnesota University. Washington D.C.: Office of Naval Research. PMCid:PMC1350622

Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. L’annee Psychologique, 12, 191-244.

Boake, C. (2002). From the binet-simon to the Wechsler-Bellevue: Tracing the history of intelligence testing. Journal of Clinical and Experimental Neuropsychology, 24(3), 383-405. https:// doi.org/10.1076/jcen.24.3.383.981 PMid:11992219

Briel, J., & Michel, R. (2014). Revisiting the GRE General Test. In C. Wendler, & B. Bridgeman (Eds.), The Research Foundation for the GRE revised General Test: A compendium of studies. Princeton, NJ: Educational Testing Service.

Campbell, D. P. (1971). Handbook for the strong vocational interest blank. Stanford University Press.

Carroll, J.B. (1969). Phillip Justin Rulon (1900-1968). Psychometrika, 34(1), 1-3 https://doi.org/10.1007/BF02290168

Carson, J. (1993, June). Army alpha, army brass, and the search for army intelligence. The University of Chicago Press, 84(2), 278-309.

Clark, C. (1976). Proceedings of the first conference on computerized adaptive testing. Washington, D.C.: Civil Service Commission

Clarke, M. M., Madaus, G. F., Horn, C. L., & Ramos, M. A. (2000). Retrospective on educational testing and assessment in the 20th century. Journal of Curriculum Studies, 32(2), 159-181. https://doi.org/10.1080/002202700182691

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Holt, Rinehart and Winston.

Da Cruz, F. (2017, January 30). Columbia University Computing History. Columbia University Computing History. Available from: http://www.columbia.edu/cu/computinghistory/ index.html

Darling-Hammond, L., Ancess, J., & Falk, B. (1995). Authentic assessment in action: Studies of schools and students at Work. New York, NY: Teachers College Press.

Davidson, C. N. (2011). Now you see it. United States: Penguin Books.

Downey, M. T. (1965). Ben D. Wood: Educational Reformer. Princeton, New Jersey: Educational Testing Service.

Dunbar, S., Koretz, D., & Hoover, H. (1991). Quality control in the development and use of performance assessment. Applied Measurement in Education, 4, 289-304. https://doi.org/10.1207/s15324818ame0404_3

Duncan, N. H. (1976). Reflections on adaptive testing. In C. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 90-94). Washington, D.C.: Civil Service Commission. PMid:1252382

Educational Testing Association. (2014). A snapshot of the individuals who took the GRE revised general test. Available at: https://www.ets.org/s/gre/pdf/snapshot_test_ taker_data_2014.pdf.

Ellul, J. (1964). The Technological Society. New York, NY: Vintage Books.

Elwood, D. L. (1969). Automation of psychological testing. American Psychologist, 24, 287-289. https://doi.org/10.1037/h0028335

Elwood, D. L., & Griffin, H. (1972). Individual intelligence testing without the examiner: reliability of an automated method. Journal of Consulting and Clinical Psychology, 38(1), 9-14. https://doi.org/10.1037/h0032416

Finger JR., J. A. (1966). A machine scoring answer sheet form for the IBM 1231 optical scanner. Educational and Psychological Measurement, 26, 725-727. https://doi.org/10.1177/001316446602600321

Florida Department of Education (2010). Race to the Top Assessment Program Application for New Grants. Available from: http://www.smarterbalanced.org/wordpress/ wp-content/uploads/2011/12/Smarter-Balanced-RttTApplication.pdf.

French, J. L. & Hale, R. L. (1990). A history of the development of psychological and educational testing. In C. R. Reynolds & R. W. Kamphaus (Eds.), Handbook of psychological and educational assessment of children (pp. 2-28). New York: Guilford

Gallagher, C. J. (2003, March). Reconciling a tradition of testing with a new learning paradigm. Educational Psychology Review, 15(1), 83-99. https://doi.org/10.1023/A:1021323509290

Gewertz, C. (2017). Which states are using PARCC and Smarter Balanced? An interactive breakdown of states’ 2016-17 testing plans. Education Week. Available from: https:// www.edweek.org/ew/section/multimedia/states-usingparccor-smarter-balanced.html.

Gierl, M.J., Latifi, S., Lai, H., Boulais, A.P., De Champlain, A. (2014). Automated essay scoring and the future of educational assessment in medical education. Medical Education, 48(10), 939-1029. https://doi.org/10.1111/ medu.12517 PMid:25200016

Gould, S. (1981) The Mismeasure of Man. WW Norton & Company.

Graduate Management Admission Council. (2017). GMAT test taker data. Available from: https://www.gmac.com/ market-intelligence-and-research/research-library/gmattesttaker-data.aspx.

Gregory, R. J. (1992). Psychological testing: History, principles, and applications. Allyn & Bacon.

Haladyna, T. M. (2012). Developing and validating multiple-choice test items. Routledge.

Hamilton, L. S., & Koretz, D. M. (2002). Tests and their use in test-based accountability systems. In L. S. Hamilton, B. M. Stecher, & S. P. Klein (Eds.) (2002). Making sense of testbased accountability in education. MR-1554-EDU. Santa Monica: RAND.

Hankes, E. J. (1954). New Developments in Test Scoring Machines. Proceedings 1953 Invitational Conference on Testing Problems. Princeton, NJ: Educational Testing Service. PMid:13208274

Harman, H. H., & Harper, B. P. (1954). AGO Machines for Test Analyses. Proceedings 1953 Invitational Conference on Testing Problems (pp. 154-156). Princeton, NJ: Educational Testing Service. Harold, M. (1960). U.S. Patent No. 2944734.

Higgins, J., Russell, M., & Hoffmann, T. (2005). Examining the effect of computer-based passage presentation of reading test performance. The Journal of Technology, Learning and Assessment, 3(4).

Hoffmann, B. (1962). The tyranny of testing. New York, NY: Crowell-Collier Publishing Company.

Holeman, M., & Docter, R. (1972). Educational and Psychological Testing: A study of the industry and its practices. Russell Sage Foundation. PMid:20119205

Horkay, N., Bennett, R. E., Allen, N., Kaplan, B. A., & Yan, F. (2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. The Journal of Technology, Learning and Assessment, 5(2).

IBM. (n.d.). IBM Special Products. Retrieved April 25, 2017, IBM Archives. Available from: http://www-03.ibm.com/ ibm/history/exhibits/specialprod1/specialprod1_1.html

IBM. (n.d.). Icons of Progress. Retrieved April 25, 2017. IBM 100. Available from: http://www-03.ibm.com/ibm/history/ ibm100/us/en/icons/testscore/

Kamenetz, A. (2015). The Test. New York, NY: Public Affairs.

Kane, M. (1992). An argument-based approach to validation. Psychological Bulleting, 112, 527-535 https://doi.org/10.1037/0033-2909.112.3.527

Kelly, F.J. (1915). The Kansas silent reading test. Studies by the Bureau of Educational Measurement and Standards. No. 3, 1-38

Koran, S. W. (1942). Machines in civil service testing. Educational and Psychological Measurement, 2(1), 164-200 https://doi.org/10.1177/001316444200200114

Lemann, N. (1999). The big test: The secret history of the American meritocracy. New York, NY: Farrar, Straus and Giroux.

Lindquist, E. F. (1954). The Iowa electronic test processing equipment. Proceedings 1953 Invitational Conference on Testing Problems (pp. 160-168). Princeton, NJ: Educational Testing Service.

Lindquist, E. F. (1955). Iowa, U.S. Patent No. 3,050,248.

Lord, F. M. (1970). Some test theory for tailored testing. In H. Holtzman (Ed.), Computer Assisted Instruction, Testing and guidance. New York, NY: Harper & Row.

Lowrance, W.W. (1986). Modern science and human values. New York, NY: Oxford University Press.

Madaus, G. F., & O’Dwyer, L. M. (1999). A short history of performance assessment. Phi Delta Kappan, 80(9), 688-695.

Madaus, G., Russell, M., & Higgins, J. (2009). The Paradoxes of High Stakes Testing: How They Affect Students, Their Parents, Teachers, Principals, Schools, and Society. Charlotte, NC: Information Age Publishing.

McClarty, K. L., Orr, A., Frey, P. M., Dolan, R. P., Vassileva, V., McVay, A. (2012) A literature review of Gaming in Education. Pearson. PMCid:PMC3409228

Mislevy, R. J., Behrens, J. T., Dicerbo, K. E., Frezzo, D. C., & West, P. (2012). Three things game designers need to know about assessment. In D. Ifenthaler, D. Eservel, & X. Ge (Eds.), Assessment in game-based learning: Foundations, innovations, and perspectives (pp. 59–81). New York, NY: Springer New York https://doi.org/10.1007/978-1-46143546-4_5

McNamara, W. J., & Weitzman, E. (1946, Feb). The Economy of Item Analysis with the IBM Graphic Item Counter. Journal of Applied Psychology, 30, 84-90. https://doi.org/10.1037/ h0057688 PMid:21015335

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13-103). New York: American Council on Education/Macmillan

Minton, H.L. (1987) Lewis M. Terman and Mental Testing: In search of the Democratic Ideal. In M. M. Sokal (Ed.), Psychological Testing and American Society 1890 - 1930. New York, NY: Rutgers University Press.

Minton, H.L. (1998). Lewis M. Terman: Pioneer in Psychological Testing. New York, NY: New York University Press.

Monahan, T. (1998). The Rise of Standardized Educational Testing in the U.S.: A Bibliographic Overview.

Newton, P., & Shaw, S. (2014). Validity in educational and psychological assessment. Sage. https://doi.org/10.4135/9781446288856

OAT. (1992). Testing in American Schools: Asking the right questions, Chapter 4. In OAT, Lessons from the past: A history of educational testing in the United States.

Office of State Assessment. (1987, Nov 24). New York State Education Department - Office of State Assessment. Retrieved May 2, 2017, from History of Regents Examinations: 1865 to 1987: www.p12.nysed.gov/ assessment/hsgen/archive/rehistory.htm

Otis, A. (1918). Otis Group Intelligence Scale: Manual of Directions for Primary and Advanced Examinations. Chicago, IL, US: World Book Company. PMCid:PMC2306970

Page, E. (1966). The imminence of grading essays by computer. Phi Delta Kappan, 48, 238-243.

Pearson, K. (1914). The Life, Letters and Labours of Francis Galton. London: Cambridge University Press.

Pellegrino, J.W., Chudowsky, N., & Glaser, R.E. (2001). Knowing What Students Know: The Science and Design of Educational Assessment. Washington, DC: National Academy Press.

Peterson, J. J. (1983). The Iowa Testing Programs. Iowa City, IIA: University of Iowa Press.

Poggio, J., Glasnapp, D. R., Yang, X., & Poggio, A. J. (2005). A comparative evaluation of score results from computerized and paper and pencil mathematics testing in a large-scale state assessment program. Journal of Technology, Learning, and Assessment, 3(6), n6.

Poole, T., & Sokolski, M. (1974). U.S. Patent No. 3800439.

Prep, V. (2012). Demystifying the MCAT. U.S. News and World Report. Available from: https://www.usnews.com/education/blogs/medical-school-admissionsdoctor/ 2012/02/27/demystifying-the-mcat.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.

Reed, J. (1987). Robert M. Yerkes and the Mental Testing Movement. In M. M. Sokal (Ed.), Psychological Testing and American Society 1890 - 1930. New York, NY: Rutgers University Press.

Rudner, L. (1998) An On-line, Interactive, Computer Adaptive Testing Mini-Tutorial. ERIC Clearinghouse on Assessment and Evaluation.

Rudner, L., & Gagne, P. (2001). An Overview of Three Approaches to Scoring Written Essays by Computer. Avialble from: http://pareonline.net/htm/v7n26.htm

Russell, M. (2006). Technology and Assessment: The tale of two interpretations. (W. Heinecke, Ed.) United States: Information Age Publishing Inc.

Russell, M., & Airasian, P. W. (2012). Classroom assessment: Concepts and applications. McGraw-Hill.

Russell, M., Goldberg, A. & O’connor, K. (2011) Computerbased Testing and Validity: a look back into the future. Assessment in Education: Principles, Policy & Practice, 10(3), 279-293 https://doi.org/10.1080/0969594032000148145

Russell, M., & Haney, W. (1997). Testing writing on computers. Education Policy Analysis Archives, 5, 3.

Russell, M., Hoffman, T., & Higgins, J. (2009). Meeting the needs of all students: A universal design approach to computerbased testing. Innovate: Journal of Online Education, 5(4), 6.

Russell, M. & Moncaleano, S. (2017). Current state of technology-enhanced items in large-scale educational testing. A paper presented at the Northeastern Educational Research Association, Trumbull, CT.

Russell, M. & Plati, T. (2000). Mode of Administration Effects on MCAS Composition Performance for Grades Four, Eight, and Ten. A Report of Findings Submitted to the Massachusetts Department of Education. NBETPP Statements World Wide Web Bulletin.

Sachar, J. D., & Fletcher, J. (1978). Administering Paper-AndPencil Tests by Computer, Or the Medium is Not Always the Message. In D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference (pp. 403-419). Minneapolis, MN: Office of Naval Research.

Scalise, K. & Gifford, B. (2006). Computer-Based Assessment in E-Learning: A Framework for Constructing “Intermediate Constraint†Questions and Tasks for Technology Platforms. Journal of Technology, Learning, and Assessment, 4(6). Available from http://ejournals.bc.edu/ojs/index.php/jtla/ article/view/1653/1495.

Scantron Co. (n.d.). Story - Scantron. Available from: http:// www.scantron.com/about-us/company/our-story.

Shermis, M.D. & Burstein, J.C. (2003). Automated Essay Scoring: A Cross-Disciplinary Perspective. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

Shute, V. J., Ventura, M., Bauer, M., & Zapata-Rivera, D. (2009). Melding the power of serious games and embedded assessment to monitor and foster learning. Serious games: Mechanisms and effects, 2, 295-321.

Sireci, S.G. & Zenisky, A.L. (2006). Innovative item formats in computer-based testing: in pursuit of improved construct representation. In S.M. Downing & T.M. Haladyna (Eds.) Handbook of Test Development (pp. 329-348). New York, NY: Routledge.

Sizer, T. (1992). Horace’s School: Redesigning the American High School. Boston, MA: Houghton Mifflin Company. PMCid:PMC1882904

Stecher, B. (2010). Performance Assessment in an Era of Standards-Based Educational Accountability. Stanford University, Stanford Center for Opportunity Policy in Education, Stanford, CA.

Steamship. (2001). In Columbia Encyclopedia (6th ed..), New York: Columbia University Press. Available from: http:// www.barleby.com/65/st/steamhi.html.

Strong, S. & Sexton, L. (2000). A Validity Study of the Kentucky’s Performance Based Assessment System with National Merit Scholars and National Merit Commended. Journal of Instructional Psychology, 27(3), 202.

Teplovs, C., Donoahue, Z., Scardamalia, M., & Philip, D. (2007, July). Tools for concurrent, embedded, and transformative assessment of knowledge building processes and progress. In Proceedings of the 8th international conference on Computer supported collaborative learning (pp. 721-723). International Society of the Learning Sciences. https://doi.org/10.3115/1599600.1599732

Terman, L. M. (1916). The measurement of intelligence: An explanation of and a complete guide for the use of the Stanford revision and extension of the Binet-Simon intelligence scale. Houghton Mifflin. https://doi.org/10.1037/10014-000

Traxler, A. E. (1954). The IBM Test Scoring Machine: An Evaluation. Proceedings 1953 Invitational Conference on Testing Problems (pp. 139-146). Princeton, NJ: ETS.

U.S. Department of Education, National Center for Education Statistics. (2016). State Nonfiscal Survey of Public Elementary/Secondary Education, 1990-91 through 201415; and State Public Elementary and Secondary Enrollment Projection Model, 1980 through 2026.

Veronese, K. (2012, May 13). The birth of Scantrons, the bane of standardized testing. io9 We come from the future. Retrieved from: http://io9.gizmodo.com/5908833/thebirthof-scantrons-the-bane-of-standardized-testing

Warren, R. (1935). U.S. Patent No. 2010653.

Warren, R. (1939). U.S. Patent No. 2150256.

Washington State. (2010). Race to the Top Assessment Program Application for New Grants. Retrieved from: http://www.smarterbalanced.org/wordpress/wp-content/ uploads/2011/12/Smarter-Balanced-RttT-Application.pdf.

Weiss, D. J., & Betz, N. E. (1973). Ability Measurement: Conventional or Adaptive? University of Minnesota, Personnel and Training Research Program. Washington D.C.: Office of Naval Research.

Weiss, D. J. (Ed). (1978). Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis, MN: Office of Naval Research Weiss, D. J. (1980). Proceedings of the 1979 Computerized Adaptive Testing Conference. Minneapolis, MN: Office of Naval Research Weiss, D.J. (1985). Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference. Minneapolis, MN: Office of Naval Research

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13(2), 181-208.

Winner, L. (1977). Autonomous technology: Technic-out-of control as a theme in political thought. Cambridge, MA: MIT Press.

Wolf, T.H. (1973). Alfred Binet. Chicago, IL: University of Chicago Press

Wood, B. J. (1936). Bulletin of Information on The International Test Scoring Machine. New York, NY: Cooperative Test Service.

Yerkes, R. M. (1921). Psychological Examining in the United States Army. Chicago: American Psychological Association.

Zenderland, L. (1998). Measuring Minds: Henry Herbert Goddard and the Origins of American Intelligence Testing. Cambridge, UK: Cambridge University Press.

Loading...