Vol. 20. Núm. 2. 2014. Páginas 79-87

An introduction to the use of evidence-centered design in test development

[Introducci√≥n al dise√Īo centrado en la evidencia en la construcci√≥n de tests]

Michael J. Zieky1 1Educational Testing Serv., Princeton, New Jersey, USA


The purpose of this article is to describe what Evidence-Centered Design (ECD) is and to explain why andhow ECD is used in the design and development of tests. The article will be most useful for readers whohave some knowledge of traditional test development practices, but who are unfamiliar with ECD. Thearticle begins with descriptions of the major characteristics of ECD, adds a brief note on the origins of ECD,and discusses the relationship of ECD to traditional test development. Next, the article lists the importantadvantages of using ECD with an emphasis on the validity of the inferences made about test takers on thebasis of their scores. The article explains the nature and purpose of the "layers" or stages of the ECD testdesign and development process: 1) domain analysis; 2) domain modeling; 3) conceptual assessmentframework; 4) assessment implementation; and 5) assessment delivery. Some observations about myexperience with the early application of ECD for those who plan to begin using ECD, a brief conclusion, andsome recommendations for further reading end the article. 


El objetivo de este trabajo es describir qué es y explicar por qué y cómo se utiliza el Diseño Centrado en laEvidencia (DCE) para diseñar y construir tests. Este trabajo está pensado especialmente para personas queya estén algo familiarizadas con las prácticas tradicionales de construcción de tests pero que desconozcanel DCE. Comienza con una descripción de las características fundamentales del DCE, continua con un breveapunte acerca de su origen y analiza su relación con la práctica tradicional en la construcción de tests. Acontinuación, se indican las ventajas que conlleva la utilización del DCE, resaltando su impacto en la validezde las inferencias realizadas sobre los sujetos en base a sus puntuaciones en los tests. En el artículo se explicala naturaleza y el objetivo de las 'capas' o etapas en el proceso de diseño y construcción de tests con elDCE: 1) análisis del dominio, 2) modelado del dominio, 3) marco conceptual de la evaluación, 4) implementaciónde la evaluación y 5) administración de la evaluación. Para terminar, se ofrecen algunos comentariosacerca de la experiencia del autor en la aplicación del DCE para aquellos que estén pensando en empezara utilizarlo, junto a una breve conclusión y alguna recomendación acerca de lecturas adicionalessobre el tema. 

Almond et al., 2002
Almond, R. G., Steinberg, L. S., & Mislevy, R. J. (2002). Enhancing the design and delivery of assessment systems: A four-process architecture. Journal of Technology, Learning, and Assessment , 1 (5). Available from
American, 2014
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bejar, 2011
Bejar, I. (2011). A validity-based approach to quality control and assurance of automated scoring. Assessment in Education: Principles, Policy & Practice, 18 , 319-341. Retrieved from h t t p & # 5 8 ; & # 4 7;/
Bennett, 2010
Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives , 8 , 70-91. doi: 10.1080/15366367.2010.508686.
Deane and Song, 2014
P. Deane
Y. Song
A case study in principled assessment design: Designing assessments to measure and support the development of argumentative reading and writing skills
Psicología Educativa
de la Torre and Minchen, 2014
J. de la Torre
N. Minchen
Cognitively diagnostic assessments and the cognitive diagnosis model framework
Psicología Educativa
Graf, 2009
Graf, E. A. (2009). Defining mathematics competency in the service of cognitively based assessment for grades 6 through 8 (Research Report 09-42). Princeton, NJ: Educational Testing Service.
Hansen et al., 2008
Hansen, E. G., Mislevy, R. J., & Steinberg, L. S. (2008). Evidence-centered assessment design for reasoning about accommodations for individuals with disabilities in NAEP reading and math (Research Report 08-38). Princeton, NJ: Educational Testing Service. Hines, S. (2010). Evidence-centered design: The TOEIC¬ģ speaking and writing tests (Re-search Report 10-07). Princeton, NJ: Educational Testing Service.
Huff, 2010
K. Huff
The promises and challenges of implementing evidence-centered design in large-scale assessment
Applied Measurement in Education
Huff et al., 2013
Huff, K., Alves, C. B., Pellegrino, J., & Kaliski, P. (2013). Using evidence-centered design task models in automatic item generation. In M. J. Gierl & T. M. Haladyna (Eds.), Automatic item generation Theory and practice (pp. 102-118). New York: Routledge.
Messick, 1989
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-104). Washington, DC: American Council on Education.
Mislevy, 1994
Mislevy, R. J. (1994). Evidence and inference in educational assessment. Psychometrika, 59, 439-483.
Mislevy, 2006
Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 257-306). Washington, DC: American Council on Education/Praeger.
Mislevy et al., 2003
Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design (Research Report 03-16). Princeton, NJ: Educational Testing Service.
Mislevy et al., 1999
Mislevy, R. J., Almond, R. G., Yan, D., & Steinberg, L. S. (1999). Bayes nets in educational assessment: Where do the numbers come from? In K. B. Laskey & H. Prade (Eds.), Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 437-446). San Francisco, CA: Morgan Kaufmann.
Mislevy et al., 2010
Mislevy, R. J., Bejar, I. I., Bennett, R. E., Haertel, G. D., & Winters, F. I. (2010). Technology supports for assessment design. In B. McGaw, E. Baker, & P. Peterson (Eds.), International encyclopedia of education (3rd ed., volume 8, pp. 56-65). Amsterdam, Netherlands: Elsevier.
Mislevy and Haertel, 2006
Mislevy, R. J., & Haertel, G. (2006). Implications of evidence-centered design for educational testing . Menlo Park, CA: SRI International.
Mislevy et al., 2011
Mislevy, R., Haertel, G., Yarnall, L., & Wentland, E. (2011). Evidence-centered task design in test development. In C. Secolsky (Ed.), Measurement, assessment, and evaluation in higher education (pp. 257-276). New York, NY: Routledge.
Mislevy and Riconscente, 2005
Mislevy, R. J., & Riconscente, M. M. (2005). Evidence-centered design: Layers, structures, and terminology. Menlo Park, CA: SRI International.
Mislevy et al., 1999a
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (1999). Evidence-centered assessment design. Princeton, NJ: Educational Testing Service.
Mislevy et al., 2003a
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1 , 3-67.
Mislevy and Yin, 2012
Mislevy, R. J., & Yin, C. (2012). Evidence-centered design in language testing. In G. Fulcher & F. Davidson (Eds.), Routledge handbook of language testing (pp. 208-222). London, England: Routledge.
Pellegrino, 2014
Pellegrino, J. W. (2014). Assessment as a positive influence on 21st century teaching and learning: A systems approach to progress. Psicología Educativa, 20, 65-77.
Scalise and Wilson, 2006
Scalise, K., & Wilson, M. (2006). Analysis and comparison of automated scoring approaches: Addressing evidence-based assessment principles. In D. M. Williamson, R. J. Mislevy, & I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15-47). Mahwah, NJ: Lawrence Erlbaum Associates.
Sheehan et al., 2007
Sheehan, K. M., Kostin, I., & Futagi, Y. (2007). Supporting efficient, evidence-centered item development for the GRE¬ģ verbal measure (Research Report 07-29). Princeton, NJ: Educational Testing Service.
Stocking and Swanson, 1993
Stocking, M., & Swanson, L. (1993). A method for severely constrained item ion in adaptive testing. Applied Psychological Measurement, 17 , 277-292.
Tannenbaum et al., 2008
Tannenbaum, R. J., Robustelli, S. L., & Baron, P. A. (2008). Evidence-centered design: A lens through which the process of job analysis may be focused to guide the development of knowledge-based content specifications. CLEAR Exam Review, 19, 26-33.
Toulmin et al., 1958
Toulmin, S. E. (1958). The uses of argument. Cambridge, England: Cambridge University Press.
Van Rijn et al., 1958
Van Rijn, P. W., Graf, E. A., & Deane, P. (2014). Empirical recovery of argumentation learning progressions in scenario-based assessments of english language arts. Psicología Educativa, 20, 109-115.

Copyright © 2018. Colegio Oficial de Psicólogos de Madrid

© Copyright 2018. Colegio Oficial de Psicólogos de Madrid ContactPrivacy PolicyCookies Policy

We use our own and third­party cookies. The data we compile is analysed to improve the website and to offer more personalized services. By continuing to browse, you are agreeing to our use of cookies. For more information, see our cookies policy