September 2, 2005

Using IRT DIF Methods to Evaluate the Validity of Score Gains

Authors:
Daniel M. Koretz and Daniel F. McCaffrey
Given current high-stakes uses of tests, one of the most pressing and difficult problems confronting the field of measurement is to develop better methods for distinguishing between meaningful gains in performance and score inflation. This study explores the potential usefulness of adapting differential item functioning (DIF) techniques for this purpose. We distinguish between reactive and nonreactive changes in DIF over time and relate these to the framework for validating scores under high-stakes conditions offered by Koretz, McCaffrey, and Hamilton (2001). We contrast score-anchored and item-anchored approaches to DIF in terms of their potential for this purpose. We explored changes in the distribution of DIF in the NAEP eighth-grade mathematics assessment between 1990 and 2000 in five low-gain and five high-gain states, in each case treating all other participating states as the reference group.
Koretz, D. M., & McCaffrey, D. F. (2005). Using IRT DIF methods to evaluate the validity of score gains (CSE Report 660). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
This is a staging environment