November 1, 1993

Cross-scorer and Cross-method Comparability and Distribution of Judgments of Student Math, Reading, and Writing Performance: Results From the New Standards Project Big Sky Scoring Conference

Authors:
Lauren Resnick, Daniel Resnick and Lizanne DeStefano
Partially funded by CRESST, the New Standards Project is an effort to create a state- and district-based assessment and professional development system that will serve as a catalyst for major educational reform. In 1992, as part of a professional development strategy tied to assessment, 114 teachers, curriculum supervisors, and assessment directors met to score student responses from a field test of mathematics and English language arts assessments. The results of that meeting, the Big Sky Scoring Conference, were used to analyze comparability across scorers and comparability across holistic and anaholistic scoring methods. Interscorer reliability estimates,” wrote the researchers, “for reading and writing were in the moderate range, below levels achieved with the use of large-scale writing assessment or standardized tasks. Low reliability limits the use of [the] 1992 reading and writing scores for making judgments about student performance or educational programs,” concluded the researchers. However, interscorer reliability estimates for math tasks were somewhat higher than for literacy. For six out of seven math tasks, reliability coefficients approached or exceeded acceptable levels. Use of anaholistic and holistic scoring methods resulted in different scores for the same student response. The findings suggest that the large number and varied nature of participants may have jeopardized the production of valid and reliable data. “Scorers reported feeling overwhelmed and overworked after four days of training and scoring,” wrote the researchers. Despite these difficulties, evidence was provided that scoring of large-scale performance assessments can be achieved when ample time is provided for training, evaluation, feedback and discussion; clear definitions are given of performance levels and the distinctions between them; and well-chosen exemplars are used.
Resnick, L., Resnick, D., & DeStefano, L. (1993). Cross-scorer and crossmethod comparability and distribution of judgments of student math, reading, and writing performance: Results from the New Standards Project Big Sky Scoring Conference (CSE Report 368). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).|Resnick, L., Resnick, D., & DeStefano, L. (1993). Cross-scorer and crossmethod comparability and distribution of judgments of student math, reading, and writing performance: Results from the New Standards Project Big Sky Scoring Conference (CSE Report 368). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
This is a staging environment