April 14, 2025

Embedding Embedded Standard Setting: An Application of Cross-Classified Item Response Theory Modeling

Authors:
Yun-Kyung Kim & Li Cai

English language proficiency (ELP) assessments play a critical role in identifying English learners eligible for services and in monitoring their progress annually. To this end, these assessments use standard setting procedure to establish cutscores that translate the policy definition of English learners onto specific locations on a reporting score scale. Among various methods, the recently emerging embedded standard setting (ESS; Lewis & Cook, 2020) aims to closely integrate expert judgements with empirical data by developing items to target specific performance level (target PLs). Under the ESS procedure, once items are calibrated, cutscores emerge organically by optimizing the coherence between item difficulties and target PLs.

The alignment between item difficulties and target PLs is fundamental to the validity of ESS-derived cutscores. It has been suggested that the alignment be assessed using correlation coefficients (Lewis & Cook, 2020; Schneider et al., 2022), but correlation coefficients have limitations as sole indicators due to their sensitivity to irrelevant factors, such as the number of target PLs and items’ distribution across the target PLs (Chen et al., 2021; 2024). In such a context, this report proposes an alternative approach of regressing item difficulties onto target PLs and integrating this analysis into the psychometric model used for item calibration.

We implemented the integration of two tasks—calibration of item parameters and evaluation of the alignment between item difficulties and target PLs—using a cross-classified item response theory (IRT) model (Huang & Cai, 2023). In our application of the cross-classified IRT model, person effects and item effects were both assumed to be random, which is why the model is often referred to as random item effects IRT model. Unlike standard IRT models where item effects are assumed to be fixed, the cross-classified IRT model allows item difficulties to be regressed onto their target PLs. The resulting regression coefficients are consistently estimated and are directly interpretable. Through a simulation study, we demonstrated the model’s ability to recover item parameters and regression coefficients effectively. We then applied the model to an empirical dataset from ELPA21’s Alternate English language proficiency assessment (Alt ELPA) designed for English learners with the most significant cognitive disabilities.

The application of random item effects IRT model, a special instance of cross-classified IRT model, to Alt ELPA dataset showcased that the model reproduced parameter estimates that are nearly identical to those from standard IRT model. Furthermore, it confirmed that the target PLs assigned my item writers well predicted the item difficulties, supporting the validity and technical adequacy of the ESS procedure. Additionally, the model significantly reduced the number of freely estimated parameters, enhancing estimation stability and considerably reducing standard errors, which was particularly valuable for Alt ELPA datasets with small sample size and sparsity.

Kim, Y., & Cai, L. (2025). Embedding embedded standard setting: an application of cross-classified Item Response Theory (CRESST Report 876). UCLA/CRESST.
This is a staging environment