September 1, 1990

Benchmarking Text Understanding Systems to Human Performance: An Exploration

Authors:
Frances A. Butler, Eva L. Baker, Tine Falk, Howard Herl, Younghee Jang, and Patricia Mutch
Benchmarking in the context of this report means comparing the performance of intelligent computer systems to the performance of humans on the same task. Computer responses to questions based on specific reading texts are referenced back to human responses to the same questions about the same texts. The results of this report support the belief that we can compare system performance to human performance in a meaningful way using performance-based measures. This study provides direction for researchers who are interested in a methodology for assessing intelligent computer systems.
Butler, F. A., Baker, E. L., Falk, T., Herl, H., Jang, Y., & Mutch, P. (1990). Benchmarking text understanding systems to human performance: An exploration (CSE Report 347). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).|Butler, F. A., Baker, E. L., Falk, T., Herl, H., Jang, Y., & Mutch, P. (1990). Benchmarking text understanding systems to human performance: An exploration (CSE Report 347). Los Angeles: University of California, Los Angeles, National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
This is a staging environment