Monday, July 26, 2010

important new study about huge error rates in value-added teacher evaluation


New mathematica study for the US DOE’ Institute of Education Sciences, showing that if using value –added teacher test scores to evaluate teachers, the error rate is 35% (by using one year of test score data) and still as high as 25% for three years of test score data.:

“ This paper addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using realistic performance measurement system schemes based on hypothesis testing, we develop error rate formulas based on OLS and Empirical Bayes estimators. Simulation results suggest that value-added estimates are likely to be noisy using the amount of data that are typically used in practice.

Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively. Lower error rates can be achieved if schools are the performance unit. The results suggest that policymakers must carefully consider likely system error rates when using value-added estimates to make high-stakes decisions regarding educators….

Using rigorous statistical methods and realistic performance measurement schemes, this report presents evidence that value-added estimates are likely to be quite noisy using the amount of data that are typically used in practice for estimation. ….

If only three years of data are used for estimation (the amount of data typically used in practice), Type I and II errors for teacher-level analyses will be about 26 percent each. This means that in a typical performance measurement system, 1 in 4 teachers who are truly average in performance will be erroneously identified for special treatment, and 1 in 4 teachers who differ from average performance by 3 to 4 months of student learning will be overlooked. Corresponding error rates will be lower if the focus is on overall false positive and negative error rates for the full population of affected teachers. With three years of data, these misclassification rates will be about 10 percent.

These results strongly support the notion that policymakers must carefully consider system error rates in designing and implementing teacher performance measurement systems based on value-added models, especially when using these estimates to make high-stakes decisions regarding teachers (such as tenure and firing decisions)….

Studies have found only moderate year-to-year correlations—ranging from 0.2 to 0.6—in the value-added estimates of individual teachers (McCaffrey et al. 2009; Goldhaber and Hansen 2008) or small to medium-sized school grade-level teams (Kane and Staiger 2002b). As a result, there are significant annual changes in teacher rankings based on value-added estimates. Studies from a wide set of districts and states have found that one-half to two-thirds of teachers in the top quintile or quartile of performance from a particular year drop below that category in the subsequent year (Ballou 2005; Aaronson et al. 2008; Koedel and Betts 2007; Goldhaber and Hansen 2008; McCaffrey et al. 2009).
While previous work has documented instability in value-added estimates post hoc using several years of available data, the specific ways in which performance measurement systems should be designed ex ante to account for instability of the estimates have not been examined. This paper is the first to systematically examine this precision issue from a design perspective focused on the following question: “What are likely error rates in classifying teachers and schools in the upper elementary grades into performance categories using student test score gain data that are likely to be available in practice?” These error rates are critical for assessing appropriate sample sizes for a performance measurement system that aims to reliably identify low- and high-performing teachers and schools.

Leonie Haimson
Executive Director
Class Size Matters

No comments: