By Matthew Di Carlo
In the world of education policy, the following assertion has become ubiquitous: If we just fire the bottom 5-10 percent of teachers, our test scores will be at the level of the highest-performing nations, such as Finland. Michelle Rhee likes to make this claim. So does Bill Gates.
The source and sole support for this claim is a calculation by economist Eric Hanushek, which he sketches out roughly in a chapter of the edited volume Creating a New Teaching Profession (published by the Urban Institute). The chapter is called “Teacher Deselection” (“deselection” is a polite way of saying “firing”). Hanushek is a respected economist who has been researching education for over 30 years. He is willing to say some of the things that many other market-based reformers also believe, and say privately, but won’t always admit to in public.
So, would systematically firing large proportions of teachers every year based solely on their students’ test scores improve overall scores over time? Of course it would, at least to some degree. When you repeatedly select (or, in this case, deselect) on a measurable variable, even when the measurement is imperfect, you can usually change that outcome overall.
But anyone who says that firing the bottom 5-10 percent of teachers is all we have to do to boost our scores to Finland-like levels is selling magic beans—and not only because of cross-national poverty differences or the inherent limitations of most tests as valid measures of student learning (we’ll put these very real concerns aside for this post).
Before addressing the argument directly, it bears noting that this policy, even if it went down perfectly, would not be a quick fix. The simulation does not entail a one-time layoff. We would have to fire the “bottom” 5-10 percent of teachers permanently. Then, according to the calculation—and if everything went as planned—it would take around 10 years for U.S. test scores to rise to level of the world’s higher-performing nations.
It also seems improbable that we could ever legislate, design, and carry out such a policy on a large, nationwide scale, even if it had widespread support (which it doesn’t). Yet that’s what would be needed to produce the promised benefits (again, assuming everything went perfectly).
But what if we could do it? Would it work? As I said, there would almost certainly be some increase in overall test scores, at least in the short-term (whether or not that would signal proportional true improvement is a different matter entirely).
But would the gains be large and sustained? It’s always difficult to project the impact of an untried, drastic intervention like this, but I would argue probably not. In fact, there is a risk that this type of policy would end up hurting overall education performance in the long run, especially in higher-poverty, hard-to-staff schools and districts.
The presumed benefits of this proposal rely on several shaky assumptions, some of which would, if violated, carry negative consequences. One assumption, which I have discussed before, is that the replacement teachers will be of sufficient quality (on the whole) to produce at least average student test score gains. Hanushek’s calculation assumes that the replacements will do so (though, among other things, it’s unclear whether he uses the average gains for a first-year teacher, which are lower).
Currently, around 8-9 percent of teachers leave the profession every year, and this will probably increase as baby boomers retire. Maintaining the deselection might place substantial strain on the labor pool (of course, there would be some overlap – teachers who would be fired under the proposal would have left anyway).
In particular, high-poverty and other hard-to-staff schools—which already have problems finding good new teachers—would have to replace even more teachers every year, while choosing from an ever-narrowing applicant pool (it seems that much of California is in trouble right now). The assumption that the quality of replacements would remain stable is rather unsafe, and the calculation hinges on it.
Moreover, you can bet that many teachers, faced with the annual possibility of being fired based on test scores alone, would be even more likely to switch to higher-performing, lower-poverty schools (and/or schools that didn’t have the layoff policy). This would create additional, disruptive churn, as well as exacerbate the shortage of highly-qualified teachers in poorer schools and districts.
When all is said, it’s conceivable that, taking the firings, attrition, and switching into account, the total annual mobility rate for all teachers could approach 25 percent, and it would be much higher in poorer school districts (making these students bear a disproportionate burden for this unintended consequence). It’s hard to imagine a public education system that could function effectively under those circumstances, let alone thrive.
Remember also that a widespread test-based firing policy would almost certainly change the “type” of person who chooses to pursue teaching (or, for that matter, chooses to remain). I find it hard to believe that any top-notch applicant would be attracted to a low-paying profession because of a systematic layoff policy (see here for an alternative view). There’s no way to know, but my guess is that the opposite is true. If so, the policy’s projected benefits would be further mitigated.
The simulation also assumes that all the dismissed teachers would leave the profession permanently. Again, this seems highly unlikely, especially if replacements are in short supply. Rather, I would speculate that a significant proportion of dismissed teachers would get jobs in other districts. In doing so, they would seriously dilute the policy’s effects, while also creating needless turnover for schools.
Then there is the issue of error. Due to the well-known imprecision of value-added models, and the year-to-year fluctuation of teacher effects, many replacement teachers would be no better or worse than the fired teachers would have been (error will be particularly high among newer teachers, due to small samples).
There is something unethical about firing people based solely on measures that may be wrong due to nothing more than random statistical error, yet these mistakes would have to be tolerated, as collateral damage, in the name of productivity. But, if the replacement pool runs dry, there would also be practical consequences: we will have fired many solid teachers, whom we might have identified as such with more nuanced measures.
Finally, on a similar note, the quality of teachers who constitute the “bottom” 5-10 percent varies by location, and by poverty level (though not drastically). Imposing a widespread dismissal system would therefore result in the deselection of many teachers who would have done quite well in a different school or district. Firing these teachers solely to meet a quota is a harmful practice (again – especially if there are shortages).
In short, this proposal would be slow, risky, unfair, and it would require us to deliberately engineer test score gains for their own sake—in the most brutal manner possible. It would also be, I argue, unlikely to work, not to anywhere near the advertised degree.
Is this really our best option?
Hanushek doesn’t think so. Talking about the systematic firings, he notes, “In the long run, it would probably be superior…to develop systems that upgrade the overall effectiveness of teachers.” He points out, however, that these efforts have not been successful in the past. But have we really tried?
Instead of trying to fire our way to the high performance of Finland or anywhere else, why not try to emulate the policies that these nations actually employ? It seems very strange to shoot for the achievement levels of these nations by doing the exact opposite of what they do.
In any case, Gates, Rhee, et al. constantly repeat the “fire 5-10 percent” talking point, along with the promise of miracle results, because of its potent political message: all we have to do is fire bad teachers, and everything will be fixed. They use Hanushek’s calculation to provide an empirical basis for this message. They do not, however, seem at all attuned to the fact that the proposal is less an actual policy recommendation than a stylistic illustration of the wide variation in teacher effects.
Let’s stick with meaningful conversations about how to identify, improve, and, failing that, remove ineffective teachers. Test-based measures may have a role in the evaluation of both teachers and overall school performance, but not a dominant one, and certainly not an exclusive one.
Systematically firing large numbers of teachers based solely on test scores is an incredibly crude, blunt instrument, fraught with risk. We’re better than that.
Follow my blog every day by bookmarking washingtonpost.com/answersheet.