Norm's Notes: Historical View of Release of Teacher Data

http://www.cjr.org/cover_story/tested.php?page=all

Tested

Covering schools in the age of micro-measurement

Eleven New York City education reporters were huddling on e-mail last October 20, musing over ways to collectively pry a schedule of school closings out of a stubborn press office, when the chatter stopped cold. Word had filtered into their message bins that the city was about to release a set of spreadsheets showing performance scores for 12,000 of the city’s 80,000 teachers—names included. Few understood better than the beat reporters that this wonky-sounding database was a game changer.
The Los Angeles Times already had jolted newsrooms across the country back in August, when it published 6,000 public school teachers’ names next to its own performance calculations. New York education reporters, though, were considerably more reluctant to leap on this bandwagon. They found themselves with twenty-four hours to explain a complex and controversial statistical analysis, first to their editors and then to the public, while attempting to fend off the inevitable political and competitive pressure to print the names next to the numbers, something nearly every one of them opposed. “I stayed up all night kind of panicked,” said Lindsey Christ, the education reporter for the local NY1 television station, “writing a memo to everyone in the newsroom explaining what was coming and what was at stake.”
It may seem odd that a geeky algorithm has become such a hot topic in education, but it is another indication of how a group of well-connected newcomers to the contentious world of education policy has influenced the national conversation on the subject. As a group—mostly Wall Street financiers, political lobbyists, and venture philanthropists—they are drawn to the tools and terms of business economics. In this case, that means something called “value-added metrics,” which estimate the worth of a teacher by analyzing her students’ test scores over time.
Supporters of this technique argue that teacher evaluations require objective rigor, calculated with statistics. Weak teachers, they argue, should not hide behind a subjective, protective system that undermines children’s futures. Critics counter that the calculations are incomplete, misleading, and often wrong. Teachers wonder how a number built on test questions can capture what it takes to help a student wrestle with ideas, say, or learn to write with voice. Wouldn’t it make more sense, they ask, to use student work, peer mentoring, and rigorous classroom observations for a more meaningful evaluation? Economists on all sides of the debate agree that these stats cannot paint a whole picture of effective teaching. So, the critics say, why print them indelibly next to teachers’ names?
But numbers have an allure. Governors and mayors facing huge budget cuts are demanding easier ways (read: rankings) to fire the worst teachers and reward the best. Washington likes numbers too. In the past year, eleven states including New York, Florida, and North Carolina have agreed to use student scores to evaluate teachers in exchange for federal Race to the Top grants.
So perhaps it was inevitable that elements of the free-market reform movement would land in the laptops of New York’s education reporters, with enough force to diminish the quality of the conversation about the city’s public schools.
The battle over the numbers is in part a battle over control. For decades, neither of the two national teacher unions has done enough to shed their more arcane rules, which has made innovation difficult. Still, the assault surprised the unions at first, mostly because it came from unexpected places, including the press. Steven Brill’s lopsided 2009 piece, “The Rubber Room,” in The New Yorker, was among the first to frame the current reform climate. He portrayed a war between good-guy marketplace reformers and villainous unions, blistering the UFT, the United Federation of Teachers, for its part in negotiating rules that led to the city’s practice of warehousing tenured teachers faced with discipline cases into “rubber rooms,” with little to do but punch the clock. The powerful union deserved the ridicule. Still, Brill allowed Schools Chancellor Joel Klein, the story’s white-hat protagonist, to dance around the irony that he had been in charge of the system, and thus the so-called “rubber rooms,” for seven years.
When eliminating teacher tenure became the movement’s silver bullet, a March 2010 Newsweek cover story jumped on the cause: “The Key to Saving American Education: We must fire bad teachers.” That followed an iconic 2008 Time cover that established then-DC school chancellor Michelle Rhee as the movement’s celebrity executive. Rhee was pictured broom in hand, poised to sweep bad teachers away.
The next salvo came with the Fall 2010 opening of Waiting for Superman, the emotionally charged and popular documentary that identifies poor classroom teachers as the primary cause of urban school failure. Promoted with the help of foundation support (Gates, Walton, Broad) and star power (Oprah Winfrey), the film promoted a now-familiar drumbeat: fire bad teachers, close failing schools, open privately run charter schools, incentivize teaching. NBC dedicated a week in September 2010, coinciding with the film’s release, to stories on public education, mining the film as its main source of ideas. Critics later exposed factual errors in the documentary and took issue with its agenda, which may have had something to do with the film’s failure to win an Oscar nomination. But in the fall, broadcast anchors and national columnists were using the documentary as a crash course.
The Los Angeles Times could not have chosen a better climate in which to launch its investigative project, “Grading the Teachers.” On August 14, 2010, the paper ran its first stories, and announced it would soon publish its own value-added ratings for 6,000 local elementary school teachers. Names included.
The idea had taken root in 2009, when education reporter Jason Song wrapped up a series called “Failure Gets a Pass,” about teacher discipline. Investigative reporter Jason Felch then joined Song to look further into evaluations. Frustrated by the district’s lack of hard numbers, the reporters and editors decided to calculate their own. The paper hired a respected economist from the Rand Corporation, Richard Buddin, who created a teacher-performance analysis using students’ third through fifth grade state math and reading exams. Bold and gutsy, no question. No major news outlet had ever attempted to develop its own job performance system for individual public employees, let alone for something as nuanced as what teachers do. Times reporters and editors had thought through some of the ramifications. Before publication, the paper set up a website open only to the 6,000 teachers, so they could log in early and post comments if they wished. (About a third of the teachers did so.) Sidebars included an airing of the data’s shortcomings and the newspaper’s methodology.
Caveats aside, the sum of the series is a strong endorsement of the value-added model—inevitable, perhaps, because it used the paper’s own. “By the time we were done with the reporting,” said Felch, “we found this was a very, very valuable statistic.” It was certainly popular. The Times’s teacher-rating site has attracted 1.8 million hits since it was launched; each of its page-one stories ranks among the most read of the year.
Here’s a simplified look at how value-added models work: analysts estimate how well a child is expected to score on reading and math tests this year by looking at her past results. The difference between the estimate and this year’s actual score is attributed to the current teacher, for better or for worse. Each teacher’s effectiveness with multiple students over several years then is boiled down to one statistic—a percentile ranking that ranges from most to least effective compared to his or her peers.
Most teachers fall into the vast middle of the bell curve, where one score is virtually indistinguishable from another. Experts agree: the numbers are much more useful for the very top and bottom teachers. The father of value-added education stats himself, economist William Sanders, told NPR’s Morning Edition that he worries that parents using the data might jump to false conclusions about the teachers in the middle. Proponents note that value-added numbers factor out “outside influences” like poverty and parents’ education levels, because students are compared to themselves, not to one standard. The measures, then, are less likely to give teachers low ratings just because they teach disadvantaged children.
Still, their limitations are legion. First, only a fraction of a district’s teachers are included—only those who teach reading and math; there are no standardized tests for other subjects. No allowance is made for many “inside school” factors, such as the effect of team teaching, after-school tutors, substitute teachers, a child or a teacher who is absent for long periods of time, or an unstable school environment—a new principal, a violent incident, a district overhaul. And finally, critics ask: Since the number is based on manipulating one-day snapshot tests—the value of which is a matter of debate—what does it really measure?
Research experts weighed in from all directions in response to the Los Angeles Times project, some saying value-added rankings may be flawed, but they are better than nothing; others said the numbers are only reliable enough to evaluate schools, not teachers. In February, two University of Colorado, Boulder researchers caused a dustup when they called the Times’s data “demonstrably inadequate.” After running the same data through their own methodology, controlling for added factors such as school demographics, the researchers found about half the reading teachers’ scores changed. On the extreme ends, about 8 percent were bumped from ineffective to effective, and 12 percent bumped the other way. To the researchers, the added factors were reasonable, and the fact that they changed the results so dramatically demonstrated the fragility of the value-added method. Times editor Russ Stanton disputed the claim that the Colorado study discredited his paper’s effort, saying it “shows only that their analysis, using somewhat different data and assumptions than we used, produced results somewhat different than our own.” Thrusts and parries like these were not likely to help parents and readers make sense of it all.
But here is perhaps the most telling observation: nearly every economist who weighed in agreed that districts should not use these indicators to make high-stakes decisions, like whether to fire teachers or add bonuses to paychecks. The numbers, they said, can’t carry that kind of weight. By last summer, it should be noted, Michelle Rhee had already fired twenty-six DC teachers based in large part on low value-added scores. And New York City wants principals to use them immediately for tenure decisions.
The first installment of the Los Angeles Times’s series featured a photo of a fifth-grade math teacher named John Smith, chalk in hand, standing before his class of mostly Latino students. One child is raising his hand, expectantly. The caption told the story: “Over seven years, John Smith’s fifth graders have started out slightly ahead of those just down the hall but by year’s end have been far behind.”
Smith’s silver hair and stern expression became the poster image for the series. The sixty-three-year-old was meant to illustrate the potential of these numbers to expose some teachers as failures (and others as triumphs), offering parents knowledge they never before had. As for the schools, some questions remained unanswered. Did these calculations add to what was already known? Hard to tell; the principal had refused to talk. In the end, readers only really knew what the numbers revealed—Mr. Smith’s students slide 14 percentage points in math during the school year on average, compared to their peers across the city.
President Obama’s education secretary, Arne Duncan, weighed in the following day endorsing the newspaper’s move, later calling on all school districts to consider making such data public. Surprisingly, the Los Angeles Times’s own editorial page took a less hawkish tone, criticizing federal policies for pushing these numbers too far. Policy aside, has the series helped LA’s parents? Principals worried that hundreds of them might demand to transfer their children into the top-rated teachers’ classrooms. That didn’t happen. So far, said Felch, it has been mostly non-minority, middle-class parents, already engaged in the schools, who have contacted him with comments—some grateful, some skeptical. It seems that the stories have yet to reach many low-income parents. One reason for that may have to do with language. It took about a month for the Times to translate the series into Spanish, the language used by many parents of the district’s majority Hispanic population.
In New York, Schools Chancellor Joel Klein had anchored his eight years of public school overhaul on marketplace solutions that relied heavily on test scores and school report cards to drive curriculum and policy. His administration had spent a total of $3.6 million to collect teacher value-added data over three years. Last year his department had released the teacher ratings to The New York Times—with school names and teachers’ names blacked out. This year, before he left to join Rupert Murdoch’s News Corporation as a $4.5-million executive, Klein was prepared to release everything. The national mood had shifted.
On October 20, reporters from the Times, the New York Post, the New York Daily News, The Wall Street Journal, GothamSchools.org, NY1 television, and WNYC public radio found themselves in an awkward spot. Some were so angry at what looked like a blatant attempt by the city to use reporters in its fight with the UFT that they quietly threatened to quit if their editors insisted on publishing names. Others were torn between the power of the data to inform—who are we to second guess readers’ ability to process all this complexity, they asked—and their power to distort. On top of all the other distortions, the skeptics pointed out, the tests used to calculate these evaluations had been found to be flawed. The state had been forced to recalibrate the results because the tests had become too easy to pass. The next day, reporters took a collective breath. The union filed suit in New York State Supreme Court, claiming the rankings were riddled with errors that would unfairly harm teachers. “Just because it’s a number,” the union’s lawyer, Charles Moerdler, argued later, “doesn’t mean it’s suddenly objective.” Nothing would be released until the case was settled.
The delay allowed time for news organizations to compare notes. On Thursday evening, October 21, many of the reporters found themselves at a midtown Manhattan bar, sharing drinks with the same teachers union and Department of Education staff they had encountered in court earlier. The occasion was a farewell party for New York Times education reporter Jennifer Medina, who was moving to the paper’s Los Angeles bureau. A guest from the union parked his oversized protest poster—displaying the city’s confounding-looking mathematical formula for value-added numbers—against the bar. The debate from the courtroom spilled over into the festivities. School reps shrugged off complaints, reminding reporters it was they who had filed Freedom of Information Law (FOIL) requests for the data. Weren’t they in the business of printing information?
But the Department of Education had privately dropped hints to some reporters that their competitors had already submitted foils, some journalists countered. Suspicions had been raised when the department responded to the foils with uncharacteristic speed. Normally, such requests took months, with layers of negotiations, said Maura Walz, a reporter for GothamSchools.org, an independent online news service. This time, it was service with a smile. “The Department of Education wants this out,” said Ian Trontz, a New York Times metro editor. “They have a lot of faith in these reports. They believe they are trustworthy enough to educate and empower parents.”
Still, empowering parents had not seemed to be a top goal in the past for this administration. To the most skeptical reporters, it appeared as if the city was using them.
Public schools may be revered as engines of our democracy, but Americans have never agreed on who should govern them or what they should teach. In the 1950s, the Cold War stirred America’s anxiety that its schools were too soft to compete. In the 1960s, the civil rights movement elevated equal opportunity as the decade’s school reform banner. President Lyndon Johnson signed the groundbreaking Elementary and Secondary Education Act in 1965, which directed federal dollars into public schools where children lived in high concentrations of poverty. The federal Coleman Report issued the following year found that a child’s family economic status was the most telling predictor of school achievement. That stubborn fact remains discomfiting—but undisputed—among education researchers today.
By the 1980s, President Ronald Reagan, concerned about the growing economic power of Japan, commissioned the groundbreaking report, “A Nation at Risk,” which warned of a “rising tide of mediocrity that threatens our very future.” Reagan ushered in marketplace ideas intended to rattle government bureaucracy, such as choice, merit pay, and more testing.
By the 1990s, the era of standards and accountability was at full throttle. With it came a subtle change in language. Instead of “teaching children,” pundits talked about “raising test scores” and “closing the achievement gap.” James Crawford, president of the non-profit Institute for Language and Education Policy, traced the injection of the “achievement gap” into the national policy debate back to the presidential platform of George W. Bush, who seized an issue traditionally owned by Democrats. The idea of bringing the test scores of poor minority children on par with those of white children became the centerpiece of the No Child Left Behind Act in 2001. “Achievement gap is all about measurable ‘outputs’— standardized test scores—and not about equalizing resources, addressing poverty, combating segregation, or guaranteeing children an opportunity to learn,” Crawford wrote. “It shifts the entire burden of reform from legislators and policymakers to teachers and kids and schools.”
That brings us to the current decade, where the very term “education reform” has shifted definitions. Elements of choice, competition, test-driven curricula, and incentive-based pay had been in the hopper for decades. What is new is the vast wealth and power of the dominant voices pushing those things, and their sharp focus on publicly funded and privately run charter schools, which currently educate about 3 percent of the nation’s students.
At a recent Robin Hood Foundation gala, the Wall Street charity pulled in $88 million in one evening from its hedge fund participants, much of the money targeted for New York City charter schools. Democrats for Education Reform, a pro-charter political action committee with hedge-fund ties, has expanded its reach to ten states, including Rhode Island, Michigan, and Colorado. By far the most influential of all are the Big-Three venture philanthropies, The Bill & Melinda Gates Foundation, The Walton Family Foundation, and Eli Broad’s Broad Foundation, which often work in concert on issues like school choice and teacher effectiveness.
Stephanie Banchero, a longtime education reporter now at The Wall Street Journal, calls this group of people and organizations the “non-traditionals,” and welcomes their relatively combative presence on the beat. “It keeps us on our toes,” she says. Caroline Hendrie, executive director of the Education Writers Association, calls them an “alternative establishment,” noting their influence in raising the level of urgency and attention to public education. Diane Ravitch, the education historian, a former assistant secretary of education under George H. W. Bush, and a past advocate of school choice and accountability, calls them “bastions of unaccountable power.”
An important story in the Winter 2011 edition of Dissent magazine by Joanne Barkan detailed their influence—amplified by the media—over urban school policy. In it, she quotes conservative education expert Frederick Hess, the nation’s most vocal critic of the media’s “gentle treatment” of the foundations. In the 2005 book, With the Best of Intentions: How Philanthropy Is Reshaping K-12 Education, he describes a credulous press that treats philanthropies like royalty.
What draws these venture philanthropists and Wall Street financiers to urban school reform, and to top-flight charter schools like Uncommon Schools and the Knowledge is Power Program (KIPP) network? One is the businesslike way the schools in those systems are run. They value standardized curricula and measures, incentives, as well as a young, flexible, nonunion teaching force. As a group, these reformers tend to believe that America’s growing child-poverty rate and shrinking social services are used as excuses by educators. Results in schools like those in the KIPP network, they say, prove that poverty does not have to be an obstacle. They see themselves as warriors against the status quo, with leverage. “It’s the most important cause in the nation, obviously,” the manager of hedge fund T2 Partners, Whitney Tilson, told The New York Times in December 2009. “And with the state providing so much of the money, outside contributions are insanely well leveraged.”
Money managers and economists share a common philosophy. Douglas Harris, an economist from the University of Wisconsin, described it best in a January 2011 Education Week column:

Economists tend to think like well-meaning business people. They focus more on bottom-line results than processes and pedagogy, care more about preparing students for the workplace than the ballot box or art museum, and worry more about U.S. economic competitiveness. Economists also focus on the role financial incentives play in organizations, more so than the other myriad factors affecting human behavior. From this perspective, if we can get rid of ineffective teachers and provide financial incentives for the remainder to improve, then the students will have higher test scores yielding more productive workers and a more competitive U.S. economy.

The critics see this as a soulless vision of American education, in which children are filled up with facts, tested “until they beg for mercy,” as educator Theodore Sizer used to say, and moved into college ill-prepared to analyze problems and think creatively. They say the new reformers value statistics but ignore research—a recent Vanderbilt University study on merit pay that concluded that it does not work to raise student test scores, for example, or a 2009 Stanford University study that found that 83 percent of current charter schools were either worse than or equal to traditional public schools.
Seasoned educators with long track records of alternative ideas to inspire school leaders to get the most out of their schools and teachers tend to be off the media grid; the late Theodore Sizer’s Coalition of Essential Schools is one example. New York’s Performance Standards Consortium is another; the consortium includes dozens of urban public schools with high graduation rates that use sophisticated classroom-based teacher assessments and a curriculum that mirrors those in our best colleges.
“Somebody explain this to me,” wrote principal George Wood last summer on his blog for the Forum on Education and Democracy, a national coalition of educators and reformers. Wood has served as principal of Federal Hocking High School in Ohio’s Appalachian foothills for eighteen years.

In that time we have increased graduation and college going rates, engaged our students in more internships and college courses, created an advisory system that keeps tabs on all of our students, and developed the highest graduation standards in the state (including a Senior Project and Graduation Portfolio). But reading the popular press, and listening to the chatter from Washington, I have just found out that we are not part of the movement to ‘reform’ schools.

You see we did not do all the stuff that the new ‘reformers’ think is vital to improve our schools. We did not fire the staff, eliminate tenure, or pay based on test scores. We did not become a charter school. We did not take away control from a locally elected school board and give it to a mayor. We did not bring in a bunch of two-year short-term teachers.

Nope, we did not do any of these things. Because we knew they would not work.

By December, frustration was mounting among the New York reporters as they waited for the State Supreme Court judge to decide whether the teacher data should be released or not. Reporters described “a spirited debate” that erupted during an off-the-record pizza and wine farewell party for outgoing Chancellor Klein before Christmas. Several in attendance said reporters bombarded him with pointed questions about the data, and Klein defended their release, for the sake of parents’ right to know.
Meanwhile, some reporters produced stories that attempted to add context to the controversy over the data. WNYC ran a story that examined what school districts in Denver, the District of Columbia, and Tennessee were doing with their value-added reports. Meredith Kolodner at the Daily News found a Manhattan middle school teacher who received a “zero” rating for her performance as an English teacher. The problem? Pamela Flanagan had never taught English, only math and science. Sharon Otterman of the Times wrote a thoughtful piece that dug into some of the research. She reported on a 2010 Mathematica Policy Research institute study that warned the city’s error rate was probably very large. That’s because the Department of Education was using only four years’ worth of students’ tests to analyze each teacher (Los Angeles used seven years’ worth). The study found that with only three years of data, the results were wrong 25 percent of the time. Parents and community members remained off the radar, however. In New York, 5,000 parents sent protest letters to the Department of Education in December opposing the release of the teacher-data reports. “We believe there must be meaningful teacher evaluations in our children’s schools,” said Martha Foote, a Brooklyn PS 321 parent, “but humiliating teachers with unreliable information will only hurt them.” Their letters did not make the news.
In November, reporters got another surprise. Mayor Michael Bloomberg announced he would replace Klein with Cathie Black, a Hearst magazine executive who had neither government service nor education experience. The New York Times went on a rare offensive against the mayor’s choice. Reporters continued to wait for the teachers union’s case to be resolved, but by this time, the Times, the Daily News, the New York Post, and WNYC had all decided to print the data when it does arrive—names included. (The Wall Street Journal refused to disclose its plans.)
On January 10, State Supreme Court Judge Cynthia Kern ruled in favor of the city and the news organizations, saying that “there is no requirement that data be reliable for it to be disclosed.” The union quickly filed an appeal. And at press time the data was stalled in court, again.
But as they waited, the news outlets were constructing databases to collect and report the numbers, which will be searchable by teacher’s name, by school, and by district. WNYC, for example, is building an interactive tool that will try to provide context for individual teachers and caveats for the wide swath of statistical errors.
It was probably inevitable. Journalists by instinct and trade are usually in the role of arguing for full disclosure of public information, fending off cautious government arguments for moderation and restraint, not the other way around. That instinct, and the pressure of competition, eventually won out. After all, the information is public, some reporters noted, and the city is using it for tenure decisions and evaluations. “It’s in the public interest,” said Trontz of The New York Times. “If we find the data is so completely botched, or riddled with errors that it would be unfair to release it, then we would have to think very long and hard about releasing it.”
The only holdout so far appears to be GothamSchools.org. “We plan to run a message saying why we are opposed to using the names,” said Elizabeth Green, editor of the site and author of a forthcoming book, Building a Better Teacher. “I want to treat schools with as much dignity as we treat restaurants. We don’t just splash grades A through F about restaurants in the paper without explanation. We do individual stories. To be fair.”
Perfection of the data is not the point, argues Arthur Browne, editorial page editor of the Daily News. The numbers, he said, will be “a net positive in terms of adding to the conversation about quality of teachers.” But what about the quality of that conversation?
In New York City, schools coverage has been largely tethered to the corporate reformers’ agenda—mostly to a measuring tool for firing incompetents. Inadequate classroom teachers are without question a serious problem, as are the rules and systems that protect them. But it’s unwise to think that weeding out the weak will address other pressing challenges facing teachers and schools and students across the city—the huge dropout rate among a rapidly growing Hispanic population, for one example, or the absence of good preschools for the rising number of poor children, or state budget cuts that are gutting core services to schools, and on and on.
I don’t happen to know any education reporters who were drawn to this complex beat in order to pore over spreadsheets, or score an interview with Bill Gates as an education expert. Most pine for more time to spend in classrooms, in science projects with preschoolers, in rapt discussions with teachers or principals or parents. Most are inspired by education’s expansive connections to culture, science, politics, and the world of ideas. The best education reporters are skilled at the invaluable art of connecting the dots for readers between policy from on high and reality in the classroom. Yet education reporters have increasingly found themselves herded toward a narrow agenda that reflects the corporate-style views of the new reformers, pulling them farther and farther away from the rich and messy heart and soul of education.
In February came a new website called the “Media Bullpen,” which, unfortunately, has the potential to help ensure that the conversation about school improvement will continue to revolve around a predictable script. This new watchdog newsroom plans to rate dozens of education stories daily using baseball metaphors—from strikeouts to homeruns.
The site is run by the Center for Education Reform, a DC-based advocacy group dedicated for the last eighteen years to promoting charter schools, and funded by the Walton family and the Bradley family, among others, including a $275,000 grant from the ubiquitous Gates Foundation. Time will tell whether the Bullpen will use its influence to expand the democratic conversation about schools, or merely bully the press by trying to call all the pitches.
Early signs are discouraging. A job posting for managing editor said its ideal candidate would be a “passionate advocate for education reform.” And we know what that means.

Norm's Notes

Tuesday, February 14, 2012

Historical View of Release of Teacher Data

Tested

No comments:

Counter

About Me