http://www.cjr.org/behind_the_news/the_press_and_standardized_tes.php?page=all
The press and standardized testing numbers: a cautionary tale
Disks of never-before-released data from the Department of
Education landed with a befuddling thud in New York City’s newsrooms at
the end of February. The swarm of spreadsheets had promised to provide a
single ranking of 18,000 teachers (by name!) from zero to 99 based on
students’ standardized test scores.
A bonanza for education reporters, right? Time to celebrate? Well, not exactly; not for me, anyway.
My intrepid journalism students wondered why I didn’t seem to share
their enthusiasm for the data. Wasn’t I the same teacher who became
semi-deranged when they turned in stories without any quantitative
evidence? Think of the stories to be done, the fun graphics to design.
Here were not only reams of data, but hot data—from the center of a
national controversy over how teachers should be evaluated. Adding to
the buildup, the reports had been locked away for more than a year while
a city judge refereed a high-octane legal fracas between the teachers
union and the city over whether to release them. Nearly a dozen news
organizations had become either witting or unwitting pawns in this
dispute when they filed Freedom of Information requests for the data’s
release.
“Isn’t it our job to bring information into the light, and let the
public judge for themselves?” one student asked me. She had learned her
lessons well.
Last year,
I was certain what my killjoy answer
would be: Just because you have data doesn’t mean it is always right to
publish it—especially if you know the numbers are no good. And these
numbers do have huge problems. Everyone from economists, to educators,
to knowledgeable city education reporters know that the arcane
algorithms that generated the teacher-rating numbers are as
statistically flawed as they are politically fraught.
The complex formulas are meant to measure how much value a teacher
contributes to a student’s learning growth (or lack of growth) over
time. It would be useful if they actually did. But the data are riddled
with mistakes, useless sample sizes, flawed measuring tools, and
cavernous margins of error. The Department of Education says that a math
teacher’s ranking could be off by 35 percent; an English teacher’s by
53 percent. That means a reading teacher with a ho-hum 35 could either
be as horrid as a 1 or as awesome as an 86—take your pick. What election
survey with these kinds of gaping margins would be published in the
papers?
Most damning—and most often ignored in the coverage—is that the sole
basis for these ratings are old student tests that have since been
discredited by the New York State Board of Regents. The 2007-2010 scores
used for these teacher rankings were inflated, the Regents determined.
The Department of Education had lowered the pass score so far that the
tests had become far too easy. So not only were the algorithms suspect,
but the numbers fed into them were flawed. News organizations that
publish them next to teachers’ names run the risk of not only knowingly
misleading the public, but also of becoming entangled in the political
web surrounding teacher evaluations, which extends from the mayor’s
office, to the state house, to unions, philanthropy board rooms, and to
the White House.
And yet, nearly every city news organization went ahead and printed them anyway.
To my mind, all reasons not to publish still exist. They are still
true. But in the last month, I’ve come around to an opposite, perhaps
more cynical, conclusion about the virtues of making them public.
Publishing them, it seems to me, has had an odd, clarifying effect.
Releasing the data to public scrutiny, alongside context and caveats,
has exposed just how flawed they really are.
Apparently, the public has received that message. A
Quinnipiac poll
released in mid-March showed that 58 percent of the respondents
approved of releasing the teacher data reports, while at the same time
46 percent believed they were flawed. The more the public sees, the less
enamored they are (Go, Journalism!).
Perhaps that was what philanthropist Bill Gates feared in February
when he lectured the media days before the data were released. In a
February 23 op ed in
The New York Times, “Shame is not the
Solution,” Gates warned news organizations not to humiliate teachers by
publishing their names next to their value-added rankings.
What was up? Gates is usually bullish on the use of test scores to
evaluate teachers. Concern for their feelings had rarely been a top
concern. Yet, he argued, correctly, test scores were not “a sensitive
enough measure to gauge effective teaching” all by themselves.
Some chose to believe Gates had finally come to appreciate the
complexities and nuances of good teaching, something that cannot be
boiled down to a number based on students’ tests. But if that were the
case, he would have backed away from endorsing the value-added formulas.
Instead, he advocated only that they be kept from the public. It’s more
plausible to me that he is worried the public will turn against the
test-driven accountability agenda promoted by his foundation—which has
fueled policy for the last several decades—as these
not-ready-for-primetime rankings are scrutinized to death. Whatever the
case, it was an unexpected signal from one of the nation’s most
influential data-driven reformers.
Even more surprising was the detour taken by Arne Duncan, the US
Secretary of Education. Last year, Duncan applauded the Los Angeles
Times for being the first newspaper to print rankings—its own—next to
teachers’ names. He encouraged other news organizations to do the same.
This year, he recently
posed and answered his own question to Education Week’s Stephen Sawchuck, “Do you need to publish every single teacher’s rating in the paper? I don’t think you do.”
All of the above
Obviously, the mood in recent months had been dialed back from fever
pitch to tepid, possibly in recognition that teacher-bashing as a reform
strategy had seen its best days.
Last year, New York Mayor Michael Bloomberg and Joel Klein, then the
city’s Chancellor of Education, gave these numbers their high-five
support. Even after Klein left to head the education division of Rupert
Murdoch’s News Corp., his city department went so far as to encourage
local news organizations to make sure they FOILED for the teacher data
in a timely fashion—with names. And back then the department responded
to their requests with uncharacteristic speed.
At the New York
Daily News, Deputy Editor Arthur Browne said
he discussed releasing the reports with department officials several
times last year when he was editorial-page editor, in response to the
bold
Los Angeles Times move. “There wasn’t any resistance on
Tweed’s part to us getting them,” Browne said in a recent interview,
referring to Tweed Hall, city schools headquarters. Pressure mounted
among the city’s education reporters—who were nearly all opposed to
publishing the data. Some even threatened to quit over it. Just as the
drama began to boil over, the United Federation of Teachers filed a
lawsuit to stop the release.
That left Klein’s laid-back successor, Dennis Walcott, holding the
data bag more than a year later, after the union’s court appeals had run
their course. On February 24, Walcott finally released the reports, all
wrapped in caveats and finger wagging. “It would be irresponsible for
anyone to use this information to render judgments about individual
teachers,”
he wrote in a
Daily News
op ed that same day. He repeatedly reminded readers that the numbers
were two years old, and should never be used in isolation. “I’m deeply
concerned that some of our hardworking teachers might be denigrated in
the media based on this information. That would be inexcusable.”
Clearly, his heart wasn’t in it.
Walcott’s sheepish tone mystified news editors as their staff
scrambled to build apps and technical platforms to house the numbers.
“It was disarming,” said Mary Ann Giordano, editor of the
New York Times’s SchoolBook.org. “Walcott was stepping back from these numbers, blaming news organizations if they published names.”
Editors had to work fast to decide what to do. Publish the raw
spreadsheets? Take the numbers at face value and march out the best and
worst teachers, one by one? Write thoughtful critiques of all the
downsides of the data and publish them anyway, next to teachers’ names?
Or refuse to publish at all, for fear of misleading the public with
faulty figures and maligning teachers with bogus data?
Maybe it was the robust news climate, or all the tangled messages
from on high, but for New York’s media, the answer was: all of the
above.
Some outlets rose above the fray and refused to go near the reports
on principle. The local Riverdale Press and two citywide online news
services—InsideSchools.org and GothamSchools.org—all took the high road.
“No amount of context could justify attaching teachers’ names to the
statistics,”
wrote Elizabeth Green,
GothamSchools.org’s editor, in a column she had prepared a full year
earlier. By contrast, NY1, the local 24-hour cable news television
station, downloaded the entire Department of Education spreadsheet
collection, which included three years of scores and more than 100 data
points per teacher.
The
New York Post surprised no one by taking the most reckless
road of all, galloping through the numbers as if they represented
reality, scooping up names for its gallery of the “best” and the “worst”
teachers. Its editors and reporters did not bother to dwell on the
caveats and nuances, or even to include, at least at first, each score’s
margin of error. (It added the intervals later).
The
low point was on day two, when the
Post
ran a photo and story about Pascal Mauclair, the so-called city’s
“worst teacher,” thus handing the union its first real teacher data
report martyr.
The teachers union reported that Mauclair’s father opened his Queens apartment door the first day of the public release to find
Post
reporters telling him his daughter was the worst teacher in the city.
Next, reporters found their way to his daughter’s apartment. She called
police. Reporters turned to neighbors for comment. The
Post
story the next day identified Mauclair at the “bottom of the heap,”
amongst those who do “zero, zilch, zippo” for students.
The backstory of her score does more to undermine the validity of the stats than the
Post
had in mind. Its’ reporters might have spent their time digging into
the calculations behind her zero rating, by interviewing Mauclair’s
principal and colleagues at PS 11 in Queens, where she taught small
sixth grade classes of recent immigrants. Her students do not speak
English. It’s not uncommon for some to take the state exams after being
in her class for only a few months. The union says her score was based
on 11 students, only 7 of whom had enough data to compute a real
report—a meaningless sample size by any measure. Her fellow teachers,
parents of students, and her principal were nonplussed. “I would put my
own children in her class,” Principal Anna Efkarpides told Leo Casey of
the UFT. The Queens school is consistently one of the highest performing
schools among similar schools, and Mauclair is one of its top teachers.
“The truth is the truth.”
By contrast, the
Daily News managed to steer clear of its
rival’s instinct to tick off the 10 worst and 10 best. In many ways it
exhibited the most caution of all the city newspapers, by weeding out
all those teachers whose rankings were based on only one year’s worth of
classes.
Last year, the
News’s Arthur Browne told me that the data
was obviously not perfect, but that was no reason not to publish. This
year, he had apparently done more homework. “We were leery of naming
names if we couldn’t be invested in the accuracy of them,” said Browne,
who also edits the op ed page. “We screened out the biggest problems in
the database. We got the margins of error down into the zero range. We
are committed to publication of the data, with all the caveats. We
believe the public can make sound judgments.”
Still, there were some head-scratchers. The
News’s first-day headline was
a case in point:
“More Than a Dozen Teachers Earned Lowest Scores.” This was a
“Bridges-Help-People-Cross-Rivers” kind of headline. The rankings are
calculated on a curve, meaning there will always be dozens at the
bottom, dozens at the top, wide swaths of fair-to-middlin’ in between.
That’s the nature of a bell curve, another controversial aspect of this
calculation, which means the city will never be able to announce that
all its teachers are high-performing.
The Times goes both ways
This brings us to the puzzling experience of reading about the test data in
The New York Times. In partnership with WNYC public radio, the Times produced the city’s most sophisticated stories, and, next to
The Wall Street Journal,
the most polished graphics. Reporters took care to detail the data’s
myriad errors, political nuances, and to put them into context. A
careful reader could not help but come away believing the numbers were
anything but radioactive.
Then the
Times published every one of them anyway, with names.
Anna Phillips hammered out incisive blog after blog for
SchoolBook.org about mistakes teachers found in their reports, about the
DOE’s conflicting messages, about parents’ reactions. National
education columnist Michael Winerip found a top-ranked school with
bottom-ranked teachers to illustrate the numerical idiosyncrasies. In a
second column he was the first to argue that publishing the bad numbers
was the best thing that could happen to ultimately discredit them.
SchoolBook.org editors created a helpful 14-point FAQ column covering
nearly every base (except the inflated state tests). Teachers were
invited to contribute blogs for the site. One 20-year veteran teacher
wrote about being slapped with a 6th percentile ranking one year and
exonerated by a 96th the next, underscoring how pointless, and
demoralizing, they were.
And yet, there the rankings were, on display on its SchoolBook.org homepage, begging the question, why publish them at all?
Two reasons, explained Jodi Rudoren, the
Times’ education
editor at the time (she now heads the paper’s Jerusalem bureau): First,
“We’re in the business of disclosing information that’s in the public
interest,” she said. And second, “We do not operate in a vacuum,”
meaning if the
Times didn’t publish them, another news organization would anyway.
Some attention was paid to minimizing harm. The
Times
invited teachers to add comments next to their scores on a Google Doc,
for example. By last count, only 60 out of a possible 18,000 had
participated, most of them correcting their reports. Editors briefly
flirted with the idea of somehow fiddling with the search function so
that teachers’ rankings would not be the first thing that popped up when
anyone Googled their name. “We considered it, but it’s a weird business
to get involved in—suppressing searches,” said Rudoren. “It would be a
gesture blocking the tabloidization of this data. But in the end, we
felt it was not our role.”
The bottom line for
Times editors was the fact that the
Department of Education used this data to evaluate teachers, most
recently stalling tenure decisions for those trapped in the bottom.
Currently, the State Department of Education is generating new
value-added reports that will be used in the city and beyond to make
high-stakes decisions about teachers; and the legislature is embroiled
in tortured debates over whether to make the results partially or fully
public. “We thought it was important to provide parents with the same
information that the DOE was using to evaluate teachers, shedding light
on its decisions,” said Rudoren.
Maybe so, but are parents taking the data seriously? Principals in
both New York and Los Angeles worried that parents would arm themselves
with these numbers and storm their offices, causing chaos by demanding
to switch their children from low-scoring teachers to higher scoring
ones. If they are, the union representing New York City principals
hasn’t heard about it yet. And if Los Angeles is a bellwether, in the
two years since parents have been able to read their teachers’ scores in
the paper, there has been little organizing around them.
Clear as mud
Perhaps this data dump was the best thing to happen to those who have
been trying to steer the national school conversation away from testing
and more testing, to thinking and learning how to learn. What should be
central in all these stories is the fundamental problem: What’s the
best way to evaluate teachers? How can authentic learning be measured?
Should standardized tests be used at all to do it? They are one-day
snapshots of how well one student answers a handful of basic, low-level
questions. At best they are crude instruments; at worst, they are
vulnerable to manipulation.
Sarah Wysocki’stale from the nation’s capital shines a cautionary
headlight on the real-life dangers that may lie ahead for districts that
put a lot of stock in these numbers. In DC, value-added numbers count
for a full half of a teacher’s evaluation, even more than in New York.
Bill Turque reported in
The Washington Post that the highly
regarded fifth-grade teacher received a stellar review by her principal
and peers, and was then fired because so many of her students didn’t
show any progress on their math and reading tests last year.
Wysocki offered a new twist on the data troubles. She pointed out to
Turque that about half her students arrived in her fifth grade class
with what she believed were inflated scores from their previous school—a
school that is now under investigation for test tampering. Any honest
teacher could never hope to improve on fraudulent scores. She appealed
her dismissal, lost, and moved away.
Back in my own journalism class, I decided to walk through some of
the teacher data online to see what we could learn. World Journalism
Preparatory High School, in Flushing Queens popped up on the screen, a
small, energetic 6th through 12th grade school that my students have
become very familiar with since it opened in 2006. The DOE gave it a “B”
grade this year on its controversial School Report Cards, and past
years’ numbers have shown steady improvement. Its’ graduation rate is
better than the average high school’s; its’ Regents English scores are
above average. But the teachers? According to the clumsy measures, they
ranked among the very worst in the city. Of course, it was hardly
possible that all the children were teaching themselves.
Principal Cynthia Schneider has learned to ignore these spreadsheets
over the last three years. “It’s not good data. It’s bad data, and we
know it,” she said. “We know what we’re doing here.” The school had to
send only eight kids to summer school last year to catch up, she said.
The high school is ranked number 41 in the city. Still, her students and
teachers suffered the indignity of a front-page article in the
hyperlocal Whitestone Times—a photo of the school, plus the poor teacher
rankings. Reporters did not call her for comment.
“I’m all about trying to get a handle on matching student achievement
to teacher effectiveness. That’s a good thing,” said Schneider. “But
that’s not what this does. At all.”
The data? Clear as mud. And now just about everybody in New York and beyond knows it.