The Atlanta Journal-Constitution
Suspicious test scores in roughly 200 school districts resemble those that
entangled Atlanta in the biggest cheating scandal in American history, an
investigation by The Atlanta Journal-Constitution shows.
Hyosub Shin, hshin@ajc.com
St. Louis: Patrick Henry Downtown Academy’s principal was placed on
leave last year for falsifying attendance records. Because attendance
rates are used to calculate state funding, it’s possible the alleged
fraud attracted state aid to the school that it didn’t deserve. Even
though the state has not found cheating at Henry, an AJC analysis
uncovered unusual scores dating back to 2007.
The newspaper analyzed test results for 69,000 public schools and found high
concentrations of suspect math or reading scores in school systems from
coast to coast. The findings represent an unprecedented examination of the
integrity of school testing.
The analysis doesn’t prove cheating. But it reveals that test scores in
hundreds of cities followed a pattern that, in Atlanta, indicated cheating
in multiple schools.
A tainted and largely unpoliced universe of untrustworthy test results
underlies bold changes in education policy, the findings show. The tougher
teacher evaluations many states are rolling out, for instance, place more
weight than ever on tests.
Perhaps more important, the analysis suggests a broad betrayal of
schoolchildren across the nation. As Atlanta learned after cheating was
uncovered in half its elementary and middle schools last year, falsified
test results deny struggling students access to extra help to which they are
entitled, and erode confidence in a vital public institution.
“These findings are concerning,” U.S. Secretary of Education Arne Duncan said
in an emailed statement after being briefed on the AJC’s analysis.
He added: “States, districts, schools and testing companies should have
sensible safeguards in place to ensure tests accurately reflect student
learning.”
In nine districts, scores careened so unpredictably that the odds of such
dramatic shifts occurring without an intervention such as tampering were
worse than one in 10 billion.
In Houston, for instance, test results for entire grades of students jumped
two, three or more times the amount expected in one year, the analysis
shows. When children moved to a new grade the next year, their scores
plummeted — a finding that suggests the gains were not due to learning.
Overall, 196 of the nation’s 3,125 largest school districts had enough suspect
tests that the odds of the results occurring by chance alone were worse than
one in 1,000.
For 33 of those districts, the odds were worse than one in a million.
A few of the districts already face accusations of cheating. But in most, no
one has challenged the scores in a broad, public way.
The newspaper’s analysis suggests that tens of thousands of children may have
been harmed by inflated scores that could have precluded tutoring or more
drastic administrative actions.
The analysis shows that in 2010 alone, the grade-wide reading scores of 24,618
children nationwide — enough to populate a midsized school district — swung
so improbably that the odds of it happening by chance were less than one in
10,000.
Cheating is one of few plausible explanations for why scores would change so
dramatically for so many students in a district, said James Wollack, a
University of Wisconsin-Madison expert in testing and cheating who reviewed
the newspaper’s analysis.
“I can say with some confidence,” he said, “cheating is something you should
be looking at.”
Statistical checks for extreme changes in scores are like medical tests, said
Gary Phillips, a vice president and chief scientist for the large nonprofit
American Institutes for Research, who advised the AJC on its methodology.
“This is a broad screening,” he said. “If you find something, you’re supposed
to go to the doctor and follow up with a more detailed diagnostic process.”
The findings come as government officials, reeling from recent scandals, are
beginning to acknowledge that a troubling amount of score manipulation
occurs. Though the federal government requires the tests, it has not
mandated screening scores for anomalies or investigating those that turn up.
Daria Hall, director of k-12 policy with the nonprofit The Education Trust,
said education officials should take steps to ensure the validity of test
results because of the critical role they play in policy and practice.
“If we are going to make important decisions based on test results — and we
ought to be doing that — we have to make important decisions about how we
are going to ensure their trustworthiness,” she said. “That means districts
and states taking ownership of the test security issue in a way that they
haven’t to date.”
‘Way too much pressure’
Both critics and supporters of testing said the newspaper’s findings are
further evidence that in the frenzy to raise scores, the nation failed to
pay enough attention to what was driving the gains.
“We are putting way too much pressure on people to raise scores at a very
large clip without holding them accountable for how they are doing it,” said
Daniel Koretz, a Harvard Graduate School of Education testing expert.
Test-score pressure is palpable in schools grappling with urban blight and
poverty.
These are the schools that the 2001 No Child Left Behind Act was supposed to
fix.
But at Patrick Henry Downtown Academy in St. Louis, airy red brick towers
rising above the school belie a grimmer reality on the ground. Children
leaving one recent afternoon passed piles of trash and a .45 caliber bullet
tucked into the curb. Inside, their classrooms are beset by mold, rats,
discipline problems and scandal.
Last year, the former principal — once hailed as among the district’s
strongest — was accused by Missouri officials of falsifying attendance rolls
to get more state money.
State investigators didn’t publicly question Henry’s test scores.
But the AJC’s analysis found suspicious scores in the school dating back to
2007. In 2010, for instance, about 42 percent of fourth-graders passed the
state math test. When the class took the tests as fifth-graders the next
year — with state investigators looking into cheating and other fraud
allegations — just 4 percent passed math.
Experts say student learning doesn’t typically jump backwards.
Henry’s scores were consistently among the lowest in the state — except for
the occasional sudden leap.
After school one recent afternoon, Deborah Dodson, who sends two children to
the school, said she saw a teacher provide inappropriate one-on-one
assistance during a state test. And she’s heard from other parents that
teachers will give students answers.
Some students who aren’t likely to test well don’t receive tests at all, she
said. “They don’t do anything by the book,” Dodson said. “That school and
how they do things is not right.”
Rural, city schools flagged
The AJC used freedom of information laws to collect test scores from 50 states
to look for the sort of patterns that signaled cheating in Atlanta. A
Georgia investigation last year found at least 178 Atlanta educators —
principals, teachers and other staff — took part in widespread
test-tampering.
In each state, the newspaper used statistics to identify unusual score jumps
and drops on state math and reading tests by grade and school. Declines can
signal cheating the previous year. The calculations also sought to rule out
other factors that can lead to big score shifts, such as small classes and
dramatic changes in class size.
Some school leaders accused of cheating have attributed steep gains to
exemplary teaching. But experts said instruction isn’t likely to move scores
to the degree seen in the AJC’s analysis.
Through teaching alone, Wollack said, “it’s going to be pretty tough to have
that sort of an impact.”
The AJC developed a statistical method to identify school systems with far
more unusual tests than expected, which could signal endemic cheating such
as that which occurred in Atlanta. The newspaper’s score analysis used
conservative measures that highlighted extremes and were likely to miss many
instances of cheating.
Big-to-medium-sized cities and rural districts harbored the highest
concentrations of suspect tests. No Child Left Behind may help explain why.
The law forced districts to contend with the scores of poor and minority
students in an unprecedented way, judging schools by the performance of such
“subgroups” as well as by overall achievement.
Hence, high-poverty schools faced some of the most relentless pressure of the
kind critics say increases cheating.
Improbable scores were twice as likely to appear in charter schools as regular
schools. Charters, which receive public money, can face intense pressure as
supposed laboratories of innovation that, in theory, live or die by their
academic performance.
Common problems unite the big-city districts with the most prevalent
suspicious scores: Many faced state takeovers if scores didn’t improve
quickly. Teachers’ pay or even their continued employment sometimes depended
on test performance. And their students — mostly poor, mostly minority —
were among those needing the most help.
The analysis, for instance, flagged more than one in six tests in St. Louis
some years. In Detroit, it was one in seven.
Dozens of school systems in mid-sized cities — such as Gary, Ind., East St.
Louis, Ill., and Mobile, Ala. — exhibited high concentrations of suspicious
tests, too.
Though high-poverty city schools were more likely to have suspicious tests,
improbable scores also showed up in an exclusive public school for the
gifted on the Upper West Side of Manhattan. And they appeared in a rural
district roughly 70 miles south of Chicago with one school, dirt roads and a
women’s prison.
The findings call into question the approach that dominated federal education
policy over the past decade: Set a continuously rising bar and leave schools
and districts essentially alone to figure out how to surmount it — or face
penalties.
“If you want to keep your job, keep your school out of the news, keep winning
awards and advance in your career, you need to make your school look
better,” said Joseph Hawkins, a former testing official with the Montgomery
County, Md., school system.
Koretz, the Harvard expert, said cheating is one extreme on a continuum that,
at its other end, includes gaming the test in legal ways — such as through
test-prep drills — that don’t significantly increase students’ overall
knowledge or skills.
Even as state test scores have soared, students’ performance on national and
international exams has been more mediocre. Cheating and gaming may help
explain why.
“The big picture is: Are we seeing apparent gains in student achievement that
are bogus?” Koretz asked.
Decade of tumult
Test scores show that instead of progressing steadily in their academics,
districts have endured a decade of tumult.
In some of the nation’s biggest cities, dynamic district leaders preached
“data-driven” decision-making and even linked test scores to bonuses or
principal hiring and firing decisions. Many boasted of taking a corporate
approach to education, focusing on student test achievement as the single
most important measure of success.
Some of the most persistently suspicious test scores nationwide, however,
occurred in districts renowned for cutting-edge reforms.
In Atlanta, for instance, former Superintendent Beverly Hall won national
recognition as Superintendent of the Year in 2009. State investigators later
confirmed scores that year were widely manipulated by educators who assisted
students improperly and outright changed tens of thousands of their answers
on state tests.
In some Atlanta schools, cheating was an open secret for years. After students
turned in their tests, teachers and administrators erased and corrected
their mistakes — even holding a “changing party” at a teacher’s home. In
another school, staff opened plastic wrap securing test booklets with a
razor, then melted the wrap shut again after making forbidden copies.
State investigators accused a total of 38 principals with participating in
test-tampering. One allegedly wore gloves while erasing to avoid leaving
fingerprints.
Ultimately, the cheating supported a massive effort to bolster the Atlanta
superintendent’s image as a tough reformer who had turned around a
struggling system.
In 2002, Houston was the first winner of the Broad Prize, which has become the
most coveted award in urban education. The Eli and Edythe Broad Foundation
praised Houston’s intense focus on test results. More recently, Houston has
been among the leaders in tying teacher pay to student test scores.
But twice in the past seven years, the AJC found, Houston exhibited
fluctuations with virtually no chance of occurring except through tampering.
In 2005, scores fell precipitously in five dozen classes in 38 schools after a
statistical analysis by the Dallas Morning News suggested test-tampering in
Houston. The district fired teachers and principals and improved test
security.
In 2011, however, as three-fourths of Houston teachers earned
performance-based bonuses, scores rose improbably in a similar number of
classes in the same number of schools. In the same year, Houston confirmed
nine cheating allegations and fired or took other action against 21
employees.
Through Jason Spencer, a spokesman for the district, Houston officials
questioned whether cheating caused all of the unusual score changes the AJC
found. He said the district doesn’t think its pay-for-performance plan has
made cheating more likely.
“We feel like we put a lot of safeguards in place,” he said, but added: “We
know it happens. We would never pretend it’s not an issue.”
Teachers and other school staff in Atlanta were eligible for mostly small
bonuses if scores hit district targets. Perhaps more worrisome for
principals were the penalties: Former Superintendent Hall boasted of
replacing about 90 percent of principals and told new hires they had three
years to deliver high scores. Her mantra: “no exceptions, no excuses.”
Three studies of merit-pay programs did not show they consistently produce
higher test scores, either legitimately or through cheating, said Matthew
Springer, director of the National Center on Performance Incentives at
Vanderbilt University.
Yet, he added that “it’s incredibly important that we systematically monitor
these programs for opportunistic gaming of the system.”
Pushback from officials
Some school districts and states have taken an apathetic, if not defiant,
stance in the face of cheating accusations in recent years.
The AJC sent detailed findings to districts with some of the most suspicious
clusters of scores. For those not already publicly looking at cheating, the
responses were similar: Officials said they were unaware of most anomalies,
but protested characterizing the score changes as cheating.
Several local and state school officials objected to conducting the analysis
at all, saying it doesn’t consider enough variables.
Some districts simply denied any problems exist. Detroit, for instance,
claimed its scores were not “unusual or out of line in any way” and that
Michigan officials had not identified irregularities “with respect to an
erasure analysis, suspected cheating, or any other issue.”
In fact, Michigan’s education agency identified six Detroit schools as having
statistically unlikely gains on a state test in 2009. At one school, the
state determined, sixth-graders averaged 7.4 wrong-to-right erasures. Their
peers statewide averaged fewer than one such change.
Analyzing Detroit’s scores from 2008 and 2009, the AJC found suspicious swings
in 14 percent of classes. The statistical probability: zero.
Regardless, Detroit officials offered an explanation that experts have said is
among the least likely: better teaching.
Steven Wasko, an assistant superintendent in Detroit, said the district has
offered before- and after-school programs, expanded summer school, and added
extra reading and math instruction. “Increases in student performance,”
Wasko said in an email, “could be attributed in part to these factors.”
In a statement, St. Louis school district officials acknowledged the
strangeness of score changes, but disagreed that cheating was to blame. They
said neither the district nor state education officials have any “credible
evidence that testing improprieties have occurred at the schools in
question.”
Officials acknowledged, however, that the district has a cheating
investigation open at one school. The state said that since 2010 it has
received allegations of cheating at two other St. Louis schools identified
as suspicious by the AJC analysis. Accusations of cheating persist.
State officials say they do not screen test scores for possible cheating and
do not consider unusually high gains to be a sign of test-tampering — if
schools provide an explanation.
“We hope to see great gains in our proficiency levels,” said Michele Clark, a
spokeswoman for the Missouri education department.
Dallas officials said that when irregularities surfaced several years ago,
they instituted new test security measures and started screening for
anomalies.
Few big-city districts have attacked cheating as aggressively as Baltimore.
After he became the district’s chief executive in 2007, Andrés Alonso heard a
whistle-blower complain at a PTA meeting about the district’s lax
investigation into cheating allegations at her school.
With accused educators sitting nearby, Alonso recalled recently, the room
became “a deafening vacuum.”
Alonso ordered a new investigation, which spread into 15 other schools. The
district posted independent monitors in each school during tests. In the
suspected schools, scores fell dramatically. In other schools, scores
continued to rise.
Alonso asked state officials to check test papers for illicit erasures and
changes. Their analysis confirmed his suspicions.
At Fort Worthington Elementary, for instance, as many as 20 mistakes were
corrected on some students’ tests, often in a lighter shade of pencil.
All of Fort Worthington’s classes posted improbable gains in 2008, the AJC’s
analysis shows. The performance level held for two more years, when the
school faced the threat of state takeover. After the cheating was detected,
statistically unlikely score drops multiplied, occurring in three-quarters
of the school’s classes. Similar patterns show up across the district.
Sitting outside the school in her aging station wagon one late winter day,
Vernetta Jones-Marshall said Fort Worthington is doing the best it can.
“I don’t even know if it was really a true statement,” Jones-Marshall, 57,
said of the cheating allegations as she waited to pick up her son, a
fifth-grader. “We didn’t make a big deal about it.”
Cheating is a big deal to Alonso, however.
Most educators act with integrity, he said, but others “feel a sense of
impunity” because school officials haven’t always held cheaters accountable.
“I was doing this before the Atlanta story broke,” he said. “This was me
feeling that nothing mattered more than the integrity of the school system.”
Call for vigilance
Leaders need to maintain that tough stance even after cheating disappears from
the headlines, experts say.
In Dallas, for instance, the score analysis shows the number of suspicious
gains dropped after cheating allegations surfaced in late 2004 — but then
began inching up again a few years later.
For years, Los Angeles’ scores were among the least suspicious for big-city
districts. But when California stopped conducting routine erasure analysis
in 2008 for budget reasons, the number of improbable score changes in L.A.
climbed steeply.
States and districts find little advice when they do decide to conduct erasure
or statistical screenings of test scores.
Federal education officials and testing experts have begun working on new
recommendations for detecting and investigating test-score anomalies.
Wollack, the Wisconsin testing expert, said there is room to improve. “Some of
the investigations that have taken place in the past have been less than
thorough, have been less exhaustive than they should have been,” he said.
“Cheating went undetected as a result.”
Districts don’t have a big incentive to unearth ugly truths about their own
testing programs. What’s more, most screening methods miss instances of
cheating by setting high thresholds in an effort not to falsely identify
innocent schools.
“It’s clear there are schools, there are districts, that are under that
threshold that are still engaged in some level of misconduct,” Wollack said.
Critics of testing have complained for years that increased pressure brought
on by accountability measures leads to more testing abuses.
Education historian and New York University Professor Diane Ravitch said the
incessant focus on testing has eroded the quality of instruction.
“All of this is predictable,” said Ravitch, a former top U.S. Department of
Education official who in recent years reversed her support for testing and
tough accountability measures. “We’re warping the education system in order
to meet artificial targets.”
Through programs such as Race to the Top, federal education officials have
pushed states to adopt more aggressive teacher evaluation systems that,
typically, consider test scores.
“Whatever the stakes were under No Child Left Behind,” Ravitch said, “they are
going to be much higher, now that teachers are being told your scores are
going to be public and you’re going to be fired if they don’t go up X number
of years in a row.”
But Daria Hall, of the Education Trust, said most educators don’t cheat, and
testing data is essential for determining if students have basic skills —
such as the ability to read.
“What parent doesn’t want to know how their child is doing in reading and in
math? What teacher doesn’t want to know how their student is doing?” she
said. “You can’t take away the source of the information. We have to make
the information better.”
Crisis of confidence
For parents, questions of academic integrity can lead to a crisis of
confidence.
The chronically low-performing Nashville district illustrates the conundrum.
Test scores in some of the district’s schools have alternately soared and
swooped to improbable degrees.
Nashville school officials said the data raises concerns about their
effectiveness as educators, but not cheating. They echoed other districts’
objections to the analysis, including their relatively high percentage of
students learning English and the number of students changing schools from
one year to the next.
In Hermitage, a working-class section east of downtown Nashville, Megan
McGowan said she was torn about whether to send her son to Dupont Tyler
Middle School.
Tests carry too much weight, she said, and teachers face tremendous pressure
to produce results. Still, she said, cheating is inexcusable. If it happened
at Dupont Tyler, she said, she’d think twice about sending her son there.
“I expect teachers to be ethical,” she said.