Teacher Evaluations Are Biased, Report Says

This image was removed due to legal reasons.

Schools have been placing more emphasis on teacher evaluations over the last decade, but a new report shows that the way those evaluations are done may need some serious revision.


The report, from the Brookings Institution, looks at in-classroom observations, a tool that teacher unions have generally seen as fairer than the alternative - judging teachers by how much their students improve on standardized tests. But according to the new report, the observations — usually done by a principal —- have an in-built bias: Teachers with high-performing students are rated four times higher than instructors with low-performing students, regardless of how good or bad the teacher actually is.

Why does this matter? If teachers get better evaluations for teaching higher performing students,they are incentivized to avoid teaching students who need more help, like English-language learners, since those students could bring a teacher’s evaluation down. What was meant as a tool to judge a teacher’s effectiveness instead ends up being a measure of how their students perform.


A bit of background

Teachers have traditionally been evaluated based on their credentials - whether they have a master’s degree, for instance - and how many years they’ve been in a classroom. In the last decade, there’s been a push for more stringent evaluations. Over the last six years, as the Obama administration has begun rewarding states for implementing teacher evaluation systems through Race to the Top initiatives and No Child Left Behind waivers, more than 40 states have developed plans to hold teachers accountable through teacher observations and other measures.

The “rapid transition” to meaningful teacher evaluations has been “breathtaking,” Russ Whitehurst, the report’s lead author, said on a video call with reporters. But it hasn’t been easy. Critics of the evaluations say they are error-prone and there’s a flurry of litigation over whether they are fair.

Even advocates acknowledge that they are a work in progress. Which, the report’s authors say, makes this the perfect time to figure out what are and are not the best ways to grade teachers. Just a handful of states have fully implemented teacher evaluations, so there is room to make changes and explore what works.


“There aren’t cookbook solutions for any of these design issues,” Whitehurst said.

Right now, there are a couple of key ways schools essentially grade teachers. One is to observe a teacher in the classroom and the other is to compare test scores for when a child started in a teacher’s classroom with his scores for when he finished and rate the improvement.


The latter measure is known as “value-added,” and it’s drawn some criticism. One reason is that some value-added scores are calculated based on the school as a whole and not each individual teacher, meaning bad teachers in good schools can be rewarded while good teachers in bad schools can be hurt. There’s been a movement in the last several years toward individual value-added scores, but they are by no means the norm. Even when the scores are individualized, the results can be harder to act on than feedback from a principal or other teacher.

With those drawbacks in mind, much emphasis has been placed on classroom observations. Teachers see them as a way to get individual feedback and make real changes, and they’re regarded as fair.


But according to the new report, classroom observations are anything but fair. They’re biased. If a teacher ends up in a poorly performing class one year and gets poor ratings and is moved to a high-performing class the next year, her ratings go up.

The finding could have powerful implications for schools that have held classroom observations as the paragon of teacher evaluations.


The main concern, Matthew Chingos, one of the report’s authors, said during the video call, is that instructors are “not going to want to teach certain groups of kids.”

How to counter bias

The report offers a solution: account for the fact that some kids are more challenging to educate than others.


Students who are English-language learners, who receive free or reduced-price lunches, and who come from certain racial or ethnic backgrounds test lower. If teacher scores account for those challenges, more instructors might be willing to tackle the challenge of teaching those students without the fear that their own ratings will slide.

It’s a controversial idea but one that the report’s authors think makes a lot of sense.


Sandi Jacobs, a vice president with the National Council on Teacher Quality who taught elementary school for nearly a decade in Brooklyn, called the report’s findings “terribly important,” but said she’s “not really sure” what reaction will be like from the education world.

There’s been a significant push at the federal level for teacher evaluations over the last decade or so, but there are some valid questions about how implementation might work, she said. Districts aren’t likely to have the resources to make the adjustments, but there’s a reluctance about handing more data than is necessary to states.


There is also always discomfort when someone proposes what can sound like different expectations for different students. To get support, the message would have to be clear that these adjustments should not be lower standards for lower-performing students, she said.

Dan Goldhaber, director of the Center for Education Data & Research and a professor who also participated in the video call, was more blunt.


“I think there will be pushback” to the idea of a demographics adjustment, he said, adding that while people want systems to be fair, the ultimate goal is to encourage teachers to improve. If you don’t have teacher support of an evaluation method, teachers are less likely to take its results seriously.

“The way teachers react,” he said, “is important.”

Randi Weingarten, president of the American Federation of Teachers, a national teachers union, said in an emailed statement to Fusion that:

“The Brookings study joins a growing list of credible research that demonstrate that [value-added modeling] is neither a reliable measure for teacher evaluation, nor is one test score an accurate indicator of student progress… If we follow the report's sound guidance and eliminate school value-added scores or diminish their importance, then we can get to the real purposes of public education—helping kids build critical thinking and problem solving skills, build relationships, have perseverance—and create joy in learning."


Emily DeRuy is a Washington, D.C.-based associate editor, covering education, reproductive rights, and inequality. A San Francisco native, she enjoys Giants baseball and misses Philz terribly.

Share This Story

Get our newsletter