Learning First Alliance

Strengthening public schools for every child

Getting Teacher Evaluation Right

obriena's picture

This week, the American Education Research Association and National Academy of Education hosted Getting Teacher Evaluation Right: A Challenge for Policy Makers, which highlighted concerns of education researchers with using value-added modeling (VAM, a model that measures a teacher's contribution to student test scores) in teacher evaluations.

The consensus of the research community: Most believe VAM is not appropriate as a primary measure for evaluating individual teachers. The standardized test score data used in these models is just not reliable, given issues with the small sample size of classrooms, the nonrandom assignment of students to classrooms, and the fact that while a student might, for example, work on reading skills with a teacher, a parent, a tutor and a paraprofessional, the only one who gets credit (or blame) is the teacher.

Two studies were cited that I found particularly disturbing: One found that 27% of teachers who get an “A” rating one year on a VAM-based system get a “D” or “F” rating the next – and that 30% of “F” teachers get an “A” or “B” the next. Another found that these models predict the influence of a 5th grade teacher on their students 4th grade test scores – scores received prior to the teacher even meeting the students.*

Despite the concerns of the research community, districts all over the country are including VAM in teacher evaluations – and federal officials are encouraging it. As Howard Wainer points out in his new book Uneducated Guesses (not a part of this briefing):

Ironically, there is a marked contrast between the enthusiasm of those who accept the claims of VAM developers and would like to use it, and reservations expressed by those who have studied its technical merits. It appears, at least at the moment, that the more you know about VAM the less faith you have in the validity of inferences drawn from it.

Of course, those questioning the usefulness of VAM still recognize the inadequacy of many current teacher evaluation systems. Ideas for improving these systems include taking a standards-based approach to evaluation, using a peer review process, and including evidence of student learning as measured by indicators other than test scores.

New Haven's New Model for Evaluating Teachers

One evaluation model worth a closer look is New Haven’s, which was developed with a landmark teachers’ contract back in 2009. (At that time, LFA interviewed both New Haven Federation of Teachers President David Cicarella and district officials Garth Harries and William Clark about the contract and the process by which it was adopted).

The New Haven Independent recently celebrated this model for its effectiveness over its first full year. It rates teachers from 1 (needs improvement) to 5 (exemplary) [the previous system rated teachers simply as either satisfactory or unsatisfactory – a relatively common feature in evaluation systems] based on classroom observations and goals that they set for their students in meetings with supervisors. For teachers in subjects and grade levels subject to state tests, at least one goal is based on those tests.

Teachers are observed multiple times and alerted to potential problems. Those on their way to a “needs improvement” rating are given improvement plans and extra professional development. They hold a status conference with their supervisor to discuss progress and goals, and a series of visits from an outside validator confirms the initial rating checks out (it did in 87% of cases last year). In other words, they were given both concrete feedback on their performance and the support they needed to improve.

This past year, around 8% of the workforce was rated “exemplary,” 38% was considered “strong,” 28% “effective,” 9% “developing” and 3% “needs improvement.” Those with the highest ratings were offered the chance to lead “professional learning communities,” receiving a stipend paid for by a private grant. 

Thanks at least in part to these evaluations, 34 teachers (16 tenured and 18 non-tenured; 1.3% and 2.8% of those workforces respectively) left the workforce.** These departures were peaceful – as Harries pointed out, all had the right to request a termination hearing if they felt wronged, and none did.  

All stakeholders seem pleased with these evaluations. Superintendent Reggie Mayo calls the system “a model for the nation.” Mayor John DeStefano touts it as a successful component of the school reform strategy. Union President Dave Cicarella was quoted as saying “Teachers are much happier because everyone knows what’s expected of them.” And a survey of principals last May found they were pleased with the new system as well, especially because they had the change to help teachers improve in the classroom by giving specific feedback.

To be sure, there are still a number of challenges in New Haven. There are concerns about whether administrators have the capacity required to complete the paperwork, conferences and observations required under this system. And the Wall Street Journal recently pointed out that some argue that poorly performing teachers leave “turnaround” schools to go to other jobs in the system. In addition, test scores haven’t risen as rapidly as one would hope (though the reform efforts in the district are only a couple years old).

Still, it is a teacher evaluation system that is transparent, accepted by stakeholders and contributing to the development of the workforce. Perhaps more districts should consider similar models.


*These findings were cited at the briefing. I have not carefully reviewed the original studies.

**The district retained some teachers who rated “needs improvement” because they didn’t receive the support they needed to improve.