Everything in the world is measured with error. This isn't the sexiest topic - so I'll try to keep it short - but it's important. I want to draw your attention to two different problems:
Repeated measurements of anything necessarily have some random variation. Step onto the scale daily for a month (assuming you're not preparing to hibernate for the winter, in which case your weight changes are not "random variation") and this will quickly become apparent. In terms of measuring teacher effects, this random variation is less of a problem if we are willing to assess teacher effectiveness across multiple years of data (i.e. my students' test scores in 2005, 2006, and 2007).
But I've heard a lot of plans kicked around that propose to reward some proportion of teachers based on one year of data; certainly, from an incentives perspective, this makes sense. We want to give you an incentive this year to push hard. But when Mrs. Scott is awarded merit pay in 2005, but not in 2006, and then she's a merit pay-worthy teacher again in 2007, this system doesn't have a lot of face validity for educators or the public.
2) A student's baseline is not necessarily a good control:
To be sure, value-added models are a tremendous improvement upon NCLB's proficiency system. (Note: value-added models with unrealistic proficiency targets aren't really an improvement - more on this next week.) Value-added approaches give us a more accurate portrayal of how a school or teacher is really doing. Here's a description of value-added Tennessee style from the Center for Greater Philadelphia at Penn:
Because individual students rather than cohorts are traced over time, each student serves as his or her own "baseline" or control, which removes virtually all of the influence of the unvarying characteristics of the student, such as race or socioeconomic factors.
Sounds about right. Right? The clearest example of why a student's baseline test score does not "remove virtually all the influence of unvarying characteristics of the student" is the following: Mrs. Jones' class enrolls only wealthy children, while Mrs. Scott's class enrolls only kids who qualify for free lunch. Tests are given in the spring of each school year, so we're going to measure May to May. We know that poor students have lower rates of learning in the summer compared to their more advantaged peers. But if we only take into account their initial score, not their socioeconomic status, we're going to come up with a biased estimate of teacher effectiveness. In this case, teachers who teach low-income kids look like poorer teachers simply because of summer learning loss.
Check out the powerpoint at the Center for Greater Philadelphia - link above. It's a great overview of value-added.