Skip to content
September 6, 2011 / compassioninpolitics

Additional Criticism of Value Added Testing for Teacher Evaluation

Kevin G. Welner and Carol C. Burris point out in their letter to Secretary Arne Duncan which cites research based justifications (published in the Washington Post and at the National Education Policy Center):

Yet the DC IMPACT program showed a relationship of only 0.34 between teacher value-added scores and the scores from evaluations
(primarily observational) linked the district’s Teaching and Learning Framework observation scores. This ―modest correlation‖ concern was raised in an evaluative report of IMPACT
published by the Aspen Institute.

….it raises red flags about the reliability and validity of one or both.Indeed, this is not the first time a lack of a strong relationship was found. A prominent peer-reviewed article published a few months ago found that teachers with ineffective teaching skills nevertheless might have strong VAM scores, especially if they taught high-achieving students. 8

As a practical matter, this means that some teachers will receive bonuses when they should not,others will not receive bonuses when they should, and still others might be unfairly dismissed to the detriment of students as well as the teachers themselves. Further, because higher growth scores are correlated to students who enter the class with higher achievement, this system creates a disincentive to teach those with greater disadvantages.

In terms of alternative evaluations they propose:

one of the most long-standing and promising teacher evaluation approaches relies on peer assistance and review (PAR) programs,such as those in Toledo, Ohio and Montgomery County Public Schools in Maryland. We note with alarm the likelihood that current policies are not just failing to promote such programs with apparently successful track records the new wave of evaluation policies are actually having the effect of discouraging and terminating these successes.

Hill, H. C., Kapitula, L., and Umland, K.A (2011). Validity Argument Approach to Evaluating Teacher Value-AddedScores.
American Educational Research Journal, 48 (3), 794-831


Value-added models have become popular in research and pay-for-performance plans. While scholars have focused attention on some aspects of their validity (e.g., scoring procedures), others have received less scrutiny. This article focuses on the extent to which value-added scores correspond to other indicators of teacher and teaching quality. The authors compared 24 middle school mathematics teachers’ value-added scores, derived from a large (N = 222) district data set, to survey- and observation-based indicators of teacher quality, instruction, and student characteristics. This analysis found teachers’ value-added scores correlated not only with their mathematical knowledge and quality of instruction but also with the population of students they teach. Case studies illustrate problems that might arise in using value-added scores in pay-for-performance plans.

For instance in the above study:

Perhaps most importantly the researchers found that a large percentage of math teachers studied had very low ‘quality of teaching’ ratings, but very high VAM estimates. In this case, low quality meant operationally that examination of the teachers’ instruction revealed “very high rates of mathematical errors and/or disorganized presentations of mathematical content.”

Case studies of these teachers explained a lot of these “false positive” VAM results — results that could make such teachers eligible for significant performance bonuses in most merit pay plans (and, not insignificantly, send the message that their teaching practice was exemplary).



Leave a Comment
  1. compassioninpolitics / Sep 7 2011 12:10 am

    The Carnegie report suggest a rosier outlook, but does note some major, major potential drawbacks to the system:

    These ardent advocates are overstating the advantages, critics assert. The critics, including the Rand report’s authors, say that the percentage of error in value-added computation can be rather high, especially when compiled over three years or less and when the pool of scores is not large enough. (These errors apparently lessen when bigger pools of scores and longer periods are used in computations.) Therefore, say skeptics, it would be harmful to base teachers’ livelihoods on what may be unreliable data. Doing so could result in low morale and in the end, be not much better than the old ways. (In fact, even the most enthusiastic advocates say that value-added assessment should be only one of several factors affecting teachers’ raises and promotions.)
    Sanders admits there is danger that necessarily complicated statistical methods will be over-simplified. Less expensive techniques may be substituted that will sacrifice accuracy.

    They also point to the lack of transparency in the model. Ironic, given that it requires more transparency on the part of administrators and teachers.

    They continue:

    It will not provide the same objective data on the quality of teachers, schools and districts. As Weil notes, “That doesn’t reflect how individuals do. It’s measuring this year’s apples with last year’s oranges.”


    Dale Ballou of Vanderbilt concurs. He thinks that value-added modeling in general—Sanders’ methods in particular, which he has studied—may inaccurately credit or blame teachers. As he explains, the value-added model dictates that if a particular student “does any better than you’d expect him to do based on his averages, you attribute that to the teacher. If he did worse, you also attribute it to the teacher, as well.” But other factors could cause a drop in a student’s achievement scores and that uncertainty, says Ballou, is value-added’s “Achilles heel.” He says, “This is the kind of thing teachers are worried about. What if they just get a class that’s going to be really tough to teach? There’s a lot of ‘luck of the draw’ in what kind of class a teacher is given.”
    The likelihood of value-added assessment errors greatly increase with more complex curricula in higher grades, adds Daniel Koretz of Harvard. Furthermore, he points out, teachers who are effective in one classroom situation may not be in a different one.
    While a number of noted experts agree that value-added modeling is most effective in pinpointing the most and least effective teachers, others suggest that “there’s a lot of noise in the middle.” Nevertheless, the Rand report agreed with the main conclusions of value-added research: that teachers do have an effect on student performance and that several good—or bad—teachers are likely to make a significant difference. The report concluded, however, that the difference cannot be accurately quantified using the data and methods currently available.
    Koretz, one of the report’s co-authors, believes that valued-added modeling can provide “valuable clues” and “vivid descriptions of what kids are learning.” But on the whole, he feels, it should be taken with the proverbial grain of salt.
    According to Stephen W. Raudenbush, professor of sociology and chair of the Committee on Education at the University of Chicago, the best use of value-added modeling at present is in evaluating schools and districts, not teachers. At the teacher level, the statistical errors will be more pronounced, he believes, and there are simply too many variables.

    Pamela Carters research points to compassion of teachers as key (which may be diminished or ignored by valued added model that focuses on simple test):

    Pamala Carter, a former Hamilton County, Tenn, teacher, now a doctoral candidate at the University of North Carolina in Chapel Hill, is doing research into teaching techniques. Using value-added data provided by Sanders, she is videotaping and studying the methods of teachers in Chattanooga/Hamilton County who have been identified as especially effective. However, it is easier to find these teachers than to figure out what they are doing. “Often they don’t know themselves,” Carter remarks. Her findings so far indicate that the best teachers are adaptable, very well organized and have high expectations for their students. They know their material so thoroughly that they can easily teach different levels of students. They are flexible enough to use different teaching strategies—to do what is needed to get the material across. And, says Carter, “They’re caring, compassionate, love children and love working with children.”

  2. compassioninpolitics / Sep 7 2011 12:23 am

    The recent proliferation of value-added modeling approaches provides an appealing alternative, but often at the cost of great statistical complexity and misguided causal inferences (Braun, 2005; Briggs & Wiley, 2008; Raudenbush, 2004; Rubin, Stuart & Zanatto, 2004)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: