My fellow blogger, John Thompson, just posted a piece that looks at the new Mathematica study on teacher transfer incentives, and he begins with this paragraph, to which I would like to respond:
More troubling is the Goldhaber's contention that VAM works well (whatever that means) in elementary schools, "at least in comparison to other ways of evaluating teachers." Which other ways is Dan alluding to? Where are the citations, or he pulling this claim out his, um, hat? You will find a whole universe of scholars out there who have a different claim about VAM at the elementary level, and unlike Dan at AEI, they all provide evidence to back up their claims.
One of the most egregious shortcomings of VAM for educational purposes is that 1 in 4 teachers is mislabeled when using three years of data. It would take ten years of data, in fact, to get the error rate down to 12 percent (Schochet & Chang, 2010). In my estimation and the 25 percent of teachers mislabeled each year, that error rate falls short of "working well."
The National Academy of Sciences (2009), along with the vast majority of other assessment experts, know that VAM is not ready for prime time, and that using VAM for high stakes purposes in education settings should be avoided entirely, notwithstanding what the gasbags may conclude in the sauna room over at AEI. And in case Dan hasn't noticed, teacher evaluation is pretty high stakes--outside the think tanks, that is.
As for Dan's claim that VAM does "well" in comparison to "other ways of evaluating teachers," I refer him to the initial research done by ag school statistician and VAM guru, Bill Sanders, in 1983. If Dan hasn't read that study, it is understandable, since it was never published. It was used, nonetheless, as the rationale for writing the Sanders VAM methodology into state statute a few years later.
Anyway, McLean and Sanders (1983) were happy to note back then that the teachers identified by their algorithm as high performing in that initial study were the same ones that school administrators had identified as high performing in their own evaluations: "The supervisors were in 100% agreement with the top ten teachers selected from each grade level and 90% in agreement with the ten that ranked at the bottom" (Horn & Wilburn, 2013, p. 74).
Given this fact, it remains as good a question today as it was 30 years "why such an extensive and expensive system may be necessary if local administrators could arrive at the same conclusions regarding teacher quality with equal veracity at much less pain and expense" (Horn & Wilburn, 2013, p. 74).
Maybe I’m naïve, but I found it hopeful when economist Dan Goldhaber told the conservative American Enterprise Institute that value-added models work at the elementary level, at least in comparison with other ways of evaluating teachers. But, he cited evidence that value-added might not work quite so well at the high school level. He concluded that less emphasis would have been placed on the value-added of individual teachers if research had focused on high schools rather than elementary schools. I am far from convinced that value-added evaluations make sense even in the early years, but I have to believe that most reformers will see the folly of test-driven evaluations in middle and high school, and that they will back off from the single silliest policy idea during this age of reform.First off, I don't think AEI would have invited Dan, who, in his spare time, is an adjunct scholar at AEI (no pool access?) to present his paper if he did not have some meaty bones to throw to the CorpEd hungry hounds over there. His paper is full of them, from speculation on potential efficiencies from more virtual learning (interesting term), to speculation on what different results could have occurred if researchers on meritless merit pay schemes had designed their research differently. If, if, if.
More troubling is the Goldhaber's contention that VAM works well (whatever that means) in elementary schools, "at least in comparison to other ways of evaluating teachers." Which other ways is Dan alluding to? Where are the citations, or he pulling this claim out his, um, hat? You will find a whole universe of scholars out there who have a different claim about VAM at the elementary level, and unlike Dan at AEI, they all provide evidence to back up their claims.
One of the most egregious shortcomings of VAM for educational purposes is that 1 in 4 teachers is mislabeled when using three years of data. It would take ten years of data, in fact, to get the error rate down to 12 percent (Schochet & Chang, 2010). In my estimation and the 25 percent of teachers mislabeled each year, that error rate falls short of "working well."
The National Academy of Sciences (2009), along with the vast majority of other assessment experts, know that VAM is not ready for prime time, and that using VAM for high stakes purposes in education settings should be avoided entirely, notwithstanding what the gasbags may conclude in the sauna room over at AEI. And in case Dan hasn't noticed, teacher evaluation is pretty high stakes--outside the think tanks, that is.
As for Dan's claim that VAM does "well" in comparison to "other ways of evaluating teachers," I refer him to the initial research done by ag school statistician and VAM guru, Bill Sanders, in 1983. If Dan hasn't read that study, it is understandable, since it was never published. It was used, nonetheless, as the rationale for writing the Sanders VAM methodology into state statute a few years later.
Anyway, McLean and Sanders (1983) were happy to note back then that the teachers identified by their algorithm as high performing in that initial study were the same ones that school administrators had identified as high performing in their own evaluations: "The supervisors were in 100% agreement with the top ten teachers selected from each grade level and 90% in agreement with the ten that ranked at the bottom" (Horn & Wilburn, 2013, p. 74).
Given this fact, it remains as good a question today as it was 30 years "why such an extensive and expensive system may be necessary if local administrators could arrive at the same conclusions regarding teacher quality with equal veracity at much less pain and expense" (Horn & Wilburn, 2013, p. 74).
Lastly, I do not see any reason to believe, as my friend John does, that "reformers will see the folly of test-driven evaluations." I think all the evidence points to the contrary, in fact.
I remember a conversation I had at Monmouth University in 2006 with Nel Noddings, who recoiled at my contention that corporate education reform was sacrificing students and teachers to achieve an ideological and economic agenda. "I just cannot believe that," she said.
Oh, well--read it and weep.
References
Horn, J., & Wilburn, D. (2013). The mismeasure of education. Charlotte, NC: Information Age Publishing.
McLean, R. A., & Sanders, W. L. (1983).
Objective component
of teacher evaluation:
A feasibility study. Unpublished manuscript.
National Academy of Sciences. (2009). Letter report
to the U.S. Department of Edu- cation on the Race to the Top fund. Washington, DC: National Academies of Sciences. Retrieved
from http://www.nap.edu/catalog.php?record_ id=12780
Schochet,
P. Z., & Chiang, H. S. (2010). Error rates in
measuring teacher and school performance based on student
test score gains (NCEE 2010-4004). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.
OK, maybe I'm naive.
ReplyDeleteSeriously, the question is whether value-added evals end with a bang or a whimper. Were I to place a bet, I'd predict that most systems will do what they do best - manufacture numbers so they don't have teacherless classroom. They settle enough scores and move on.The mendacity will worsen school cultures.
And, non-educators will blame us even more.
Once corporate charters and TFA have replaced public schools and teachers, we will no longer need tests.
DeleteThe key is when some behavior has statistical validity (thus "works") and we fail to see that by working that behavior does human damage. Thus the trap of trying to do VAM "right" whilst ignoring the damage it will/does do to children, teachers, learning, and teaching...
ReplyDeleteSanders, et al, set out from the beginning with the belief that corn, peas, and children are equivalent in terms of assessing growth and that statistical machinations could smooth the jags in the data. Which is can, but it cannot smooth the error rate in terms of farmer, er, teacher effect.
ReplyDeleteWhy won't someone call this what it is? Brainwashing!
ReplyDelete