Friday, December 06, 2013

Grasping at Straws and the Impending Value-Added Trainwreck

We are just beginning to implement value-added evaluations, but it is already becoming clear that a trainwreck is coming. Already, we are reading stories about good teachers being fired for failing to meet their test score growth targets. We are also reading about surrealistic cases, such as the New York Teacher of the Year who earned 60 of 60 points on the observation component of her evaluation, but only 6 of twenty points on its made-up quantitative component. Tom Kane’s “Presumed Averageness” makes it seem like he is already grasping at straws defending his contribution to the fiasco.

Kane posits a heart attack victim on the way to the hospital. Should he go to Hospital A, with a 75% mortality rate for heart attack patients or Hospital B, where the mortality rate is 20%? Of course, he is correct that the patient would not ask a social scientist friend to help decide in that particular case. Social scientists make evaluations based on the rules of classical hypothesis testing, which requires tests of statistical significance and they are slowed down by the burden of proof. The circumstances Kane cites in this metaphor do not apply in regard to school reform. And, as Kane knows, different medical institutions serve diverse populations, making it difficult to determine how much of their outcomes are determined by the variety of preexisting conditions they encounter.

An intellectually honest analogy would provide a better illustration of the value of social science methodology versus Kane’s utilitarianism. Of course, society would demand that health care reformers accept the burden of proof before closing hospital B and firing its doctors. Respected medical professionals, as opposed to disposable teachers, would make sure that data would be used in a more sophisticated manner before laws were passed requiring the mass firings of cardiologists. Civil rights activists would demand proof that data-driven hospital closures would not have a disparate impact in poor communities, leaving them without access to essential services. It shouldn’t be that hard to explain to reformers that this data-driven accountability would result in services being denied to unhealthier populations. Treating them would invite lower medical outcomes. This would likely produce an overall decline in well-being, with most harm being inflicted on poor people of color.

Kane then attacks the “hypothesis testing paradigm” (the scientific method?) as it applies to education. He then seems to deny that economists like him have the burden of proving that their preferred solutions would do more good than harm. He presents two ways of looking at teacher tenure. The first approach would be “only deny tenure when that presumption was beyond a reasonable doubt.” He calls it the “average until proven below average” formulation. Under this scenario, Kane concludes, only 1% of teachers would be denied tenure.

Kane then proposes the “better than an average novice” paradigm where “we would turn down any teacher with predicted effectiveness less than the average novice teacher.” By “turn down,” I presume, he means fire.

His characterization of the first scenario might not be absurd if we ignore the definition of tenure. Tenure, however, is due process. It is the teachers’ protection against arbitrary misuse of power. It is our buffer against the “Fire, Aim, Ready” nature of ill-conceived hypotheses that are repeatedly imposed on our powerless profession.

Moreover, he proposes a radical, technocratic solution when there are plenty of alternatives that make far more sense. Rather than create a grand and dangerous experiment, we could merely create a capacity where schools could fire bad teachers for being bad teachers. There is nothing hard about citing an employee’s bad behavior as proof he should be terminated. There is no need to social engineer a system for mass firings that is guaranteed to pollute educational values and incentivize teach-to-the-test malpractice (and create huge amounts of busy work making up bogus metrics for the majority of teachers of untested subjects.)

What would happen to inner city education if we rejected the rules of evidence, jumped the conclusion, and adopted Kane’s opinions about school improvement? He would “turn down” (dismiss) the bottom 25% of teachers and replace them with novices. He gives no evidence that we could find enough rookies to replace so many terminated teachers. Does Kane believe that urban education would be in such a sorry state if there was a surplus of talent willing to put up with the conditions that we face in the inner city?

Kane seems to assume that the bottom quarter of teachers, identified by his metrics, would actually be the least effective 25%. Actually, Kane’s value-added is systematically biased against high poverty schools, especially secondary schools with high percentages of English Language Learners, students on special education IEPs and low-income students. This means that a much higher percentage of inner city teachers would have to be replaced. Kane gives no evidence that this would be possible. On the contrary, his approach makes sense only if we assume that the imposition of collective punishment i.e. firing teachers because they chose to teach in schools where it is harder to raise test scores, would make the teaching profession more – not less – attractive.

Similarly, Kane would fire inner city teachers who are less effective than the average of all teachers. If he is serious, I would think Kane would propose an apples to apples approach and fire teachers who were less effective than novices in comparable schools. If he were to do so, however, Kane would have to learn far more about schools than he seems willing to consider. Would he expect neighborhood school teachers to increase test scores as much as teachers in No Excuses schools that drive out low-performing students who don’t meet their behavioral standards?

Kane argues that the hypothesis testing paradigm, or the retrograde idea that systems should have proof before taking punitive actions against teachers, would save the jobs of the 14% who “began” as below average performers. (Of course, Kane has no idea of how many of those teachers were always above average, but his metrics began by erroniously categorizing them.) He claims that it damages the students of the 86% percent “who were below average.” Again, he should have been more careful with his wording. He should have acknowledged that he was just estimating the effectiveness of those teachers and he was doing so with a statistical model that cannot determine whether those individuals were ineffective, or whether they were merely guilty of teaching in ineffective schools. He doesn’t consider the harm that would be done to many more students if (when?) his radical experiment fails.

Whether he understands it or not, the corollary effects of his proposal would be far more damaging in the inner city. By now, it should be clear even to the most ideological of non-educators that test-driven accountability has disproportionately damaged poor schools. Fear is the natural result of demanding the impossible of inner city administrators. When they are required to raise student performance as much as their competitors in schools where it is easier to increase test scores, under-the-gun administrators are more likely to impose primitive teach-to-the-test basic skills instruction. This would make it even harder for good teachers in high-poverty schools to produce above-average outcomes and avoid termination.

Moreover, Kane is proposing the firing of teachers who are below average. How could any policy advocate propose something as grandiose as that? Even the disgusting “stacking” corporate approach, where the bottom line is profits and not improving the lives of children, doesn’t contemplate something that extreme. Even Eric Hanushek only proposes the firing of the bottom 5 to 10%. Would Kane propose such a standard for firing of the bottom 25% of doctors? Of course not. Who would expend the time and money becoming a physician if he had an x% chance of his career being unfairly destroyed?

Kane should ask himself the key policy question. Would he make a commitment to teaching in the inner city if he had a y% chance, per year, of having his career being unfairly ended by the model he proposes? Since we have no idea of how many careers would be unjustifiably destroyed by his hypothesis, perhaps we should borrow Kane’s 14% for a rough estimate. Would he start a career, buy a house, start a family and commit to a career in a school where his value-added will always be lower when he would have about a one in seven chance of being unfairly terminated? If he objects to using that figure (after all, everyone is just guesstimating “effectiveness”) would he accept a 5% chance, per year, or a 15% chance, or a 10% chance (depending upon the conditions of his school) of being falsely indicted as ineffective?

Above all, Kane seems oblivious to what it actually takes to improve schools serving neighborhoods with intense concentrations of generational poverty and trauma. The key to school improvement is trusting relationships. What sort of collegiality could survive in schools where everyone would have a colleague, widely known to be a good teacher, who was fighting for his career due to being inaccurately identified as a poor performer? Now, think of the resulting culture where everyone has to compete to be identified as being above-average. The most likely scenario would be that a dog-eat-dog culture would result.

I would not be surprised if many corporate reformers would prefer to be equally abrupt in closing hospitals and firing doctors, or whoever they chose to micromanage. But, people value their health care services too much to leap ahead so brazenly. It is only in education where the big boys' opinions immediately became law, without even a brief discussion of the human costs. Reformers should consider the idea that a ruthless shoot-from-the-hip approach might or might not be an effective corporate policy for maximizing profits. We need a far higher burden of proof for the hypothesis that it is a good tactic for educating children.

4 comments:

  1. Top notch work here. Thanks, John. When Kane signed on to add another floor to this house of cards, he knew there is no scientific foundation underneath VAM--he just believed everyone would be as stupid, craven, and/or as self-serving as Tennessee politicians since 1992. Kane's only defense now is to attack science for not validating the duplicity that he and other corporate ed policy wonks continue to engage in. The thing that worries me most: what crisis will they now create to draw attention away from their crimes?

    ReplyDelete
  2. www.tulsakids.com3:08 PM

    Excellent post. Too many policy-makers (and the public) make the erroneous cause/effect assumption that bad test scores = bad teaching & bad schools; good test scores = good teaching & good schools without looking at the myriad influences behind the scores. Those influences range from poverty to drug use to lack of language skills to low cognitive ability, all of which are out of the control of the teacher or the school administration.

    ReplyDelete
  3. Thanks for this. Thomas J. Kane, the lead economist hired by the Gates Foundation to research "Measures of Effective Teaching" for about $64 million dollars has learned nothing. The resounding critique of this work by peers should be widely circulated. See Rothstein, J. & Mathis, W. J. (2013). Have we identified effective teachers? Culminating findings from the Measures of Effective Teaching Project. (Review). Boulder, CO: National Education Policy Center.
    Kane's faith in "stacked ranking" and firing based on a truncated definition of "effectiveness"--producing above average gains in scores on standardized statewide tests, annually--ignores the fact that about 70% of teachers have job assignments for which there are not statewide test scores. Such tests are hyped as if they are some sort of gold standard for judging the work of teachers. This is nonsense, unless you cannot tell the difference between education and teaching this generation to fill in the bubble on tests where the questions are asked by anonymous others, the answers are known to a computer, and the test is designed from the get-go to to make a big profit in an unregulated industry.
    Claims that test scores are "objective" measures of learning are lies. Kane should find out why Microsoft and other big companies are abandoning "stacked rating" systems. Kane is as clueless about education as Bill Gates.

    ReplyDelete
  4. Thanks for this. Thomas J. Kane, the lead economist hired by the Gates Foundation to research "Measures of Effective Teaching" for about $64 million dollars has learned nothing. The resounding critique of this work by peers should be widely circulated. See Rothstein, J. & Mathis, W. J. (2013). Have we identified effective teachers? Culminating findings from the Measures of Effective Teaching Project. (Review). Boulder, CO: National Education Policy Center.
    Kane's faith in "stacked ranking" and firing based on a truncated definition of "effectiveness"--producing above average gains in scores on standardized statewide tests, annually--ignores the fact that about 70% of teachers have job assignments for which there are not statewide test scores. Such tests are hyped as if they are some sort of gold standard for judging the work of teachers. This is nonsense, unless you cannot tell the difference between education and teaching this generation to fill in the bubble on tests where the questions are asked by anonymous others, the answers are known to a computer, and the test is designed from the get-go to to make a big profit in an unregulated industry.
    Claims that test scores are "objective" measures of learning are lies. Kane should find out why Microsoft and other big companies are abandoning "stacked rating" systems. Kane is as clueless about education as Bill Gates.

    ReplyDelete