That’s the buzz, anyway. You’ve probably read about the latest jeremiad on higher education: Academically Adrift. Among its findings is the claim that 45 percent of students show no academic improvement during the first two years of their undergraduate careers. People are taking exception to the report’s findings and methodology. At the Chronicle of Higher Education, Alexander Astin has a particularly rigorous critique:
[The] method used to determine whether a student’s sophomore score was “significantly” better than his or her freshman score is ill suited to the researchers’ conclusion. The authors compared the difference between the two scores—how much improvement a student showed—with something called the “standard error of the difference” between his or her two scores. If the improvement was at least roughly twice as large as the standard error (specifically, at least 1.96 times larger, which corresponds to the “. 05 level of confidence”), they concluded that the student “improved.” By that standard, 55 percent of the students showed “significant” improvement—which led, erroneously, to the assertion that 45 percent of the students showed no improvement.
The first thing to realize is that, for the purposes of determining how many students failed to learn, the yardstick of “significance” used here—the .05 level of confidence—is utterly arbitrary. Such tests are supposed to control for what statisticians call “Type I errors,” the type you commit when you conclude that there is a real difference, when in fact there is not. But they cannot be used to prove that a student’s score did not improve.
I haven’t read the study yet, but if this is how the authors used the “standard error” test, it does raise some questions. I don’t have a PhD. in statistics, but as I recall, the standard error test used here is best used to determine if there is a difference in a measure of central tendency (e.g., mean score) from one sample to another.
As Astin points out in the above quote, the test is supposed to provide an indication of whether the difference in the sample means is just a random fluctuation in the measurement of those samples or represents a meaningful difference between the two samples. Applying this test to compare two single test scores from single individuals is just asking for trouble. You can read Astin’s full Chronicle piece here.