27 December 2006

The Myth of Student Evaluations

Student evaluations of college faculty are a standard tool used to rate professors in the promotion & tenure process. Most professors realize the limitations of this tool, but of course some administators see it as a simple and objective measure of "effective teaching." Many adminstrators like quantitative measures -- i.e. numerical scores -- so that it is easy to compare one professor with another. We forgive them for this, for after all they are administrators.

Unfortunately, student evaluations -- especially numerical evaluations -- are highly flawed as a measure of effective teaching. These flaws contribute to the decline of American higher education.

William Rundell's study, "On the Use of Numerically Scored Student Evaluations of Faculty," is instructive in this regard. Rundell examined student evaluations of calculus professors at Texas A&M. Students used a 5-point scale to agree or disagree with statements about their professors--questions such as "The instructor seemed to be well-prepared for class;" and "The instructor genuinely tried to help the students learn the material and showed concern."

This was a thorough study. Rundell looked at evaluations from a 4-course sequence. He looked at evaluations from tens of thousands of students in hundreds of sections.

Rundell found two amazing correlations in the data:

(1) The higher the numerical rating for a particular professor, the higher the grades awarded to the students. The conclusion here seems to be that students reward professors that "give" them high grades. I.e. it's quid pro quo: good grades = good evaluations.

(2) The higher the numerical rating for a particular professor (and thus the higher the grade), the lower the grades awarded to students in subsequent sections. The conclusion here seems to be that highly rated professors do a worse job of preparing their students for future courses than do more lowly rated professors. I.e. good evaluations (and therefore good grades) = poor preparation for future courses (i.e. lower subsequent grades).

Wow! So college adminstrators that rely heavily on student evaluations for the promotion & tenure of faculty encourage: (1) grade inflation; and (2) poor teaching.

No wonder many (if not most) college professors are suspicious of numerical student evaluations. We all want to be liked, and even the tenured full professors among us are likely to inflate grades and "dumb it down" if we are rewarded with higher evaluations. For the untenured among us, well, what would you do if your job depended on it?

After reading Rundell's study several years ago, I wondered how easily student evalutions could be manipulated. I could not repeat Rundell's study, but I could test another idea I had heard about. I performed a little test in two sections of a summer course. In section A, I daily repeated several phrases from the student evaluation. I worked through these phrases in order, so that by the end of the semester students had heard me repeat each of the phrases several times. E.g. on a given day I would say "I always try to be well-prepared for class;" and "I am concerned about your performance in this class and I genuinely want to help you." In section B, I taught everything the same way except I did not repeat the phrases from the evaluation form.

The result? At the end of the semester, students in section A gave me an average score (on a 1 to 5 point scale) that was considerably higher than section B (4.5 vs. 3.8). Wow! Just by repeating some set phrases, I could let my administators know that I had become a better teacher.

This said, I like student evaluations and will continue to use them. In part, I will do so merely because administrators require it. More importantly, however, in addition to the mandatory quantitative forms I also use qualitative questions (e.g. "What can the instructor do to improve your learning in this course?"). Thoughtful students -- whether they like or dislike me and my style of teaching -- answer such questions with genuinely useful information, such as asking for review sessions, or complaining about class discussions where students drift off-topic.

It is sad that some administrators use numerical student evaluations as a major measure of teaching effectiveness. I know some professors who have been denied promotion and/or tenure because their numbers "did not look good."

In one case, an administrator did not like the "dumbell distribution" (i.e. bimodal distribution). The professor's average evaluation score was fine, but it was split between students who rated the professor very high and those that rated the professor quite low. The professor's explanation that they were simply trying to prepare students for a subsequent course went unheeded, never mind the implicit conclusion that perhaps many of the students had been ill-prepared for the current course.

In another case, an administrator did not like it that a large proportion of students had dropped the professor's course. In that case, it seems as if the prof were being rate on the basis of negative evidence.Without knowing why students dropped the course (maybe it was scheduled at 8 am!), it was a big leap to say they dropped because of ineffective teaching. Nevermind that this same adminitrator had tried their hand at teaching, and there had been a mass exodus of students from their section to another professor's section! Well, I suppose we are most critical of others when we see in them faults that we cannot recognize in ourselves. Maybe that's why some professors become administrators in the first place?

See Rundell http://www.math.tamu.edu/~william.rundell/teaching_evaluations/article.html and (for a brief summary) see http://lohman.tamu.edu/summarypapersrewards/rundell1.htm.

2 comments:

Anonymous said...

Pat, your research and knowledge of this topic is pretty interesting. I suppose you have seen ratemyprofessors.com ? If you haven't it's worth a look at least once. Only those profs who have been rated are listed.

Pat Munday said...

Thank you, Anonymouse. I just tried to login but could not--maybe too many of my students are logged in and rating me! Seriously, an online site such as RateMyProfessor.com would be even LESS reliable than classroom student evaluations: only a tiny proportion of students are represented; it is possible for individual students (enchanted or disenchanted) to make multiple entries; and the questions asked are not necessarily relevant to what students learn or the quality of teaching.

While perhaps an amusing diversion, there's even less to be learned through RMP than from quantitative classroom evals.