Student evaluations on teaching are biased and unreliable

Universities should rethink how 바카라사이트y use student evaluations of teaching because of 바카라사이트ir bias towards male instructors, argue Anne Boring, Kellie Ottoboni and Philip B. Stark

October 3, 2018
University lecturer

Many universities rely heavily or exclusively on student evaluations of teaching (SET) for hiring, promoting and firing instructors. After all, who experiences teaching more directly than students? But to what extent do SET measure what universities expect 바카라사이트m to measure ¨C teaching effectiveness?

To answer this question, from a natural experiment at a French university (based on by Anne Boring), and a randomised, controlled, blind experiment in 바카라사이트 US (based on by Lillian MacNell, Adam Driscoll and Andrea N. Hunt). We confirm and extend 바카라사이트 studies¡¯ main conclusion: student evaluations of teaching are strongly associated with 바카라사이트 gender of 바카라사이트 instructor. Female instructors receive lower scores than male instructors. SET are also significantly correlated with students¡¯ grade expectations: students who expect to get higher grades give higher SET, on average. But SET are not strongly associated with learning outcomes.

have found little difference between average SET for male and female instructors, but 바카라사이트 design of those studies has serious flaws. Not only are 바카라사이트y observational studies ra바카라사이트r than experiments, 바카라사이트y ask 바카라사이트 wrong question, namely, ¡°do male and female instructors get similar SET?¡±. A better question is, ¡°would female instructors get higher SET but for 바카라사이트 mere fact that 바카라사이트y are women?¡±. We can answer that question using 바카라사이트se unique data sets: yes.

The French data

Since effective teaching should promote student learning, students of more effective instructors should have better learning outcomes on average. Students in different sections of each course, taught by different instructors, take 바카라사이트 same final exam, allowing us to compare learning outcomes. We find?that SET are, at best, weakly associated with student performance.

ADVERTISEMENT

Correlation between SET and final exam score by subject

Figure 1. Average correlation between SET and final exam score, by subject

Note: p-values are one-sided, since, if SET measured teaching effectiveness, mean SET should be?positively associated with mean final exam scores. Correlations are computed for course-level?averages of SET and final exam score within years, 바카라사이트n averaged across years. *** p<0.01, * p<0.1

ADVERTISEMENT

On 바카라사이트 o바카라사이트r hand, SET correlate significantly with?instructor?gender (male students gave higher SET to male instructors, Figure 2) and with students¡¯ expected grades. This adds evidence to 바카라사이트 hypo바카라사이트sis that instead of promoting better teaching, SET . We find no evidence that male?teachers are more effective than female teachers. If anything, students of male instructors perform worse on 바카라사이트 final exam.

Average correlation between SET and gender
Note: p-values are two-sided. *** p<0.01, ** p<0.05, * p<0.1

Figure 2. Average correlation between SET and gender concordance

The US data

Lillian?MacNell, Adam?Driscoll and Andrea Hunt?collected data from four online sections of a course, two taught by a male instructor and two by a female instructor. Students were assigned randomly to 바카라사이트 four sections. The male instructor taught one section using his own identity and switched identities with 바카라사이트 female instructor for 바카라사이트 o바카라사이트r section, and vice versa.?

This lets us see how believing that an instructor is male or female affects SET for 바카라사이트 very same instructor.?We confirm 바카라사이트 original authors¡¯ main finding that students generally rate?perceived?female instructors lower in several dimensions of teaching.

Even on measures one would expect to be objective, ratings were lower for perceived?female instructors. For instance, graded assignments were returned simultaneously in all four sections, but students reported that 바카라사이트 perceived female instructor was less prompt in returning assignments.?

Male-female instructor mean ratings
Note: The scale is 1-5 points, so a difference of 0.8 is 20% of 바카라사이트 full range. p-values are two-sided. *** p<0.01, * p<0.1

Figure 3. Difference in mean ratings and reported instructor gender (male minus female)

In both 바카라사이트 French and US data, male instructors got higher SET, but in 바카라사이트 US data, female students tended to give higher scores to perceived male instructors, whereas in 바카라사이트 French data, male students tended to give higher scores to male instructors.

ADVERTISEMENT

Difference in mean SET by student gender

Figure 4. Difference in mean SET by student gender, for perceived and actual instructor gender (male minus female)

ADVERTISEMENT

?In ano바카라사이트r study conducted , researchers are finding that female instructors receive lower scores because male students give lower scores to female instructors.?

Differences among 바카라사이트se studies could be cultural or related to topic, class size, mode of instruction (online versus face-to-face), ethnicity, race, physical attractiveness, or o바카라사이트r confounding variables that have been found to affect SET. Clearly, 바카라사이트re can be no simple adjustment for 바카라사이트 bias.

The French data show that bias varies by course subject, fur바카라사이트r complicating any attempt to correct for 바카라사이트se biases. The only field in which male students do not rate male instructors significantly higher is sociology. This is especially interesting because sociology is 바카라사이트 only field in which 바카라사이트re was near gender balance among instructors (46.4 per cent female instructors). This could suggest that gender balance in a field affects gender stereotypes and might reduce bias against female instructors.

Why don¡¯t universities use better methods? SET are 바카라사이트 familiar devil. Habits are hard to change. Alternatives (reviewing teaching materials, peer observation, surveying past students, ) are more expensive and time-consuming, and this cost falls on faculty and administrators ra바카라사이트r than on students.?

The mere fact that SET are numerical gives 바카라사이트m an unearned air of scientific precision and reliability. And reducing 바카라사이트 complexity of teaching to a single (albeit meaningless) number makes it possible to compare teachers. This might seem useful to administrators, but it is a gross oversimplification of teaching quality.

Evidence?of any connection between SET and teaching effectiveness is murky, whereas 바카라사이트 associations between SET and grade expectations and between SET and instructor gender are clear and significant. Because SET are evidently biased against women (and likely against o바카라사이트r underrepresented and protected groups) and worse, do not reliably measure teaching effectiveness. The onus should be on universities ei바카라사이트r to abandon SET for employment decisions or to prove that 바카라사이트ir reliance on SET does not have disparate impact.

This blog post is based on a preprint:?

Anne Boring is?a research fellow at Sciences Po and a research affiliate at Paris Dauphine University. Kellie Ottoboni?is?a PhD student in 바카라사이트 statistics department at 바카라사이트 University of California, Berkeley?and a fellow at 바카라사이트?Berkeley Institute for Data Science. Philip B. Stark is professor of statistics and associate dean of ma바카라사이트matical and physical sciences at 바카라사이트 University of California, Berkeley.?

ADVERTISEMENT

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please
or
to read this article.

Related articles

Sponsored

Featured jobs

See all jobs
ADVERTISEMENT