Stars and numbers: the question of evaluation

The question of marking schemes is a recurring topic in academic debates, where the key word is evaluation. In this exam period, a brief reflection on the art of grading.

By: Daniel Jutras

Date: December 20, 2022

There was a little dustup in the cultural pages of La Presse+ a couple of weeks ago. Theatre director René Richard Cyr railed against the rating scale for art critics adopted by the major daily and denounced a “paternalistic,” “laughable,” and “infantilizing” system. And that’s because La Presse+ abandoned its five-star rating grid and instead chose to use a decimal scale. Films, novels, plays, and albums are now rated from 1 to 10. Cyr, who was already not particularly fond of the five-star system, reminds us that his job is to “invent worlds where we want to believe that everything is still possible and where sixteen divided by three equals a thousand suns.” Instead, he invites the critics to describe works “using analysis, intelligence and sensitivity, with words, impressions and ideas.” To hell with rating on a scale of 10!

You’d swear you were at a departmental assembly. The question of grading schemes is indeed a recurring topic in academic debates, where the key word is evaluation - evaluation of exams, assignments, manuscripts for publication, promotion dossiers, programs, teaching and so on.

Let’s look at the evaluation of exams and assignments. The rest would detract from my point.

René Richard Cyr is quite right: all grading grids are reductive. But they still have a meaning, which may vary depending on the recipient. So here’s a first question: What’s the meaning conveyed by the grade? In the academic world, the awarding of a grade, whether it’s a percentage, a letter, or an honour, serves two distinct but related purposes.

A grade is primarily feedback—definitive, monolithic, crude—on the achievement of certain learning objectives. Apart from some specific contexts, the mark evaluates the result rather than the effort (a nuance that is not always well understood). It can be a more or less precise range, from a binary statement (success/failure) to a percentage scale and everything in between (more or fewer letters and more or fewer pluses and minuses).

The other purpose of grading is comparison. The mark given places each “performance” on a scale that situates it in relation to others, with a desirable but variable degree of accuracy and objectivity. In an ideal world, the grading grid would allow each individual to situate themselves in relation to the group—useful information in a learning journey—without having their position on the scale shared with others. Nobody likes bad grades: not artists, not restaurants, not students. But bad grades hurt even more when they are used as the basis for decisions made by people other than the one being evaluated: a potential client or viewer, a graduate school admissions committee or a potential employer.

There’s a clear tension between these two aims of evaluation. Grades and their distribution on a curve provide clear, simple, and immediately usable information for third parties. As a professor, am I accountable to these third parties? Do I have to worry about how they receive and use this information? Conversely, when it is intended for the person being assessed, the grade alone does not provide sufficiently informative feedback. So, in an environment where multiple choice exams were essentially unheard of and consequently essay questions were commonplace, as a young professor I spent a lot of time constructing grading grids that made it possible to distinguish a B from a B-. It was a waste of time. Students filed into my office to find out more. The conclusion, not surprisingly, was that feedback in “words, impressions and ideas” to use René Richard Cyr’s words, is more telling when it explains what’s wrong with an essay. But this requires time and resources that aren’t always available, especially if the group consists of dozens of people, all wanting to know exactly why they didn’t do well.

Over the years, I have come to the conclusion that feedback “in words” is more important than marks, though my students didn’t always agree with me. I’m aware that grades have consequences and should not be given carelessly or cavalierly. But I chose to pay greater attention to the close relationship between evaluation and learning. By making sure that I assessed the skills and knowledge actually and explicitly used in my course. Disclosing my evaluation grid and the relative weight given to each element ahead of time. Giving each person a paper version of this grid, annotated with the contents of their own examination booklet. My group sizes varied, from about 15 people in a seminar to large groups of close to 200 in a required course. In doing this, I devoted many hours and days around Christmas as well as beautiful days in May. I didn’t please everyone, but it was worth the effort. Through trial and error, I usually—though not always—managed to fulfil my fundamental responsibility of explaining their successes and failures to each person I taught.

In responding to René Richard Cyr, some argued that awarding a rating out of 10 was not intended for the artists but rather for all those people looking for a way to choose from among the many shows on offer. This, in my opinion, is where the jobs of professor and theatre critic part ways. Teachers should avoid worrying too much about the other “users” of the information conveyed by transcripts. To those of you who devote many hours to reading, evaluating, and commenting on papers, theses, and other exams, I raise my hat. This will aways be the most difficult aspect of your chosen career.

P.S. The play directed by René Richard Cyr received a rating of 8.5 out of 10. If it’s restaged, don’t miss it!

Daniel Jutras

If you’d like to continue the conversation, please drop me a line.

All communications