How the UK's algorithm-based grading fell apart

Direct Centre-level Performance was flawed from the start.

Ben Birchall - PA Images via Getty Images

The coronavirus pandemic has made it impossible for UK students to take any traditional exams. In England, these include GCSEs, the qualifications that mark the end of compulsory school education, AS and A-levels, the latter of which are required for most university degree courses. With countless futures in limbo, the UK government asked Ofqual, the regulator for qualifications and exams in England, to develop an algorithm that could determine people's marks. Thousands of A-level students were given a grade that was lower than their teacher predicted, though, sparking a nation-wide backlash and protests on the streets of London. Now, the government has buckled and announced that it's abandoning the formula and giving everyone their predicted grades instead.

Roger Taylor, chair of Ofqual, said the regulator was “extremely sorry” for the “real anguish” it had caused students and the inevitable damage to public trust in the education system. “There was no easy solution to the problem of awarding exam results when no exams have taken place,” he said in a statement.

The backlash to Ofqual’s algorithm was only matched by its complexity. The non-ministerial government department started with a “historical grade distribution” — the percentage of students who achieved each grade — for every institution, broken down by subject. Then, Ofqual looked at how results shift between the qualification in question and students’ previous achievements. (For A-levels, this ‘prior attainment’ would mean GCSE grades.) It used the “relationship” between the two to predict a general range of grades for a “historical” cohort and the current year group at each institution.

This wasn’t enough, though, because many students didn’t have old grades to reference. Some didn’t sit the relevant exams, while for others it wasn’t possible “to reliably link the student back to their prior-attainment measure,” Ofqual explained in a technical document. Ofqual had to give prior attainment weighting based on the number of students that had available data. So if only a few learners had GCSE grades on file, the influence on their year group would be lower, just in case they weren’t representative of everyone else.

The algorithm then combined everything — the historical grade distribution, the “relationship” between GCSEs and A-levels, Ofqual’s initial predictions, and the ‘prior attainment’ for this year’s crop of students — to create a rough set of grades for every school and college, broken down by individual subjects.

The students in each class or year group were ranked and roughly matched up with the grades range provided by Ofqual. They were then given marks — even though there were no exams this year — based on whether they were ranked higher or lower than other people that achieved the same grade in their year group. Finally, Ofqual adjusted each subject’s grade boundaries so that the spread of marks was roughly similar to previous academic years.

If an institution had fewer than 15 people taking an A-level or GCSE, greater emphasis was placed on the teacher’s predicted grades. As Jeni Tennison, Vice President and Chief Strategy Adviser at the Open Data Institute explained in a blog post: “As teachers overall tend to overestimate grades, this means overall scores will tend to be higher for small classes.”

Ofqual was sure that the grades would be “broadly in line with previous years.”

Ofqual explained in a technical document that this approach, dubbed Direct Centre-level Performance (DCP), assumed “that a center will perform the same in a subject this year as they have across recent years.” It was also designed to take into account “any changes in underlying ability of students.” The hope was that the system would respect the talent of the learners who were unable to take their exams, while simultaneously delivering results that didn’t seem out of the ordinary.

Once the guidance was published, though, concerns were raised. Huy Duong, a parent with a PHD in physics, worked with his sister, a statistician at the Medical Research Council, to calculate how the algorithm would affect his son’s school. They shared these findings with the Guardian, who reported that nearly 40% of A-level grades would be downgraded in England. Ofqual was sure, however, that they would be “broadly in line with previous years” and, if anything, slightly higher than those recorded last year. “We will make sure there isn’t any significant change in year on year results which would undermine the value of the qualifications,” Ofqual explained in its guidance document.

Students with posters during a protest at the Northern Ireland Education Authority main building in Belfast, over Northern Ireland Minister of Education Peter Weir's decision on A level results. (Photo by Liam McBurney/PA Images via Getty Images)
Students protesting the algorithm-generated exam results. (Liam McBurney - PA Images via Getty Images)

The number of downgrades wasn’t the only problem, though. The reliance on historical data meant that students were partly shackled by the grades awarded to previous year groups. They were also at a disadvantage if they went to a larger school, because their teacher’s predicted grade carried less weight. At a time when society is examining how technology is reinforcing its race and class issues, many realized that the system, regardless of Ofqual’s intentions, had a systemic bias that would reward learners who went to private institutions and penalize poorer students who attended larger schools and colleges across the UK.

The government could sense the growing hostility toward the algorithm. On August 12th, the day before the A-level results were published, education secretary Gavin Williamson said students could choose their mock result or sit the relevant exam in the fall if they weren’t happy with their algorithm-decided grades. “This triple lock system will help provide reassurance to students and ensure they are able to progress with the next stage of their lives,” he promised.

It wasn’t enough, though. As the Guardian reports, 35.6 percent of results in England were downgraded by one grade from the mark issued by teachers. A further 3.3 percent dropped by two grades and 0.2 percent fell by three grades. Analysis published by the paper showed that pupils with lower socioeconomic backgrounds were more likely to be downgraded than those in wealthier areas. “At low-performing schools, high-performers have been shifted down,” Richard Wilkinson, professor of statistics at Nottingham University told New Scientist. “They have all been shifted towards the average of the school performance over the previous three years.”

The public outrage was large and swift. Students, parents and educators alike expressed their anger and disappointment over the algorithm’s results. As BBC News reports, Nina Bunting-Mitcham had been predicted an A and two Bs by her teacher, but was awarded three Ds by the algorithm. “You have ruined my life,” she said during the BBC’s Any Questions programme. The pandemic had already created uncertainty about how schools, colleges and universities might reopen; now a man-made algorithm was clouding students’ futures even further. “Something has obviously gone horribly wrong with this year’s exam results,” Sir Keir Starmer, the leader of the opposition Labour Party tweeted.

Universities were ready to defy the algorithm, though. The day before the A-level results were published, Birmingham City University (BCU) announced that it would accept students based on their teacher-predicted or Ofqual-awarded grades. The day after the A-level results were published, the University of Leicester said it would be offering places based on mock exam results and published results. “Whichever is higher,” the institution promised on Twitter. Pembroke College, part of the University of Cambridge, said it would accept “all candidates who missed their A-Level offer by one grade.” Worcester College, part of the University of Oxford, announces that it would take everyone who was offered a provisional place, regardless of their A-level results.

Several other institutions followed suit on Saturday. Keen to dispel people’s concerns, Ofqual published documentation that explained how students would be able to appeal their algorithmically-generated grades. As BBC News reports, it confirmed that teacher assessments would be considered if the student didn’t take a written mock. But if the teacher’s predicted grade, known as a CAG, was lower than the mock result, the student would have to accept the former. A spokesperson for the UK’s Department for Education told Schools Week: “In the rare circumstances where the CAG is lower than the mock, it would be more appropriate for the student to instead receive the CAG.”

The rules were immediately criticized. “It is attempting to remedy the grading fiasco through an appeals process so surreal and bureaucratic that it would be better off at this point doing that U-turn and allowing original teacher-assessed grades, where they are higher, to replace moderated grades,” Geoff Barton, general secretary of the Association of School and College Leaders said.

Shockingly, Ofqual issued another statement later that day: “Earlier today we published information about mock exam results in appeals. This policy is being reviewed by the Ofqual board and further information will be published in due course.”

If the regulator had pressed on with its algorithm-based grades, it would have been met with countless appeal requests and at least one legal bid.

It was a nightmare that Scotland knew all too well. The country had issued its exam results on August 4th — nine days before England — and experienced a similar backlash. John Swinney, education secretary for Scotland, had been forced to reinstate the grades originally recommended by teachers. On August 16th, Labour MPs and backbench Conservatives were urging the UK’s Prime Minister Boris Johnson to take swift action.

Wales and Northern Ireland moved before England. Peter Weir, education minister for Northern Ireland, said yesterday morning that all GCSE results would be solely based on grades provided by teachers. Kirsty Williams, the education minister for Wales, announced later that all awards would be “on the basis of teacher assessment.” Ofqual's walk-back, therefore, wasn't a surprise. It felt inevitable, even. If the regulator had pressed on with its algorithm-based grades, it would have been met with countless appeal requests and at least one legal bid.

A-level student holds a placard during a protest about the exam results at the constituency offices of Education Secretary Gavin Williamson, amid the spread of the coronavirus disease (COVID-19), in South Staffordshire, Britain, August 17, 2020. REUTERS/Jason Cairnduff
An A-level student calls for education secretary Gavin Williamson's resignation. (Jason Cairnduff / reuters)

The work isn’t over, though. Students now have to reconsider their grades and what, if any new options are available to them. Some will now be eligible for their first-choice university, but there’s a good chance those institutions will have already allocated their places. The question, therefore, is whether these educational businesses can increase their intake without threatening staff and student safety throughout the pandemic. Otherwise, applicants will have to go through clearing — a system in the UK that matches learners to unfilled university places — or consider deferring a year.

“It is vital that information is provided speedily on how this decision will impact higher education institutions, students wishing to apply through clearing and those who may have been rejected on their original grades,” David Hughes, CEO of the Association of Colleges said.

Questions are being raised, too, about how the system will be adjusted for 2021. Everyone hopes that the coronavirus pandemic will be over soon and restrictions on everyday life will ease. But there’s a chance that schools, colleges and universities will stay shut or experience temporary closures that make it impossible for traditional exams to be held next summer.

If the government should call on an algorithm again, it will need to be better than the one used this year.