zoka74 - stock.adobe.com

UK government coronavirus data flawed and misleading

Government Covid-19 coronavirus data has been a miasma of inexactitude, often basically flawed and misleading

This article can also be found in the Premium Editorial Download: Computer Weekly: The UK government’s ‘flawed and misleading’ Covid-19 data

On 30 April 2020, Boris Johnson told the country: “171,253 people have tested positive – that’s an increase of 6,032 cases since yesterday.” Having given a figure for those in hospital, from where he had recently returned following his own fight with Covid-19, the Prime Minister added: “And sadly, of those who tested positive for coronavirus, across all settings, 26,711 have now died. That’s an increase of 674 fatalities since yesterday across all settings.”

The government’s data on cases and deaths has been used by ministers and journalists as single sources of truth on the pandemic in the UK. But both are flawed and in some cases misleading, potentially distorting both public understanding and government decision-making.

The most recent government health data faux pas was a technical glitch that led to 15,841 positive results between 25 September and 2 October 2020 not being included in reported coronavirus cases. Public Health England says the fault was caused by a data file exceeding its maximum file transfer size.

Deaths might seem easy to measure, but deaths from a specific cause are not. The day before Johnson spoke in April, Public Health England had changed its figures to include those who died outside hospitals, such as in care homes, hence his repeated use of “across all settings”.

This would not be the last recalculation: the data covered anyone who had tested positive for Covid-19 and later died, regardless of cause, inflating the figures. When, in August, the organisation restricted the data to those who had died within 28 days of a positive test, in line with Scotland, Wales and Northern Ireland, more than 5,000 deaths were wiped from the overall death toll and Johnson’s 30 April figure fell to 634.

But that didn’t mean 634 died from Covid-19 on 30 April, rather that 634 deaths were reported that day. The figure tends to spike on Tuesdays, as NHS administration catches up from the weekend. Using the previous seven days’ data can smooth out weekly fluctuations but, more importantly, the data is out of focus, with one day of reporting including deaths scattered over previous days, weeks and even months.

More ways of measuring Covid cases

David Paton, professor of industrial economics at Nottingham University Business School, has regularly published figures based on the actual date of death in the English health service. He reckons the figures are “reasonably complete” only five or six days afterwards. By 6 May, NHS England had reported 281 deaths taking place on 30 April, 90% of Paton’s current total of 313 – which includes one report made as late as 20 September. The government now publishes its own UK-wide “date of death” data, reporting 548 deaths on 30 April.

Looking at when deaths actually occurred provides a different view of the pandemic’s spring peak. According to both the original and revised version of the government’s data, coronavirus death reports peaked on 21 April, a Tuesday. Both Paton’s and the government’s numbers based on date of death peaked almost a fortnight earlier, on Wednesday 8 April. However, the “date of report” daily figure remains the one most used, probably because it is immediately available and doesn’t change as more reports are made.

There are other credible ways to measure coronavirus deaths. By 18 September, the government’s all-UK death toll had reached 41,801. But the Office for National Statistics (ONS) reckons approximately 57,677 people had died by that date across the UK where death certificates mentioned Covid-19 – including 769 on 30 April – and that 53,663 more deaths had occurred in England and Wales alone in 2020 compared with the average of the previous five years, a measure known as excess deaths.

The government’s other main measure – positive tests for cases of coronavirus – has bigger problems. On 21 September, the official number of new daily cases exceeded April’s record and has continued to climb. But while the rise is a cause for concern, it should be seen in the context of daily test numbers having quadrupled since April. “The supply side has changed,” says Paton.

Case data on positive tests is a side-product of the testing system. It took time for the government to set up large-scale public testing; testing centres are likely to be established in areas where cases are growing; and in recent weeks demand has outstripped supply. The ONS is carrying out weekly research by sending tests to a random sample of the population, but this is smaller-scale, slower and lacks local detail.

Furthermore, cases differ in significance: 200 cases in households across a city are more threatening and harder to control than the same number in one university hall of residence. This does not mean positive test case data is useless, just that it needs treating with caution. Paton says it has the advantages of being easy to understand, quick to react and is highly localised. “But when you’re making very serious and significant policy interventions that affect everybody’s everyday life, my view is it’s not quite good enough,” he says.

More data means greater accuracy

Paton advocates using a holistic range of data, including the different measures of deaths and case numbers, NHS data on hospital admissions and triage assessments, and data from a symptom-reporting app run by health science company Zoe. All have their flaws, and the more reliable ones – hospitalisations and deaths – are slower to react. They work best together.

“When all those things are moving in the same direction, that probably tells you an increase in cases is genuine, not just due to testing,” he says. “If you rely on any one of those, you may make false policy decisions,” he says.

Nigel Marriott, an independent statistician who provides consultancy and training, says using a range of sources will be vital in tackling increasingly localised outbreaks, with urban areas outside southern England worst affected: “The effort needs to be focused on those areas,” he says. But most of the data published by the government is not localised, so Marriott looks for rising case numbers in neighbouring areas as corroborating evidence.

“There’s a classic trade-off between quality and speed. Often, the best data that helps you make a decision will not be available for a week or so”
Nigel Marriott, independent statistician

Marriott believes the pandemic has demonstrated a basic tension in gathering and using data. “There’s a classic trade-off between quality and speed,” he says. “Often, the best data that helps you make a decision will not be available for a week or so.” Using fast data can be justified, but he thinks the government should explain the limitations of such measures.

Openness is vital, Marriott adds, noting that in June the government imposed extra local regulations in Leicester, partly on the basis of unpublished “pillar 2” coronavirus test results carried out by commercial labs for the general population. Public Health England – which the government announced in August would be replaced by a new National Institute for Health Protection – has not consistently published its raw data, something Marriott describes as “a giveaway” that an organisation is not confident of data quality.

“It’s a very 2010 thing to say, but [we need] more open data,” says Gavin Freeguard, programme director at the Institute for Government think tank, calling for a return to a policy of last decade’s coalition government. He says that some of the best uses of coronavirus data have been made by third parties, such as Matthew Somerville’s local lockdown lookup service.

Freeguard adds that the government’s recent publication of a national data strategy and work to recruit a chief data officer are positive signs that people at the centre of government recognise the value of good data. But he warns there is a “danger of technology and data solutionism”. The backlash caused by qualification bodies using algorithms to decide grades for pupils after examinations were cancelled may have damaged public trust in data use, he thinks.

Using data visualisation for impact

Freeguard, who writes an email newsletter on data visualisation, says the government has made effective use of graphs to communicate data on coronavirus, but that there has been “a lack of uncertainty”.

People may prefer a simpler message – “I think there is something about we the public trying to reach for certainty in uncertain times” – but this can be misleading. It can be done: the ONS, which Freeguard praises for its work on Covid-19, illustrates the uncertainty with its estimate that 103,600 people in England had the virus in the week from 13-19 September by also including 85,600 and 123,400, the range it is 95% confident the true figure is in.

Andy Cotgreave, technical evangelism director at visualisation software provider Tableau, says graphs showing how cases would grow and peak given different transmission rates helped make the case for the UK government to enact strict controls on people. He describes “flattening the curve” as “possibly one of the most influential visualisations ever”, by communicating how a virus can spread with different rates of transmission.

But Cotgreave criticises a graph used by the UK’s chief scientific adviser, Patrick Vallance, and chief medical officer for England, Chris Whitty, at a press conference on 21 September. This projected cases doubling every seven days from 3,105 on 15 September to nearly 50,000 by 13 October. Vallance and Whitty stressed it was a scenario, not a prediction, but the single set of bars on the graph suggested otherwise.

“Was it the best case, the worst case, the mid-case?” says Cotgreave. This is part of a bigger problem, he adds: “The risk is that people are fatigued with data, trust is waning and these things could backfire.”

Nottingham University’s Paton argues that the scenario of doubling every seven days was far more than Spain and France – which the two scientists had just discussed – were experiencing, with those two countries seeing cases double approximately every 20 days.

“I called it sleight of hand and I don’t think that’s unfair,” he says. The graph seemed to be a way to generate headlines that would justify new regulations announced by the Prime Minister the next day. “That’s not what I think independent scientists should be doing.”

Jennifer Rogers, the Royal Statistical Society’s vice-president for external affairs and head of statistical research and consultancy for UK-based biometrics researcher Phastar, says the government needs to maintain trust in how it deals with data on coronavirus. She says that while it makes sense to change the way measures work when limitations become apparent, this should not be undertaken lightly and needs to be explained openly and honestly.

Rogers believes the government should consider presenting less data. During the period of national lockdown, the same graphs were used every day, often with minimal changes: “As a statistician who likes data, I felt it was too much data,” she says.

Passing on numbers and showing graphs may seem to boost the credibility of someone presenting, but it can also overwhelm those watching. “There’s a big lesson: just present what’s important, what’s relevant and what people really need to know in order to make decisions about their day-to-day lives,” she says.

Read more about UK government and Covid-19 coronavirus data

Read more on Big data analytics