ronstik - stock.adobe.com
A key component of NHS Digital’s data processing struggled to cope with demand caused by Covid-19 testing, according to one of its senior technologists.
The Master Person Service (MPS), which is designed to link records of treatment to individuals through demographic data, received up to one million requests an hour at some periods last year when testing was at its peak.
“The service struggled to meet the Covid testing volumes,” James Zwiers, head of software engineering and IT operations for the organisation’s data services directorate, told a session at the Big Data LDN conference in London on 22 September. “That’s where the challenge was more than anywhere.”
NHS Digital increased the capacity of MPS to cope with the demand.
NHS services are provided by a wide range of organisations which collect data in different ways. The NHS number provides a unique identifier for each individual in England, with equivalents in Scotland, Wales and Northern Ireland, but these are not widely known by individuals and do not need to be collected as a condition of treatment.
MPS aims to fill the gap when people do not know their NHS number or they provide an invalid one by applying a four-stage algorithm to their demographic data such as age, gender and postcode. It aims to generate the correct NHS number in 99% of cases, as well as providing a confidence score.
Outlining NHS Digital’s overall data processing service architecture, James Zwiers said it uses Amazon Web Services, Splunk, GitLab, Terraform and Kubernetes to manage more than 200 health and social care datasets that cover the population of England, and in some cases all of the UK.
The datasets include those used to manage the response to the pandemic, such as for NHS Test and Trace processes and Covid passports. They do not process data in real time, with some information on ambulance services processed in 15-60 minutes but other data from hospitals only handled monthly.
Although NHS Digital’s data architecture is run in a public cloud, Zwiers said it was protected through a range of measures. Data is only able to flow in one direction, with processes either read-only or writing to a single location, all of the compute services have a single purpose and NHS Digital checks the identity of all users. “If you don’t need access to the data, you don’t get access to it,” he said.
The National Cyber Security Centre (NCSC), GCHQ’s information security unit, is involved in protecting the systems, with Zwiers adding that “nation-state actors” are known to be interested in their contents.
He said NHS Digital acceptsed data from healthcare providers in a range of formats, including XML, JSON and those generated by Microsoft Excel. Public Health England’s use of Excel’s old .xls file format, which limits the number of rows to 65,535, was blamed for loss of data on 15,841 Covid-19 test results in September and October 2020.
“I’d like us to get away from Excel, but for the government Excel is here to stay,” said Zwiers, adding that NHS Digital accepted data in a range of formats to reduce the reporting burdens on provider organisations.
Read more about Covid-19 data and the NHS
- Coronavirus: NHS corrals Microsoft, Palantir and Google to hone data analysis.
- Legacy data and IT issues ‘laid bare’ during Covid-19, says National Audit Office.
- Over a million opt out of NHS data sharing.