How data shielded our most vulnerable during the pandemic

August 16, 2021

Mark Reynolds, Chief Technology Officer at NHS Digital, talks about the unique challenges brought about by the pandemic, and how using data innovatively with the Shielded Patient List and Population Risk Assessment helped to identify and protect some of the most vulnerable in society

Normally, when global crises occur, few people would think to look to the statisticians or data scientists for answers. However, every day we have seen the importance of having hard data to make evidence-based decisions in order to save lives.

This has been especially true during this pandemic. Whether it has been to create vaccine cohorts prioritised by need, or superpowering research into the virus or finding suitable candidates to accelerate crucial clinical trials, access to data has been critical in powering our nation’s response to the pandemic in so many different ways.

This can be seen in the way that we used data from our NHS to help protect our most vulnerable in society.

While this might sound easy, working out what health conditions are most susceptible to a specific virus, and then identifying those people in datasets is an extremely complex job, especially in a world of fragmented health records and disparate clinical systems.

As the terrible consequences of COVID-19 were becoming clear in early 2020, we were tasked with just that. Armed with a list of conditions from the Chief Medical Officer, that he and his peers had determined worsened the outcomes of contracting this coronavirus, we set about finding those people affected so that they could be given advice on how to protect themselves over the coming months.

However, we were at a distinct disadvantage. While the GP and secondary care data existed to make this possible, they were held in different datasets and there was no system at the time in which we could link the data and create that cohort list.

There were also challenges about how exactly to define and then find each of the medical conditions on the list within the different systems, which work with different codesets that don’t align.

The richest source of data where the health conditions could be identified are in secondary care records, but this came with certain issues: digitisation isn’t complete across the system, the records use different formats and the coding isn’t as granular as the SNOMED coding used in GP systems.

Meanwhile, the GP systems were not designed to have that volume of data extracted. The current method of extracting data from them, the GP Extraction Service (GPES), is over a decade old and had been earmarked for replacement, a piece of work that was still underway when the pandemic struck.

These were all challenges that needed overcoming, and quickly, as any delay meant more lives could be lost. That knowledge was never far from any of our minds.

Integrating GP data to pinpoint individuals

By bringing together hospital and community prescribing data, an initial cohort of around 900,000 people was swiftly established, while further teams worked day and night to address the challenge of integrating GP data to pinpoint individuals that could be affected with much greater precision. The initial Shielded Patient List was born, within a week of first being commissioned.

With everything moving so fast and the knowledge of the virus growing each day, this meant there were continuous iterations of the list, each one enhancing the dataflows as we understood more alongside front line clinicians providing their crucial local knowledge.

This then fed into a wider support effort across the NHS, Government and local authorities, all working together and sharing the data to give those shielding the vital services they needed to continue protecting themselves and their loved ones, including food and prescription medication deliveries, check-ins from volunteers and priority supermarket shops.

While shielding paused periodically throughout England, the Shielded Patient List carried on being updated every week to guard against future need, so that advice and guidance could be provided again when subsequent waves of the virus rose in various parts of the country and shielding had to resume once more. Once vaccinations began to be rolled out, it helped to prioritise people for the vaccines they needed to help keep them safe.

A secure and transparent approach to data

We’d done all this in a way that was secure and transparent, publishing exactly who the data had been shared with as well as known issues as they came to light. While the work was done on the legal basis of the Government’s pandemic COPI notice, we still minimised the amount of data processed and let the Caldicott principles guide our way to share the data ethically. A huge amount of work was done in parallel to the technical build of the list, to ensure that the information governance and the data sharing were absolutely in line with the regulations, so patients could rest assured that their data was safe.

By Christmas, there were nearly 2.5 million of the most vulnerable people shielding in England. We’d refined the process, finding people centrally through a range of disparate datasets and then backed this up with local clinical expertise, with GPs and hospital consultants able to check those on the list to ensure they were appropriate and adding those that the central records couldn’t find.

However, we had also been working in collaboration with clinicians and epidemiologists across government and academia to take this work a step further. By using our GP, secondary care and mortality data from the first wave of the pandemic and linking it to COVID-19 outcomes, leading academics were able to expand on the list provided by the CMO and create a risk prediction model that they called QCovid®.

It took into account a whole range of factors from age to other medical conditions, ethnicity, BMI, deprivation and sex, amongst others, and calculated how each of these worked together to create a unique risk score for each person.

In collaboration with clinical experts in the Chief Medical Officer’s office, we then had to build a strategic data platform to apply this model to patient records to add people to the Shielded Patient List. This used no less than seven different existing datasets with their variety of coding.

This identified a further 1.5 million people that we were able to advise to shield, thanks to the academics who used vital data to learn the lessons of the first wave working with data experts to translate that knowledge into saving lives. These are people that no one could have identified without access to the detailed data that showed how various factors had affected those who had already contracted the virus.

This work has been recognised by the Royal Statistical Society, who awarded us one of their highest honours for excellence in data analytics. They spoke of how the project showcased quality analysis at a nationwide scale and called the collaborative spirit, concerted effort and careful consideration outstanding. I am immensely proud of our team for their work.

Data matters

Because through this all has been a dedicated team of database experts, clinicians, architects, information governance experts, system analysts, data scientists and others, all working together within the NHS to help protect those most at risk during the pandemic, underpinning public confidence in the work and proving that what they do matters. That data matters. And that they, too, can help save lives.