Data analysts deploy Monte Carlo modelling gambit to fight West African Ebola virus epidemic

Mathematicians are using data analytics to predict how the Ebola epidemic will spread so aid can be deployed effectively

With the Ebola epidemic marching on in West Africa, the World Health Organisation (WHO) expected up to 10,000 new cases a week in December 2014.

What Guinea, Liberia and Sierra Leone need more than anything else is targeted medical aid to combat the virulent disease – but the scale of the outbreak is now so severe, providing aid has become a logistical nightmare. The difficulties of getting the right help to the right place is compounded by poor transport infrastructure and the unwieldy size of the area of contagion.

In a bid to help deploy aid, IBM has provided an Ebola tracking system. Citizens can directly report new cases of Ebola – via phone or text messages – with their location to a central hub. 

This information is processed using data analytics and cloud computing, to assist governments and relief agencies in tracking the spread of the disease. With this data, aid can be distributed to the areas where it is most needed.

This technique provides information as to where Ebola is or, more accurately, where the people were when they became ill. But, because of incubation and diagnosis times, the location data can be weeks out of date.

Simulations will give reasonable results only if our model can encapsulate the relevant conditions

Enrico Scalas, University of Sussex

Simulating solutions

Mathematicians are employing various techniques to predict how Ebola will spread. This allows governments and aid agencies to make risk-informed decisions for medical intervention. One such technique is the Monte Carlo analysis.

Unlike deterministic modelling – which aims to provide a definite solution – Monte Carlo analysis is a dynamic system that generates a series of outcomes. It is based on the relationships between a set of variables sampled randomly, from pre-defined statistical distributions representing the observed data. 

Once the code running the model has completed several thousand independent simulations of the cycle, the resulting data provides the most likely statistical outcomes.

“Monte Carlo analysis does not give an exact value for the number of infected persons after a certain time, from the beginning of an epidemic,” says Enrico Scalas, professor of statistics and probability for the department of mathematics at the University of Sussex. “Instead, one gets a range of values and, attached to each of these values, there is a probability – namely a measure of our degree of belief in that value.”

The mechanics of Monte Carlo analysis

The Monte Carlo method of modelling disease spread employs population demographic data to model viral outbreaks. 

Chris Jewell, a lecturer in biostatistics at Massey University in New Zealand, researches risk-forecasting for outbreaks of infectious disease. He gives an example of modelling a livestock epidemic: “This would be the locations of farms, numbers and types of animals, and the presence of any business contact relationships between farms – such as animal movements and the sharing of equipment.” 

The above screen grab is from a disease management system that Chris Jewell developed with Judith Brown at Warwick University. It shows the output from a Monte Carlo method which uses a dynamical epidemic model to detect undiscovered foot and mouth disease infections on farms during an outbreak. The analysis is based on the 2001 UK FMD outbreak and assigns a probability of being infected to each presumed-uninfected farm given what was known about the outbreak on the 33rd day after the first case was detected.

Although the factors will differ, the modelling technique is similar for modelling a viral outbreak among humans.

“When we run a Monte Carlo simulation for an epidemic,” Scalas explains, “we are repeating many copies of that very epidemic, occurring in unique conditions. Every time we run the simulations, due to chance, something slightly different happens each time – as in a parallel universe or the movie Sliding Doors.

“Then we get different numbers of infected people after, say, 20 days from inception. If we run the simulation 1,000 times and we get 1,234 infected people in 100 of these times, we can approximate the probability of having 1,234 infected with the relative frequency 100/1,000.”

Researchers use mobile phone tracking records to present an accurate model of how people move around

Chris Jewell, Massey University

Creating a Monte Carlo analysis simulating a viral outbreak is a two-stage process. “You take what has happened already, and then you run these simulations to try to estimate what sort of transmission rate and reporting rates match the data and seem most reasonable,” says Adam Kucharski, a research fellow in infectious disease epidemiology at the London School for Hygiene and Tropical Medicine. 

“Once you have estimated these values, you can then simulate forward in time – based on the values we now have, and about what we can expect to see in the coming weeks,” he adds.

Monte Carlo analysis provides insight into the statistically most likely set of outcomes, based on the quality and accuracy of the data provided. It does not make decisions, but facilitates them, based on all of the available data – regardless of complexity – during the decision-making process.

Recovery models

Two elements that limit Monte Carlo analyses are computing power and programming expertise. 

The greater the amount of detail in a model, the longer the results will take to generate. Sixteen hours is the ideal limit, allowing a programmer to set the model running overnight and return the following morning to review the results. 

GPU processors and mini-core architectures are used to provide the necessary computing power. Creating a more elegant model – which streamlines the simulation without undue complexity – will reduce simulation time.

Sometimes a simplification will be required, such as modelling groups rather than individuals. However, modelling at an individual level provides more detailed results, such as the different distributions for the statistical likelihood of patient prognosis – death before treatment, death after treatment, recovery at hospital and recovery at home – and the statistical likelihoods for disease transmission in each case.

Error limitation

The key element to Monte Carlo analysis lies in ensuring all the available data is accurate and correctly modelled. Investing additional time and resources to ensure completeness and accuracy will minimise uncertainties, which could later become dominant areas of concern.

Monte Carlo analysis is bound by the computing adage of Gigo: garbage in, garbage out. Before running a Monte Carlo analysis, the modeller must ensure optimum data quality.

Acting on incorrect simulations can have unfortunate consequences. “Simulations will give reasonable results only if our model can encapsulate the relevant conditions. Otherwise, we can do useless or wrong things based on simulations,” says Scalas.

As with all simulations, there are limits to how accurately it can model reality. This is particularly true for elements such as people and population movements. “Human behaviour is one of the things that is very difficult to incorporate into a model, especially during an outbreak,” Kucharski observes.

“In these cases, it may be better to employ a more abstract method, pared down to only the number of cases per week in the population, as well as the population size, but ignoring the geography and details about interactions between individual people.”

Human behaviour is one of the things that is very difficult to incorporate into a model, especially during an outbreak

Adam Kucharski, London School for Hygiene and Tropical Medicine

However, Jewell highlights how technology could provide a solution to mapping population movements. “Researchers use mobile phone tracking records, anonymised of course, tracked from cell tower to cell tower,” he says, to present an accurate model of how people move around, or between, cities on a daily basis.

It is the inability to accurately model reality that allows errors to creep into the simulation. The further the simulation looks into the future, the greater these errors grow until the model is no longer reliable.

For this reason, Monte Carlo analysis is kept as simple as possible – especially when there is limited data available – to avoid errors skewing the predictions. As more data becomes available, this can be added to the model to provide greater accuracy. However, a complex model has greater potential for problems to occur. A robust data verification strategy must be used to limit error creep.

Real applications of simulated results

“If we are able to capture the relevant conditions and we know very well how the disease spreads, Monte Carlo simulations can give reliable statistical estimates to control an epidemic,” says Scalas.

In cases such as the Ebola outbreak, researchers can gain an understanding of the scale of the epidemic, and how it will most likely spread. With this information, aid agencies will know how many doctors and nurses they will need, and can plan the deployment in advance.

The act of modelling a viral outbreak provides useful insights, especially when considering the implications of the extent of the outbreak. Jewell explains that the creation of Monte Carlo analysis can act “as a thought framework for identifying aspects that perhaps we had not thought about before”.

Likewise, researchers can gain a greater understanding of the situation – and how the outcome can be influenced – by manipulating the variables during different iterations of the model, and assessing the most likely results from each simulation. These variations can include simulating curfews, school closures or quarantines.

Monte Carlo analysis has become a necessary component in crisis management, especially involving viral outbreaks such as the current Ebola crisis. However, it is only through the expertise of Monte Carlo programmers that simulations can be streamlined to provide increasingly accurate results. These results can be used to strategically target medical interventions, and curb the spread of Ebola or other viral outbreaks.

Ebola vaccines are currently being developed but, until pharmaceutical companies can mass-produce them, they will be in short supply. With this in mind, Monte Carlo analysis could help in the strategic deployment of vaccines, reducing the spread of Ebola and minimising further suffering.

These sample images (above and below) are taken from the modelling analysis of the 1976 Ebola outbreak, which formed part of the paper “Potential for large outbreaks of Ebola virus disease”, published here under a Creative Commons licence.

Read more on Big data analytics