Lancaster University begins datacentre research to establish cloud latency causes

Researchers, in collaboration with Microsoft and others, want to stamp out data processing delays in datacentres to improve user experience and energy efficiency

Lancaster University has secured £120,000 in funding for a two-year datacentre research project to find the root causes of latency in hyperscale facilities.

The project will aim to determine why data processing delays sometimes occur within datacentres, which, in turn, affect the performance of the cloud and internet-based services underpinned by these facilities.

Delays are known to occur when cloud computing tasks, for example, are broken down into smaller jobs to make it easier for the related data to be processed across thousands of server nodes.

For unknown reasons, some of these smaller jobs can take longer to complete than others, causing knock-on delays (referred to as stragglers) – ranging from seconds to minutes – when it comes to completing the overall task.

The resulting performance lag, known as the “long-tail problem”, can be frustrating for end-users, while the operational inefficiencies created by these delays can also prove costly for datacentre operators, the researchers claim.  

“As datacentres grow larger, so does the problem of stragglers,” the research team said in a statement. “The problem cannot be fixed by simply adding more server nodes, nor can it be diagnosed in a straightforward manner. It stems from a multitude of possible causes, including failures, data size, system usage, and even temperature.”

The researchers also claim that, without addressing the root cause of stragglers, emerging technologies that rely heavily on internet connectivity could fail to reach their full adoption potential.

Dr Peter Garraghan, a lecturer in distributed systems at Lancaster University’s School of Computing and Communications, said the problems caused by stragglers were an area of growing concern for the US tech giants and hyperscale cloud firms.

Read more about datacentre research projects

“We know that stragglers can cause significant delays to distributed systems, such as the internet of things and cloud datacentres, causing myriad problems, such as increasing operating costs and the energy consumption of computing systems,” he said.

“We don’t exactly know what conditions are likely to cause stragglers and so system managers are unsure of how to avoid their occurrence. They simply ‘live with it’, which going forward is becoming increasingly unfeasible.”

Establishing what causes stragglers should make it easier for datacentre operators to work out how best to prevent long-tail problems occurring, said Garraghan.

“Our work, in collaboration with leading industrial partners with massive-scale distributed systems, represents a significant step towards solving the long-tail problem and will provide direct benefits to the user experience, the operational costs of service providers, and will enhance the competitiveness of the UK digital economy,” he added.

Read more on Datacentre performance troubleshooting, monitoring and optimisation