Romolo Tavani - stock.adobe.com
Networking problems are on course to overtake power supply issues as the most common source of datacentre outages, as enterprises look to move more of their workloads to the cloud, according to the Uptime Institute.
The datacentre resiliency thinktank’s third Annual outage analysis seeks to shine a light on the frequency of downtime incidents affecting server farms over the course of the past 12 months, as well as their causes.
The 2021 report suggests that the frequency of outages appears to have dampened markedly over the course of the past 12 months, with the onset of the Covid-19 coronavirus pandemic cited as a factor.
“According to our public outage tracking, 2019 was a particularly bad year for server outages, while 2020 was the best year yet recorded. Not only were there fewer outages reported by publicly available sources, but a lower proportion were serious or severe,” the report stated.
“This is probably because the level of business-critical activity was significantly disrupted and/or depressed due to Covid-19.”
A direct consequence of the government-imposed lockdowns and stay-at-home orders the pandemic brought about last year is that many companies temporarily ceased or scaled back their operations, which may have reduced the number of outages that occurred.
Furthermore, in keeping with the Uptime Institute’s own advice to datacentre operators at the start of the pandemic in March 2020, many firms also sought to delay datacentre maintenance and upgrade projects, which are typically a source of outages, the report further stated.
“Looking at global, enterprise-class IT more generally (spanning private datacentres, colocation and public cloud), Uptime Institute’s annual survey data provides a consistent picture over several years, with power problems invariably the biggest single cause of outages,” the report stated.
Citing data from the Uptime Institute’s 2020 global survey, the report said that on-site power failures remain the biggest cause of “significant outages”, followed by software and IT issues, and networking trouble.
“Overtime, Uptime Institute expects that more outages will be caused by networking and software/IT, and fewer by power issues,” said the report.
This is, in part, due to the fact that the rate of power-related outages is in steady decline, as operators have take action to improve the design of their facilities and have trained their staff to take preventative action against such downtime incidents occurring.
In the meantime, networking-related outages are becoming increasingly prevalent due to the “broad shift in recent years from siloed IT services running in dedicated, specialised equipment” to a model where IT systems are distributed and replicated across multiple sites linked together by network connections.
“Networking issues are now emerging as one of the more common – if not the most common – causes of downtime. The reasons are clear enough: modern applications and data are spread across and between datacentres, with networking ever-more critical,” the report stated.
“To add to the mix, software-defined networks have added great flexibility and programmability, which can introduce failure-prone complexity.”
At the same time, enterprise datacentres are typically served by “one or two” telecommunications providers, but with companies increasingly looking to shutter such facilities in favour of using colocation or public cloud datacentres to run their workloads, the risk of networking issues blighting their operations rises.
“Multi-carrier colocation hubs can be served by many [telcos]. Some of these links may, further down the line, share cables or facilities – adding possible overlapping points of failure or capacity pinch points,” stated the report.
“Configuration errors, firmware errors, and corrupted routing tables all play a big role in networking-related failures…Congestion and capacity issues also cause failures, but these are often the result of programming/configuration issues.”
Andy Lawrence, executive director of research at Uptime Institute, said the report serves to reinforce the fact that resiliency remains a top of mind concern for business leaders, while also highlighting emerging threats to their ability to keep their IT systems up and running.
“Overall, the causes of outages are changing, software and IT configuration issues are becoming more common, while power issues are now less likely to cause a major IT service outage,” he said.
“The fact is outages remain common and justify the increased concern and investment in preventing them. Because of the disruption and high costs that result from disrupted IT services, identifying and analysing the root causes of failures is a critical step in avoiding more expensive problems.”
Read more about datacentre outages
- One month on from the OVHCloud datacentre fire, it is time to assess the lasting impact the event might have on the way server farm operators run their facilities, and cloud users approach disaster recovery
- The financial fallout from datacentre outages is continuing to grow year on year (YoY), despite operators admitting that most downtime incidents could be avoided if they were to invest more in the resiliency of their facilities.
Read more on Datacentre disaster recovery and security
Riskier business: How Brexit could be changing downtime risk and resilience in the datacentre
Uptime Institute highlights patchy reporting of water use by datacentre operators
Triggered dormant bug sees Fastly CDN cut to the quick
The OVHCloud fire: Assessing the after-effects on datacentre operators and cloud users