svetazi -

O2 outage highlights importance of software certificate audits

A major outage on the O2 4G mobile network was caused by an expired certificate and could easily have been avoided, it has emerged

O2’s 4G network is now back up and running after a day-long outage on Thursday 6 December caused by an expired software certificate – which could have been avoided very easily.

The outage began between 4am and 5am and besides millions of O2 customers, it impacted numerous enterprises and organisations that run connectivity services on the O2 network, including Transport for London and a number of NHS trusts.

The operator’s 3G mobile data service began to return early in the evening of 6 December and was fully restored by 9.30pm.

“I want to let our customers know how sorry I am for the impact our network data issue has had on them, and reassure them that our teams, together with Ericsson, are doing everything we can,” said Mark Evans, CEO at O2 parent Telefonica UK.

“We will continue to work with Ericsson … which has assured us that a full service will be restored for customers by the morning. We fully appreciate it has been a poor experience and we are really sorry.”

Behind the scenes, O2 technical teams and Ericsson engineers worked through the night to restore the 4G network, which was completed by about 3.30am on Friday 7 December, almost 24 hours after it first collapsed.

“We can now report that our 4G network has been restored,” said an O2 spokesperson. “Our technical teams will continue to monitor service performance closely over the next few days to ensure we remain stable. A review will be carried out with Ericsson to understand fully what happened.

“We would like to thank our customers for their patience during the loss of service on Thursday 6 December and we are sorry for any impact the issue may have caused.”

As of 11am on 7 December, O2 told ComputerWeekly that both 3G and 4G services were running as normal, although technical teams were continuing to monitor performance closely. “We will be updating our customers later today on how we will make yesterday’s data service issue up to them,” said the spokesperson.

Initial analysis of the problem by O2’s supplier, Ericsson, which also supplies back-end equipment to many other mobile network operators around the world – some of whom, including Japan’s SoftBank, were also affected – said the issue occurred because of an expired certificate in the software installed at customers using a specific version of its Serving GPRS Support Node – Mobility Management Entity (SGSN-MME) in their core network.

Read more about certificates

  • Certificates in Windows 10 create a chain of trust that confirms the identity of the user accessing corporate resources and ensures he or she is doing so over a trusted connection.
  • VMware and Citrix offer numerous certifications for their virtual desktop and app products. Test yourself on the different levels and types of certifications and what each one covers.
  • An expired Exchange 2010 certificate is one of those issues that catches everyone’s attention. Check and replace certificates with these basic commands.

“This episode illustrates the essential role certificates play in keeping IT infrastructure safe and running, and also the risk that enterprises face if they don't have a firm handle on the certificates installed in business-critical systems,” said Tim Callan, senior fellow at certificate issuer Sectigo and a member of the industry body that came up with the Extended Validation (EV) SSL certificate.

“The proliferation of certificates and ever-increasing complexity of IT infrastructure has made it more and more challenging for IT professionals to stay on top of this component of their networks.”

Kevin Boeck, vice-president of security strategy and threat intelligence at Venafi, added: “The identity of machines makes the internet run. Machine identities allow our mobile device, networks and computers trust each other. But they expire and networks, allocations and businesses fail.

“Today’s O2 outage is just one more example of how important machine identities are to the economy and when they fail, everything from buses, mobile devices and more, fail. O2’s experience is the same that banks, airlines and the high street have all faced. It’s painful for millions and these problems are only getting worse as we depend more on clouds, mobile devices, AI, and the coming arrival of 5G networks.”

Writing on LinkedIn, Weightless SIG CEO William Webb said that although Ericsson was now likely to take steps to make sure its software certificates were kept fully up to date in future, the complexity of modern-day mobile networks, and the inability of humans to write error-free code, meant network failures were inevitable, and their severity would increase as 5G comes on-stream and more and more facets of day-to-day life become enabled for mobile.

“If operators pick up more IoT [internet of things] traffic and 5G emerges and becomes a more pervasive part of our lives, then such outages could become very serious indeed,” wrote Webb, who has long been sceptical of the business case for 5G and published a book to this effect in 2016.

The simplicity of Ericsson’s error makes it highly likely that such outages will occur again, so it is imperative that enterprise users consider their mobile strategies very carefully and take steps to minimise their dependence on a single external point of failure – that is to say, their operator partner.

“If I were a factory owner, I would be wary of moving all my factory communications to a mobile operator,” wrote Webb. “If I were running smart metering or similar, I might look for multiple methods of communications, allowing a fall-back to a different solution if one fails. If I were government, I might start to consider mandatory national roaming when a network is considered to have failed.”

Read more on Mobile networking

Data Center
Data Management