Anterovium - Fotolia
A Microsoft Azure outage left users across the globe unable to access key services for a couple of hours on 15 September because of a DNS (domain name system) malfunction.
The problems began just before midday, UTC time, with Microsoft warning users on its Azure status page that they might encounter availability issues when trying to use a range of the company’s cloud services, including SQL database and Azure backup.
A brief note on the company status page cited a spike in networking traffic as the root cause of the problems, which were reportedly resolved around 1pm UTC time.
“This caused service-level drops for the DNS service,” the status page post read. “This resulted in connectivity issues for services reliant on DNS. Once network mitigation was implemented, most services fully recovered. Microsoft SQL Azure had secondary impact due to a misconfiguration.”
In a follow-up statement to Computer Weekly, Microsoft confirmed normal service for Azure users had resumed.
“Some customers may have experienced a service interruption,” it said. “Services are now fully restored. Customers can receive updates at Microsoft Azure Service Health Dashboard.”
A further statement on the Azure status page said the company intended to publish a detailed analysis of what had gone wrong within the next two days.
Read more about outages
- SSP confirms to Computer Weekly it has no clear idea when all brokers will be able to continue doing business via its SaaS platform, two weeks on from Solihull datacentre power outage.
- Power supply issues at Docklands datacentre could be behind loss of internet access for more than 5,500 BT broadband users.
Microsoft was not the only cloud provider to run into technical difficulties, with Google Apps for Work users across the US and UK unable to use the service for 90 minutes on 14 September.
“We will conduct an internal investigation of this issue and make appropriate improvements to our systems to prevent or minimise future recurrences,” Google said in a statement posted on its status page. We will provide a more detailed analysis of this incident to the affected customers once we have completed our internal investigation.”
David Hood, cyber resiliency expert at email security firm Mimecast, said the service problems highlight why companies cannot afford to overlook the imporance of business continuity when shifting workloads to the cloud.
“In a cloud-first world, organisations need a cyber resilience strategy for when Azure, Office 365 or another critical cloud service goes offline. Organisations need a continuity plan to keep operating when their primary provider becomes unavailable,” he said.