Case study: Putting disaster recovery into action when revellers strike

How an out-of-hand office party in its building brought Icelandic IT services company Midverk to the brink of collapse

Infrastructure managers are no strangers to worrying about disaster. Preparing for eventualities such as fires, floods and power cuts goes with the territory. 

Most, however, don’t spend too much time thinking about protecting their servers from drunken revellers. So when Jon Helgason, IT manager for Icelandic software development and IT management specialist Midverk, left the building one Friday evening last winter just as a party was livening up in one of the other offices in the building, he never suspected that just a few hours later this Nordic knees-up would put his business perilously close to the point of collapse.

Helgason was rudely awakened by a call from building security at 5am the following morning. They informed him the company’s servers had been without power for almost two hours. Helgason was more than a little confused. He’d planned for power failures, of course, and his uninterruptible power supplies (UPSs) should have kicked in after 10 minutes. 

“When I arrived at the building, the first thing I saw was beer and vomit outside our datacentre,” he says. 

So Helgason followed his nose and soon worked out what had happened: “Someone from the party had gone into a restroom on our floor and ripped out the light, which short-circuited the whole floor. When the electricity came on, the UPSs were down.” 

We needed a backup immediately – it was a do-or-die situation

Jon Helgason, Midverk

So far so bad, but things were about to get a whole lot worse. “One of the SAN arrays that held all of our important software development data was corrupted,” he says. It contained four years of code and was worth in the region of half a million US dollars. 

With no up-to-date backup working and no redundancy in the system, the situation looked dire. “My boss told me that if we didn’t get our data back, we would have to close the company,” says Helgason. 

Recovering from the data disaster

Fortunately, after a lot of sweat and worry, Helgason’s IT service provider managed to bring the faulty array back online, but Midverk was not out of the woods yet. 

“Because the system was in such a fragile state, we needed a backup immediately – it was a do-or-die situation,” says Helgason. “I didn’t want to use our legacy backup appliance because the latest version was too complicated to use, especially during a crisis that left the development data unprotected.”

In a bid to avert potential disaster, Helgason downloaded a trial version of Veeam’s backup and replication software for virtualised environments. “I installed it quickly and three hours later I had everything in our VMware vSphere environment backed up and running,” he says.  

Not only did the move save the day, it also led to Midverk subsequently winning awards for "best disaster recovery project" and "best in show" at the VMworld Europe 2013 User Awards – a fact Helgason puts down to the simplicity of the software. 

“The interface is so straightforward that I didn’t have to read the user guide, which saved crucial time during the crisis,” he says.

Midverk certainly isn’t alone in neglecting to ensure backup procedures were always dependable, but there is nothing like the threat of total disaster to shock a business into action. This was the company’s wake-up call, and now Helgason is confident its backup process is exemplary. 

Backup lessons learned

The day after the power outage, the business purchased the full version of the Veeam software, and Helgason also installed it at home to ensure Midverk would always have a second set of virtual machine backups in future. 

There is nothing like the threat of total disaster to shock a business into action

At home, he runs a virtualised environment based on Microsoft Hyper-V, but he says this was not a problem since the software supports multiple hypervisors. 

“Now we back up our software development virtual machines every four hours rather than once a day like we did before, which greatly reduces the chance of us losing code,” he says.

The software also has built-in deduplication and compression, which has allowed the company to shrink its backups to a quarter of the size they were before, slashing Midverk’s storage costs. Who knows, the company could even divert some of those savings to a huge office party. On second thoughts…

Read more on Datacentre disaster recovery and security