What to do when back-ups break down

Everyone knows how important back-ups are, but few understand the problems of restoring data or how important it is to test your...

Businesses are so reliant on their data that only the very naive do not make regular back-ups. And if lack of business sense prevents some companies from running sensible housekeeping routines, new and stringent legislation now requires businesses to keep data available for the purpose of audit trails and data protection compliance.

Traditionally, back-up and data archives were the preserve of the finance director, who needed a record of data to complete year-end figures for the Inland Revenue. However, post-Enron and WorldCom, corporate governance has extended to keeping an audit trail of all transactions for regulated industries, and to maintaining records of business-critical documents for others.

Unfortunately, even with good policies and technology in place, things can still go wrong. "Everyone understands that people need to do back-up, but lots of people do not realise how long it takes to do the restore, or that they need to test the restore," says Adrian Palmer, managing director of data recovery specialist Ontrack.

"One false click and a deleted file can cost an IT person half a day's work to restore it," says Ed Jones, director of Thinking Safe, a back-up and recovery services provider. This is multiplied many times over when an entire server or network goes down and companies have not performed a dress rehearsal or even planned where a restore might take place.

Howard Scoot, managing consultant at Avaya Global services, says a lack of forward thinking is evident across all types of companies. "It is not given a high enough priority in any organisation. There is often no idea of how the restores should be done, which data should be prioritised, or who is in charge of conducting it. The IT manager might not have enough clout to say to the call centre boss 'This is how we come back online'."

However time consuming a restore is, it becomes a minor inconvenience compared to the worst-case scenario of data loss. Physical media problems that may scupper data include using discs from the same batch in a Raid array. And manufacturing faults could then compromise failover as well as an individual disc.

Similarly, a disc head crash could also be terminal. Palmer likens it to a needle flying erratically over a 12-inch vinyl record and ripping into it. "Because data is not stored contiguously, it is like losing the index of the book and so the computer does not know how to rebuild the file."

Practising restore procedures should flush out errors in hardware and human practice, but unfortunately these are rarely discovered until a real incident. Recovering data from the commonest form of back-up - tapes - is littered with pitfalls. Mishaps that could occur include companies being given the tape from the wrong day or from another company. As the data stored on tapes is not usually encrypted, the latter is a huge corporate governance exposure that is often overlooked.

Reliance on a third party that is some distance from the business may also cause unnecessary delays to data recovery.

One reason that solicitors J Keith Park chose to manage its Raid array in-house was to improve resilience. Among smaller companies, data back-up is frequently neglected or left to inexperienced people, often the finance department. "Although it is blindingly obvious to an IT-literate person that 10Gbytes of data cannot be backed up in 10 seconds, this is not apparent to the finance department," says Palmer.

Even when procedures exist for restoring data, the realisation of how long it can take and the financial loss to the business of downtime can instigate panic and a race to restore. "Over-zealous IT technicians have been known to gouge hard drives with a screwdriver or accidentally delete a partition from a server," says Palmer.

When physical media is damaged and the data it contains is precious, companies may resort to the services of the data recovery specialist. Ontrack provides "clean rooms" where engineers rebuild data files from damaged discs. Alternatively, they may download tools to a customer site and do remote repairs.

The cost ranges from a few thousand pounds, which is what clockmaker Ben Bray paid to recover his clock-making tool blueprints, to tens of thousands, where engineers work around the clock to restore mission-critical data.

New legislative and business pressures are encouraging companies to have data not only backed up but accessible. IT directors are reviewing traditional methods of storing banks of tapes in dusty rooms and then resorting to experts in a crisis. Jones says, "Although most people focus their time on back-up, the problem is recovery. Back-ups are only an enabler to the recovery process. A solution is required that ensures automation of the back-up and the recovery replication process."

Storing data locally using on-site back-up appliances so that they can restore even if the internet connection is down is one route to speedier recovery, says Think Safe. Where offsite recovery is the key strategy, co-locating standby servers next to the offsite back-up appliances enables a server to be recovered to the state it was an hour ago. Configuring a firewall then means that business applications are available remotely.

However, with the focus shifting to the recovery of voice and data communications to provide audit trails of contested transactions, IT directors are having to raise the data recovery bar again. In the words of one IT director, "If the business needs to see an e-mail and it cannot be found in its original format, that is our problem."

The potential of network monitoring technology to provide selective content recovery is being explored by some UK financial services companies. Chronicle Solutions developed Netreplay to be the ultimate "Big Brother" traffic monitoring product, but discovered that its ability to identify, store and recover content objects could help with legal compliance. The card sniffs all traffic and recognises content, as opposed to packet information, and stores this as a unique blueprint in a content library.

Thus, deals negotiated during internet chatting or messaging - an increasingly common medium for business transactions - can be recovered. "I was recently chatting with a US sales engineer and a French software developer over MSN. We transferred files, but once the session was over, it is normally impossible to recover data from MSN," says Melville Carrie, vice-president of research and development for Chronicle Solutions.

Using the fingerprinting capabilities of Netreplay, Carrie was able to recover the files associated with the conversation. Traditionally, the inbound and outbound parts of a conversation are stored separately. Carrie says, "Using the object approach enables a chain of evidence to be built up including where the file came from, whether it was zipped or e-mail, and who interacted with it."

Although users are getting clued up about the importance of restoration and suppliers are adapting technologies to assist compliance, one community remains vulnerable. Mobile users who are carrying details about new business deals on their laptops and can only back-up when connected to the network are highly susceptible to physical knocks or the latest virus that can wipe out valuable data.

"Alongside continuous data protection, how an IT department integrates remote users in its data recovery and back-up plans is a major issue," says Hamish Macarthur, director of consultancy Macarthur Stroud International. Ontrack confirms that it receives calls from FTSE-100 executives who store .pst (Outlook) files locally and then lose them. "This can be a calamity and typically we find they bypass any business continuity plan," he says.

Data recovery may be the last thing on people's minds during daily operations, but the need to protect the business and keep it on the right side of the law is forcing it up the agenda.

Read more on IT risk management