Research published last week by analysts IDC has shown that 21% of the UK's big businesses do not have detailed business continuity plans. The survey – which was commissioned by BT Global Services - has revealed that despite recent instances of threats from flooding and terrorism many large organisations are still not fully prepared for disaster.
Looking at businesses of all sizes the survey found that 29% lacked business continuity plans. It's a remarkable percentage. If fire or flood hits your business you'll be losing money every hour that your systems are down - and the long term effect on the business can be fatal. According to Government figures nine out of 10 businesses that lose data in a major incident don't survive another two years.
So, what should be the aim of business continuity and disaster recovery planning and how should the business and the IT department draw up contingency plans for when catastrophe strikes?
First of all it's crucial to distinguish between business continuity and disaster recovery. Business continuity planning is the specification of key objectives needed to get the business operating after a disaster, and will encompass the wider aspects of the organisation's processes – such as being able to service key accounts – with the aim of getting them online again as soon as possible.
Disaster recovery, by contrast, is defined as the process of restoring the infrastructure and systems necessary for the business continuity plan to succeed, so is much more the concern of the IT department – and of those responsible for data and its storage – than business continuity.
"It all comes down to risk management," says Clive Longbottom, service director with analyst organisation Quocirca. "You need to determine what disasters are within the realms of possibility – and prioritise them according to whether they are more or less likely to happen. Having determined the main scenarios you might face you can categorise them further in terms of likely impact on the business."
At this point you will be in a position to specify what should be the recovery time for the parts of the business affected by different disaster scenarios and work out a concrete response in terms of people, processes, hardware and timescales, says Longbottom.
He says, "When disaster strikes, what happens? All humans need to know what they are meant to do. They should know their role and have practised it. The plan should have been tested, or at least elements of it, on a regular basis and everyone should have their particular courses of action to follow without deviation."
Part and parcel of this, says Longbottom, is ensuring the physical necessities for recovery are available – and this will vary from, say, spare individual hardware components to whole estates of server and storage hardware in a location that is safe from the point of disaster.
Those are the general outlines of an IT disaster recovery strategy for IT departments. When it comes to planning in detail there are best practice standards that can be referred to, such as BS25999 part two, which was released late last year.
BS25999 part one outlines the general processes, principles and terminology of business continuity management and has been available since 2006. Part two – which is the most downloaded British Standard to date – specifies best practice in terms of the people, infrastructure and information flows needed to get a business up and running with minimum disruption if disaster strikes. It also makes it possible for organisations to have their business continuity management arrangements independently certified by external auditors. BS25999 replaces the BSI's Publicly Available Specification PAS 56, which has been available for four years.
Jon Collins, service director with analyst Freeform Dynamics, says, "The first port of call for UK organisations should be to consider the recently completed BS25999 standard for business continuity management. The good news is that those defining that standard have quite clearly taken a pragmatic approach, with the result that the standard should be applicable to most sizes of organisation."
Collins points out that a key questions asked by BS25999 is, "Do you understand your business?"
London-based law firm's Trowers & Hamlins' IT manager Bob Greenwood clearly recognised the signs of vulnerability when his organisation moved from three premises in the city to a single one in 1999. That move massively heightened the firm's exposure by creating a single point of failure for the business's servers and storage, so he acted to implement a hardware configuration that could withstand external shock.
Greenwood says, "When we moved our operations to one office at Tower Hill all our London staff were in one place so we faced a significantly greater risk. From that point on we spent a lot of effort on planning DR, not least because it reduces insurance premiums if you can demonstrate you have clear plans in place."
Initially Greenwood's team replicated key data to a south London site used as a document store where there were extra servers and enough room for about 12 people to work.
That plan, however, soon foundered. "We used Double Take replication software to mirror filestores and SQL databases to the site to the site but we found it inadequate – either the complexity of the data was too much for it or the network link was insufficient," says Greenwood.
So, Trowers & Hamlins decided to go for a more robust hardware solution and opted to deploy a pair of IBM SANs while re-purposing the now-spare servers to form part of the DR facility.
The solution entails –- at the City site – IBM System X servers connected to an IBM DS4700 Fibre Channel array with dual IBM SAN Volume Controllers which relay data to the backup site where integrator Tectrade implemented further SVC nodes, an IBM SATA DS4200 Express and virtualised servers under VMware.
The SVCs' mirroring of the production to the seconday site plus VMWare's ability to virtualise servers at the remote location were key advantages for Greenwood.
He says, "When we found out what the IBM SVCs could do, that gave us a lot of confidence, especially being able to replicate SQL and Exchange filestores with great accuracy. This meant we could think about how we could use similar hardware in the two locations and use VMWare to have three or four servers at the second site replicating the 16 servers at the live one."
On the basis of that hardware configuration Trowers & Hamlin's disaster recovery plan entails recovery within 24 hours with office accommodation for key workers on the most important accounts at the south London site. The firm recognises it is not able to have all 500 staff working straight away but believes it can last for a few weeks until new premises and hardware are secured.
Crucially also, the plan has been tested, says Greenwood. "We test the plan every 12 months. During the last test we managed to get the business-critical applications, namely our e-mail systems, document management and practice management system, back on-line within one business day. At all times during normal working our DR system captures all business transactions within five minutes of commitment to the live system."