How to plan for a disaster-free storage Christmas

How to ensure preparedness in storage, backup and disaster recovery for the holidays, when many organisations operate only skeleton storage staff.

Christmas is the longest continuous holiday period during the year for many organisations. But that doesn't mean that storage professionals can take their eye off the ball.

Over the holiday break, backups still need to be performed and disk space on SAN and NAS arrays must still be provisioned for any applications that will continue to run. It's also a troublesome fact that storage hardware failures don't respect holiday periods.

And though for some companies the Christmas period is a time of shutdown, for others – notably retailers and organisations with offices around the world – it can be a time of heightened activity, often against a backdrop of non-standard staffing patterns. So whether Christmas means shutdown or a flurry of activity, for most companies it will mean a different rhythm of business, and planning for the holiday period will be crucial to ensure that everything in the storage environment runs smoothly.


Before going on holiday, it is vital to ensure that SAN and NAS systems have sufficient headroom to meet the capacity requirements of all applications, especially if they support major undertakings such as closing the company books. Hamish Macarthur, chief executive of analyst firm Macarthur Stroud International, says, "If you end up with higher levels of activity than expected, you want to ensure you can capture that and not lose business."

It is also recommended that organisations undertake a full system review before the holiday period commences. "Storage managers will know the trends around disk usage and what the common errors are, so they can prepare for that," says Martin Taylor, converged network manager at the Royal Horticultural Society. "But if you undertake a full system review, it'll give you peace of mind when you're not in the office."

Such reviews should include:

  • Checking and resolving any outstanding faults.

  • Testing that alert systems are working and that event notification settings are valid so that staff will be notified instantly by email or text message should something go wrong.

  • Establishing whether monitoring and reporting tools are configured to the norm so they only highlight potential problems rather churning out information on regular ongoing systems activity – a factor that is particularly important in an environment where a number of point systems management products have been deployed.

  • Printing out detailed configuration reports for each storage system and subsystem, then saving software configurations to disk and file so that in an emergency, staff can reload the configuration file to recover all the settings.

  • Ordering spare parts in advance and putting them in a place that is accessible and known to staff.

  • Ensuring that equipment is secured physically in a lockable room and that keys or passwords are stored in a location that is only accessible by authorised storage personnel.
A similar rationale pertains to backups. On the one hand, systems should be fully backed up before the holidays begin and backup media stored in a secure location. But enough additional media should also be made available to support scheduled backups over the vacation period without the need for manual intervention. Checks should be made to ensure systems are configured correctly so that scheduled backups can take place.

According to Sagar Vadher, head of IT at online retailer I Want One Of Those.Com, "Ensure that you are backing up the right elements and that they can be restored. You need to do a full test, although ideally you should be doing that at least monthly anyway."

If you undertake a full system review of disk usage, it'll give you peace of mind when you're not in the office.
Martin Taylor
converged network managerRoyal Horticultural Society
Documentation for storage personnel

Because storage personnel need to be adequately equipped to deal with potential incidents, they should be briefed in advance about what problems could occur and how to resolve them. Such information should be supported by documentation that lays down processes and procedures in an unambiguous step-by-step format. Also, the staff needs to be informed of the location of such documents.

"We have now written down all procedures for testing and what needs to happen if something goes wrong," says Alex Gomes, senior systems engineer with global property adviser CB Richard Ellis. "For example, for a failover event, which is very important because it's not something you can test all the time, we have very extensive documentation and provide clear steps on what to do if an incident occurs. It's really important for best practice purposes, particularly as someone doing it this year might not be around next year."

Also useful is a checklist of daily administration tasks for covering staff, with information on how to undertake them presented in simple and unequivocal fashion.

Other useful forms of documentation, meanwhile, include an emergency action plan chart. This should include the location of spare parts and instruction documents and the contact details of relevant staff, third party suppliers and maintenance providers. Vadher says, "One of the most likely things to go wrong is a hardware failure so it's important to ensure that good maintenance contracts are in place so that you can get the right people in quickly."

Expect the unexpected in storage

The secret to dealing with unexpected circumstances is good planning. This involves assessing potential risks, ensuring that mitigation processes are in place and making certain that staffing levels are adequate, whether personnel are on-site or have access via remote management capabilities. A rota system is useful, but it is also advisable to ensure a senior staff member is on call to provide leadership should an incident occur.

"It's about applying common sense and thinking through what might happen," says Jim Spooner, strategy services team leader at IT services provider Glasshouse Technologies UK. "So look at what your key ongoing issues are and how they can be resolved."

Ensure you are backing up the right elements and they can be restored. You need to do a full test, although ideally you should be doing that at least monthly anyway.
Sagar Vadher
head of ITI Want One Of Those.Com
Part of this process involves taking the time to establish which departments or countries will still be working over the holiday period and to discuss their requirements with business managers.

For instance, the firm CB Richard Ellis has offices in 57 countries and not all of them stop for Christmas. "We have to ensure adequate backup and disaster recovery facilities are in place for them," says Gomex. " It's a process thing – you need to ensure the business is happy and prepare for worst-case scenarios."

Ideally you should give yourself plenty of time to complete tasks rather than rushing to do everything at the last minute. Therefore, plans should be lined up at least the week before Christmas.

Early planning is what I Want One Of Those.Com does. The online retailer generates 60% of its revenues over the Christmas period and an outage of only five minutes can cost as much as £10,000.

"We aim to get the majority of work, which includes development activity, done by the end of July because, by the middle of August, we're starting to gear up for Christmas," says Vadher. "So, in July for example, we do a full system test, which involves breaking things apart and ensuring we can put them back together."

Good storage planning is for life, not just Christmas

Having effective policies and procedures in place is the key to a peaceful holiday. But these things should be "for life and not just for Christmas". Dealing with the ups and downs of the business year should really be in place all the time. According to Taylor of the RHS, "If you have sound processes and procedures and a good solid maintenance regime in place, you really shouldn't need to make any special provision for different times of year, including Christmas."

Read more on Disaster recovery