Guy Hains, President, Europe Group, Computer Sciences Corporation, has spoken of the causes and lessons learned from a crash at the company’s Maidstone data centre.
The crash caused a loss of systems for NHS trusts on an unprecedented scale. About 80 NHS trusts lost the use of some of their main IT systems for several days.
On 10 May 2007 Guy Hains told a Health Committee inquiry into aspects of the NHS’s National Programme for IT [NPfIT]:
“To go back to Maidstone, I believe that the biggest risk in the computer industry generally at the moment is unreliable power supply. Generally across the world power has become more spiky which is ruinous to any sort of IT system.
“Last year we experienced a power issue at Maidstone which caused a short in our configuration there. It set up a position within a storage device that required experts to come from Japan effectively to reset that system.
“We transferred the operation between our Maidstone centre and the reserve centre which was effected without data loss, as was the pass back to the primary data site some weeks later. We learnt several things from that.
“First, we learnt that as we scale up the system it is better to have four centres than two, which is what we have invested in, so that data is now not only mirrored but effectively held simultaneously in two places. We are holding that across four centres, not just two.
“Second, out of that experience with the authority we have tightened our targets and expectations of how quickly systems need to be brought up. There was a category of systems which were deemed not to be critical – they were non-acute and departmental-type systems – and there was a fairly leisurely take-up…
Some systems can come up much later. The view was that 72 hours would have been acceptable for that. We have now set a new timetable which says that in any failure of these systems the expectation is one of 24 hours and, within critical environments, clearly a much shorter time is expected. We have learnt from that. We have increased our investment in multiple centres and have worked with the authority to re-evaluate those levels of acceptability in bringing up systems. Just to confirm it, our job is not done unless we have also defined contingency and manual procedures.”
The chairman of the Health Committee, Kevin Barron, then asked:
“I believe that it affected about 72 primary care trusts and eight hospitals which suffered a loss of data for that period of time.”
Guy Hains replied: “There was no loss of data; it was loss of availability.”
Barron: “They were unable to get hold of the data for a time. Were there any clinical implications of which you are aware?”
Hains: “No. Clearly, there was an administrative implication. With our staff who have security clearance to help with administration we went out and paid for additional administration support. Therefore, the pain was administrative. We have no evidence of clinical risk or data loss, and it was inconvenience for which we are really sorry. Our remedy has been the further investment we have made regarding setting up a much more resilient environment.”