Emergency procedures fail the stock exchange

System failure at the London Stock Exchange was caused by a software bug in a non-critical overnight trading systems program, compounded by weaknesses in emergency escalation procedures

The failure of systems at the London Stock Exchange last week was due initially to a software bug - as yet still unidentified - but compounded by weaknesses in emergency escalation procedures, Computer Weekly has learned.

The problem began with a bug in a non-critical overnight trading systems program that purges old message logs and the previous day's market data. The batch program usually takes about an hour to run. In the early hours of Wednesday it took four hours.

This was not a disastrous problem in itself, but all of the exchange's 300 overnight batch programs must run one after another, and not in tandem. This time, while the first batch program was still running, a second unrelated batch program started, for reasons that are not yet clear.

This caused a set of problems that had not been predicted, Chris Broad, the exchange's head of service development, said.

With the two programs running in tandem, rather than sequentially, data was corrupted. Information from the previous day's trading became mixed with that being prepared for the coming day.

One of the main lessons to be learned from the incident appears to relate to the escalation procedures that involve the system operators and developers Andersen Consulting. Escalation procedures define the actions that computer operators must take to cope with a potential emergency.

"The procedures were complied with," said one executive, "But it was the escalation procedures themselves that were found wanting."

The exchange has now introduced manual and software procedures to prevent the batch programs overlapping. But some key questions remain unanswered:

  • What was the software bug and can it be replicated and therefore identified?
  • Why was the potential seriousness of the problem not realised sooner?
  • With programs that must run sequentially, rather than overlap, why had risk analyses not spotted the potentially disastrous consequences of a problem that caused two programs to run into each other?

    Andersen Consulting declined to comment on the failure.

Read more on Antivirus, firewall and IDS products