Tony CollinsThe failure of systems at the London Stock Exchange last week
was due initially to a software bug - as yet still unidentified -
but compounded by weaknesses in emergency escalation procedures,
Computer Weekly has learned.
The problem began with a bug in a non-critical overnight trading
systems programme that purges old message logs and the previous
day's market data. The batch programme usually takes about an hour
to run. In the early hours of Wednesday it took four hours.
This was not a disastrous problem in itself, but all of the
exchange's 300 overnight batch programmes must run one after
another, and not in tandem. This time, while the first batch
programme was still running, a second unrelated batch programme
started, for reasons that are not yet clear.
This caused a set of problems that had not been predicted, Chris
Broad, the exchange's head of service development, said.
With the two programmes running in tandem, rather than
sequentially, data was corrupted. Information from the previous
day's trading became mixed with that being prepared for the coming
day.
One of the main lessons to be learned from the incident appears
to relate to the escalation procedures that involve the system
operators and developers Andersen Consulting. Escalation procedures
define the actions that computer operators must take to cope with a
potential emergency.
"The procedures were complied with," said one exe-cutive, "But
it was the escalation procedures themselves that were found
wanting."
The exchange has now introduced manual and software procedures
to prevent the batch programmes overlapping. But some key questions
remain unanswered:
- What was the software bug and can it be replicated and
therefore identified?
- Why was the potential seriousness of the problem not realised
sooner?
- With programmes that must run sequentially, rather than
overlap, why had risk analyses not spotted the potentially
disastrous consequences of a problem that caused two programmes to
run into each other?
Andersen Consulting declined to comment on the failure.
Computer Weekly has also learned that a top audit partner
at Andersen has flown in from the US to help with the
inquiries.
Exchange sticks with Anderson