Accountable to whom?

The blaming of the pilots for the fatal crash of their Chinook helicopter on the Mull of Kintyre on 2 June 1994 highlights the...

The blaming of the pilots for the fatal crash of their Chinook helicopter on the Mull of Kintyre on 2 June 1994 highlights the difficulty in holding manufacturers to account for any major failure of their systems. Tony Collins reports on the lessons for the IT industry

The notorious crash of a Mk2 Chinook helicopter 10 years ago this week was, to some people, a human drama and tragedy - nothing more.

But to others, blaming two dead pilots for a crash illustrates the most profound congenital weakness in the evolution of the computer industry: the more complex software becomes, the less its users are able to hold to account those responsible for poorly designed or faulty code.

Defective software-controlled equipment may contribute to fatal accidents involving train signals, aircraft, the use of x-ray equipment or fairground rides; or a poorly-designed patch can cause millions of computers to crash.

As the inner workings of software become more impenetrable to users and transparent only to a small number of people working for the manufacturer, the more difficult it is to hold companies, directors or code designers to account if a critical system fails disastrously.

Experts say the lack of accountability may become a greater problem as software takes over decisions made by humans - particularly in healthcare and transportation - where failure could involve fatal accidents.

"It's always easier to blame dead pilots than the technology," says Peter Amey, a software engineer and principal consultant at Praxis Critical Systems in Bath.

This shortfall in accountability could send the wrong signals to manufacturers which are perennially under pressure to cut costs and development time.

If they know they cannot, in practice, be held to account for a major failure of their systems, there is little incentive for them to exercise extreme caution and spend more time to improve the writing and design of code and the preparing of documentation.

Taken to an extreme, there could be manufacturers that would have little to fear from disabling thousands of business systems, or even killing people, if they ensure their software is too complex, or their documentation too poorly maintained, to be evaluated by independent assessors.

Chinook ZD576 crashed on the Mull of Kintyre in June 1994 killing all 29 people on board. Two air marshals found the pilots, flight lieutenants Rick Cook and Jonathan Tapper, grossly negligent.

Nobody doubts that the air marshalls made their decision in good faith. They did so partly because no clear evidence emerged of any serious malfunction of equipment. But some experts believe that a new software-controlled Full Authority Digital Engine Control system could have been a factor in the crash of ZD576.

The Fadec system replaced mechanical equipment which regulated fuel to the helicopter's jet engines.

An RAF Board of Inquiry into the crash was told that the Fadec system had been known to cause "flight critical" incidents: on occasions an engine would unexpectedly surge or run down to idle speed.

Amey says, "In the case of the Chinook that crashed, it was equipped with a system in which there is overwhelming evidence the software was not developed to the standards it should have been. It was not obvious from the code whether it was intended to control engines, let alone whether it actually did so."

Nobody has been able to prove that badly designed or faulty software played any part in the crash. But various investigations, including by a Fatal Accident Inquiry in Scotland and a House of Lords select committee, have found there is no evidence that the pilots were definitely to blame.

The chief technical investigator of the crash, Tony Cable of the Air Accidents Investigation Branch told a Lords committee, "Throughout this investigation, the evidence was remarkably thin."

An engineering report published this week on the crash and the state of systems on the Chinook Mk2 questions whether this type of helicopter was airworthy.

The report is the work of retired air commodore John Blakeley, who was a senior engineering officer in the RAF and who chaired official RAF inquiries into accidents.

He was asked by Mike Tapper, the father of one of the pilots, to review the airworthiness, engineering standards and maintenance issues revealed by the 1995 RAF Board of Inquiry into the crash of ZD576.

Blakeley's report says the Fadec system was not designed or documented to flight-critical software standards. The RAF had become so used to incidents in flight, triggered by flaws in the equipment, that it was "no longer questioning the underlying airworthiness of the aircraft".

After the crash, he says the RAF "seems to have spent its energies on establishing that the crash was caused by aircrew error". He adds, "No other finding would be appreciated or, indeed, accepted."

Blakeley raises many engineering questions about the crash that went unanswered in the RAF Board of Inquiry's report. He also questions the airworthiness of the Chinook Mk2 and the serviceability of ZD576 for its final flight.

"Both were essentially assumed or evidence to the contrary was either not sought or, even where it was available to the board, it was largely ignored," he says.

A technical failure such as an unexpected engine surge that could not be identified and corrected in the time available "is just as likely, if not more likely, a scenario as gross negligence", Blakeley says.

He adds that he finds it "incredible" that the RAF had all the facts before it but made "no serious attempt to see if a technical problem could have fitted the same accident scenario".

But why not look to blame the manufacturer after such a high-profile crash? Blakeley says doubts about the airworthiness of the Chinook Mk2 might have raised questions about the RAF's chain of command, as well as the introduction into service of the helicopter.

Experts say that one of the key lessons from the crash is that operators or users of critical systems are more likely to be blamed than manufacturers if software-controlled equipment is the probable cause of a failure.

In the 1980s and 1990s, the phenomenon of "phantom withdrawals" from cash machines was blamed on end-users: banks did not admit that their software-controlled equipment was capable of error.

Faced with claims by customers trying to recover lost money, banks accused account-holders of making a mistake, lying, or being negligent in not keeping safe their Pin numbers.

Two years ago, the near-miss of a Virgin Atlantic 747 and a Delta Airlines 767 over Wales led to a passenger being injured when the Virgin pilot took sudden action to avoid a possible collision.

The official report blamed an air traffic controller. This was despite a conclusion in the report that the controller's mistake was made possible by a design flaw in an air traffic control system.

There have been several fatal aircraft accidents in which software was implicated but never proved and the pilots blamed, such as the crash of an Airbus 320 at Warsaw airport in September 1993.

The lack of accountability over software failures could become a greater problem when computer equipment assumes responsibility for safety in cars from drivers. "The most dangerous thing about critical systems is that they work," Amey says.

"Failures are so unusual and the systems largely work so it is hard to recognise when they have failed, especially when the failure destroys evidence. Large airliners rarely fall out of the sky."

There have been safety-critical incidents in which failures of systems in cars have caused injury and even death, he says. In these cases manufacturers have tended to blame the drivers. "This is an indicator of where we are going," says Amey.

"Systems fail infrequently but none the less fail. We are already talking to manufacturers about steer by wire. Suppose a software error means that once in every million miles a car goes straight on instead of round a bend. It will take a long time to recognise a software fault because so many cars go off the road without any help from software."

He says manufacturers must be "whiter than white" in the development of software, making it easy to validate independently and mathematically. They can then demonstrate after any major accident that, beyond reasonable doubt, their system was developed to the highest standards.

In the case of Chinook Mk2, the manufacturer certified its systems as safe, and this was accepted by the Ministry of Defence. But before the crash on the Mull of Kintyre, the MoD's software assessors at Boscombe Down had refused to recommend the software as safe. Even so the ministry decided to give the helicopter an airworthiness approval.

As an aeronautical engineer who served in the RAF at Boscombe Down working on the certification of aircraft armament systems, Amey says the RAF hierarchy had a tendency to perceive any refusal by software evaluators to give equipment the all-clear as "obstructionism".

Another key lesson from the Chinook crash, he says, is that the contracts between customer and supplier need to specify that software is written with certification and independent verification in mind.

The Chinook Mk2's software was said in a report by Boscombe Down to be "unacceptable" and "not fit for purpose". But the MoD accepted the manufacturer's assurances that it was safe.

The MoD has rarely, if ever, taken action against manufacturers after a major fatal crash. But pilots have on many occasions been found negligent after a fatal accident. Indeed the MoD, when questioned about the software on the Chinook Mk2, has defended the equipment using arguments advanced by the manufacturer.

The fathers of the pilots of Chinook ZD576, in their fight to clear their sons' names, have won support from some of the most senior figures in the establishment and Parliament.

The pilots have been defended in numerous debates in Parliament. The Commons Public Accounts Committee highlighted a series of flaws in the procurement of the software for the Chinook Mk2 and concluded that the verdict against the pilots of ZD576 should be overturned.

Yet the verdict of gross negligence stands, endorsed by Tony Blair, the prime minister, and Geoff Hoon, the defence secretary.

If such a powerful and meritorious campaign fails to achieve its aim after 10 years, it may be said to bode ill for users and operators, and for justice, should software be the suspected cause of fatal accidents in future.

For the verdict of negligence is an acceptance of the principle that it is reasonable to blame dead operators - pilots in the case of the Chinook crash - even if the cause of a major incident cannot be definitively established.

The implications for IT directors   

The Chinook crash shows that, as software grows more complex and impenetrable to external assessment, the only organisation that is likely to be able to prove any coding faults after a major incident is the manufacturer.

But can manufacturers be expected to find fault in their own software and, therefore, admit legal liability after a major failure which may have led to fatalities? 

Software and engineering specialist Peter Amey, of Praxis Critical Systems, says a potential weakness in manufacturers' accountability can be pre-empted if the contract between a customer and supplier ensures that software is written with independent validation in mind.  

The Chinook's Full Authority Digital Engine Control (Fadec) system was commissioned by the RAF and the Ministry of Defence without open competition. The RAF was kept at a distance from the development process. Studies of IT disasters show the need for the work of software developers to be scrutinised almost constantly by the system's customer.  

The crash raises the question of whether blame after a major incident in which software is a suspected factor, is likely to fall on those least able to defend themselves. In the case of Chinook ZD576 the pilots were found grossly negligent although there is no evidence of whether the helicopter was, or was not, under their control in the last moments of flight. 

After a major incident, looking beyond the Chinook crash, corporate boards may seek to protect an IT supplier from criticism rather than the supplier's opprobrium to settle on the lap of directors. This may leave the IT managers, operators and project staff vulnerable.

This is particularly so because the manufacturer, in proving its equipment was not at fault, may have millions of pounds at its disposal. It may also have the goodwill of the user, who relies on the manufacturer's support for equipment maintenance. 

In contrast, individuals - for example, system operators - may have minimal resources: little or no access to the manufacturer's commercially sensitive information, little or none of the manufacturer's knowledge of how the systems work, and not much money for expert reports. Therefore, the weakest link after a major incident will always be those who do not have the money - or are no longer present - to argue their case.

The Chinook disaster: 1985 to date   

1985: delivery of the Chinook's Full Authority Digital Engine Control (Fadec) computer system was promised within 23 months 

1989: Fadec has its first series of tests, fitted to an MoD Chinook Mk1. But the Chinook is nearly destroyed by a Fadec-related engine surge. The disaster is described in a confidential MoD report as "potentially catastrophic". Fadec is modified 

1993: an assessment on the modified Fadec by contractor EDS-Scicon is abandoned because 485 anomalies  are found after less than 18% of the code is analysed. EDS-Scicon says a potential flaw in the Fadec's main computer "may cause incorrect operation of the Fadec" 

1993: Boscombe Down, the MoD's airworthiness assessor, refuses to give the Fadec an unqualified approval unless the Fadec software is rewritten. The MoD and RAF put the Chinook Mk2 into operational service without a software rewrite 

January-May 1994: the MoD's procurement executive raises "safety case issues" over the Fadec. Chinook pilots experience "flight critical" problems including unexpected engine surges, engine run-downs and cockpit warning lights.  

1 June 1994: for the second time in five months, Boscombe Down suspends trials flights over Fadec concerns 

2 June 1994: Chinook ZD576 crashes on the Mull of Kintyre. There is no evidence of whether the helicopter was, or was not, under the control of the pilots in the last moments of flight 

3 June 1994: Boscombe Down says in a memo that the Fadec has been shown to be "unacceptable" and is "unsuitable for its intended purpose" 

Late 1994: major improvements to Fadec are made by its manufacturer Textron 

1995: two air marshals over-rule the inconclusive report of an RAF Board of Inquiry and find the pilots of ZD576 grossly negligent 

1996: a Scottish Fatal Accident Inquiry says there is not enough evidence to blame the pilots 

1999: Computer Weekly publishes a 140-page report on Fadec problems. Nearly 90 MPs ask for an investigation into the findings  2000: senior ministers, including Tony Blair, reject calls for new inquiry 

February 2002: a House of Lords select committee report finds there is doubt about the cause of the crash 

March 2002: defence secretary Geoff Hoon rejects calls for a new inquiry 

October 2002: the government makes it impossible for the families of the pilots to seek a judicial review of the way ministers and officials have handled the matter  June 2004: the 10th anniversary of the crash. The finding of negligence against the pilots of ZD576 still stands.

Read more on IT risk management