One of the worst software project failures in memory?

| 2 Comments
| More

Last month BBC R4's Today programme and Computer Weekly quoted from an MoD memo that said there was a "positively dangerous" flaw in the Chinook Mk2's safety-critical "Fadec" software.

Software code containing that dangerous flaw was fitted on the type of Chinook that crashed on the Mull of Kintyre in June 1994. The crash of Chinook ZD576 was one of the worst RAF accidents in peacetime.

All on board were killed including 25 VIPs.

There's active discussion today of a crash 16 years ago because two dead pilots were found to have caused the crash of an aircraft that some inside the RAF and the MoD considered was not safe to fly. The development of the Chinook Mk2 fuel-control software has been one of the most improvised projects we have investigated in decades.

It's likely that two RAF air marshals were unaware of the potential seriousness of the faults in the Chinook Mk2 when they found the dead pilots of Chinook ZD576, Flight Lieutenants Richard Cook and Jonathan Tapper, grossly negligent.

Only after an RAF Board of Inquiry into the Mull crash did it become clear that a series of internal documents had tried to alert the MoD hierarchy to the danger posed by the Chinook Mk2's safety-critical Fadec fuel control system.

That those internal MoD memos were not shown to the RAF Board of Inquiry into the Mull crash, or to the Air Accidents Investigation Branch which wrote a technical report on what it found in the wreckage, has never been explained.

The number of those who are now convinced the Mk2 Chinook helicopter was not airworthy has much increased since the disclosure of these documents.We have published several of the documents.

Now we're publishing (below) in technical detail another of the leaked documents: one written by EDS - which is now owned by HP. The EDS report explained in detail what was wrong with the Chinook Mk2's software.

EDS had been commissioned by the MoD to examine the Chinook Fadec's 16, 254 lines of software code.

The analysis was carried out in July 1993, nearly a year before the crash on the Mull.

EDS found such a density of "category one" anomalies - the most potentially serious flaws  - that I find it hard to believe that the RAF put the Mk2 Chinook into service without a software rewrite.

Indeed,  the MoD's IT and engineering specialists at Boscombe Down in Hampshire refused to endorse the airworthiness of the Chinook Mk2 unless the Fadec software was rewritten.

But the RAF and the MoD put the Chinook Mk2 into operational service about five months after the EDS analysis, knowing that the Chinook Mk2 had a system for controlling the fuel to the engines that was defective.

Nobody was able to ascertain with any certainty the cause of the crash of Chinook ZD576. But an RAF Board of Inquiry did not rule out a problem with the "Fadec" software as a factor.

A Squadron Leader at the main UK Chinook depot, RAF Odiham, had told the RAF Board of Inquiry:

"The unforeseen malfunctions on the Chinook HC2 of a flight critical nature have mainly been associated with the engines control system FADEC. They have resulted in undemanded [not controlled by the pilots] engine shutdown, engine run-up [engine acceleration], spurious engine failure captions, warnings in the cockpit and misleading and confusing cockpit engine indications".

A later inquiry by the Public Accounts Committee into the botched procurement of the Chinook Mk2's Fadec system, and the crash on the Mull, said that the poorly-designed software could have been a factor in the accident.

The Committee concluded:

"It is unacceptable that the late discovery of problems with the FADEC software, the consequent operational restrictions and lack of independent assurance all stem largely from the inadequate specification and communication of quality standards and testing approaches.

"The finding of the RAF Board of Inquiry into the crash of Chinook ZD576 does not satisfy the burden of proof required. The Board found that the pilots had been grossly negligent, a finding which requires that there be 'no doubt whatsoever'.

"There are, however, clear grounds for doubt in a number of areas. These relate primarily to the FADEC, although there were also problems with the mechanical system. At the time of the crash the Chinook Mark 2 was experiencing repeated and unexplained technical difficulties caused by the FADEC software. The technical data recovered from the wreckage was incomplete and does not, we believe, conclusively rule out a technical malfunction as a potential cause of the crash."

A subsequent investigation into the crash by a House of Lords committee expressed surprise that the RAF hierarchy had so easily dismissed the possibility of a technical malfunction. 

The House of Lords committee said:

"In view of the considerable number of problems which had beset Chinook Mk 2s since their entry into service - problems of which the investigating board appear to have been aware - it is perhaps surprising that they were able to dismiss so readily any such problems as having a significant effect in the accident."

EDS had warned in its report to the MoD that, in certain circumstances, the Fadec could malfunction. But the RAF and MoD hierarchy didn't regard the software as safety-critical. They argued that even if it malfunctioned it would fail in a safe way.

The MoD told the Public Accounts Committee in 2000:

"The FADEC system fitted to each of the Chinook's two engines was designed under a fail safe philosophy with independent Primary and Reversionary channels, each with dissimilar software, hardware, and control algorithms.

"FADEC continually monitors its performance and is designed to accommodate the loss of all sensor signals. If a fault cannot be managed by the levels of redundancy within the Primary system, it automatically changes to Reversionary at which point the aircrew receives an audible and visual warning.

"Should the Reversionary system also fail, the affected engine will still be supplied with a constant fuel supply, set at the pre-failure rate.

"Furthermore, in the highly unlikely event of a total FADEC system failure, the aircraft can still fly safely within certain operational parameters on one engine.

"At the time of the accident the Mull of Kintyre aircraft was within those parameters. The probability of both Primary and Reversionary channels on both engines failing simultaneously is infinitesimal; and even if that were to happen the Chinook is designed to descend in a controlled manner."

"...All incidents record "Fault Codes" within the Digital Electronic Control Unit (DECU) part of the FADEC system (the Fault Codes are hexadecimal codes that indicate system faults to support further diagnostics). There have been no instances of a total FADEC system failure in flight. Minor sub-system faults have occurred and FADEC has safely accommodated these--exactly as it was designed to.

"... MoD has never denied that there were a number of faults associated with FADEC on its introduction into service. None was a serious software fault."

But the incidents in the weeks before the crash on the Mull of Kintyre were not as trivial as the MoD claimed.
Knowing this, the Public Accounts Committee asked for evidence from Malcolm Perks, one of the world's most experienced Fadec specialists, who was an MoD expert witness in an MoD legal action against the Chinook's Fadec suppliers.

Perks concluded that the Fadec might have been a factor in the Mull crash.

What Perks told the Committee made it all the more surprising that such defective Fadec software was installed, unmodified, in the Chinook Mk2 helicopter, of the type that crashed on the Mull. Perks said:

"In 1984, a team of control manufacturers made a direct approach to MoD to update the control systems of the Chinook's engines. They offered to pay for the control development; the new system would pay for itself by reduced maintenance costs; and the new FADEC could be in production within three years.

"All MoD had to do was pay the engine manufacturer to put it on the engine, and the aircraft manufacturer to put it in the aircraft. MoD agreed to the proposal in 1985, with the aim of making FADEC part of the longer-term plan for a mid-life update. Their prime contractors were Lycoming and Boeing Helicopters.

FADEC was essentially a software project 

"FADEC replaces complex mechanical controls with simpler mechanical units plus a computer with new sensors to feed information to the computer.

"It is a very different control technology; complexity is still there but in the software. FADEC is a smart system, designed to operate with minimal pilot attention at all times... FADEC software is designated Flight Safety Critical--lives depend on its continued, predictable, fault-free operation.

"The Chinook FADEC was a unique design, with two separate computer systems and two sets of software to be developed.

The FADEC contractors  

Hawker Siddeley Dynamics Engineering, which is now part of BAe, wrote the Chinook Mk2 Fadec software. But the MoD's contracts would be with Lycoming and Boeing, as suppliers of the engines and airframe respectively.


The flawed project management approach

"It is often the practice to subcontract management of complex projects involving software to 'systems houses'. This was not the case here. FADEC was managed by Lycoming. MoD had no direct control over the software: its developer was too far down the supply chain.

"CECO [Chandler Evans, the hydromechanical part of the Fadec]  and HSDE were funding themselves and as software products were delivered it became clear their money had been spent on their idea of the right job, not MoD's."

The early signs of trouble

"Three years after launch, the Chinook FADEC project was in trouble. Far from being in production, development FADECs were only just starting their test programme in a modified RAF Chinook, loaned for the purpose. In early 1989, during initial tests on the ground, software conditions that had not been considered caused an uncontrolled runaway of an engine. The aircraft was severely damaged, but luckily no one was hurt.

"The RAF Board of Inquiry recommended a review of the software design. It was 1990 before testing in the Chinook restarted and it was 1993 before the first Mk 2 Chinook with a production FADEC was ready.

The MoD sued their own contractors

"In the aftermath of the [1989] incident, MoD wanted Lycoming and Boeing to accept some liability for the damage. To repair the aircraft $7m worth of parts were needed from the RAF. Boeing came to an agreement with MoD before legal actions started. Lycoming's insurers fought the case but were eventually obliged to pay $3m in 1995, after protracted hearings under American Arbitration Association rules.

Software quality was a real issue

"In 1994, as Expert Witness to MoD, my investigations showed that the 1988 version of the software had not been designed and developed in accordance with industry accepted practices... [the] production software with its documentation had finally been delivered in 1993, for the approval of MoD's experts, A&AEE at Boscombe Down.

"They pronounced the software unverifiable. So, five years after the first version, the software was still not acceptable. The writing had been on the wall for years that this FADEC's software was of doubtful quality, but the necessary actions had not been taken. In the end, as described in the NAO [National Audit Office] report, MoD overruled their experts and went ahead anyway with its release [the Chinook Mk2 and the Fadec] into service.

Why was it accepted into service?

"...To make the FADEC acceptable to Boscombe Down a payload limit was imposed. The theory was, if FADEC caused an engine to be lost, the remaining engine could cope under most circumstances.


Was Fadec, in fact, acceptable?

"The Mk 2 Chinook weight restriction was only a palliative. Until software can be verified, it has to be considered unpredictable, as the NAO report says.

"That means it could ... cause an engine to be lost, cause an engine (maybe both) to produce too much power, light lamps in the cockpit for no real reason ...

"The evidence produced during and after the inquiries into the June 1994 Mull of Kintyre Chinook crash showed that FADEC in its early days in service was doing many of these things. Although not provable, it remains a possibility that FADEC unpredictability was a factor in the Mull accident.

Has anyone been blamed?

"MoD have never admitted in public that any of the contractors did a less than perfect job. In fact, MoD Projects have continued to defend the contractors against all critics, including those from within MoD itself.

"It is not clear why, given the 1995 legal victory against Lycoming for what amounted to negligence.

"Neither has it been accepted that FADEC may have played a part in the Mull of Kintyre crash, despite the evidence that FADEC was put into service before it was fully developed.

"The only fault ever clearly assigned by MoD was to the pilots who crashed into the Mull while flying one of the early Mk2 Chinooks with its unpredictable FADEC software."

**

Special Forces pilots Rick Cook and Jonathan Tapper had not wanted to fly the Chinook Mk2 because of the high number of engine-related incidents.

One of them had specifically requested a non-Fadec Mk1 for a journey from Northern Ireland to Scotland. He was told a Mk1 was not available.

Ironically, the Mk2's Fadec software was modified - and a different processor installed - after the crash of Chinook ZD576 on 2 June 1994.

** 


Below are EDS's findings.


EDS put the anomalies it found in the fuel control software into four categories:


Cat 1 - EDS has a high level of confidence that an anomaly relates to a real error in the code or a discrepancy between the code and the documentation.

Cat 2 - The anomaly relates to poor code or poor correspondence between code and documentation but it is likely that it performs the intended function.

Cat 3 - These anomalies relate purely to obvious documentation errors such as typographical errors and incorrect commenting of code. They have no direct implication for the correct operation of the code.

Cat 4 - These anomalies arise from the 'style' of coding, or the way in which the code has been modelled for analysis. They do not relate to any source language use that will cause incorrect operation.

EDS examined 102 software modules (45.2%) representing 17.8% of the total lines of code (approx 16,000), before it stopped its analysis because of the density of discrepancies. These were its findings:

 

Cat 1 

Cat 2 

Cat 3 

 Cat 4

 

 Primary software lane

 13

111

 42

 81

 

 Backup [reversionary] lane

   8

 43

 15

 20

 

 Documentation Traceability

 35

 39

 75

   3

 

 Total

 56

193

 132

 104

 485

 

EDS's findings in detail:


Failure to trace code to documentation:


a) algorithms that could not be traced to the documentation (cat 2)

 

b) Individual actions on paths that could not be traced to the documentation (cat 1 or 2)

 

c) Paths that not be traced to the documentation (cat 1 or 2)

 

d) Interfaces to hardware that do not appear to be documented (cat 2)

 

e) Undocumented values for booleans, constants (cat 2 or 3)

 

f) Overflow of variables (cat 1 or 2)

 

g) Undocumented use of persistent local variables (cat 2)

 

h) Parameters which could not be traced to the documentation (cat 2)

 

i) Extra code not required in the documentation (cat 2)

 

j) No documentation at all (cat 2)

 

k) The variables in the code do no trace (without considerable analysis) to the variables mentioned in the documentation (cat 2)

 

l) A requirement in the documentation which does not appear in the code (cat 1)

 

Incorrect Code Comments

 

a) Incorrect code comments (cat 3)

 

b) Incorrect specification of a parameter (cat 2)

 

Redundant Code

 

a) Unreferenced labels (cat 4)

 

b) Statements which do not perform any documented/required function (cat 2)

 

c) Addition of an opcode to test a flag that was set correctly by the previous instruction (cat 2)

 

d) Entire subprograms that are not used in the FADEC (cat 1)

 

e) Redundant data declarations (cat 2)

 

f) Arrays with unusual elements (cat 2)

 

g) Literals declared and never used (cat 2)

 

Aliasing

 

a) Addressing the same location using two different literal names (cat 2)

 

b) Jumping to an address which may or may not have an associated label (cat2)

 

c) More than one label associated with an instruction address (cat 1)

 

d) Multiple names for a single variable or constant (cat 2)

 

e) Aliasing of a word variable name to the element of an array (cat 2)

 

f) Aliasing of data variables by allowing the index of one array to be incremented beyond the upper bound of that array into subsequent variables in memory (cat 2)


g) Using mnemonics and literals to represent the same number (cat 2)


 h) Locating differently named segments to the same physical location and using these names arbitrarily in the code to address data (cat 2)

 

Unstructured code

 

a) Unstructured code due to the handling of RAM failure conditions (cat 4)

 

b) Unstructured code due to a common block of code being used on both paths of an IF_THEN_ELSE statement (cat 2 or 4)

 

Mismatch of data types

 

a) Indexing simple word variables (cat 2)

 

b) Applying a byte operation to a word variable and so leaving the upper byte undefined, rather than clearing it (cat 1)

 

c) Array used as a single word (cat 2)

 

d) Using the second element of an un-initialised array giving a rounding error (Cat 1).

 

Incorrect code

 

a) Overflow not handled as documented (cat 1)

 

Documentation Anomalies

 

a) Incorrect documentation

 

b) Ambiguous documentation

 

c) Typographical errors

 

**

 

EDS found so many anomalies in the Chinook Mk2's fuel-control software that it became concerned that the MoD and RAF would ignore the potential importance of each flaw because of the high volume of errors.

EDS said the effect of a "large number of category 1 and 2 anomalies" - which totalled 249 - could deflect the RAF and MoD from fully grasping their impact.

EDS said in its report: "On well written and documented code with few significant anomalies, each category one or two anomaly is viewed seriously and causes some concern.

"When there are hundreds of such anomalies the threshold of what is acceptable tends to be diminished by the presence of so many anomalies due to poor programming style and inadequate documentation".

When EDS raised queries about category one and two anomalies with Hawker Siddeley, which later became part of BAe, the supplier gave "adequate" answers which implied no need for any modifications to the code.

And when EDS had concerns that the safety of the Fadec might have been compromised by the anomalies, Hawker Siddeley has "provided more thorough answers" said EDS.

But EDS also disclosed in its report that it was dissatisfied with some of Hawker Siddeley's responses. This was because Hawker Siddeley's replies always implied there was no need to modify the code or documentation.

Links:

Was software to blame for Chinook crash? - BBC News online, Jan 2010

BBC "Today" reports again on "bitter debate" over danger Chinook Fadec - IT Projects Blog, Jan 2010

Campaign for Justice - campaign website

Macdonald report on Chinook crash - report of 3 fellows of Royal Aeronautical Society

Call for pilots' names to be cleared - Kathryn report 

I will keep fighting to clear my brother's name - Chris Cook in the Basingstoke Gazette

Computer Weekly publishes full texts of leaked MoD documents damning Chinook Mk2 - Argyll News

Senior airmen question safety of Chinook software - ComputerWeekly.com

Flawed Chinook software modified after notorious crash - IT Projects Blog

Chinook computer was positively dangerous say newly-disclosed documents - ComputerWeekly.com

Critical internal memo of software flaws - Military-quotes.com forum

2 Comments

  • Well done chaps. I love that you will not give up on this tragic case.

    While this is a truly shameful and for the families, a depressing outcome, it serves to highlight and hopefully prevent future such incidents.

    The general issues of software quality and its consequences are deemed as trivial by the UK business media. Great journalistic campaigning and hope it wakes people up to the fact that good or bad software saves/enhances/takes lives away in today's world.

  • Thank you for your kind comments. In a few words you have summed up the importance of the campaign to highlight the significance of the software flaws and the crash.

    What is it about the institution that is the MoD that stops it admitting that it could have been wrong? 16 years after the crash, does it really have so much tied up in the decision to blame the pilots?

  • Leave a comment

    Subscribe to blog feed

    Archives

    -- Advertisement --