Flawed software can cost lives

If safety critical software fails it can cost lives. This is true whether it governs control systems in airliners or security...

If safety critical software fails it can cost lives. This is true whether it governs control systems in airliners or security systems in nuclear reactors.

Professor Bev Littlewood, professor of software engineering at the Centre for Software Reliability, City University, London, has worked for many years on problems associated with the modelling and evaluation of safety-critical software dependability. He is also a member of the UK government's Advisory Committee on the Safety of Nuclear Installations.

Safety-critical software is often built as embedded systems - where the development process follows strict guidelines designed to eliminate programming errors. But software dependability is also critical in many other areas using commercial business IT systems. "You can include some communication systems, such as those used by the emergency services," says Littlewood. "If an ambulance or police network failed, then people's lives could be in danger."

Given that the software development process has an impact on safety-critical software, Littlewood says it is difficult to tell whether knowing about the software development process really provides a good indication of how safe a system is. "It's an area where there are real problems. There are certainly lots of safety-critical systems graded by people who look at the process by which they are built." But, he adds, "The trouble is that high-quality processes can still leave you with a system that fails. You need to check the reliability and safety of the system."

Littlewood says the key component of a high-quality software development process is: "Understand the informal engineering requirements and express them clearly in a formal manner, such as by setting an acceptable rate for probability of failure. "

But there is a real concern that software quality varies considerably between industry sectors. Littlewood believes this is because sectors such as the airline industry set unreachable targets.

"The airline industry aims for a one in 10 to the power of nine probability of failure per hour," he explains. "That says that as a society we are prepared for aircraft to fail at a rate of one in 10 to the power of seven per hour, so with 100 critical systems on board an aircraft that is the figure they arrive at."

But, adds Littlewood: "10 to the power of nine is totally unrealistic. You can really only reach that figure after a system has been in operation for years and shown to be incredibly reliable."

As for the nuclear sector, Littlewood says it sets quite modest demands. "They think about what is achievable," he says. For instance one of Sizewell B's software safety protection systems has a one in 1,000 rate of failure on demand. The programmers looked at their overall requirements and decided on a 10 to the power of seven per year failure rate, so with a secondary system that has a 10 to the power of four failure rate, the two systems together meet that requirement.

Quality depends on thorough testing and, says Littlewood, the only way to achieve this is to test systems in an environment highly similar to the one in which they are operating. Some industries can do this well because they understand their physical environments, such as airline manufacturers with accurate flight simulators. But most testing claims are rather suspect as they are based on claims about the production processor fault densities in the code. "The relationship between these factors and the actual operation of the software is not understood," says Littlewood.

How much testing needs to be done? As an example, Littlewood points to the high-speed Paris metro system, RER. When it introduced a new signalling and controls system about 12 years ago, it had to check 20,000 lines of code. "The formal verification process took 100 person years for the assurance alone, before the system was even built." Another example is the Sizewell nuclear power plant. In 1993, for Sizewell B to claim 99% confidence in the failure rate of its software protection system, the programmers had to make 4,500 demands of it without the system failing.

The cost of such levels of testing can be extraordinarily high. "Some 15 years ago NASA spent $1,000 per line of code for production and assurance of software for the Space Shuttle, and it is widely known the programs still contain lots of faults," Littlewood says.

Clearly such detailed testing is rarely undertaken in general business software. "It is unfeasible to apply these levels of testing to commercial software containing hundreds of thousands or even millions of lines of code. The effort required increases exponentially," he says.

Littlewood says he would expect robust processes for business-critical systems, such as the big infrastructures of financial institutions. But the problem is that programmers in the commercial world lack the discipline to achieve the simplicity that safety-critical systems require.

Commercial programmers, he says: "either feel they need the complexity, or are driven by commercial pressures to constantly add extra functionality". This is the opposite of programming safety-critical systems, which he says are relatively simple and do not use fancy frills.

Read more on Business applications