The software legacy that haunts businesses

Imagine this: The new century is barely a year or two old. You’re a programmer, working on one of the most critical software projects in the UK, to build the new national flight control system for air traffic controllers.

The new system goes live in 2002 and, generally, it’s all gone well. Then, 12 years later, a bug you unwittingly left in the code brings down the entire UK air traffic control causing chaos across the country. You probably wouldn’t even have known it was your code that caused the problem.

Think even further back. Imagine you were working for a high-street bank in the 1970s. You were programming the most critical application in the company – the central transaction engine that underpins the millions of current accounts held by the bank. You were probably programming in assembler code – the lowest level of programming language, working right down at the guts of the IBM mainframe that ran the bank. Perhaps, if you were lucky, you were using a more modern language like Cobol.

Nearly 40 years later, that system you helped to build has crashed – the bank’s customers can’t access their accounts, make payments or withdraw cash. The ATMs, online and mobile banking apps of the bank – technologies you could barely have imagined when you cut that assembler code – are not working. You wouldn’t have been able to envisage such a problem way back when.

Yet this is the reality for many major firms of longstanding today – as Nats, the air traffic control service found out in December 2014 when its systems crashed, and as Royal Bank of Scotland discovered when its IT systems failed in 2012 and froze 12 million customer accounts. They were just the unlucky ones who got caught out.

IT leaders talk a lot about legacy technology, and often they mean old hardware or out of date networking equipment. Perhaps the biggest single technological challenge facing many large companies – and especially the big retail banks – is their ageing legacy software. It’s had so many new bits bolted onto it to help move into the internet, mobile and digital ages; there are barely any programmers left with the skills to keep the original applications running; and there is very little documentation to help anyone work it all out.

As long as these creaking applications kept going, nobody dared wonder what might happen if they stopped. But the world has changed – we’re all going digital and agile and for the first time, those big old back-office systems are becoming a hindrance, not a help. Where once they were the source of competitive advantage and a barrier to entry for potential rivals, they are now the very reason that startups and new entrants are on the verge of disrupting your business.

Which CEO is going to be brave enough to say, let’s replace the lot? Those banks that have tried found it cost billions more than they anticipated, and took a lot longer than planned. If the average tenure of a FTSE100 CEO is barely three or four years, who wants to be the one that spends all that money for so little return before their time at the company is up? The last  thing any CEO wants is for their legacy to be as the person who screwed up the firm with that costly, painful IT overhaul.

Nobody can put a price on the risk of keeping that legacy software, so nobody can make the case for replacing it. At what point does a CEO accept that not moving off the old system becomes a bigger risk or a greater cost than moving off it?

These are questions that many boardrooms will – perhaps to their surprise – find themselves facing in the next five to 10 years. Nobody, even now, develops software thinking of what it might be expected to do in 20 or even 10 years’ time. That’s someone else’s problem. Yet we are building a digital economy on software. Software is eating the world, as Marc Andreessen, co-founder of Netscape, once said. Software might just spit back out again too.

Look at the Post Office: It insists that the accounting software it provides to subpostmasters is free of problems. But in over 100 cases, subpostmasters have blamed the system for errors that led to many receiving heavy fines and even jail terms for alleged false accounting – blame the Post Office resolutely rejects. The Post Office points out that the number of affected subpostmasters is tiny – not even 1% of the total. Surely, they imply, the fact that the system works for 99%-plus of its users shows that the problem cannot be the software?

What they – and many other organisations – don’t seem to get, is that over the lifetime of a software application, even a 0.1% error rate is at some point going to cause a problem for someone. The only question is, how big will that problem be? If only a tiny proportion of users are affected, that perhaps makes it more – not less – likely that some small problem in the software caused the issue. It is well within the margin of error for software quality and testing.

For all the great advances made through software, it is not and probably never will be perfect. As Nats pointed out, testing every possible scenario the air traffic control software could have encountered would have taken over 100 years.

Of course, software development is becoming smarter and better, and with iterative, agile techniques and componentised microservices it’s possible to rapidly isolate faults and make sure any problems they cause are more manageable.

But it’s almost impossible to predict whether some unusual or unexpected combination of activities will one day cause a catastrophic crash of the complex software your business relies on. It is hugely to the credit of the software industry that such crashes are so infrequent. In a digital world, this is increasingly going to be a fact of business life. If you don’t have software bugs on your corporate risk register yet, you really should.

Somewhere in your organisation right now, a talented software developer might just have unwittingly written the line of code that brings down your company in 10 years’ time – and there’s almost nothing you can do about it.