National Air Traffic Services: bug fix is flying

The media has portrayed the 200 remaining bugs in systems at the New En Route Centre at Swanwick, Hampshire, in a negative light....

The media has portrayed the 200 remaining bugs in systems at the New En Route Centre at Swanwick, Hampshire, in a negative light. Actually it may be the best news to emerge about the Swanwick project for six years, says Tony Collins

There seems to be an end in sight to the ongoing systems problems at the new air traffic control centre being built at Swanwick near Southampton. For senior management at National Air Traffic Services, the question seems now to be one of when, rather than if, it can manage to dispatch all of the bugs in the system.

National Air Traffic Services now aims to finish the main part of the IT project by 20 December when its IT staff are due to hand over the systems to operational management. December is also the deadline for a fully working system to gain interim approval from the Safety Regulation Group of the Civil Aviation Authority. In effect this approval will confirm the system's viability.

That 200 bugs currently remain in the system only a few months from the December target date has indicated to the some parts of the media that the project is doomed. But the number has halved since May this year.

For the first time, the figure is down to a manageable size. Only a short time ago, it appeared to be too high to tackle in the time available. But the board of National Air Traffic Services has decided that the organisation has sufficient staff and skills to eliminate most, if not all, of the 200 bugs in time.

More importantly the discussions are no longer focused on whether the system goes live: only when. Two years ago, it appeared to Computer Weekly, some MPs and IT specialists that the project was displaying the hallmarks of a major public sector IT disaster.

Now National Air Traffic Services management is no longer secretive and defensive about the risks and problems on the project. This openness could signal that senior management is genuinely confident of success.

In his first in-depth interview on the state of the project, Colin Chisholm, deputy chief executive of the service, explains that National Air Traffic Services is embarking on a series of roadshows at air traffic control centres around the UK to answer questions raised by staff.

One of the aims is to talk to staff and answer questions about the Swanwick project and other matters such as the public/private partnership under which 51% of National Air Traffic Services is due to be sold to the private sector and to employees.

"In the last two years we have worked very hard at being open with our staff and about programme, the risks in the project, and setting realistic goals which we weren't doing before quite frankly. I really don't have anything to hide," Chisholm explains.

"We are saying: here is our programme and we stand behind it. We are not trying to oversell it or come across with bluff and bluster. We have had two sizeable meetings with staff at the London Air Traffic Control Centre and we're following it up with roadshows," he adds.

"Most of the questioning [so far] has been about Swanwick. So what did we say?

"We have been making pretty good progress. It has not been without the odd difficulty. But we have not had anything yet that has been a showstopper. There have been problems that have been pretty tricky but we have managed our way through those. The programme is running pretty well to date.

"We slipped a bit on only one of the key sub-milestones. But we have managed to get through the air traffic control simulations to prove that controllers can work with this system and that the procedures work.

"We have reached the point now that we're sufficiently confident to say yes we are convinced we have a good programme to technical handover," Chisholm adds.

Technical handover involves the IT staff transfering the systems to operational management. After technical handover the main IT effort will focus on the operational target date, known as O date, of 27 January 2002.

"The programme to O date is mostly concerned with a very elaborate programme of 11 months of training of air traffic controllers." This begins at the end of January 2001.

Frank Agnew, IT director of National Air Traffic Services, says the scheduling of the controllers' training presents a managerial challenge since it will take place on the systems at Swanwick, while they are being tested and undergo enhancements - and during the "bust summer months" next year.

"It is a big and complex [training] programme which has got to be planned to the nth degree," Agnew says. "Any controller, whether they are sick, unavailable or whatever must go through the training programme and get the validation [on the systems]."

Chisholm adds, "We are going through tremendously elaborate testing of all the system components. A lot of it, for example, has been testing of links between London Air Traffic Control Centre and Swanwick."

Next year there will be two major software upgrades. "We are mapping this out," he says. "We are moving from three to four [major upgrades] a year to two a year. The two we have got to do during 2001 are a mix of further change that we have identified as necessary - essential change, some of which is related to the needs of air traffic controllers - and some of it is engineering related."

One of the key changes involves improving the warning messages that flash on the air traffic controllers' screens when an aircraft moves out of their area of control and responsibility. The aim is for a warning to flash on the display to remind the controllers to ensure that they have completed all their co-ordination checks before an aircraft leaves their screen and becomes the responsibility of others.

"There is a quirk over whether it flashes or not," says Chisholm. "We want it to work in 100% of cases".

It is important to fix this problem because the Swanwick system, unlike the current manual process, supports the automated transfer of aircraft from one air space sector to another.

Currently at the London Air Traffic Control Centre, when controllers relinquish responsibility for an aircraft, they confirm this by phoning the appropriate new controller. This will not happen under the new automated procedures at Swanwick.

"We are open with people about the risks," says Chisholm, "This project has a high degree of visibility with all our managers. We have been briefing the board. We track all the risks. I think the combined effect of the risks [of not achieving the January 2002 target date] is in the medium-low category.

"It has been higher than that certainly on the way through here. It's not without risk. It's not no risk, or absolutely minimal. It's medium low. There are some difficult things still to achieve, both to get to technical handover and to O date and we don't hide that from anybody".

On the remaining bugs in the system, called programme trouble reports (PTRs), Chisholm says that in the past "the number of PTRs we had to fix looked very formidable. It really was a problem to us".

PTRs are divided into five categories:

  • Categories 1-3 are regarded as essential, for which a fix must be made at some time in the future;

  • Category 4 is non-essential; and

  • Category 5 relates to documentation.

    National Air Traffic Services has been categorising the grade 1-3 faults by whether a fix has been identified and whether it is crucial to fix it by technical handover, or by O date.

    "At one point we were looking at over 1,000 PTRs that had to be fixed by technical handover and we were running with a backlog of category 1-3s.

    During May and June Chisholm explains, "We really focused on this and the [upgrades to] systems have gone better. Build 1.35 [an earlier major software upgrade] was an extremely good build with a low number of faults in it. Prior to that we were getting a fair number of faults in each build as well as trying to fix the backlog.

    Breaking down the backlog

    "In the past we weren't catching up as much as we wanted to," he adds. "Now we are making real inroads into the backlog. Build 1.36 has continued in same vein: a really good delivery of software and a relatively low number of faults in it.

    "We have just taken delivery of Build 1.37 and it is looking good as well. That tells me that the programmers are getting to grips with the system. They are understanding it and their ability to fix the faults is getting better.

    Chisholm says, "What we have seen now for the first time is that the number we have got to fix is within bounds of being fixed [before technical handover]. Our outlook to technical handover is that we can pull these PTRs down to zero or close to zero.

    "We could have taken a judgement that we would run with a number of PTRs that we knew would not bring the system down and we would tell our controllers or our engineers: you will see this fault from time to time, and this is the procedure to get around that. You really do not want too much of that."

    He says no system is bug free, even the Flight Data Processing System at London Air Traffic Control Centre, the software for which has been largely rewritten over the past 20 years.

    "At one time we had a target that we would not go to technical handover with more than 400 category 1-3 or severity 1-3. Now we are going to be way below that. We are going to be quite close to zero now, in the tens and twenties."

    Chisholm says he is "feeling fairly good about it" but is not complacent. Fixing any one of the bugs could prove a "sticky job".

    Agnew says the focus now is on continuing to bring down the number of PTRs and the changes to the systems. This is echoed by Chisholm: "There has been a good and vigorous debate between the customer, the project team and Lockheed Martin to agree [what changes are essential]. They have reduced the amount of change that they require partly by smart analysis: for example, do we really need to change all those things; maybe we could do without that and that.

    "Also the programmers see a cleverer way to do that now, so that the software doesn't need 5,000 lines of code to be written. We could do it in 500. There has been quite a lot of that.

    "The change exercise is complete. We just have to deliver it. But we are confident we can do that."

    If the worst came to the worst, and the systems in January 2002 were not wholly satisfactory could National Air Traffic Services go live in stages?

    Chisholm says, "Potentially yes but I think in practice we would not. You could go live with certain sectors. We have looked at it in the past and it is fairly complicated thing to do but theoretically you could do it".

    And how will National Air Traffic Services cope with the problem of new systems causing a productivity dip when they go live?

    "There will be no compromise on safety," says Chisholm. "That cannot be compromised and will not be compromised. What does happen though is that, in terms of the service, it could affect our ability to move all the airlines without delay. We are already advising the airlines that around the time of the transition we will put on quite restricted flow rates [of aircraft coming in and out of UK air space].

    "We will not attempt to run at our normal level while we are doing this transition and the airlines expect that. We choose the dead of winter to go live. We would not attempt it in the middle of the summer because the service hit would be too severe. We hope that in the middle of winter we will limit the service but we hope it will not be too damaging to the airlines," he adds.

    There may also be delays next summer as controllers are taken off their normal duties at London Air Traffic Control Centre to train on the Swanwick system.

    Chisholm says, "We have enough controllers but once you take sizeable numbers out of the operation you reduce its resilience. So if you get a little bit of sickness or very pronounced flows of traffic through certain sectors, typically at a weekend, and you cannot deploy extra controllers onto that area, you might take some service hit.

    "Controllers have a feeling we are tight on numbers. We certainly have enough to get us through transition, and there are enough to get into operation. Success rates on new controllers not as good as I was hoping so I have tightened up assumptions. We are not in a fool's paradise. It's tight, depending on what sort of service you want delivered. The number we will have in the first summer is comparable to the service that we have delivered in the previous two summers."

    Timetable for next 18 months

  • August 2000: work continues on eliminating 200 bugs in systems

  • December 2000: technical handover. A fully working system is due to be handed over by IT staff to operations executives

  • December 2000: Civil Aviation Authority's Safety Regulation Group due to given interim approval to a fully working system

  • January 2001: final bids due on 46% sale of National Air Traffic Services (Nats)

  • February 2001: controllers begin phased 11-month training on the new system

  • March 2001: partial sale of Nats to be completed

  • January 2002: system to become operational

    Remaining project challenges

  • Greatly reduce or eradicate the 200 existing bugs before 20 December 2000

  • Also by 20 December, buy and install into the complex infrastructure an uninterruptible power supply system to help support 200 controller workstations

  • Train air traffic controllers on the new operational systems next year - at a time when two new major releases of software are being installed and tested

  • Update communications protocols to make systems compatible with those at control centres in Europe

  • Train as many controllers on the new systems next year, while taking out a minimum number of controllers from operational duties

  • Maintain top-level commitment and attention to the project at a time when directors are involved in negotiations to sell 46% of the organisation to the private sector

  • Maintain an internal and external openness and lack of defensiveness even if problems begin to mount

  • Ensure that the new scheduled releases of software have a minimal number of bugs. Part of the job of the new releases is to cure earlier bugs

  • Maintain staff morale at a time of uncertainty over the public/private partnership sell-off

    Positive steps being taken

  • Problem resolution by up to 300 experienced IT staff and managers based at Swanwick

  • Financial commitment ensures that National Air Traffic Services can afford any major purchases that are necessary to resolve any serious problems that may arise

  • Tough decisions are taken to minimise changes and defer any modifications that can safely be left until later

  • Roadshows at traffic control centres in the UK to answer questions raised by staff

  • Read more on IT jobs and recruitment