Report: Knowledge management failures central to Shuttle disaster

Knowledge management systems failures played a major role in February's Columbia space shuttle disaster, according to the final...

Knowledge management systems failures played a major role in February's Columbia space shuttle disaster, the final report of an official investigation has claimed.

NASA's reliance on informal communications to manage space shuttle operations - coupled with the agency's insular culture - turned risk and danger into disaster according to the Columbia Accident Investigation Board.

The board, established shortly after the Columbia disintegrated during re-entry on 1 February and chaired by Hal Gehman, a retired Navy admiral, concluded that "deficiencies in communication ... were a foundation for the Columbia accident".

The report paints a picture of a massive bureaucracy that relied on informal e-mail communications to manage the in-flight analysis of damage to Columbia during takeoff.

This led to a series of discussions that took place in a vacuum, with little or no cross-organisational communication and often no feedback from senior managers contacted by low-level engineers with concerns about the shuttle's safety, according to the report.

A major element in NASA's management and decision-making failures was its inability to integrate critical safety information and analysis, the report said.

"The agency's lack of a centralised clearing house for integration and safety further hindered safe operations. In the board's opinion, the Shuttle Integration and Shuttle Safety, Reliability, and Quality Assurance Offices do not fully integrate information on behalf of the Shuttle Program."

NASA does have an automated system in place to track safety critical issues, but it is "extremely cumbersome and difficult to use at any level", the report said. As a result, the system, which contains a list of more than 5,000 "critical items" and more than 3,200 safety "waivers", often goes unused.

"The Lessons Learned Information System database is a much simpler system to use, and it can assist with hazard identification and risk assessment," the board concluded. "However, personnel familiar with the Lessons Learned Information System indicate that design engineers and mission assurance personnel use it only on an ad hoc basis, thereby limiting its utility."

The board also made clear that it isn't the first commission to find such deficiencies. Numerous reports, including a General Accounting Office report published in 2001, highlighted "fundamental weaknesses in the collection and sharing of lessons learned" by program and project managers.

That GAO report also found that "the existing workforce was stretched thin to the point where many areas critical to shuttle safety, such as mechanical engineering, computer systems and software assurance engineering, were not sufficiently staffed by qualified workers".

The report also questioned whether a more efficient and interactive form of communications and information sharing would have made a difference, given NASA's dysfunctional corporate culture.

Between 27 January and 31 January, "phone and e-mail exchanges, primarily between NASA engineers…illustrate another symptom of the cultural fence that impairs open communications between mission managers and working engineers", according to the report.

"These exchanges and the reaction to them indicated that during the evaluation of a mission contingency, the Mission Management Team failed to disseminate information to all system and technology experts who could be consulted. These engineers - who understood their systems and related technology - saw the potential for a problem on landing and ran it down in case the unthinkable occurred. But their concerns never reached the managers on the Mission Management Team that had operational control over Columbia."

Dan Verton writes for Computerworld

Read more on CW500 and IT leadership skills