Knowledge management systems failures played a major role
in February's Columbia space shuttle disaster, the final report of
an official investigation has claimed.NASA's reliance on informal communications to
manage space shuttle operations - coupled with the agency's insular
culture - turned risk and danger into disaster according to the
Columbia Accident Investigation Board.
The board, established shortly after the
Columbia disintegrated during re-entry on 1 February and chaired by
Hal Gehman, a retired Navy admiral, concluded that "deficiencies in
communication ... were a foundation for the Columbia accident".
The report paints a picture of a massive
bureaucracy that relied on informal e-mail communications to manage
the in-flight analysis of damage to Columbia during takeoff.
This led to a series of discussions that took
place in a vacuum, with little or no cross-organisational
communication and often no feedback from senior managers contacted
by low-level engineers with concerns about the shuttle's safety,
according to the report.
A major element in NASA's management and
decision-making failures was its inability to integrate critical
safety information and analysis, the report said.
"The agency's lack of a centralised clearing
house for integration and safety further hindered safe operations.
In the board's opinion, the Shuttle Integration and Shuttle Safety,
Reliability, and Quality Assurance Offices do not fully integrate
information on behalf of the Shuttle Program."
NASA does have an automated system in place to
track safety critical issues, but it is "extremely cumbersome and
difficult to use at any level", the report said. As a result, the
system, which contains a list of more than 5,000 "critical items"
and more than 3,200 safety "waivers", often goes unused.
"The Lessons Learned Information System
database is a much simpler system to use, and it can assist with
hazard identification and risk assessment," the board concluded.
"However, personnel familiar with the Lessons Learned Information
System indicate that design engineers and mission assurance
personnel use it only on an ad hoc basis, thereby limiting its
utility."
The board also made clear that it isn't the
first commission to find such deficiencies. Numerous reports,
including a General Accounting Office report published in 2001,
highlighted "fundamental weaknesses in the collection and sharing
of lessons learned" by program and project managers.
That GAO report also found that "the existing
workforce was stretched thin to the point where many areas critical
to shuttle safety, such as mechanical engineering, computer systems
and software assurance engineering, were not sufficiently staffed
by qualified workers".
The report also questioned whether a more
efficient and interactive form of communications and information
sharing would have made a difference, given NASA's dysfunctional
corporate culture.
Between 27 January and 31 January, "phone and
e-mail exchanges, primarily between NASA engineers…illustrate
another symptom of the cultural fence that impairs open
communications between mission managers and working engineers",
according to the report.
"These exchanges and the reaction to them
indicated that during the evaluation of a mission contingency, the
Mission Management Team failed to disseminate information to all
system and technology experts who could be consulted. These
engineers - who understood their systems and related technology -
saw the potential for a problem on landing and ran it down in case
the unthinkable occurred. But their concerns never reached the
managers on the Mission Management Team that had operational
control over Columbia."
Dan Verton writes for Computerworld