"As we know, there are known knowns. There are things we
know we know. We also know that there are known unknowns. That is
to say, we know there are some things we do not know. But there are
also unknown unknowns, ones we do not know we do not
know."
When former US secretary of defense Donald Rumsfeld gave his
famous "known knowns and unknown unknowns" speech, he left out one
configuration: unknown knowns.
Yet that, in a nutshell, is the unstructured data problem that
many companies are facing these days: the things we do not know we
know. With an estimated 80% to 90% of corporate data held in the
form of e-mail word documents - and increasingly
voice and video files - much vital information is beyond
traditional methods of
data analysis and retrieval.
"In that 80% or 90% is some really important stuff," says Ovum
senior analyst Mike Davies. "There is a growing realisation in
organisations that the information they hold is either an asset or,
more importantly, a liability."
A number of issues have converged to bring unstructured data to
the top of the agenda. A key one is the loss of people who have
traditionally held the corporate memory, as the baby-boomer
generation reaches retirement age.
"In the states they are now having to interview the people who
built the nuclear reactors," Davies says. "Much of that information
could have been written down, sitting in that unstructured
data."
The proliferation of compliance and regulatory regimes such as
the
Markets in Financial Instruments Directive in Europe and the
Federal Real Property Council in the US has also forced
organisations to come to terms with the threat that not having a
handle on unstructured data poses.
"Not being able to find an e-mail is not a get out of jail free
card any more," says Costi Perricos, a director in Deloitte¹s
consulting practice.
"You need to be able to categorically prove it was not sent.
Lots of people file their inboxes, but how many people file their
outbox? Yet in a client-service environment, your outbox is more
important if we give advice to a client, that is the advice we
need to find."
These problems multiply in organisations where even the
structured data is scattered among different systems. Take BAE
Systems, Europe's largest aerospace company and the product of
years of mergers and acquisitions.
"We have inherited a whole bunch of information systems and network
drives that will not talk to each other and never will," says
Richard West, head of organisational and e-learning at BAE.
"There are no common tagging or data-retention or versioning
policies how do we find key information from a
knowledge-management and compliance perspective among hundreds of
terabytes of unstructured information?"
BAE has adopted enterprise search software from
Autonomy. The software not only sits on top of multiple
information systems but is also able to burrow beneath the
incompatible metadata attached to documents to uncover their
meaning. For BAE this was vital, as it seeks to share best
practices across the organisation without limiting people to their
personal networks in their search for advice and expertise.
"People finding people is the key to me," says West. "I do not
care so much about the documents themselves, but a document tells
you who its owner is." Because the search is enterprise-wide, it
can also pull in competency information from human-resources
systems and search across discussion forums and wikis to find
people who are the "hotspots" of thinking on particular topics.
Another feature of the system takes a leaf out of the product
recommendation and profiling techniques that online retailers use
to create "learner profiles", which can link individuals performing
similar searches.
The increasing familiarity of users with sites such as Amazon.com,
Facebook and Match.com has helped West sell the system.
"When you talk about knowledge management, it goes over people's
heads," he says. "A few years ago, if you talked to a bunch of
engineers about this, they might have been worried it would be used
against them. Now they are all doing this stuff outside on the
internet and we can explain that we can do something similar
internally."
Sharing best practices
Benchmarking the implementation of enterprise search suggests it
has achieved a 90% improvement in access to information. But the
prize for West is the ability to re-use and share best practice
across projects.
³"There have been a lot of false dawns of this sort of stuff," he
says. "But the organisations that have got a handle on it are
achieving huge competitive advantage."
That intuition is backed up by formal research by firms such as
consultancy firm Accenture¹s "High Performance" project.
³The message from clients is clear," says Stephen Gallagher,
global director of analytics in Accenture's Information Management
Services practice. "High performing companies stand out from other
companies in their use of analytics to a higher level to be
competitive."
Unfortunately, matching this is not simply a question of
acquiring the right technology, "There are two problems," says
Gallagher. "One is that the software is quite immature and
relatively difficult to use. The other is to find enough people to
understand the analytics."
On the first point, Gallagher points out that smaller suppliers
of unstructured data analysis tools are rapidly being bought up and
integrated by big players such as IBM and
Oracle.
The people issue is more thorny, but Gallagher notes Gartner¹s
predictions that companies will increasingly form business
intelligence competency centres to bring together scarce analytic
skills scattered across their organisations.
"Clients are recognising a market need to consolidate their
skills," Gallagher says. "The guy in the accounts department can
use the same analytics skills he uses to detect fraud to do
customer analysis."
Customer analysis using structured data has been a staple of
business intelligence for years. Unstructured data contains its own
gems, but it is when the two are combined that real benefits
flow.
"It is not only about extracting keywords, or even concepts from
unstructured data, it is about extracting sentiments, either
positive or negative," says Olivier Jouve, vice-president of
product marketing at data and text-mining specialists SPSS.
"You always call a call centre to complain about something, but
it may not mean you are about to churn. You could call every week
to complain, so what really makes sense is to combine structured
and unstructured data so we know who you are."
This approach allows companies to tackle unhappy customers on
two levels.
For example, Swiss cable operator Cablecom used data and text
mining to analyse the free text responses of those customers in a
survey that said they were unlikely to recommend the company to
others.
The output not only allowed those customers to be tackled and
turned round on a case-by-case basis, but also provided the basis
for more wide-ranging changes that would affect every customer.
But increasingly companies are looking outside their own stores
of data for customer insight.
"More and more people use data and text mining to look at
external sources of data, such as forums, blogs and wikis, where
people talk very freely," Jouve says. "They use it use to
understand better how they are seen in the market."
The key difference between this approach and traditional
business intelligence is the predictive nature of the information,
as Mike Lynch, Autonomy¹s CEO, explains.
"It is about trying to pick up what is 'over the horizon', the
things that are about to become very important," he says. "For
example, trying to work out that the US sub-prime mortgage market
was about to become important before everybody else realised."