Businesses across all industries are gathering and storing more and more data on a daily basis. But when it comes to assessing the benefits and challenges of big data, sometimes it is easy to overlook one key point: Most of the business information in use today does not reside in a standard relational database.
An often-cited statistic is that 80% of business data is unstructured, be it in word processor, spreadsheet and PowerPoint files, audio, video, sensor and log data, or external data such as social media feeds.
Exploiting unstructured data
This raises some issues for organisations that need to process and exploit unstructured data. Some big data tools, primarily those based on Hadoop, are designed from the ground up to manage and analyse unstructured information. Other, more conventional business intelligence (BI) and data warehousing technologies may not be.
Business intelligence (BI) and data warehousing suppliers have been adding support for unstructured data management to their tool sets, and some IT organisations have built their own platforms for converting unstructured data into structured records, for example, through knowledge management systems. But that can be a time-consuming and expensive process.
Moreover, the large-scale corporate knowledge management systems that were popular a decade or more ago are not usually flexible enough to accommodate new types of information or to support new analytics tools.
Businesses also want to be able to rapidly analyse unstructured and structured data. Loading large volumes of documents into a knowledge management system, let alone hand-coding metadata so they can be processed and searched, is not a practical option in organisations that want real-time or near-real-time insights into their business operations.
"Unstructured data is really coming to the fore of people's minds,” said Nick Millman, a senior director at Accenture Information Management Services.
"Historically people have been talking about data within the firewall, document management or collaboration information that is not structured, such as video, photos, documents and diagrams. But we are at a tipping point: There is as much value in unstructured data in terms of what customers are thinking on the web and what businesses can derive from other organisations' data."
Risks and pitfalls of unstructured data management
It would be a mistake to see analysing unstructured data as always yielding a quick win. Businesses face a number of challenges, including data quality, data categorisation, combining structured and unstructured data, and handling the potentially large volumes of information that are involved.
This might mean technology upgrades, including to new databases and BI or analytics tools. It might mean moving to entirely separate systems to process unstructured data, perhaps via the cloud, or relying on in-house development and customising Hadoop, MapReduce and other open-source tools. But above all, it requires an understanding of the type of information the business is looking for and the kinds of insights business managers are hoping to draw from the data.
"Where business is conducted online, there's a lot that can be done. Google has done it for its entire business," said Millman.
"You can carry out experiments that were not possible before. For example, you can find out if a drug works better or not."
Often, though, the more considered the query, and the more focused the search, the better the results. This rule applies to both structured and unstructured data.
Being able to process information via the internet – using cloud computing resources – and make use of online data sources has opened up a powerful new set of options when it comes to analysing unstructured data.
Services such as Twitter's Firehose of raw data, for example, are already being used by companies for everything from customer service and market research to adjusting their supply chain and logistics strategies. And much of that can be done in close to real time.
But, above all, the decision to analyse unstructured data cannot be driven by the IT department or technology availability alone.
"The right way to do it is not to start with the technology, but understand what the business is about," said Stephen Black, data management expert, PA Consulting Group. "You need to have a clear vision. You need to work back from that to the source of the information. But it is also true that questions that you couldn’t answer ten years ago are answerable now."