Panning for gold - the data we know and use and the data we don't

This is a guest blogpost by Matt Davies, technical evangelist, Splunk

They say you don’t know what you don’t know. It’s the same with your data. Most organisations have data they know about, collect and use. This data is typically structured, neat and tidy and probably in some form of database or data warehouse. However there is also a wealth of data that they don’t know they have and aren’t using. This is most likely machine data. It comes from every technology interaction be it machine-to-machine, person-to-machine or person-to-person (via technology). According to IDC, the digital universe is growing at 40%  a year and most of the data generated is machine data. It is coming from core IT, customer-facing applications, cloud computing, mobile devices, social media and the Internet of Things.

This machine data has certain characteristics: it is in motion (often very fast motion), there’s a lot of it and it is time series data. It is also messy data (it is unstructured) and it is lazy (every company generates it but it typically gets left unused). But, to steal from a Wild West cliché, “there is gold in them thar data”. The challenge has always been how do I get to it, how do I make it useable and how do I find value from it?

Increasingly the ability to use the same information for multiple purposes is one of the secrets to making the most of any kind of data. Think of your real-time machine data as a stream of light, you need some form of prism to be able to look at this data with a different “lens” or colour. The same data has value for security, IT, customer service, and so on. The term data silo isn’t new and barriers preventing anyone from using data is often what hinders data-centric initiatives. A lot of time is typically spent collecting and preparing data before you ever start to ask questions and get the value from it.

 So if there’s data we’re not making use of, that has benefit for multiple audiences and it takes a lot of time to get to the value from it – how do you start? Technology and modern data platforms can help but this must go hand in hand with building a culture of exploration around your data and using analytics as a way of democratising it for everyone.

I have seen a great example from Deutsche Bahn, which ran a 24 hour hackathon where they provided a data set from their rail infrastructure and challenged all-comers to “show us what’s in the data”. By exploring this previously untapped source of data, Deutsche Bahn found out potential train delays, how journey time was impacted when comparing wooden vs. concrete sleepers and where outages are more likely to occur.  I thought it was an interesting example of largely unused data, a culture of exploration and valuable new insight and analytics.

To further illustrate, I was fortunate enough to be in a presentation with UniCredit, a European bank, who are managing multiple terabytes of this machine data every day. They are taking data from over 180 different sources, 8 billion events per day and 400,000 events per second at peak. They use this data to improve their banking operations, create real-time alerts, search to find patterns of behaviour and deliver real-time data visualisations. This capability is delivered to various parts of the bank for business analytics, security intelligence, ITOA, internet banking service monitoring, mobile banking insight and improved accounting. The value from the data includes improves SLAs, faster issue resolution, a real time data centric approach to decision making and the chance to improve customer experience.

Think about the data you know and use, then think about the data you don’t know you have and don’t use. Try it, explore it, share what you find and see if “there is gold in them thar data”.