Round table question: What is the significance and potential value of big data?
Big data is a term that has caught the imagination of analysts and database and data warehousing vendors. But what does it mean for corporate IT users? How is it to be most effectively managed through data warehouse and database technologies? How can business intelligence software and systems be harnessed to get the most value out of big data?
Our round table of experts offers some answers.
Yvonne Genovese, vice president and distinguished analyst, Gartner
Big data is a popular term generally used to acknowledge the exponential growth, availability and use of information in the data-rich landscape of the emerging information economy era. It is a disruptive force and an immediate problem that is already affecting traditional understanding and business models. However, big data is just the start. In the future, the full range of extreme information management issues — of which volume is just one aspect — will pose even greater challenges, but it will also enable the emergence of even more significant business opportunities.
Business leaders and technologists have traditionally focused information-seeking efforts on answering two questions — what happened and why did it happen? For the most part, investments in business intelligence delivered reports that helped in this area. But business leaders are asking three new questions: What is happening right now? What is likely to happen? What events could affect the future?
There is potential value in evaluating types of data that currently exist in the business and some new types of data. Many organisations have stored data for years and have never attempted to analyse it or look for patterns simply because the business appetite for doing so didn't exist. For example, I know of an insurance company that now plans to analyse petabytes of claims information (text, video and image), looking for patterns that may dramatically change the way claims are managed and paid.
In addition to the untapped opportunity of data currently stored in organisations, there is a new world of data emerging from sources like social media and mobile devices. This new connection point provides access to information that can improve relationships, reach new prospects and markets, tell us how to improve customer retention and push specific information back to individuals or groups. Gartner calls the strategies to “seek, model and adapt” to patterns of change a pattern-based strategy.
By sweeping away current limitations derived from data constraints and exploiting a growing universe of existing enterprise data and publicly available data from external sources, a whole new era of digitally accelerated business models will emerge that has the potential for substantive new revenue and competitive advantage.
Andy Hayler, CEO, Information Difference
In 2001, the largest data warehouses in the world were around 100 TB in size, but in just a decade, data warehouses 10 times this size have emerged. This tremendous growth has stretched database capabilities to the limit and heralded the emergence of different approaches to the analysis of big data. A number of innovative products such as database appliances have appeared, applying either clever hardware or massively parallel processing approaches (or both) to tackle this vast increase in numerical data. For less well-structured data, as is often the case with website traffic or social media, radically different approaches have evolved, pioneered by companies like Yahoo and Google, and they are now becoming more widespread in the form of Hadoop, an open source form of these highly parallel approaches to intensive computation and file handling.
The challenge for companies is how to cope with such increased scale, but the opportunities are as jumbo-sized as the data. Companies that can successfully analyse their burgeoning data volumes and put this to good use will be in a position to react better to their competitors. The manner of this will depend on this industry, but it can be seen that hedge funds able to more rapidly test trading strategies, or marketing companies able to better predict customer behaviour, or gaming companies or banks able to better predict fraud than their competitors will be at an advantage. Governments that want to detect criminal behaviour or potential terrorists also benefit from the new approaches being taken to analysing big data. With the amount of data being generated increasing all the time, especially machine-generated (e.g., by sensors or radio frequency identification [RFID] tags) or Web-generated, companies that rapidly adapt to these challenges will be able to lead the pack.
Mark Whitehorn, co-founder, PenguinSoft
The term big data is somewhat of a misnomer – we have been collecting data for years, and the big prefix implies more of the same, albeit in much greater quantities; but that is not the case.
The data we have collected in the past has generally been well-structured. For each employee we collect the same pieces of data (date of birth, salary, name, etc.). This fits very well into the nice, neat regular tables that are found in relational databases and are easy to query. It is true that the volumes of structured data are increasing, but not excessively. No, the data explosion that leads to the term big data is of what we call semi-structured data – data in Facebook, Twitter, images, audio, email and so on.
So, why is the type of data that we are storing changing so dramatically?
Well, think about it this way. Ten years ago, storing 10,000 patient records, at 100K each, in a hospital took 1 GByte. If we double the number of words about each patient, that would double the storage to about 2 Gbytes. But in reality, current storage for 10,000 patients can easily top 1 TB. It isn’t the words that make the difference; it is the ECG scans and X-ray images. Are these of potential value? Well, as a patient I really, really want the best diagnosis possible, so I am delighted my doctor has ready access to the images and, if I move hospital, I want medical staff to be able to follow me electronically rather than physically.
The same is true in the commercial world – massive increases in data volumes, and little of it is text. EPOs (electronic points of sale) has proved wildly useful in supermarkets. We are now seeing the use of Global Positioning System, or GPS, and RFID tags to track components, and temperature, humidity and pressure sensors to monitor the environment where they are stored. This can generate huge volumes of data (some sensor pump out data hundreds of times a second), but we can track stock much more efficiently, move to just-in-time supply and so on. Now imagine that you want to monitor the reaction of the public to some price changes that your company is making. You need to track it in near real time so that you can readjust the price if necessary.
Monitoring sites like Twitter will tell you the public’s reaction (more accurately, the reaction of the Twitterverse), but you will need to monitor millions of tweets. And, as before, the data is not the same as the data we have traditionally captured and processed. After all, people on Twitter don’t rate their reaction on a scale of 1 to 5, so you have to capture tweets like this:
“Just noticed price hike OatyWheatyFlakes what company think is doing”
And you have to turn it into structured data that can be processed easily.
So, the term big data doesn’t just describe more of the same, it describes a sea change in the type of data we collect and the mechanisms we use to extract information from it. However, the value of the information we can extract from big data is such that we ignore it at our peril.
Kathy Hunter, Principal Information Management Consultant, Kynetika
Buzz phrases come and buzz phrases go. This year, one of the buzz phrases of note has got to be big data.It’s an expression heard everywhere. However, for those of us who have been around, it’s not entirely new. We’ve had big data since computers began. The only difference between then and now is the notion of what is “big.”
In today’s world, big data reflects the exponential growth of information available globally. And that’s information that can be of tremendous use to organizations that are ready and able to effectively utilise it. However, managing today’s big data with its various formats and massive sizes, often streaming around the Internet, can be a demanding and expensive proposition.
With my information quality hat on, I think that smart organizations might want to step back before taking on something this large. A few questions that might be asked before proceeding: Do our business goals require the use of this data? Can we really make use of it all? Who will manage and control it? Can we get the gist of this somewhere else? How can we tell whether it’s reliable? Is it worth the time and money?
If, after consideration, the decision is made to implement one of the many big data solutions that are out there, then proceed judiciously. Carefully pick your solution, ensuring it works for your requirements, not those of the saleperson who is selling to you, and keep your business goals in mind during implementation. Keep quality in the mix so you don’t find that all the time, money and effort invested in big data gets wasted.
Remember, your analysis of this data will allow you to draw conclusions based on information that, in some respects, is totally uncontrolled. It’s important to put checks and balances in place to ensure your company doesn’t make decisions based on information that is unreliable. Don’t allow big data to become an even bigger mistake.
Keith Gordon, Data Management Specialist Group of the BCS, The Chartered Institute for IT
Big data is a recently coined term that is used to describe large data sets whose sizes make them difficult or impossible to handle using the typical database capture, storage, management or analytical tools.
As a subscriber to SearchDataManagement.co.UK you know that information is a valuable asset. Some authors have suggested that up to 80% of the value of a financial services company rests in its information about its products, customers and staff. And even for a simple manufacturing company the equivalent figure is up to 40%.For many, the “big data problem” is one for the technologists to solve: sorting out how large data sets can be stored and accessed and so on.
But the bigger the data set, the greater its value and the greater the need to ensure that when it is accessed and analyzed the interpretation of the data is correct. This means that for big data we still need a high-quality definition of the data and we also need the quality of the data itself to be as high as possible.
The softer data management skills of analysis, communication, negotiation and arbitration are still required to produce the metadata for all of this data. For structured data the definitions of the business entities and characteristics represented by the data need to be defined, as they would need to be for any data set. But for most organizations, the majority of this big data will be unstructured – or multimedia – data, such as emails, letters, reports, recordings and so on, and it is this category of data that really needs extra attention so that it is correctly analyzed and understood.
To be of value, the quality of this data must also be maintained: It needs to be sufficiently accurate for its future use, it needs to be up to date and it needs to be complete. Procedures must be in place to ensure that when data is created it is of good quality.
In the era of big data, whilst we grapple with the problems posed by the technical problems of capture, storage, management and analysis we must remember that data is of no value unless it is of high quality and its meaning is unambiguous. The fact that there is more data means that all the challenges of data administration are still there, only magnified. Be prepared to grasp the challenge.