Data is the lifeblood of an organisation but its flow needs to be managed effectively. The challenge for IT chiefs is that data volumes are getting bigger with the explosion in the use of unstructured data and social media sites, such as Facebook and Twitter, holding valuable data organisations want to effectively exploit.
The desire to sweat information assets for possible competitive advantage is being fuelled by the tough economic climate, but many IT leaders are struggling over how to extract value and empower business users in the face of a big data skills deficit and concerns over the governance, privacy and security of data. IT leaders met at a recent Computer Weekly roundtable event, in association with Oracle, to discuss the possibilities and challenges of big data.
Where to use big data
Many of the IT leaders at the event were at the exploratory stage of using big data technologies, but can see its potential.
Peter Elliott, head of architecture and design at Vocalink, was an example. The payment processing firm was formed from two companies with lots of legacy systems.
"We want to look at how we exploit transactional data and create a centre of excellence around the business," he said.
Stephen Holder, executive director at UBS, said his challenge is knowing what data to keep: “We get rid of a lot of data because we can’t hold it, but we are interested in holding and querying the data and how this may develop or evolve.”
There’s a new way of dealing with data in town
Doug Cutting, creator of open-source Hadoop framework
Simon Griffiths, head of business intelligence (BI) technology at Vodafone, said it's important to understand the relative merits of different technologies.
“The differentiation is about types of analysis – what you do with each technology rather than throwing one away. Relational databases are good at some things and bad at others. We can look at new types of analysis for existing data sets,” he said.
Another delegate, from a large police force, said a lot of important data is not yet in the same place.
“We have large amounts of unstructured data and it is siloed in various systems so we are looking at changing the paradigm. We support processes, but we are an information organisation and want to deliver the right information at the right time,” he said.
Griffiths said Vodafone has been dabbling in customer sentiment analysis on a small scale.
“The initial indicators are reasonably positive,” he said.
Patrice Gourlet, global BI architect at BAT, said big data analytics could help the interaction with customers via social networking sites.
“There is scepticism surrounding business intelligence over issues such as whether the data is true and if the right rules are in place. The tobacco industry is heavily regulated in its merchandising and marketing, but another area for data is within social networks and how we may be able to interact directly with customers, not through the authorities, but in a way which is allowed,” he said.
But he says there is an issue of verity.
“This is about data quality. You need to walk before you can run and understand how you do it as you move up the maturity curve. It will be difficult, but only then can I unleash data to big users and get value,” he said.
Privacy and compliance
Big Data: Hype versus reality
Doug Cutting is a big name in big data. He helped create the open-source Hadoop framework that allows applications based on MapReduce to be run on large clusters of commodity hardware. He questioned the hype surrounding big data.
“I don’t see big data suddenly crossing the chasm between Silicon Valley technology companies to mainstream companies. It feels like a path of pretty steady growth,” he said.
Firms collect lots of data and find valuable uses for it, but the majority of organisations have more potential applications to exploit big data than they have actually deployed.
“Not all these applications will pan out, but a large number will. We are at the beginning and there is meat here. There’s a new way of dealing with data in town,” he said.
The combination of the affordability of commodity hardware and software that allows effective processing of big data is propelling the trend, but Cutting said there is a skills issue.
“A lot of smart folk are not being leveraged to the degree they could be because of the way things are structured and how information is siloed,” he said.
Cutting said big data technology removes silos by using fewer but bigger clusters which are more economical and create data sets for better analysis.
“This allows people to experiment and explore data with their ideas and test them, but we are in a new cycle and it takes time for people to come up to speed. There’s a skills deficit; demand is great,” he said.
Andy Mendelsohn, senior vice president, Oracle server technologies, believes there is hype regarding big data, but data growth has been part of the territory for years.
“In 2000 we had the first terabyte relational database. Twelve years later, relational database analytic systems have experienced a thousand-fold increase into the petabyte range. There’s no reason we won’t move to the exabyte range over the next 10 years. We started going down this path 15 years ago, but we are now calling it big data in public. Whereas it used to be for a few geeks, we are all data scientists now,” he said.
Mendelsohn thinks big data and relational databases will co-exist.
“There is a myth that relational databases will go away and everything will be done on Hadoop. MapReduce is an interesting technology but the technologies are complementary and customers will use both; they will put Hadoop in front of the Exadata relational database. Let’s see how this evolves, but the recognition is that we’re all information companies now and data is the lifeblood of companies,” he said.
The challenges are how technology engines, such as MapReduce, Hadoop and relational databases, will co-exist and integrate, and what’s the right use for each.
“We ask customers about their business case and uses, but we are at the early stages and there is a long way to go, and we are learning too,” he said.
Griffiths said problems with data volumes are manageable, but the real difficulty concerns privacy.
“We manage 86 to 100 billion global [mobile] calls a day which is big data. There is information we collect and throw away within a few hours. There is location information, such as where the phone is at any point in time. We could use this information, but there is the whole privacy and provision piece for users,” he said.
Andrew Rowland, global head of database engineering at UBS, said there is a privacy and a relevancy element surrounding data: “You might have bought ladies underwear online once, but do you want to be presented with it every day? Do people want every bit of data assessed for them?”
Elliott said the key questions to ask is what can be done with data from a compliance viewpoint.
“Data is not always in our control. Whatever our aspirations, organisations are hit by compliance and industry regulations. The business knows it can do things with data but how do we develop the business case? How do we get competitive advantage and make better use of it? Who do we need to convince?” he asked.
Michael Bollen, head of infrastructure and application services at Bank of Ireland, said the question is how you communicate and structure information for customers.
“For us it is natural territory, but how you explain this to the external or internal customer base is the challenge,” he said.
He said it is important to look ahead and ask where it makes sense and where is the money, but the fear is that “only the size of the mess will be different.”
Mark Atwell, head of software engineering – risk & finance technology at Royal Bank of Scotland, believes the business will drive many of the uses of big data.
“We have an employee-led initiative and request for comment and ideas for how to use big data as an enabling technology. What data have we got out there and how can we use it? This won’t be a classically technology-led thing,” he said.
Users who have ideas to improve the business may find big data a powerful enabler. Elliott said a “big learning curve” is ahead, but money is still an issue.
“What does the business need and what does it want? Some marketing information might be throwaway data and the business might not know what it wants. Some information needs to be 100% accurate; other data may be for ‘fuzzy logic’, but the tipping point is cost,” he said.
The future for SQL
Bob Harris, CTO at Channel 4, said big data may mean using new technologies.
“Some data problems are not appropriate to [conventional database] technologies because they are larger than what we can ask them to comfortably process. MapReduce was put into the hands of our users in 2011, and these are tools with sharp ends. However, I think the technologies are complementary,” he said.
Andy Mendelsohn, senior vice president, Oracle server technologies, said skills availability is an issue, but SQL, the programming language designed for relational databases, is still a powerful way for the casual user to retrieve data.
“How will SQL develop? It won’t be the only way, but it will be important in business for marketing and finance people to ask questions. SQL may evolve entirely differently from how it has been used in the traditional database space,” he said.
But some delegates disagreed that SQL is automatically the language to use.
“SQL is a good language in a particular time and place. With the changes in technology and speed of technology we are not looking at the same levels of extraction; the paradigm has changed. We recognise someone from a video image in the same way as we used to identify someone in the past from a social security number. No-one thinks of doing modern stuff in SQL,” said one.
Other programming languages, such as Hive, may be the answer.
“It looks like SQL, and is a great way to take people forward,” said Bollen.
But Griffiths said people will continue to put SQL on top of MapReduce.
“This won’t change in the short-term, and it is important to support all different types of analysis. Business users want to point and click,” he said.
More on big data
Holder said UBS deals with financial data, and the infrastructure has been built on SQL.
“SQL is not suitable for everything, but it is a big component today so there is a need to support it,” he said.
Doug Cutting, creator of the open-source Hadoop framework, said there are different kinds of tasks and solutions and there are options available.
“Any general purpose language is not appropriate for every task. For example, SQL is good for dealing with the insurance industry’s policies and claims, but not necessarily for open questions and actuary tasks which are fuzzy. MapReduce is a language more complicated than most people can deal with. It can take some time to figure out the higher level tools to give you the most power,” he said.
But users must understand the issues, said Bollen: “Business users need to properly understand and make sense of the data their business produces. The big data paradigm must be made useful to business; rather than being just a fluffy conversation."
This was first published in May 2012