Companies struggle to get past open source ‘big data’ experimentation

Open source big data technologies are attracting enterprise interest. Experts say, Experiment, but have business value in mind.

Open source "big data" technologies seem stuck in the sand pit of experimentation in UK corporations.

This has been a consistent theme in interviews with people familiar with early big data implementations.

For more on big data technology adoption

Find big data wading cautiously into the mainstream

The ComputerWeekly 500 Club on the business value of big data projects

Find out why big data analytics projects  are difficult, but doable

Speaking at a Computer Weekly roundtable on the topic, Bob Harris, chief technology officer at Channel 4, said big data initatives will likely require organisations to adopt new technologies.

"Some data problems are not appropriate [for conventional database] technologies because they are larger than what we can ask them to comfortably process," he said. "MapReduce was put into the hands of our users in 2011, and these are tools with sharp ends. However, I think the technologies are complementary."

Eddie Short, a partner with KPMG advisory services, who heads the firm's business intelligence practice, said that while "a lot of mainstream businesses are dabbling in big data, its appeal is more to out and out techies. And they won't be getting their way ahead of the CIOs [who are] protecting the crown jewels [of IT infrastructure]."

But Short's firm is encouraging clients to play around with big data. "They need to experiment and prepare for managed failure", he said, adding that there is "a big chasm building" as companies wait for big data to become mainstream.

Organizations using Hadoop -- the open source big data processing technology created by the Apache Software Foundation -- are also very much in the experimental phases.

"You will have to look hard to find anything beyond small pilots. It is a fundamental of IT that you need to 'show me the money,' " he said. "In theory if you can link information to outcomes, a powerful case should be there, leveraging big data, being more real time, more responsive to the market."

But all too often, this is not the case, he added.

Steve Shelton, head of data at BAE Systems Detica, said that "if implemented incorrectly, big data can be an expensive mistake."

Without naming names, Shelton confirmed that his firm is working with clients that recognise the potential of big data, but hit a cost crisis when experimenting with tools that "were not doing much for them."

The skills sets around open source software like Hadoop are so specific that companies are struggling, he said.

In any case, he continued, "many of these systems are immature compared to more established enterprise models," especially with respect to documentation. This is less so with Hadoop, but more so with Storm -- a Python programming library -- he said.

And then there is, inevitably, the question of security. "Some of these tools don't have security built in," Shelton explained. "Hadoop is secured to a degree, but it does assume a trusted network. At Detica we run it on a physically separate network."

Shelton said his clients are still in the very early stages of adopting Hadoop and complementary technologies.

"It is a long term thing," he said. "It won't be over in a year's time."

Bob Tennant, CEO at financial and legal services oriented enterprise search company Recommind, commented that while good for "data ingestion," Hadoop is "too batchy" for real-time data management and not yet ready for the level of security financial institutions require. "But data volumes are increasing, so [organisations] will have to get there," he said.

Martin Willcox, director of product and solutions marketing, Europe Middle East and Africa, at enterprise data warehousing company Teradata, acknowledged the new analytics opportunities opened up by Hadoop and other big data technologies.

"It's less that with big data you can now boil the ocean than that you can now cook certain parts of it. Hadoop can capture everything, but then business users can say what data they are interested in," he said. "That inverts the traditional relationship between IT and the business. It's not so much about business dreaming up the questions and IT developing the supporting infrastructure. Now, business professionals can go through an iterative process to surface the questions."

But he also sounded a cautionary note.

"The problem is that all big data technologies were designed by programmers for programmers. That is a major barrier to adoption by analysts in organizations," he said. "People who are expert statisticians, programmers, and have a deep knowledge of business are few in number."

Willcox said his company's Aster Data technology is a bridge between SQL and Hadoop.

"Big data BI is a cottage industry, with three guys in [each organisation's] basement. To industrialise it, as with traditional BI, requires such connexion," he contended.

Willcox confirmed that Teradata's customer base for its big data offer is still largely West Coast US. In EMEA, they have "proof of concepts running with retailers, banks and high tech manufacturing. There is a lot of interest in Europe, but it is hindered by hype."

He reported that there is a lack of business focus in evidence in Europe. "There are people doing big data projects by installing a Hadoop cluster and capturing everything without thinking through what the value could be."

At the ComputerWeekly round table on big data , Doug Cutting, who helped create the open source Hadoop framework said, “I don’t see big data suddenly crossing the chasm between Silicon Valley technology companies to mainstream companies. It feels like a path of pretty steady growth."

Read more on Master data management (MDM) and integration