Big data storage – what went wrong?

Big data storage seemed a prime opportunity for storage suppliers, but most users run DAS, shunning dedicated systems. Is that set to change?

On paper, the emergence of big data seemed like a match made in heaven for the storage industry, writes Simon Robinson. But, the reality has been underwhelming to say the least. What went wrong, and is there any reconciliation on the horizon?

When the big data phenomenon first emerged around three years ago it seemed like manna from heaven for storage companies; a golden goose had seemingly fallen into their collective lap and many vendors wasted no time turning up their marketing loudspeakers to eleven about big data storage.

Fast-forward to today, and the reality for the storage industry is rather more sobering.

On the one hand the big data bandwagon continues to clatter ahead at full speed. But, adoption still remains low and mostly experimental, although interest in all things concerning the volume, variety and velocity of enterprise data remains sky-high. Meanwhile, some big data startups continue to attract substantial levels of funding, and the belief that these big data gambles will at some point pay off remains intact.

Big data – No big deal for storage

On the other hand, and despite being apparently ideally positioned, storage companies haven’t been invited to the big data party, despite many of them repeatedly banging hard on the door. Over the last couple of years we have seen many storage companies invest – in technology as well as in pure marketing – to align themselves with the big data movement.

On the technology side, most of this has focused on the belief that the things that many storage companies are good at – effectively and efficiently storing and protecting large data volumes – would be a shoo-in for companies kicking the Hadoop tyres.

Why rely on the rather limited data storage model within Hadoop, when you likely have millions of dollars of purpose-built storage equipment – and expertise – already in your datacentre? Accordingly, storage giants such as EMC, Symantec and IBM, as well as smaller suppliers such as Cleversafe, have worked to integrate their software technologies with Hadoop.

The net result of these efforts in terms of adoption appears to be little more than a collective “meh”; at this point the industry has simply shrugged its shoulders. Sure, you can find pockets of adoption here and there, but the storage industry has so far failed to find a way to tap into big data in a significant way. It’s telling that in our long-running and extensive conversations with the Hadoop community – both distributors and end-users – that the topic of storage has never once been raised. It just isn’t viewed as an issue.

This apparent ambivalence to all things big data-related is also apparent in our own end-user research.

When we ask enterprise storage professionals at medium and large enterprises about their plans to invest in big data solutions, a quarter said they already have made those investments, while a whopping 40% said they had no plans to do so. Of the remainder, 14% said they planned to invest in big data in 2013, with 13% saying these investments would come sometime beyond 2013.

Tellingly, many enterprises told us that the way they deal with big data storage is by leveraging their existing SAN. And enterprises have told us for two years running that big data makes up just 3% of their total storage footprint.

Big data is out there – just not in the datacentre

Clearly, there is some confusion here. That’s not surprising with a term that has been used and abused to an extraordinary degree. Ask a storage pro about big data, and in their mind’s eye they may be thinking about something very different to what a data scientist may imagine.

But we think something else is happening here. At this point most real big data initiatives (most typically Hadoop-based projects) are not actually running in the core datacentre. They are being run on an ad hoc, experimental basis by individual departments, such as engineering, product development or marketing.

The core IT department may not even be aware that such projects are underway. In such instances it’s easy to see why storage isn’t on the radar; what matters here is that the storage is cheap and simple to use. Expensive and hard to manage external systems such as SAN and NAS are viewed as overkill while DAS rules.

While we don’t expect a huge amount of change to here in the near future, one question over the longer term is whether Hadoop projects will reach a level of scale, maturity and importance that it becomes necessary to throw them over the wall to the IT department.

Prospects for big data storage

Are there signs that this is starting to happen? Some, but it’s still early days. When we asked storage professionals earlier this year what factors were driving data growth, 14% said ‘big data (advanced analytics)’, though we note this is just one of many data types/applications experiencing rapid growth. It was also well down the priority list behind more pressing issues such as server virtualisation and meeting the needs of new and existing business applications.

Meanwhile, some suppliers are starting to think differently about how they can add value. Seagate’s recently-announced Kinetic initiative – whereby a key-value store is implemented into Ethernet-enabled hard drives – opens up the potential of radically simpler large-scale storage systems that could be used as a cost-effective back-end for a range of big data and object-based applications.

On a slightly different tack, EMC is positioning its ViPR product as a platform that can manage multiple applications and storage environments from a single pane of glass. While the initial focus is on traditional storage protocols such as file and block, it now also support object storage, and will shortly be able to manage Hadoop environments as well.

Some other storage companies are moving in a similar vein, for example Scality and Inktank are bringing together file and object storage capabilities into a single platform, in part because on paper it makes sense to have large unstructured data residing in a common repository, regardless of the actual data access method.

Indeed, these players believe centralised management will eventually emerge as the real carrot for IT managers. Big data – be that Hadoop or any one of its many and growing variants – is, after all, just another data type, and it should be treated as such.

A big challenge in many IT departments is that storage is already fragmented into too many silos, and the risk with big data is that it just becomes another island of data, separate from everything else and further compounding overall management costs.

In this context we give storage vendors some credit for actually anticipating this issue and coming up with some creative technical workarounds. Unfortunately for them, the reality so far is that these workarounds are ahead of the market, and in large part they have become a solution looking for a problem.

Nonetheless, it’s still very early days and we remain optimistic that storage will eventually play a more prominent role in the overall big data story, but the exact nature or the timescales involved are still to be determined.

Simon Robinson is vice-president of storage at 451 Research

Next Steps

HGST aims object storage at cloud providers

Read more on SAN, NAS, solid state, RAID