vectortone - Fotolia

Rethinking storage in the age of big data

The volume, variety and velocity of data are challenging organisations such as insurance giant IAG to reassess the way storage architectures are designed

A year ago, insurance giant IAG had a data store of just over 80TB (terabytes), but today it manages a collection of 2PB (petabytes) that is growing by 14TB a month.

Comprising written records, recordings of phone conversations with customers, data from a sprawling internet of things (IoT) network and driverless vehicle trials, IAG’s data collection is a classic example of what the industry terms “volume, variety and velocity” of data.

Recent figures from Cisco suggest that by 2021, global internet traffic will reach 3.3ZB (zettabytes) a year, up from 1.2ZB in 2016. Monthly internet traffic will reach 35GB per capita by 2021, up from 13GB per head in 2016.

Clearly, a good deal of that data is being held on consumer devices or cloud services – but the figures give a clear signal of the pace of growth.

Storage rethink

Eddie Satterly, leader of data and production engineering and data operations for IAG, says the nature of data, how it is used and where it is collected is also changing, leading to dramatic shifts in his approach to storage over the 16 months he has been with the company.

“We went from a great deal of legacy storage network (SAN) to open source or a commercial version of open source storage,” says Satterly, adding that commodity and converged storage in software-defined networks has also enabled IAG to spin up data pools far more rapidly.

The speed and flexibility are essential not just for innovation, but also to cope with sudden spikes in demand. Satterly explains that a major weather event such as a storm can demand four to five times more storage capacity than a typical day. Being able to spin that up cost-effectively and instantaneously is essential.

“The revamped storage architecture gives us four times the performance for a fifth of the cost, and with complete flexibility”
Eddie Satterly, IAG

There is also a need to strike a balance between performance, growth and cost. Satterly has deployed a shared OpenStack cluster with multiple hosting providers to wrangle that balance. In terms of storage media, the company uses different tiers, with flash storage as its top tier. Satterly also has 8TB of archival storage.

“We have a 25% overhead built into the [storage] environment that gives us the ability to burst, and we review that every quarter,” says Satterly. In addition, the revamped storage architecture means the company has “four times the performance for a fifth of the cost, and with complete flexibility”.

Sophisticated technology

Julia Palmer, Gartner research director for storage infrastructure and converged technology, says storage has become a more sophisticated issue for enterprises in recent times – not least because of the level of innovation among storage suppliers.

But this also injects a level of complexity, with different technologies being used for primary and secondary storage, and for different types of data. At the same time, CIOs are looking out for technology that can support their growth without compromising performance, she says.

When it comes to the underlying storage media, Palmer notes that although the focus on performance has led many organisations to explore solid-state drives (SSD), they should question whether the technology is stable enough for business-critical applications despite supplier claims.

Read more about storage in Australia and Zealand

Although all-flash and SSD revenues are expected to grow by 31% each year through 2020, the fact that legacy applications are generally not designed to take advantage of faster storage – apart from as a caching solution – has been holding back flash adoption.

“It is still early for flash rollouts,” says Palmer, noting that it remains a “huge undertaking to have the software stack designed for all-flash”. More startups, however, are increasingly developing applications to take advantage of all-flash storage, she says.

Arron Patterson, chief technology officer of Dell EMC, says containerising legacy applications will help to address the legacy issue. Coupled with hyper-converged storage architecture, which is fully software-defined, containerisation offers maximum flexibility and longevity, he says.

And when it comes to secondary storage, which Palmer says is growing 40% year-on-year, cost becomes a major factor.

Patterson acknowledges the eternal tension between capacity, performance and cost of storage. While storage suppliers have boosted the performance of the current crop of SSDs, they have also created challenges in cost and capacity that could only be overcome with compression and deduplication.

“The next generation of technology will potentially address all three at once,” he says, citing the example of carbon nanotube-based storage, which is at least five to six years away from being commercialised.

Cloud is no panacea

Cloud-based storage has been spruiked as one possibility to address storage challenges arising from big data, but Gartner’s Palmer says the major use case for cloud storage is still backup and recovery.

However, she notes the impact of rising demand for software as a service (SaaS): “Ideally, storage follows compute, so if you have workloads in the public cloud, then storage will be in the public cloud.”

“Ideally, storage follows compute, so if you have workloads in the public cloud, then storage will be in the public cloud”
Julia Palmer, Gartner

For example, enterprises that are turning to SaaS-based analytics to make sense of their data would also need to have storage in the cloud. This can be done through a gateway that provides access to a range of storage solutions delivered by a managed service provider.

Dell’s Patterson also sees a rising role for cloud storage in managing data collected at the edge of a corporate network through IoT devices. That data could be analysed in the cloud, with actionable insights sent to core in-house systems.

As for mission-critical workloads, Patterson warns that the latency of cloud-based storage would work against applications that demand high performance, so enterprises will still need a small pool of high-speed storage.

Read more on Storage management and strategy