ink drop - stock.adobe.com

Scattered data, cloud transfers creating challenges in enterprise AI

The cost, speed, and governance of moving petabytes of across multi and hybrid cloud environments is becoming a challenge for enterprises looking to harness the benefits of AI

The physical limits of on-premises storage and compute capacity originally drove the enterprise shift to the cloud, where users have access to “practically infinite” resources, according to Chalan Aras, senior vice-president and general manager of acceleration at Riverbed.

Consequently, petabytes of data are now stored in the cloud, and organisations are eager to apply artificial intelligence (AI) to put this information to work.

However, this data may not be in the ideal location for AI processing. Under a multi-cloud strategy, enterprise data is inherently spread across multiple providers. Even if all the data required for an AI project resides within a single cloud, it may be stored in a region where power is expensive and therefore unlikely to be equipped with the graphics processing units (GPUs) needed for AI workloads. Either way, enterprises face the daunting task of moving massive quantities of data.

Such data movements are expensive, costing up to $80,000 per petabyte in egress fees alone, Aras warned – even when transferring within a single cloud provider. Furthermore, transfers must be strictly governed to ensure the right data reaches the right destination entirely intact. Speed is another major bottleneck – transferring just 1PB over a 10Gbps connection takes around nine days.

That’s just the historical data. It is usual to keep feeding the AI model with the most recent data, perhaps once or twice a day. The volumes are much smaller, but it is still important to do the transfers quickly and efficiently as the model is running round the clock, and governance is still essential.

Riverbed is now taking its 25 years of data movement experience and applying it to customers’ cloud environments, Aras explained. The process involves extracting data from storage and optimising it for network transfer. “We’re serving it on a plate,” he said.

In one instance, an organisation needed to transfer 1PB of data to a new location for AI training but found its existing processes would take 12 days. This was only the first parcel of data, with a further 20PB still to move. The organisation had already booked highly sought-after GPU time, and the looming deadline was at risk. Riverbed completed the entire task in three to four weeks, rather than the projected eight to nine months, ensuring data transfer was no longer the project’s limiting factor.

Similarly, following a merger in the financial services sector, a company needed to transfer roughly 30PB of data from one cloud to another. Riverbed completed the migration in just over a month while meeting the required governance standards.

More broadly, IT teams at established enterprises have historically made data storage and processing decisions based on the circumstances of the time. Layered on top are various decisions made at a departmental or line-of-business level. As a result, businesses today typically operate a mix of on-premises datacentres, multiple cloud environments, and numerous software-as-a-service (SaaS) applications.

While consolidating to a single cloud provider is possible, organisations must decide if one provider can truly meet all their needs without forcing compromises. Not even the largest hyperscalers have a presence in every geography, Aras pointed out. For this and other reasons, a second provider is often necessary, even if it means sacrificing the simplicity of a single contract and a single set of skills.

Distributing systems across multiple locations is rarely an issue until all that data needs to be aggregated in one place. As AI adoption spreads, this requirement is becoming increasingly common. To extract full value from their data, businesses face a very real need to move large volumes of it – not just as a one-off, but on an ongoing basis.

This is especially true for agentic AI, which may need to pull information from a myriad of sources to effectively respond to a prompt, Aras noted. This is great for users, he said, as they can get very quick answers, but it does require the frequent movement of data.

Until recently, much of Riverbed’s business revolved around supporting one-time data transfers, such as migrating systems from on-premise locations to the cloud. However, Aras noted that customers increasingly need to move substantial amounts of data continuously to feed their AI strategies. Riverbed’s approach makes its products suitable for both situations, he said.

Read more about AI in APAC

Read more on Artificial intelligence, automation and robotics