Public cloud computing enables users and IT departments to deploy applications without having to make capital investments in computer hardware.
And with storage forming an increasing part of on-premise budgets, public cloud storage provides a way to convert storage costs to an operational expense, rather than a capital expense.
This provides the opportunity to create a hybrid cloud storage architecture where data is stored in multiple locations, on and off-site, at the most cost-effective price.
Public cloud providers offer a range of storage options that can form part of a hybrid cloud storage architecture.
These include unstructured “raw” storage capacity in the form of object stores, file and block services. Typically, object storage platforms can be accessed locally in the cloud or remotely from the on-premise datacentre. Block and file offerings are usually restricted to access in the public cloud using public compute instances.
There is also a range of structured products from cloud suppliers that expose database interfaces to the user. These are either proprietary or compatible with traditional structured query language (SQL) and non-relational structured query language (NoSQL) database platforms. We will not cover the structured products in this article, but focus on how the unstructured storage offerings can be used.
So, how should IT departments go about developing a hybrid cloud strategy?
Hybrid cloud access methods
Public cloud storage is usually offered in two ways: either across the internet, or private wide area network (WAN), to the cloud provider; or through public cloud computing services.
Accessing data remotely between on-premise applications and public cloud storage incurs increased latency, which makes direct connectivity to file and block storage impractical due to the effect on application performance. We will look at how this can be solved later.
By comparison, object storage is less affected by latency and more dependent on bandwidth, making access across a WAN practical.
Public cloud object stores have become the perfect storage target for backups and archive, with many suppliers building in “cloud connectors” to existing offerings. Object stores offer practically unlimited capacity scalability, with high durability and reasonable access times.
Read more about hybrid cloud storage
The management of data remains with the backup and archive system, which is responsible for expiring data when a backup is no longer needed, for example.
Data tiering can be implemented by the backup offering, or on the cloud platform. Amazon Web Services S3 object platform, for example, allows the automated tiering of ageing data from the standard to infrequent access service levels and eventually to long-term retention on Glacier. Each step reduces the cost of storing the data at the expense of longer access times.
Object stores don’t have to be used just for backup and archive. They can be used for primary data too, including any unstructured data such as files, images and media. This means applications such as customer-facing websites could be hosted on-premise, with media served up directly from the cloud. This saves the IT organisation on storage hardware, but may incur costs for access (unless data is accessed through a front-end cloud delivery network).
One point to be aware of with object stores is the way in which stored data is charged. Cloud suppliers typically charge for storage on a terabytes per month basis, as well as an additional charge for network access. Putting data into the object store is generally free of a networking charge, but accessing it from outside the cloud provider will attract a charge.
In addition, suppliers will charge for every TB of logical data stored, even if that data is highly compressed or deduplicated. One option is to use a software product like StorReduce, which deduplicates data before storing it in an object store, significantly reducing the amount of data stored, especially with highly redundant data like backups.
Cloud gateways and appliances
Making use of cloud storage means getting application data onto the platform in one form or another. There are a number of products offered that include cloud storage gateways and appliances that can natively consume cloud storage.
Nasuni offers a hardware and virtual network attached storage (NAS) appliance that uses public cloud storage. The appliance caches active data locally and offloads inactive data to the public cloud.
Panzura also offers a global “cloud NAS” platform that uses public cloud storage. Active data is cached locally on each appliance, with global file locking to ensure data integrity over distance.
Microsoft acquired StorSimple in 2012, and uses that technology as a gateway to migrate data onto the Azure public cloud. Data is exposed locally as iSCSI volumes and stored on Azure Blob storage.
AWS offers its own storage gateway. The gateway software allows data to be stored on S3 from either file, block or tape interfaces.
Dell EMC acquired TwinStrata in 2014 and has used the technology to develop CloudArray, an appliance-based product that allows block and file data to be stored on the public cloud. CloudArray integrates into VMAX, VPLEX and VxRAIL products from Dell EMC.
Running applications in the cloud
All the discussion so far has focused on using public cloud as back-end storage for on-premise applications. But what if you want to run compute in the public cloud and access on-premise storage?
Moving data to or from the public cloud is not a trivial task, and can lead to application downtime to ensure data integrity is maintained. There are products on the market that can overcome these problems, and extend on-premise data and applications into the public cloud.
Avere Systems has a range of NAS caching services that include vFXT, a virtual filer that runs on AWS, Google Compute Platform or Microsoft Azure. Indeed, vFXT can be used to cache on-premise data into the cloud, allowing cloud-based application instances to access on-premise data.
Velostrata offers a product that extends data access for on-premise data to a caching appliance that runs in the public cloud. Virtual machines running on-site can be migrated to run in the public cloud, for disaster recovery or capacity planning purposes.
VMware recently introduced Cross Cloud Architecture, a new platform that allows applications running on-premise to be migrated to the public cloud. This is achieved by extending the network virtualisation layer between multiple locations. Although this isn’t directly a hybrid cloud storage product, it does provide the ability to move applications and data between physical locations.
Cloud storage appliances
One final area we should mention is the use of virtual storage appliances that run in the public cloud.
Here, array suppliers have taken their existing hardware or software offerings and packaged them for deployment with public providers like AWS. The benefit of these services is that they provide a standard interface to move data between on-premise and public clouds through the use of existing replication technologies.
These products could, for example, be used to seed test on development environments running in the public cloud, based on copies of data kept on-site.
NetApp, for example, currently offers Data ONTAP for AWS and Azure. The company is also developing the Data Fabric, a strategy and suite of tools to allow data to be accessed from multiple locations and platforms.
There is now a wide range of options for accessing public cloud storage. At present, most of these are supplier-specific, based around storing data in proprietary formats.
We’ve yet to see a universal data standard that could allow data to coexist between on-premise and public products.
As a result, hybrid cloud storage may remain a more tactical than strategic decision for most IT organisations. No doubt products will evolve to bring a more consistent view of storage, regardless of the physical location.