Hybrid cloud storage

Hybrid cloud storage products provide the best of both worlds -- local storage that's tightly integrated with off-site cloud storage services.

Hybrid cloud storage products provide the best of both worlds -- local storage that's tightly integrated with off-site cloud storage services.

[This story was updated February 2013] Following the widespread adoption of server virtualization, cloud computing is the next evolutionary step toward utility computing where computing resources are consumed like electricity and paid for based on usage. Cloud storage got off to a running start with Amazon's Simple Storage Service (S3), which was quickly followed by other offerings. However, security concerns and slow performance have often overshadowed the benefits of cloud storage and hampered its adoption in the enterprise. Early adopters included startups, development teams and consumer-facing data services, but cloud storage struggles to become a viable complement to data center storage.

Conservative by nature, corporate IT has viewed public cloud storage as risky. But that's changing -- not because of a change in the perception of public cloud storage, but because of the emergence of internal cloud storage offerings as well as solutions that safely allow extending on-premise data storage with external cloud storage services. From a deluge of cloud computing-related offerings and heightened enterprise customer interest, to analyst predictions and extensive press coverage, all indications are that cloud computing has reached an inflection point and we'll soon see accelerated adoption of cloud storage in the enterprise.

Cloud storage defined

When a technology gets as hot as cloud computing is right now, there's a temptation by vendors to simply take existing products and rebrand them as "cloud." But, generally, storage-area network (SAN) storage and network-attached storage (NAS) can't be considered cloud storage simply because they offer shared storage. "SANs really don't meet the cloud storage paradigm of dynamic, flexible and elastic storage that's allocated when and where needed; from zoning, provisioning to worldwide names, they're pretty static in nature," said Terri McClure, a senior analyst at Enterprise Strategy Group (ESG) in Milford, Mass. This is especially true for traditional, vertically scaled SAN and NAS offerings. Scale-out block-based storage systems like Hewlett-Packard (HP) Co.'s 3PAR StoreServ array with its self-tuning and load-balancing capabilities, are able to dynamically spread loads across the SAN; scale-out NAS products are further along, but even those aren't appropriate for large public storage clouds.

For an offering to be considered cloud storage, it needs to be:

  • Network accessible
  • Shared
  • Service based and paid for by usage
  • Elastic, so it can dynamically shrink and grow as needed
  • Able to scale up and down on demand

The primary use of cloud storage today is for unstructured data, which is the fastest growing and most voluminous content, causing the most administrative pains. Cloud storage is less suitable for structured data, which continues to live on traditional enterprise storage.

The benefits of cloud storage

The benefits of using cloud storage for unstructured data are compelling, starting with lower overall storage costs. Being service based, there's no storage hardware to buy, manage and maintain, and depending on the service, it can greatly reduce, if not eliminate, data center and storage administrator costs. Cloud storage eliminates expensive technology refreshes that usually kick in three years to five years after the initial purchase, needed to either get state-of-the-art technology or simply to get around purchasing expensive support contracts for older arrays.

Cloud storage can provide close to 100% storage utilization by eliminating the massive amounts of unused storage that are needed with traditional data storage for anticipated growth and peak loads. Besides the overall cost savings, scalability of cloud storage and its ability to transparently support base and peak loads are its most appealing characteristics.

PUBLIC vs. PRIVATE vs. HYBRID CLOUD STORAGE
PUBLIC vs. PRIVATE vs. HYBRID CLOUD STORAGE
Enlarge PUBLIC vs. PRIVATE vs. HYBRID CLOUD STORAGE diagram.

Public storage clouds

Public cloud storage services are offered by a fast growing list of service providers: AT&T, Amazon, Iron Mountain Inc., Microsoft Corp., Nirvanix Inc., Rackspace Hosting Inc. and many others. Their storage infrastructure usually consists of low-cost storage nodes with directly attached commodity drives with an object-based storage stack that manages the distribution of content across nodes. Data in the cloud is typically accessed via Internet protocols, mostly Representational State Transfer (REST) and to a lesser degree Simple Object Access Protocol (SOAP). Resilience and redundancy is achieved by storing each object on at least two nodes. Usage is charged on a dollar-per-gigabyte-per-month basis and, depending on the service provider, there may be additional fees for the amount of data transferred and access charges.

Public storage clouds are designed for massive multi-tenancy that enables isolation of data, access and security for each client. The type of content stored on public clouds ranges from static non-core application data and archived content that needs to be available, to backup and disaster recovery data. Public cloud storage isn't suited for active content that changes all the time. The primary concern of using public cloud storage in the enterprise is security and, to some extent, performance.

Internal storage clouds

Internal cloud storage runs on dedicated infrastructure in the data center and, as a result, addresses the two main concerns of security and performance, but otherwise offers the same benefits of public cloud storage. Internal storage clouds are usually for a single tenant, even though larger enterprises may use multi-tenancy features to segregate access by departments or office locations. Unlike their public cloud storage counterparts, scalability requirements are more modest, so internal cloud storage offerings are more likely to have traditional storage hardware under the hood. A case in point is HP's CloudStart, which combines HP servers, storage and orchestration software into an internal cloud storage infrastructure. HP CloudStart by itself isn't a private storage cloud offering because it lacks the key element of being service based; instead, it's the enabling infrastructure that could be used by HP, one of its partners or even enterprises to offer it as a fully managed, pay-as-you-go cloud storage offering.

An example of a private cloud storage offering is the Hitachi Data Systems Cloud Service for Private File Tiering. Based on the Hitachi Content Platform (HCP), it resides in the customer's data center but is owned and managed by Hitachi. Besides an initial setup fee, the customer pays for it by usage. Similarly, the Nirvanix Inc. Hybrid Node (hNode) provides a fully managed, pay-as-you-go, internal cloud offering within the data center, based on the same technology that powers the Nirvanix Cloud Storage Network.

The hybrid cloud storage model

While internal cloud storage addresses the concerns associated with public cloud storage, it's certainly not the Holy Grail for unstructured data. To start with, these systems aren't designed to leverage existing internal storage infrastructure. The fact they're on-premise means they require data center real estate, electricity, rack space and cooling. Since internal cloud storage runs on dedicated hardware, it won't be able to scale to the degree public storage clouds can. Most unstructured data is static and little used, so it doesn't have to reside on-premise.

This is where hybrid cloud storage comes into play, when traditional storage systems or internal cloud storage are supplemented with public cloud storage. To make it work, however, certain key requirements must be met. First and foremost, the hybrid storage cloud must behave like homogeneous storage. Except for maybe a small delay when accessing data on the public cloud, it should otherwise be transparent. Mechanisms have to be in place that keep active and frequently accessed data on-premise and push inactive data into the cloud. Hybrid clouds usually depend on nimble policy engines to define the circumstances when data gets moved into or pulled back from the cloud.

Today, there are three routes to implementing a hybrid storage cloud:

  • Via cloud storage software that straddles on-premise and public cloud storage
  • Via cloud storage gateways
  • Through application integration

Software for hybrid storage clouds

Combining internal and public cloud storage into a single heterogeneous storage cloud without custom integration or gateways is only possible today if the internal and external storage clouds run the same cloud storage software. While there are standardization initiatives in progress, such as the Storage Networking Industry Association (SNIA) Cloud Data Management Interface (CDMI), a lack of standards has prohibited out-of-the-box integration between heterogeneous storage clouds. So what we're seeing is cloud software vendors selling their offerings to corporations and service providers to create the prerequisite for hybrid clouds. And some cloud storage providers are offering their storage stacks as internal storage clouds that provide easy integration with their public storage cloud services.

An example of the latter is Nirvanix. Nirvanix began as a public cloud service, but now allows users to run its cloud storage internally in a private cloud.

Rackspace has been offering its Cloud Files as a public cloud storage service, but it has now open-sourced Cloud Files and formed OpenStack.org to drive standardization. The intent is to enable hybrid clouds between service providers and corporate customers, as well as Rackspace's public cloud storage service.

Until recently, cloud storage service providers had to either use one of the open source cloud storage products, such as Luster and MogileFS, with their idiosyncrasies and limitations, or develop their own offerings. In the past couple of years, however, cloud storage software has become available as a commercial product from several vendors who sell it to both enterprises and service providers.

Among the commercially available products, EMC Corp.'s Atmos is the most prominent. It's a software-based, hardware-agnostic, object-based storage stack that consists of three loosely coupled services: a presentation layer that handles interfacing to clients via REST, SOAP and traditional file-system protocols; a metadata management layer that manages where data objects are stored and how they're protected and distributed on storage nodes; and a storage target layer that interfaces with storage nodes. It can run on dedicated hardware or on VMware virtual machines. Architected as a scale-out system, it's able to scale to petabytes of storage by simply adding nodes. EMC sells Atmos to enterprises and providers, so on-premise Atmos deployments can federate with Atmos services in the cloud.

EMC's most prominent customer is AT&T. The AT&T Synaptic Storage virtual private cloud, however, is a hybrid storage cloud offering that's quite different from others. It runs in AT&T data centers, but is accessed by customers through AT&T's MPLS network. As a result, it combines security and performance of private clouds with the economics and scalability of public cloud offerings.

Besides EMC Atmos, there are several other cloud storage software products. Caringo Inc. brought CAStor Content Storage Software into this market by repositioning its content addressable storage (CAS) product as a cloud storage solution. Cleversafe Inc. offers a cloud storage platform that leverages information dispersal algorithms (IDAs) that slice data across nodes in the cloud, eliminating the need for replication; Cleversafe claims it has achieved substantially higher storage utilization than products that have to store multiple copies of data on storage nodes for redundancy.

Hybrid cloud storage gateways

Cloud storage gateways sit between on-premise storage and public cloud storage. They translate between traditional storage protocols and the more esoteric cloud storage protocols and APIs. Historically, public cloud storage could only be accessed via custom integration. Furthermore, cloud gateways perform data migration of data from on-premise storage into public cloud storage and vice versa, usually via policy engines.

Cloud storage gateways differ in several key areas. They're either block or file based; and they present themselves within the data center as block-based storage or NAS devices. Data deduplication and compression are critical cloud gateway features, as both features significantly impact cloud storage cost. Encryption of data in-transit and while stored in the storage cloud is a must. Some gateways are designed and optimized for backup and archival, some are closely integrated with applications like Microsoft Exchange and SharePoint, and others are targeted as a transactional cloud storage tier to supplement internal storage tiers.

PRODUCT SAMPLER: CLOUD STORAGE GATEWAYS
PRODUCT SAMPLER: CLOUD STORAGE GATEWAYS
Enlarge PRODUCT SAMPLER: CLOUD STORAGE GATEWAYS diagram.

Application integration for hybrid clouds

All public cloud storage services offer APIs to interact with internal cloud storage software and cloud gateways, but these APIs can also be used to directly integrate applications with public cloud storage. Cloud storage APIs enable custom in-house and commercial applications to tap into public cloud storage via REST interfaces.

For instance, backup application vendors have started to add public cloud storage support to their backup suites. Symantec Corp. offers cloud storage support for NetBackup and Backup Exec. Similarly CommVault's Simpana backup software integrates with public storage clouds.

Enterprise-friendly storage clouds

Enterprises have stayed away from cloud storage for the most part, but the emergence of internal cloud storage and secure integration options of on-premise storage with public cloud storage (hybrids) have lowered the bar for enterprises to safely extend existing enterprise storage with cloud storage.

Despite the recent hype, which is mostly consumer-driven, mobile adoption and public cloud services like those from Google, Dropbox and many others go hand in hand. Gartner Inc. doesn't expect full-scale adoption of cloud storage by major enterprises for another five years. In the meantime, enterprises are likely to add hybrid cloud storage strategically to complement their existing storage infrastructure.

BIO: Jacob Gsoedl is a freelance writer and a corporate director for business systems. He can be reached at [email protected].

Read more on Cloud storage