The year has been a very cloudy one. And that is not a comment on the UK’s record wet weather in 2012, but about cloud computing, cloud data storage and its near ubiquity in IT marketing.
But now the hype is beginning to clear, we can assess the readiness of cloud data storage in its various guises and its limitations. So, let’s look at the various types of data – primary, nearline, backup and archive – and assess the ability of cloud data storage services to handle them.
Primary cloud storage
This is still cloud data storage’s weakest area, but that is not to say the cloud cannot handle primary data at all; it just has its limits. Cloud’s key technical constraints are latency and bandwidth, as a result of the fact that data is held remotely. Vendors have addressed this by devising hybrid cloud products that store hot data on local drives and this could include flash storage for fast response times.
iSCSI block access is available on some hybrid cloud products so it is possible in theory to use cloud data storage for transactional and database uses. But realistically, cloud for primary data is probably an SMB play. There is no fibre channel connectivity; local drive types and capacities are limited; and many products only offer file access, so it is best to hang onto that Symmetrix for now if you’re an enterprise user, unless you’re looking for something for branch office use.
Vendors include Nasuni, which offers global file access via a local appliance, the Nasuni Filer, accessible via CIFS, NFS and iSCSI, aimed at remote and branch offices. It includes Active Directory integration, snapshotting, compression and deduplication.
Nirvanix Hybrid Cloud Storage is a hybrid cloud data storage device that replicates to Nirvanix's seven global nodes with a single namespace. The storage appliances, remotely managed by Nirvanix, provide policy-based replication, encryption and customisation for different applications, users or workgroups.
Panzura's Cloud Storage Controller provides access to a global file system with global file-locking and deduplication. It contains a tiered storage system which caches recently used files using HDDs and SSDs and a total capacity of up to 36TB.
Nearline data cloud storage
Nearline data is a much better fit for cloud storage. The types of hybrid cloud data storage products described above are well-suited to nearline use as, by definition, you do not demand the fastest response times.
One key argument against using cloud for nearline storage, however, is cost. If you are likely to want to access large files you may face bandwidth constraints. If a wait of maybe a few minutes is not a problem, then fine. If it is, you may be compelled to store some data locally for fast access, in which case it is no longer really in the cloud and maybe you should store it all locally. The answer to that question will depend on the workload profile of your nearline data.
Ctera's Cloud Attached Storage appliance offers shared storage, folder synchronisation and remote access. The device supports NAS and iSCSI, and caches using Sata drives.
TwinStrata's CloudArray appliances provides an iSCSI storage gateway as either a virtual or a physical appliance, with access to a maximum of 50PB using Sata drives, with deduplication, compression and encryption.
As we travel away from the requirements of working data, the cloud becomes a more realistic option. Most backups are not accessed again, but the option to access recent copies is desirable if, for example, users accidentally delete data. For that reason, pure cloud backup – that is, with no local storage – is probably best suited to small businesses that can handle a wait to get data back from their cloud provider, should they need it.
Hybrid cloud backup provides some insurance, should a user require access to a file on a recent backup, as it incorporates local storage to which data can be staged before sending to the cloud.
Cloud backup is an option also via mainstream backup products – including backup appliances – that offer the cloud as a backup target or as a vaulting option, after a period of local retention.
Bandwidth will always be a restriction, however, so large or complex backup sets will be more efficient using a store-and-forward technique. This involves backing up at LAN speeds to local disk, which then synchronises with the cloud back-end, a technique that helps shorten the backup window and means requests to restore from a recent backup can be serviced locally.
- Commvault Simpana allows you to backup to cloud services – public or private – that are compliant with Rest APIs. Its policy engine includes: multi-tenancy controls for use with multiple providers (including Amazon, Microsoft and Nirvanix); customisable data management and protection levels; and encryption and deduplication.
- EMC Networker 8.0 provides centralised backup across a wide range of platforms including virtual environments. The latest version includes multi-tenancy features that allow data, devices and users to be logically zoned. Features include encryption and bandwidth throttling.
- HP Data Protector 7.0 includes deduplication and snapshotting, with backup to Autonomy's 14 datacentres, adding encryption and mirroring across multiple locations. It supports multiple hypervisors and software platforms, and restores can be managed using a browser.
- SecurStore Backup & Recovery is agentless, uses SecurStor's own datacentres and offers encryption, snapshotting, compression and deduplication. It is application-aware, and can backup and restore most major databases, as well as Microsoft Exchange, Lotus Notes and Groupwise, and virtual infrastructures deployed on VMware, Citrix, Microsoft, Parallels, and Virtual Iron. It also includes a client to backup laptops.
- Symantec NetBackup 7.5 uses an optional plug-in to provide backup to cloud data storage providers Amazon, Rackspace, AT&T and Nirvanix. In addition to NetBackup's standard features, the plug-in enables encryption, and bandwidth metering and throttling.
- Tivoli Storage Manager offers centralised, automated data protection by storing backup, archive, space management and bare-metal restore data to the cloud, retaining multiple copies and versions of every file. Managed using a browser, its features include incremental backup, deduplication and compression, and reports on where data is stored and how much it is costing. Optional features include continuous data protection, and application-aware backup and restore.
- Zmanda Cloud Backup backs up Windows servers, desktops and live applications such as Microsoft Exchange and SQL Server to either Amazon S3 or Google Cloud Storage, adding a management layer to both cloud providers. Features include multi-threaded uploads and downloads, incremental or differential backups, compression and bandwidth throttling.
Archiving is perhaps cloud storage’s best fitting play right now. Archive data, by its nature, is rarely accessed and sending data to the provider doesn’t need to be a speedy business. Archiving providers can offer the full range of archiving features, such as search and data integrity guarantees. Costs can compare well with the key the current archive medium, tape.
While high latency and bandwidth limitations are less critical for infrequently accessed data, data portability and the long-term survival of the service provider can be issues.
Vendors offering archive services include Archivum, which provides a local gateway appliance that encrypts all data before dispatching it to Archivum's datacentre. That data is not deleted from your premises until the provider has created three copies, one of which is locked away offline in a third-party escrow service. The company claims retrieval takes minutes.
Amazon Glacier is a tape library replacement service with retrieval times of three to five hours that costs as little as $0.01 per GB per month. Retrieval costs extra and you can only retrieve a limited amount of the total data volume in storage, although there is no charge to move data to Amazon’s EC2 service.
EMC Cloud Tiering Appliance allows you to move inactive data to a lower storage tier. Specifically, it allows you to use policies to save infrequently accessed data locally, archiving older data to the company's Atmos cloud service, while allowing it to be retrieved directly by users.
Quantum Q-Cloud offers similar pricing to Amazon Glacier, but uses a deduplicating appliance to send encrypted, compressed data to the company's datacentres. The use of a local appliance allows local retrieves, and the service also provides role-based access control.
Cloud data storage limitations
So, in summary, cloud storage’s key technical limitations are latency and bandwidth. These impinge most upon data for which rapid access is required, although this can be ameliorated by placing hot data on a local disk. But if you start to spend too much on that, you risk destroying the key advantages of the cloud, that someone else is buying the infrastructure and taking the strain of capacity. There is also the fact that, for primary data, protocols are limited to NAS plus iSCSI for block access.
But if the data volumes you are dealing with are not huge or particularly hurried in access requirements, cloud offers key advantages over existing methods, especially in backup and archive use cases. For primary data, it can bring benefits too, but within limits of performance and capacity.
This means cloud storage is largely an SME or branch office play for the moment. At best, for an enterprise there may be limited use cases for some types of primary data, or fuller opportunities in backup and especially archiving.
Somewhere down the line this will change and the cloud is likely to provide storage-as-a-service to all sizes of organisation. But that will depend on a step change in available bandwidth.