Tape has been the medium of choice for backup and archive in enterprise organisations for more than half a century. This isn’t surprising, as magnetic tape has been at the forefront of data storage since IBM released the first tape drive 60 years ago in 1953.
But, now the cloud has begun to offer more compelling solutions and the opportunity to use cloud archiving services and to dispense with traditional in-house technology.
Are we near the tipping point where tape could be supplanted by cloud archiving in the enterprise?
Backup provides recovery from unexpected data loss (user error, data corruption, hardware failure) whereas an archive is a long-term store of inactive or low-activity data retained for future use, whether for compliance purposes or historical data mining.
A simple way to view the difference is that backups are copies and archives are moves.
Archives must provide additional features, such as security controls and locking of data for compliance; search functionality, to query the archive; and options to expire and remove data, based on policy or time controls.
Tape media is well suited to archive, as it provides the following benefits:
- Low cost – tape media is relatively cheap. It is certainly less expensive than disk and has low cost of storage. Unlike disk, tapes don’t have power and cooling requirements and can be kept in an office environment. As more data is written to an archive and more media is used, the overall cost per GB drops, as the cost of tape drives, libraries and archive software is spread over a larger volume of data;
- Longevity – as long as tapes are kept in stable conditions (constant temperature and humidity), the media last up to 30 years;
- Scalability – as data volumes have increased in most organisations, tape (especially LTO) has continued to scale to meet those demands. Unlike disk, tape scales in both capacity and throughput/performance for sequential read and write I/O;
- Portability – tape media is easily portable, making it simple to ship between locations for physical security or when archives and data are being relocated, without the need to re-write data, as would be required for disk-based systems;
- Security – features such as Worm (write-once read many) and encryption mean data on tape can be securely retained and so mitigate against the data loss risks that can be an issue with portable media.
Cloud archiving offerings provide an additional layer above the pure technology of the storage medium. They abstract the underlying storage platform by delivering archive as a service, with the implementation of that service the responsibility of a service provider.
Customers should expect cloud archiving to be competitive on cost compared to using tape and in-house processes and to provide value-add functionality, such as:
- Geographical redundancy – cloud archive services should provide for multiple copies of data in diverse locations, mitigating the risk of data loss from technology failures, (it should be remembered that an archive copy may be the only retained copy of a piece of data);
- Advanced search – data is typically indexed at ingestion into cloud archive services, providing rich search capabilities. Data should be re-indexable if required;
- Media/content management – although tapes have a 30-year lifespan, the drives themselves are usually replaced on a much shorter cycle, so content must be refreshed regularly onto new media of the latest technology. Cloud services handle this process as part of service delivery, removing this sometimes burdensome task from the customer.
In-house tape and cloud archive have benefits and disadvantages. We’ve already touched on some of them, for example media refresh. This task is a major headache for large organisations with significant existing tape infrastructure, as media needs replacement on a regular basis. Other points to consider with in-house tape archiving include:
- Technology refresh cost – it is not simply the cost of media that must be considered, but also that of tape drives, Tape libraries and archive software. Where technical decisions change the hardware platform, data may have to be re-indexed or hardware may have to be maintained beyond end-of-life supplier support, which can significantly add to costs and the overall total cost of ownership (TCO) of an archive solution;
- Application dependency – tape might be integrated with proprietary solutions such as direct writing of archive data from an application that don’t scale or create future dependencies for data access.
Points to consider for cloud archiving include:
- Compliance - providers need to meet customer needs in terms of data location (some local laws don’t permit data to be moved out of the country), security controls, encryption (both at rest and in-flight) and data immutability (for example, data deduplication may mean data has changed in compliance terms);
- Standard API access – although this can be seen as a benefit, where the provider offers a standard interface for data, API access may require additional work by the customer during implementation to integrate with existing data streams;
- Portability – consider what happens if an archive provider goes out of business or a new provider is chosen. How easy would it be to move content back into the datacentre or to another provider?
- Ongoing re-indexing – archive providers should offer a data re-indexing service to exploit the value of a retained archive. This should include the ability to do full content scans. Be aware there may be performance and/or capacity restrictions on performing this task with some providers.
Amazon Web Services – AWS is probably the most well known cloud computing provider. It has recently entered the market with Glacier, its long-term archive offering. Costs start at $0.01/GB per month with additional charges for bandwidth and data retrieval for more than 5% of the archive. Data can be ingested using portable media where data quantities prohibit the use of network transmission.
Arkivum is a UK-based company that provides a virtual or physical appliance as a gateway to its archive service. Data is archived and retained on tape using the LTFS format, which provides for data portability, should a customer choose to move its data to another supplier.
Autonomy, now part of HP, offers a number of archiving solutions, including its Consolidated Archive product. Data ingest rate can be as high as 3 million files per hour. HP also offers access to its new cloud object store through partners including TwinStrata and Panzura, with a raw data storage cost of $0.09/GB per month. This cost can be reduced with data deduplication and compression offered by the partner solutions.
Symantec provides email archiving using its Enterprise Vault cloud offering. This is an extension of its existing Enterprise Vault technology but does not require any onsite hardware or software to implement. Symantec uses a scaleable grid architecture to provide advanced searching (including within content) that can return results in seconds.
It is likely that most cloud archive providers are already using tape to deliver their services today. But, what these services add is a framework in which data can be stored and retrieved with all the additional features of security and compliance.
So cloud archiving will not replace tape. Instead, for many organisations the shift will be to use cloud archiving services and take advantage of the service wrapper, even though their data continues to be stored on tape media.