Cloud archiving services are an area of cloud computing that enable organisations to store data long term in service provider data centres, accessed over the Internet.
In this article we discuss what cloud archiving means, what cloud archive services exist and what to look out for in a cloud archive provider.
Data archiving is the movement of data to a repository for long-term storage. The data is still required by the organisation, but it doesn’t form part of the active “working set” and so can be moved to a cheaper medium to reduce storage costs. Many organisations rely on backup as a crude form of archiving, but this is not a true archiving solution. Archives require the following characteristics:
- Search. Data stored in an archive must be easily searchable. This means storing additional metadata relating to archive content to achieve this. Searching the content itself is both time-consuming and impractical compared with using full-content indexing. In situations where e-discovery (searching of electronic records, often for legal reasons) is a regular activity, for instance with email, efficient search is essential.
- Retention management. For compliance and other regulatory reasons, data in archives needs to be retained for long periods based on retention policies. This may mean ensuring data is not deleted before the retention period ends but also may require that data is actively deleted once a certain time period has elapsed.
- Auditing. For compliance reasons, organisations may be required to be able to demonstrate audit trails for archived data. This can also form part of e-discovery requirements.
- Integration. Data moved to an archive needs to be accessible. Archive solutions should provide the ability for applications or end users to access data. Using the example of email, end users should still have the ability to access email from both their desktop and any mobile devices they use.
Cloud archive services are a great fit for retention of archived data for a number of reasons:
- Cost. Cloud archive services are based on a per-use model, where charges are directly related to consumption. As data is moved to an archive, the costs can be tightly controlled and predicted.
- Scale. Cloud-based archiving services provide effectively “unlimited” capacity, removing the need to do planning and design around archive growth. The service provider manages the process of ensuring sufficient capacity is available.
- Latency. Cloud storage has a higher latency than locally connected storage as data is stored and retrieved across the Internet. This means response times can’t be guaranteed and are subject to fluctuation. As cloud archive data is typically accessed infrequently, latency is not a major issue.
Of course, there are some drawbacks to using a cloud archive service, revolving around long-term data access. Firstly, the cloud archive provider might store data in its own proprietary format. This may make it difficult or impossible to move to another provider without a lot of work that may negate the benefits of using the service in the first place.
Lack of portability can present issues if a provider goes out of business. Understanding the contractual ramifications of such events would need to be explored, as happened when Iron Mountain’s cloud backup and archive service was acquired by Autonomy in 2011. At the time, it wasn’t clear whether Autonomy would continue with the same terms of service offered by Iron Mountain. In other instances, customers have had issues recovering data from the administrator after companies have gone into liquidation.
There’s also the issue of data format concurrency. Over time, data formats change as new product versions arise or with completely new technology generations. Ensuring archive data continues to be accessible as formats change may turn out to be expensive if the only solution is to retrieve and restore all of the content.
Cloud archive services on offer
Now let's talk about the cloud archive offerings in the market today and how they are implemented. Some services enable data to be archived to the cloud service without any additional hardware or software. These typically use Web-based email servers and HTTPS/SSL (Web-based) uploads to achieve this. In some instances, either physical or virtual appliances are deployed. The specific resource requirements for these are dependent on the volume and rate of data being uploaded.
Arkivum enables customers to archive their file data by providing a virtual or physical appliance on-site. Data is securely transmitted to and replicated between three of Arkivum’s data centres. Files are stored on tape using an open source format and the Linear Tape File System (LTFS), enabling data portability beyond the archive. In addition, tape media is regularly refreshed and data integrity validated. Arkivum also highlights the "green" credentials of its service, claiming tape is 238 times more power-efficient than disk. But, there will be a trade-off in response time as retrieval from tape is slower than disk.
Sonian offers file and email cloud archive services. Its file archiving solution provides customers with a Web-based portal that enables searching of archive content based on metadata created at ingest time. The portal allows search results to be viewed and downloaded to the user’s computer. The file and email archiving solutions require no additional customer hardware or software on-site (file uploads are achieved using Web-based HTTPS/SSL). For large volumes of data ingest, Sonian can accept data shipped on physical media.
EMC’s Cloud Tiering Appliance enables file data to be archived to EMC’s Atmos storage platform, typically through a third-party cloud storage vendor. The solution is supplied as either dedicated hardware or a virtual machine that can be installed as a VMware virtual appliance. EMC’s solution has the benefit of allowing data to be archived locally and remotely in a hybrid solution. In the UK, Redstor offer the Cloud Tiering Appliance using cloud storage hosted from Redstor data centres.
Autonomy, a UK company now owned by Hewlett-Packard, offers a number of archiving solutions, including technology acquired from Iron Mountain in 2011. Autonomy’s Consolidated Archive provides the ability to archive email content either on-site, as a hybrid solution or in the cloud. Autonomy’s solutions are highly scalable, supporting an ingest rate of 3 million files per hour. The company’s solutions are used widely across financial and legal organisations.
InTechnology offers cloud archiving solutions that cover both file and email content, focusing on the Microsoft Exchange and Windows file server solutions. Data is stored across two geographically diverse locations, with additional indexing information created for high-speed search. The InTechnology solution is also capable of detecting and archiving local PST files.
Chris Evans is an independent consultant with Langton Blue, a London-based consultancy.