Archiving information for future use is an essential business practice, whether for regulatory or commercial purposes, and a cloud archive is one potential solution that can meet those needs. However, there are a number of issues to be considered when using cloud services to archive data.
Cloud archive benefits
Archiving refers to the process of moving inactive data from primary storage systems to systems and media more suited to long-term retention. Typically, archived data is relatively inactive, which means it can benefit from placement on a cheaper tier of storage.
Removing inactive data from primary storage not only reduces storage costs but benefits application performance. It also facilitates a reduction in the backup window. Applications usually run faster with a smaller working set of data (as is the case with traditional databases such as Oracle). And a smaller database, for example, translates directly into a smaller backup and faster restores.
The rise of big data and the promise of future gains that can be achieved by storing and analysing large volumes of historical data have put pressure on to IT teams to retain as much data as possible.
More on cloud archive
There is potential commercial value to be gained from exploiting the information assets held by many organisations. Historical or multiple data sets can be combined to create future value and with storage perceived as being cheaper than ever, there’s a focus on retaining as much data as possible for future use.
For some organisations, including the financial, legal and medical industries, regulatory compliance means data has to be retained for certain periods and be available for future use. This may be anything from a few years to many decades in the case of medical records.
There is much talk also about the so-called internet of things, a term used to describe the data produced by millions of devices in our homes, businesses and daily lives. Many organisations will increasingly use this data and exploit the information derived from it as either a selling opportunity or for product support. Much, if not the majority, of this data will need to be archived for most of its lifetime.
Many organisations already operate in-house or on-premises archives of their own. In some instances these archives are ad-hoc, using backup systems to retrieve historical data. This isn’t a true archiving solution, as archives should meet a set of requirements that are different from the features backup systems offer. Most importantly, backup solutions aren’t guaranteed to see all data updates (for example an email could be received and deleted before a backup is taken) and rarely integrate fully with the application.
In-house archives are difficult to maintain for a number of reasons, which are typically operations focused. They include:
- Media refresh – Storage media needs periodic refresh and replacement. Storage arrays must be replaced and tape media has to be recycled and replaced. This process takes time and effort and cost to achieve.
- Software refresh – Archive software also needs upgrading and updating over time. Due to the longevity of some archive data, a software refresh could be a substantial task, especially if a change of supplier is needed.
- Cost – Large-scale archives incur cost in terms of infrastructure and facilities. Most archives continue to grow and so the cost profile is almost always upwards.
Outsourcing the storage of archive data brings some obvious advantages. The most obvious is one of cost. When looked at in terms of total cost of ownership, cloud archiving solutions can save money and make costs more predictable, as they are based on metrics such as quantity of data stored and transferred.
However, the bigger savings are in removing complexity from the customer. The supplier takes over the task of managing hardware maintenance, management, refresh, capacity growth and upgrades.
Cloud archive providers: Asking the right questions
Picking a cloud provider means asking some important questions. These include:
Data access – How exactly is data ingested into the archive and subsequently accessed? Ingestion of data raises an important feature all archives require and that is metadata. Metadata is information describing the content stored and is used for searching (in the case of e-discovery) and data management (for example moving classes of data between tiers).
Pricing – What is the pricing model for archive data? Most pricing is based on the volume of data stored over time, for example GB/month or TB/month. There are also usually additional charges for the transfer of data over the network. Most providers don’t charge to get data into the archive but do charge to take data out. There may be some free minimum allowances for data restore, after which the customer is charged a per GB/TB amount for any subsequent access within a given time period. Data replicas that are created for redundancy are usually charged at a multiple of the volume of data stored.
Resiliency – How many copies of data are kept and what protection methods are used? Archives need to have very high resiliency specifications, as they will retain data for many years, usually many more than primary storage. Loss of archived data can be a significant problem as it is generally the only copy of data left and there are many reported cases of regulatory bodies imposing large fines where data couldn’t be produced on request.
Geographic diversity of locations is also an important service feature for increasing system resilience and although it will incur additional cost it is useful for ensuring against datacentre or localised hardware failures.
Security and compliance – What security standards and practices does the cloud archiving supplier adhere to? Many existing standards such as ISO27001 apply to all cloud service providers as well as those offering archive services. However, there may be industry specific standards such as PCI-DSS that suppliers must also comply with. Customers should also question suppliers on basics such as data encryption in flight and at rest, physical security of datacentres and processes involving the secure destruction of data (eg, failed hard disk drives).
Application support – Does the cloud archiving provider natively support any applications? Email is a good example of an application and data type that is widely supported by archiving vendors. Email is a well-understood data format and so is easy to encapsulate and manage. Other archives may provide direct application support or only work with file-based data using CIFS and NFS protocols.
Migration between providers – What facilities exist for moving away from the provider?Moving to another provider can be one of the most difficult issues with cloud archiving services. Each provider stores data in their own format, making access to data limited to their advertised interfaces. Migration to another vendor can therefore be time consuming and expensive.
The time taken to achieve migration can be a big concern in the light of a number of high profile failures in the cloud storage market in recent years. Upon the closure of Nirvanix, for example, customers were initially given just weeks to retrieve their data, which in some cases was actually impossible due to the volume of data stored with the provider.
Some cloud providers offer the ability to obtain copies of data on removable media when a customer wants to move to another provider, although it has to be said this is not a widely-adopted feature. Where this is offered, the vendor should provide details of the format used to export the data. For tape, this could be the LTFS format, which is readable cross-platform.
Cloud archive supplier summary
Looking across the market place at the vendors offering cloud archive services, we see a range of solutions.
Amazon Web Services and HP Cloud, for example, both offer archiving solutions based on their existing object storage platforms. These require gateway hardware and/or software and so these suppliers partner with other companies such as CommVault, RiverBed and Panzura to deliver the gateway functionality while they deliver the raw storage capacity.
Cloud gateways are offered by a number of cloud archive providers as a way to archive data into the cloud, including products from Avere Systems, StorSimple (now Microsoft) and Nasuni. Although these are not traditional archive products, they do allow data to be moved out of primary systems while maintaining the ability to search metadata in the file system.
There are also application-specific archive vendors, including Sonian and Rackspace. They offer solutions tailored to application data, in this case email archiving. Obviously these solutions require some setup work to integrate with the customer’s email software. Rackspace uses its existing infrastructure-as-a-service platform as the repository for data storage.
Arkivum is an example of a dedicated cloud archive company. It offers archive-as-a-service using an onsite appliance that replicates data to two datacentres. The service also offers data escrow – the ability to extract all of the data archived in their solution to a tape for migration to another provider or simply as an additional backup of the data.