Of all the cloud storage services, archiving is the one most suited to widespread adoption. The cloud is an ideal location for little-used data that must be retained, and questions of latency and rapid access are not an issue as they are with primary data. But what are the issues when considering in-house vs cloud-based archiving?
In this interview, SearchStorage.co.UK bureau chief Antony Adshead speaks with Chris Evans, an independent consultant with Langton Blue, about the viability of cloud archiving and the things your cloud archiving service will provide and the things it might not but that you need to have covered.
Download for later:
- Internet Explorer: Right Click > Save Target As
- Firefox: Right Click > Save Link As
SearchStorage.co.UK: How viable an option is cloud archiving vs an in-house solution?
Evans: We should start by setting out a definition of what we mean by archiving. Clearly, in large enterprises today we have lots and lots of data; some of that data is in use but quite a lot is not in use, probably infrequently accessed but still needs to be retained. So, that’s the sort of stuff we could classify able to be put into an archive.
It means we want to retain it for a long time potentially; sometimes people want to retain data forever, but it typically has a low access profile. You might need to access it occasionally, and we’ll discuss why in a moment, but typically it’s low access retained for a long time.
Now, people could store that on-site, and obviously there’s a cost associated with that, but we already have cloud storage available to us, and cloud storage is a great fit for these requirements. We already know that cloud can provide scalable storage and therefore as data is moved to the cloud we don’t have to think about how much infrastructure we’re keeping on-site to manage that data. And because this data doesn’t need high access levels, the latency … associated with cloud storage [is not an issue].
So, cloud can be a great fit for archiving and we’re already seeing vendors coming out with solutions for that. We’re seeing two main areas. First of all, we’re seeing file archiving and that’s quite an obvious one. You maybe have NAS data you don’t want to access frequently but you still want to store it, and that’s a great opportunity to push it into the cloud.
We’re also seeing email archiving, and again, [that] seems like a great solution too. Only a very small amount of our emails are ever actually read, but clearly archiving them into the cloud and having access to them is a great solution. It takes a load off local services, it allows us not to have to scale expensive solutions like Exchange internally … and ultimately that results in cost reductions.
SearchStorage.co.UK: What are the key things you should expect of a cloud archiving provider?
Evans: Let’s just talk about some of the standard features we would expect of cloud provider in general. Clearly we expect to have things like encryption [at rest and in flight]. If we’re retaining this data for a long time, clearly encryption is more important because it’ll be sitting around for a long time and we want to be sure it’s secure.
We want to make sure we have resilience in wherever the data is stored and we have to remember this is an archive not a backup copy of the data, so this could be our primary copy, and therefore we need to be sure the provider is giving us high resilience. That might mean replicating copies geographically to enable that to occur.
Clearly, we still want those cloud features of multi-tenancy and secure access. However, what we are talking about here is archive, and archive data tends to be accessed in a different profile. So if you imagine, for example, I want to be able to do something like e-discovery, so I need good metadata. I need to be able to ensure that as I index that data into the cloud, that metadata is available to me and I can search it nice and easily. I don’t want to trawl through the actual data so I need to be able to achieve that.
As this is an archive we want to make sure we’ve got retention policies. That applies in two areas -- for data, so we can decide how long we want to retain it and when we delete it. And we want to make sure we have protection from overwriting that data because we may have compliance rules that say we may have to retain that data for a certain length of time before we do anything with it.
Finally, we need to think about access. If we’re going to use something like email as an example, we want to make sure our end users still have access to archived emails so we need integration with the platforms they’re using. And that might mean integration with their desktop and possibly mobile devices so they have access to that archive data even if it’s not sitting on their primary email system.
There are two more considerations. First of all there is portability. If we want to retain data for a long time with any provider, file or object or just general NAS data, we still need portability. You have to start thinking [about] what happens if that vendor goes out of business; how would I get my data back out of that archive and move it around. That’s something the vendor might not directly provide but that you need to question the vendor about.
Secondly, there’s the issue of data refresh. If you write data to an archive like this, you still have to think about how you refresh it in future. You might want to change the format because it’s gone out of date. You might have new systems that need upgrades and therefore change the format of data, and you need to consider that; how would you refresh that data in the archive without … changing it?
Those are things that might not be presented by a cloud archiving provider, but they’re worth having a discussion about.