Backup alternatives like replication, CDP and snapshots can banish backup headaches

Backup alternatives such as replication, CDP and snapshots can banish the pain of the traditional backup window. Find out how they work in this podcast interview with Ian Lock of GlassHouse Technologies (UK).

Backup alternatives such as replication, continuous data protection (CDP), snapshots and cloud-based backup allow organisations to free themselves from the onerous daily grind of traditional backup. Often, a traditional backup regime brings major headaches, with backup windows that run into working hours and difficult data restoration processes.

In this interview, Bureau Chief Antony Adshead speaks with Ian Lock, service director for storage and backup with GlassHouse Technologies UK, about how backup alternatives can free users from the pain of traditional backup and ease file and volume restoration.

You can read the transcript below or listen to the podcast. What alternatives are there to a traditional backup regime?

Lock: We'll focus here on the broader topics of replication, CDP, the use of snapshots and cloud-based backup.

Replication means taking a complete copy of your data volumes to a second system, usually at a separate disaster recovery location. This means you have a complete second copy of your data you can use if the primary copy fails. This replicated copy can also be used for other things like off-site backup, DR testing or load testing.

The replication can happen in all sorts of different places, such as the storage array level, and that means the replication is configured and managed by the controllers of the storage arrays themselves, and that generally works at the LUN level.

Or it can work at the appliance level, where replication is configured and set up [in an appliance] between your host and storage system and virtualises your storage. That brings benefits in that the make and model of storage array at either end can be different so you're not confined to using the same model at both ends.

The last place it can happen is at the host server level, and that gives you greater flexibility in allowing replication between different types of storage at both ends. In that model the software or the operating system controls the replication between two different servers at two different sites, and the replication is over IP links.

Array-based replication gives good performance because it removes processing and networking load from the servers and is probably the simplest to set up because you set up replication in one place on your storage arrays.

Appliance-based replication probably gives greater flexibility because it allows for different makes and models of storage at each end.

Host-based replication probably gives you the best chance of integration with your application and is probably going to turn out to be the lowest-cost option. There are free tools available such as the Rsync utility available in Unix for nearly 15 years now. Rsync allows you to do replication of files and directories between two servers at two different sites and controls that at the individual server level. We've talked about replication, but what about some of the other backup alternatives?

Lock: Some of the other [backup] alternatives are CDP, snapshots and cloud-based backup.

CDP is a fundamentally different model for backup [that captures] each I/O from a host and storing it in a secondary repository. This usually means using a filter driver that sits on the host, which captures each write written to your primary storage and are replicated to a second area. So if each and every write is copied recovery can be very granular, even down to an individual write level. That's clearly a much better solution than the traditional once-a-day backup model, where your best recovery point [is] last night's backup.

The only thing to note as a drawback is that CDP can consume large volumes of secondary storage capacity because it catches each and every write, and if performance is not to be compromised the secondary storage system must be as performant as the primary.

Next is snapshots, which uses the functionality of the storage system or the appliance or in some cases the host operating system itself to take regular point-in-time copies of your data, which can be used to roll back to in the event of data corruption or deletion.

When we talk about snapshots we generally mean capacity-free copies of data volumes, which are collections of pointers between new data that's been written and the original storage volume. So they are only capacity-free if no data changes on the original volume.

In reality the amount of storage capacity consumed depends on the rate of data change on the original volume. So in the worst case, for example, where every single block of data on the original volume changes the snapshot copy will end up being the same size as the original. In reality, that's unlikely and in most normal cases data change rates are relatively low and snapshots provide a great method of quickly capturing a picture of your data volumes and how they look right now that you can use to recover back to.

Snapshots are normally scheduled to happen at regular intervals, perhaps every four or six hours, and are normally kept for a number of days. So this can be [a] great benefit in the event of data corruption which isn't spotted immediately because you can choose exactly which point you want to recover back to.

The last one to mention is cloud-based backup, where data is sent from server workstations over the Internet to a cloud provider. A key benefit here is not having to worry about owning and managing your backup servers or disk or tape libraries and having to manage hundreds of backup tapes. All the energy that normally goes into managing a backup solution goes away.

The downside can be, sometimes, restore times. A single file can usually be quickly and easily recovered with a cloud-based solution, but when you need to recover large very data volumes over the Internet it can take a long time.

Read more on Data protection, backup and archiving