As a storage medium, tape is familiar to most storage managers and is cost-effective. But tape can be a pain, especially for backup, where growing data volumes strain backup windows to a breaking point and restore processes become onerous.
That's why many businesses are turning to disk as a target for backups. Disk-based backup offers more efficiency on write, slashes restore times and eliminates worries about tapes being lost or stolen, says Simon Robinson, research director, storage with The 451 Group. "Disk-based backup is an enabler of industry-wide migration from tape as a backup medium," Robinson says. "Users often want to get off tape, not necessarily for reasons of reliability but for manageability."
According to Robinson, tape's benefits are especially compelling when coupled with data deduplication, which, by reducing data volumes often many tens of times over, allows users to store weeks' or months' worth of backups on disk – all almost instantly available for restore. "Current growing volumes of data put the focus on why we do backup, and tape is good at it as it's optimised to receive streams of data very fast," Robinson says. "But recovery with tape is often an unreliable process. With disk-based backup, recovery is made far simpler and a lot quicker."
Disk-based targets for backup can be standard disk, such as dedicated volumes on a SAN or standalone array, or 'intelligent disk' products using NAS or virtual tape libraries (VTLs) as targets.
The primary reason for adding disk to the backup and recovery process is speed. For example, it has enabled Shepway District Council to back up more servers and data each night and restore it faster when needed.
"We've got 80 servers to back up nightly. They back up first to VTL and then from there to tape," says Tracey Boyle, network support officer for Shepway. Tape also adds flexibility. For most servers, only the most recent full and incremental backups are kept on the VTL, but for high-priority servers, the VTL versions are kept longer. The goal is to restore from disk whenever possible.
The ability to migrate backup volumes from disk-based virtual tapes to real tapes is part of Bakbone: NetVault Backup, Shepway's backup application. Boyle says that 300 GB of disk storage was enough to define four VTLs, each capable of running two concurrent backup jobs. NetVault then clones virtual tapes to real tapes in the background as the VTL fills up.
One of the most important things to do, says Boyle, is define your servers and the level of protection each server needs. For example, some systems and users need to retrieve files less frequently than others, so they need less VTL space. In this case, if you already have an email archiving system, there should be little need for file restores on that front.
But there's more to using disks for backup than just speed. Another advantage of disk is that it is random access, whereas tape can only be read sequentially. That makes it feasible to reprocess the data on disk once it has been backed up, which has enabled another innovation in backup: deduplication.
Deduplication is a data reduction technique that takes a whole data set or stream, looks for repeated elements, and then stores or sends only the unique data. Obviously, some data sets contain more duplication than others, but it is not unusual for users to report compression ratios in the neighborhood of 30:1.
For Associated Newspapers, the publisher of the Daily Mail and London Evening Standard, data deduplication was a big win. Tape backup had been a long-standing headache, and when the company started adopting server virtualization, that pain became intolerable. Virtualisation meant more servers to backup at 10 GB each and, perhaps more importantly, more servers to restore.
Associated Newspapers was recommended a combination of VizionCore's vRanger software to backup its VMware virtual machines, and Data Domain deduplicating disk arrays to store the resulting files.
"Our virtualisation project was tentative at the start, and at that point we didn't know if our data contained much duplication," says Steve Bruck, infrastructure architect for Associate Newspapers. "Data Domain offered us a trial, so we threw snapshots at it, at 10 GB a time."
According to Bruck, because Associated Newspaper is creating servers from templates, they're virtually identical. "So the Data Domain box was lapping it up," he says. "We're getting an incredible amount of compression, averaging 50:1."
The publisher also has Apple Macs to back up and tried deduplication on those, mounted on NFS. "We got less compression but it was still 10:1 or 15:1," Bruck says. "So then we thought, 'Let's try Oracle'. It was taking a huge amount on our NAS for only two days' worth of online backups, and Data Domain straightaway gave us five days."
Of course it is possible to deduplicate backups to tape by processing the data on disk first. The technology is also used for backups between data centres, to reduce the amount that must be sent over the WAN.
Legacy tape storage infrastructure crashes
Arup, the design and engineering firm behind projects such as the Beijing Olympic Stadium, Heathrow Terminal 5 and the Sydney Opera House, found that its legacy tape storage infrastructure could no longer cope after it consolidated more than 20 dispersed email systems into just two giant central repositories.
For Arup, disk-based backup is as much about simplicity as speed. According to Steven Capper, the company's associate director and Europe region IT leader, Arup had bad experiences with tape in the past, and didn't believe it could scale to cope with the data volumes involved in its new consolidated European email repositories.
However, says Capper, Arup needed deduplication as well, because its requirement for four months' retention meant 900 TB, and Arup's data centres simply didn't have the space and power for that much extra disk storage.
When it comes to power usage, tape retains an advantage over disk, since, unlike a spinning disk drive, a stored tape consumes no energy. Arup dealt with that by adopting a VTL with inbuilt de-duplication from Copan Systems, whose MAID (Massive Array of Idle Disk) technology powers down inactive disks.
"Copan came to us with a unique proposition: the capacity, density and power metrics of a traditional tape silo with the performance, data integrity and ease of access of a disk array," says Capper. "This gives us a higher quality of service than we previously had. So when it came to the decision, it was no contest."
According to Capper, choosing disk-based backup over tape for its email system allowed Arup to introduce a new service without hiring any extra staff. . .and without seeing an 80% jump in its electricity bills.
Tape still has its place
Disk-based backup isn't always going to be the best choice. Steve Bruck says that updating its tape infrastructure with the latest version of EMC's Legato software and LTO-4 tape drives has given Associated Newspapers "incredible performance gains."
Shepway Council's Tracey Boyle cautions that you shouldn't forget the limitations inherent in emulating tape on disk. "Keep an eye on media re-use," she says. "If you don't mark your media for re-use, the database becomes huge."
Furthermore, unless you can back up to disk on a different site, you probably need to keep tape in the picture for disaster recovery, as well as for archiving.
Plus, while disk lets you back up more data and back it up faster, is that really what you should be doing? Perhaps it is time to stop backing up simply out of habit, and ask what data really needs to be protected -- and what kind of protection it really needs.
Graeme Gordon, operations director at Scottish ISP Internet For Business, says the lesson he learnt from implementing Quantum DX systems for disk-based backup and deduplication was the importance of getting assistance from a specialist early in the process to help accurately profile data and backups. This way, he says, you can avoid overcomplicating the solution.
Bruck agrees. With his backup domains partitioned so that some go to disk and others straight to tape, "tape backup isn't the pain it used to be," he says. "Now we have a breathing space to decide where we want to go next. We need to find out what really needs to be backed up and when."