Tape technology used to be the main way to store all data, but over the years it has been gradually relegated to use as tape backup and tape archive. Now, as disk-based storage gets cheaper, the role of tape as an archive medium is expanding.
"Tape is becoming the last port of call," said Hamish MacArthur, storage expert at storage analyst company MacArthur Stroud. "Organisations back up to disk first, and then move to tape offload last."
Compliance is one of the main factors pushing toward tape as archive. Organisations are now forced to preserve data for years -- sometimes, as in the case of health organisations in certain jurisdictions, for a patient's entire lifetime. The dramatic growth of email data (including bulky attachments) has also created a pain point for IT departments that are forced to keep that data and make it easily retrievable in the event of an investigation. Sarbanes-Oxley (which affects UK branches of US-owned companies), for example, requires email to be preserved for up to five years.
Another factor: Disk has been creeping into the backup world, thanks to falling hardware costs, cheap SATA arrays and data deduplication products. The figures bear this out. Enterprise Systems Group's 2010 Data Protection Survey found a sea change in tape's use for backup. Just one respondent in five used tape exclusively for backup, down from one in three in 2008. Meanwhile, 62% use a combination of disk and tape for backup compared with 53% two years ago. And almost a fifth now back up exclusively to disk, vs 14% in 2008.
The shift to archiving
What changed? After all, tape holds many benefits, said Peri Grover, director of product marketing for tape solutions vendor Overland. "Tape is very energy efficient. Disks spin. They generate heat and use power. Tape does none of that unless you actually use it," she said. Tape's storage density is also superb. Drives based on the LTO-5 tape specification launched in the second quarter this year store 1.6 TB per tape (compressed), compared with LTO-4's 800 GB.
In addition, the LTO-5 format enables partitioning of different parts of the tape. This has led IBM to create an index partition in its LTO-5-based file system, LTFS. The use of an index turns LTFS-based tapes into self-describing media, which means they do not need a specific vendor's backup software to read the data on them. This is an asset to any tape archive operation, which may find itself using another vendor's hardware to read tapes years down the line. However, it is important to note that this capability is not backward-compatible with previous incarnations of LTO.
Native encryption introduced in LTO-4 also makes tape more portable and lessens the risks of moving it around. Moving unencrypted tape offsite has been a security risk, but encrypted data will satisfy regulators for compliance purposes.
Conversely, disk is better at supporting multi-threaded processes. As a non-linear, random-access mechanism it is easier for operating systems to write to and read from quickly.
Also, disk won't suffer performance problems from "shoeshining." Tape tends to operate at fast speeds and can run out of data to write from the LAN. This causes the tape to reposition itself repeatedly while it waits for more data, which impacts performance. Although later generations of LTO tape can step down their speed, it can still be an issue.
Repeatedly running a storage medium over a read-write head with which it has physical contact can lead to reliability issues. Ian Lock, service director for storage and backup at Glasshouse, points out that disk is generally more reliable than tape. "Tape goes wrong inevitably. There are more moving parts," he said.
Strategic use of tape
As disk's technical qualities, combined with its decreasing costs, push tape into the archiving niche, smart organisations are building it into an overall storage strategy.
Much data stored by an organisation never gets accessed again, but it must still keep it. Disk satisfies the need for rapid restore access, but users still need a cost-effective, high-capacity location from which to pull infrequently accessed data.
Guy Chapman, senior engineer for storage and virtual infrastructure at London-based SunGard Financial Systems, said his company is phasing out the use of tape for disaster recovery. The company has a large number of virtual machines (more than 1,000 at one site alone). Each of them is of low criticality individually, but together they represent a significant risk, he said.
"Tape is being worked out of the system due to handling and media costs, speed, reliability, and personnel costs," said Chapman, explaining that the company opted for EMC Data Domain data deduplication storage systems instead. "These have allowed us to increase the speed, scope, reliability and flexibility of our internal business continuity offering. Increasingly, tape is being used only as a backstop for 'just in case' final archiving of aged data."
Data deduplication as used by SunGard is a way to reduce the cost of disk-based backup systems. Meanwhile, tape has increasingly become a way of shortening long backup cycles with aged data offloaded to it, while data that is likely to be needed again sooner rather than later is kept on deduplicated disk.
Setting tape archive policies
So, how do you decide what should reside on disk and what should be sent to tape?
The simplest way is to set policy according to data age, with data not modified for a given period of time moved off disk systems to tape.
The most complex approach involves categorising your data and incorporating it into an information lifecycle management scheme in which the path of the data from primary disk to deduplicated backup disk to tape is carefully managed through the use of predefined policies.
Categorision can help particularly in the compliance and discovery area, where companies may be forced to find specific information and link it together as part of a paper trial. Email is a good example of this: finding specific emails sent as part of a conversation can be daunting unless metadata has been set to help categorise that data during the archiving process.
Using journalled data from email as metadata is relatively straightforward these days, but other categorisation of data may be more difficult and rare today, say experts. A human element is needed to categorise some of a company's less structured data and define policies, and whenever humans get involved, things have a habit of getting messy and political.
"The value of information isn't down to the IT manager," said Rob Emsley, senior director of product marketing at EMC's Backup Recovery Systems Division. "You can end up with tension between business and infrastructure owners as they decide which bits of data they need to get back in four hours with no data loss."
"Some people say they want you to put policies on their data, but they don't like giving up control," added Grover.
For now, though, it seems that simple age-based baselining is the most attractive method for defining the point at which some less compliance-sensitive data is transferred from disk to tape-based archive.
Tape-based storage may be giving way to disk, but like the mainframe, tape is unlikely to die out any time soon. Its cost-effectiveness makes it hard to beat for archiving aged data, even if some of its technical disadvantages are seeing it usurped in favour of disk storage.