In the war of words over disk archive versus tape archive, "tape sucks" was the loud-and-clear message from EMC at its recent EMC World bash in the US. Meanwhile, the tape library vendors say it’s the ultimate archival backstop. Who is right?
EMC has the volume button turned up so high on its "tape sucks" message that it drowns out the fact EMC bought $25 million worth of Quantum tape automation products in 2010 and that Quantum has been an EMC Select partner for at least nine years. Nevertheless, EMC people say the future for backup is disk, with Avamar and Data Domain data deduplication being front and centre of that message.
EMC has its Data Domain archive technology and is bent on selling this where it can, rather than Scalar libraries from Quantum. At one level this is surprising because EMC’s enterprise storage competitors -- HP, IBM and Oracle -- each have tape library archive products with cost-per-terabyte ratings that are better than disk.
EMC would respond that the tape-versus-disk comparison isn't clear-cut, that a value can be placed on a shorter restore time, and that if a customer values a shorter restore period highly enough this can tip the cost balance in favour of disk. No doubt that’s true, but this involves a qualitative rating rather than a pure quantitative one.
The tape library vendors, particularly the ones with huge, high-capacity libraries -- IBM, Oracle and Spectra Logic -- would point out that when recovering lots of data and streaming it, their I/O rates knock a disk library into a cocked hat.
Plus, with modern tape media validation technology, their tapes are as reliable if not more reliable than disk, and they last longer. Add to that the fact that tapes spend most time offline and so don’t consume power in such a state. Should disk library vendors respond with spun-down disks, then it can be pointed out that they erode their restore speed advantage with disks that take time to spin up.
The bottom line is the biggest, fattest, data tub of all is a high-end tape library capable of holding many petabytes of data, exabytes even in compressed form with the latest IBM and Oracle tape formats.
This view of tape is, however, under attack from disk archive vendors cutting prices. ExaGrid, for example, has a promotion that offers to match the price of some (unspecified) tape libraries with its own deduplicating disk archive.
What the tape archive vendors need -- to put their products effectively out of reach of the disk archives in pricing terms forever -- is a deduplicated tape archive.
While backup product vendor CommVault has had the ability to dedupe data to tape since early 2009, no tape hardware vendors have so far achieved it from their end of the process. What it needs is an embedded server placed inline, in front of the tape drives, that would maintain deduplication mapping data and other metadata and rehydrate or expand the deduplicated file fragments streaming off tape. Give this server enough flash memory, and its inline rehydration could be fast enough not to hinder tape restore steaming speed.
A mere 10-to-1 dedupe ratio would be enough to see off disk archive competition on a price-per-terabyte stored basis effectively for ever. A 20-to-1 ratio would drive rivets into the disk archive cost comparison coffin lid, let alone nails.
One issue would be that a tape from such a library could not be read in another tape format library or drive of the same type because the dedupe map would not be present. Such maps could be sent to other same-vendor libraries to partially get over the issue, but they would not be readable in other vendors' libraries even if they employed the same base tape format. Customers might be willing to bear this in exchange for a 10x to 20x increase in library capacity, especially if they could write in their existing tape cartridge contents and deduplicate them.
Getting back to the disk-versus-tape-archive discussion, it is interesting, if not bizarre, to see how the same event can produce diametrically opposite interpretations. The tape library vendors point to the Google email outage in February in which data was eventually recovered from tape, and say, "We told you so. Tapes are the lifeboat you need when your disks are sinking."
EMC points to the exact same Google event and says it proves Google needs a modern disk-based data protection scheme, ridiculing the length of time it took Google to recover files from its tape archive. It's almost a religious argument with no possibility of agreement between the parties.
One aspect of EMC's position is that its tape library source, Quantum, doesn't have a high-end library that matches those from IBM, Oracle and Spectra Logic, the Scalar i6000 having significantly lower capacity. It's understood that Quantum has a high-end library under development. Would that change EMC's position? I don't think it would as disk is embedded deeply in EMC's DNA. For it to embrace tape now would be such a dramatic denial of its “tape sucks" message that customers would no longer take EMC marketing messages seriously.
EMC is locked in to its anti-tape message for the foreseeable future, and the archive tape library vendors potentially have a single big marketing stick to beat it with, if they get their act together. Right now, though, EMC's anti-tape megaphone blast threatens to drown their individual messages out, and more tape library customers, like Sainsbury's in the UK, could respond to it by walking away from tape.
Given the revenues at stake here I'm surprised there is no concerted rhetoric coming from the tape vendors about the unreliability and extra cost of disk archives. They are in danger of losing the disk-archive-versus-tape-archive marketing war by default. And also by complacency, which in competitive markets tends to be a losing strategy. So, come on guys, check out the idea of deduplicating tape libraries energetically, get out there and sell, really sell, your tape archival products.
Chris Mellor is storage editor at The Register.