Nearly halfway through 2007, storage managers have made up their
minds on the merits of
data deduplication technology.
"I wouldn't buy a secondary storage device today that doesn't
have it," said Michael Thomas, storage architect with the Federal
Reserve Bank, at a recent Storage Decisions conference.
It's easy to see why. The latest virtual tape libraries (VTL),
which include data deduplication as a feature, claim to offer users
as much as a 50-to-1 reduction in storage footprint by
deduplicating redundant backup data. The savings in cost per
gigabyte stored can be huge.
@37424 "With deduplication turned on, the economics of today's
VTLs are comparable to tape," according to Robert Amatruda, analyst
with IDC. Curtis Preston, vice president of data protection
services at GlassHouse Technologies Inc., estimates the cost of a
midrange tape library to be roughly $4 to $11 per gigabyte with
disk prices hovering around $3 to $11 per gigabyte without
compression or deduplication.
VTL providers estimate that with a retention period of one year
for weekly full backups and 10 days for daily incremental backups,
a single terabyte of data requires 53 terabytes (TB) of capacity
for data protection over its life. With storage capacity growing at
this rate, users are clamoring for any way to contain these
costs.
Deduplication products have stepped in to help users curb this
growth. The key suppliers include: Data Domain Inc., Diligent
Technologies, ExaGrid, FalconStor Software Inc., Network Appliance
Inc. (NetApp), NEC, Quantum Corp., Sepaton and Symantec Corp. EMC
Corp. acquired Avamar Technologies and plans to incorporate its
dedupe technology across its backup portfolio later this year.
Hitachi Data Systems (HDS) has partnered with Diligent Technologies
Corp. and IBM with NetApp.
"The merits of data deduplication are abundantly clear," said
Arun Taneja, founder and consulting analyst with Taneja Group.
However, he says the different methods of deduping data and the
resulting reduction ratios are very fuzzy. Users should test the
products thoroughly and with their own data sets, he warns, as
vendors have found skillful ways to spin the numbers, none of which
should be taken at face value.
Guna Shankar Selvaraj, IT infrastructure architect at Motorola
Inc., says his company is evaluating Data Domain, but that they're
in the "very early stages."
Similarly, the Federal Reserve Bank's Thomas says that he will
test all the data deduplication products for six-to-eight months
before committing to buying anything. "I want to know how many
copies of the index [the product] will hold, and what happens if it
gets corrupted … the integrity of that is very important," he
said.
Another user concerned with recovering data after deduping is
Richard Dearmon, enterprise storage architect with UIC Medical
Center. "I want it, but it's not clear to me what happens to
secondary and tertiary copies," he said. Across the board, users
are eager to evaluate the technology, but still have lots of
questions.
A few have already taken the plunge. CitiStreet, which keeps 50
TB of backup data on Sepaton's VTL, has seen a 56-to-1 reduction in
its backup set using that product's deduplication technology. The
firm has had the product in test for a couple of months and plans
to move it into production by the end of July. There were some
initial challenges with performance that CitiStreet was able to
iron out with the help of Sepaton. " Their deduplication product is
like a black box to the user -- they came in and flipped some
switches, compressed some small files," and now it's working as
advertised, according to Jeff Machols, vice president of global
infrastructure at CitiStreet. With the reduction in data,
CitiStreet is able to get more long-term retention online instead
of worrying about tape storage. "We can keep at least a year's
worth of data online now for backup and recovery," Machols said.
"We don't have to worry about rotating to other storage."
Smoking guns
There are a couple of smoking guns that could slow down the
adoption of deduplication products. Users are concerned about how
deduplication, encryption and compression can all work together in
a coordinated manner.
"Sometimes these features can be at cross purposes … it's
important to figure out the profile of your data, as not all of it
will deduplicate well," Motorola's Selvaraj said.
Another outlying concern is power consumption as more and more
storage goes online. We talked with one user who was recently
forced to turn off several Data Domain boxes because of power
consumption issues. He requested anonymity because of the
sensitivity of the topic.
"The product was working great … and then our facilities guy
came in and said either you figure out what to turn off, or I'll
have to start pulling plugs … we're out of power," the user said.
The Data Domain gear was the last product in and first out of the
data center. "We're back to tape for energy efficiency."
It's unclear at this stage how severely storage managers will be
impacted by the recent energy crunch, but the problem appears to be
filtering through to all departments in IT According to a recent
Gartner report, "By 2008, 50% of current data centers will have
insufficient power and cooling capacity to meet the demands of
high-density equipment." Through 2009, Gartner says energy costs
will emerge as the second highest operating cost in 70% of
worldwide data center facilities.