NEW YORK, NY --
Data deduplication is a hot topic at this year's Storage
Decisions conference, with users saying they're gung-ho about
deploying the technology. However, those with large storage
environments say they've had trouble finding a product that fits
their requirements.
Brian Greenberg, director of
data protection services for a large financial company based in
Chicago called data deduplication the "Holy Grail" of
disk-based backup Wednesday during a presentation on disk-based
backup.
Still, Greenberg's company, which he declined to name, is
sticking to tape for backup for now while waiting for deduplication
to become more useful for disaster recovery.
A cost analysis model Greenberg performed using systems-analysis
software called iThink from Isee Systems showed that with a
three-year retention scheme, the cost of media for about 68,000
tapes over the next five years would amount to $3.4 million. The
cost of disk capacity for the same amount of data, not including
power and cooling, comes out to $103 million -- and twice that
amount for replication. However, he said, data deduplication at a
ratio of 30:1 brought the disk costs down to about $3.2 million.
"Data deduplication is the key to being able to do disk-based
backup in our environment," he said.
So why isn't he using it? Greenberg said he will not deploy a
data deduplication appliance until he finds one that can copy its
deduped data store and its index to tape for disaster recovery
purposes. He could copy data from most data deduplication systems
to tape by "rehydrating" the data and backing up the same data
separately, but Greenberg said he wants to save space on tape, too.
"Being able to backup the catalog is a standard feature of a tape
backup environment," he said. "Many of the vendors have asked me
why I'd want to do tape backup when I can replicate between
systems, but what if there's a rolling disaster that corrupts
both?"
Pete Fischer, storage administrator for a large paper and
packaging manufacturing company based in the South, said his
company is desperate to find a product that can reduce the 400 TB
of data it must protect every 24 hours. The company uses IBM's
Tivoli Storage Manager (TSM) to send data from EMC Clariion CX500,
600 and 700 systems with a total of 27 TB usable capacity to
Clariion Disk Library (CDL) virtual tape library (VTL) systems.
"We have barely enough room to keep our incremental backup data
in the disk pool," Fischer said. Any overflow gets sent directly to
the CDLs, which are also trying to backup data from the disk pool,
causing bottlenecks. Fischer also said he's running out of capacity
in his tape libraries, estimating that a fully populated Sun
StorageTek SL8500 has about 30 percent of the drives he needs.
Fischer's company has brought in a Data Domain box for testing.
He's also evaluating Diligent Technologies, but favors Data Domain
because Diligent is strictly a VTL. "We're leery of VTL and tape in
general at this point," Fischer said. His firm is putting Data
Domain DD560 systems through rigorous performance testing, and
Fischer said he's not satisfied with the product's scalability. The
DD560s hold just over 1 TB of disk apiece, so he will need to
deploy at least eight boxes and silo his data according to
application. "What I want is to have the boxes be aware of each
other, and to be able to get even more data reduction across
applications," he said.
Mark Glazerman, storage and backup admin for a plastics
manufacturing company, is happily running Data Domain DD560 and
DD430 boxes to back up 25 TB. Glazerman said his most recent
monitoring reports from his Data Domain systems show an average
throughput of 10 MBps over 24 hours. That satisfies Glazerman, but
won't work for everybody. [Update: Following publication of
this article, Glazerman contacted SearchStorage.com to clarify that
the 10 MBps throughput rate reported by the system is per drive,
rather than for his entire system. At 15 drives, the entire system
is getting an average throughput of 130 MBps, Glazerman said.]
Jannes Kleveberg, solution area manager for ATEA, a consulting
firm that manages storage at a large automobile manufacturer's
facilities in Europe, has considered deduplication for his client's
600 TB shop. He heard Glazerman's per-drive performance numbers
with Data Domain and said "that kind of performance won't do in a
large environment."
Kleveberg said he's concerned about post-process systems causing
contention with the servers they draw data from after the backup
window is over. "For us it always comes back to the performance
issue," Kleveberg said.
Data Domain's director of product management Ed Reidenbach said
users may point fingers at deduplication if they have poor
performance because it's an unfamiliar technology. "We spend a lot
of time debugging customer networks to resolve the issue, but since
we're the new player in the environment [users] think we're the
problem," he said. According to vice president of marketing Beth
White, Data Domain is working on letting individual boxes connect
through a global namespace to scale better. "We're still pushing
the upper limits of our product," she said. "All of us [vendors] in
this market are still working our way up the food chain to those
megascale data center environments."