Users evaluating
data deduplication products in an already crowded marketplace
will now have yet another basis for comparing products: not just
how different vendors perform data deduplication, but how they also
incorporate data compression as a means of
data reduction.
Virtual tape library
(
VTL) vendor Sepaton joined competitor FalconStor Software in
announcing a partnership with Hifn, maker of hardware-based
compression chips. (FalconStor announced the partnership last
fall.) And the numbers being thrown around by Sepaton -- 50-to-1
data reduction without a significant performance hit when both data
deduplication and Hifn's data compression are turned on in its
S2100-ES2 500 Series VTLs -- show the potential impact of a
relatively small system component.
Depending on where data deduplication is done, either inline or
post-process, compression can also speed data throughput rates,
similar to the way it works in tape drives.
The terms "deduplication" and "compression" can get confusing,
as both processes perform data reduction, and both do so by
eliminating redundant bits. However, data deduplication does
comparisons to previously stored data; compression eliminates
patterns within one file.
Sepaton claims that once data is deduplicated at the block level
at what it said is a typical reduction ratio, 25 to 1, that data
can then be compressed using the Hifn chip to cut its storage
capacity requirement in half again, boosting the typical data
reduction ratio to 50 to 1.
Sepaton doesn't have many publicly announced customers to back
up these claims yet, but Linda Mentzer, vice president of marketing
for Sepaton, said the 25-to-1 ratio is an average taken from
testing at between 15 and 20 customer sites using "typical" data
sets -- not repeated full backups in a lab, which can inflate
dedupe numbers.
"In some environments using NetBackup with files, Exchange data
and SQL data, we've seen as much as a 56-to-1 deduplication ratio,
which would make the ratio with compression 100 to 1." But, she
said, the company chose to go with the "most typical" number.
It's a number analysts said is reasonable given the data at
hand, which unfortunately isn't much. According to Arun Taneja,
founder and consulting analyst with the Taneja Group, it's because
of the way Sepaton does data deduplication; It removes individual
files from tape archive "wrappers" made by backup applications and
then deduplicates them according to preset "awareness" of how much
deduplication is possible within each file type, a process it calls
"content-aware deduplication."
"Technically, they could squeeze out duplication there at the
byte level," Taneja said, and a 25-to-1 ratio is in line with what
most data deduplication vendors on the market claim. A 2-to-1
compression ratio for LZ compression, which has been around for
years, is a generally accepted figure in the industry.
However, Taneja pointed out, for all the squabbling and
infighting this space has seen so far this year, there has yet to
be a definitive bake-off between products performed by a third
party.
"They're all claiming around 20-to-1 or 25-to-1 ratios, but it's
very muddy how they arrive at the numbers from one company to
another," Taneja said. "Nobody has hard data on this."
According to W. Curtis Preston, vice president of data
protection services for GlassHouse Technologies Inc., the bottom
line is that users should test products carefully using an accurate
sampling of the data in their environment before buying. "Each of
these systems, because they have different approaches to
deduplication, will consistently do better with different types of
data," he said. "Don't believe any of these numbers on their face
-- it's not that vendors are lying, but it's like miles-per-gallon
figures for cars. Your mileage will vary based on driving
conditions."
Sepaton is now shipping the Hifn chip with all 500 Series VTLs,
regardless of whether or not it has the company's DeltaStor data
deduplication option. Users can choose to turn on the data
compression feature with a software license that costs $16,000. It
seems a hefty price for a feature that's standard on most tape
drives, but Metzer pointed out, "Replacement disk trays can cost
between $35,000 and $40,000 -- avoiding having to buy it by turning
on data compression is the more attractive option for most
users."
While FalconStor and Sepaton are partnering for hardware-based
compression with Hifn, Diligent Technologies Corp. is sticking to a
proprietary software module to perform compression. According to
chief technology officer Neville Yates, this is because Diligent is
an inline process that writes only changes to disk, meaning the
performance benefit it would realise from compressing data for
throughput would be negligible.
"If I operate at 400 megabytes per second (MBps) and only 40
MBps of that is data I'm going to compress, hardware-assisted
compression in that instance [would be] overkill," Yates said.
Another competitor, Data Domain, is also sticking with software.
"If you do fast dedupe inline and compress as a final step for only
unique sequences, the CPU impact of local compression is nominal,"
wrote Beth White, vice president of marketing, in an email to
SearchStorage.com. "Most vendors don't have this capability.
Throwing hardware at it might be all they can do."