According to a white paper released this week (but dated 26
February), Network Appliance (NetApp) has deduplication, but it's
still limited in terms of the products it supports, the size of
data stores it can dededuplicate and has the potential for high
overhead.
The company has ported its single-instancing algorithm, which it
calls Advanced Single Instance Storage (A-SIS), from its SnapLock
content-addressed storage (CAS) product into
FlexVols to deduplicate data at the block level.
According to the white paper, "A-SIS only stores unique data
blocks in the flexible volume and creates a small amount of
additional metadata in the process." Each block of data has a
digital signature, which is compared to all other signatures in the
flexible volume. If an exact byte-for-byte block match exists on
the flexible volume, the duplicate block is discarded and its disk
space is reclaimed."
The white paper also claims that the post-process deduplication has
a 1% write performance hit. The background process, which is
activated through a command line interface, can also be scheduled
or run manually. A-SIS operates on the active file system (AFS) of
a flexible volume.
The product is currently in beta tests and has not yet been
released to the public. Though the white paper contains
instructions for deploying A-SIS with NearStore, it requires two
licenses, called "nearstore_asis2" and "nearstore_option" to be
activated on the filer.
A-SIS won't work with snapshots, LUNs and limited in
scale
There are also a few catches at this phase of the product as
detailed by the white paper: Any block referenced by a snapshot
copy cannot be deduplicated, A-SIS will only work on data sent via
CIFS or
NFS, it will not work on LUNs and is only
compatible as yet with the NearStore R200, FAS3020c and
FAS3050c. A-SIS also cannot deduplicate across FlexVols, which
currently have a size limit of 4 terabytes (TB) on the R200, 2
TB on the FAS3020c and 1 TB on the FAS3050c.
The white paper also warns, "The total storage used by A-SIS is
…1% to 3% of the actual stored data due to fingerprints in the
fingerprint file and change log file(s). So for 1 TB of data there
would be 10 GB to 30 GB of overhead." That's without snapshots --
if snapshots are turned on for the flexible volume, the paper
states, "the overhead becomes additive each time A-SIS is run and
is therefore substantial."
Finally, under best practices, the white paper suggests that
users "run A-SIS infrequently … do not run eight A-SIS processes
concurrently if possible because there will be a negative
performance impact on other applications."
It continues, "given the above two items, the best bet is to
disable any A-SIS schedules on the flexible volume and run A-SIS
manually, [and] turn off scheduled Snapshot copies or keep Snapshot
copies to a minimum ... if Snapshot copies are required, run A-SIS
before creating the Snapshot copy as this will minimize the amount
of data that gets locked in Snapshot copies."
"This pretty much looks like an SMB [small and midsized
business] type play where the backup window finishes, and you have
fairly large amounts of time to dedupe data already backed up,"
said Jerome Wendt, lead analyst and president with the DCIG Inc.
"You might free up space in the course of a day with it, but you
still need all that space for your backups before the dedupe
happens."
"Based on what I'm seeing in this white paper, to me this looks
like a poor man's solution to single-instance storage," Wendt said.
"I wouldn't view this as a robust way to manage it. It appears it
will work, but it's really not suited for the enterprise. There are
so many qualifiers here, and even NetApp recommends just doing it
under certain circumstances and infrequently."
Read the entire white paper on
A-SIS.