Pavel Losevsky - Fotolia
The disk-based backup deduplication market is lucrative and competitive. As a result, the major players in this space are working hard to improve their products and grow their market share. In this article we will look at some of the big players in backup deduplication hardware and review some of the major improvements to their products over the past couple of years.
Today's backup deduplication appliances are a far cry from those on the market two or three years ago. Data deduplication technology is now rock-solid and mature. Vendors are shifting their efforts towards improving performance and scalability and adding functionality, including things such as replication and support for third-party software. This trend is common across all the major vendors in this space.
As a result, products in this space are starting to provide a far more rounded feature set than in years past. Perhaps we're seeing the advent of Backup Deduplication 2.0?
In early January this year Dell launched its first data deduplication backup target, the DR4000. This is the fruit of Dell's acquisition of inline deduplication and compression technology from Ocarina. The devices come in a choice of three capacities up to 12 TB raw, which, with a projected reduction ratio of 15-to-1, gives about 130 TB deduplicated. Dell claims it can ingest 1 TB per hour for NFS and up to 4 TB using Symantec OST.
The DR4000 is an addition to the existing PowerVault DL backup appliances, which consist of Dell hardware with either CommVault Simpana 9 or Symantec Backup Exec 2010 backup software, both of which include data deduplication capabilities.
The Symantec Backup Exec option is aimed at small and medium-sized customers, whereas the CommVault offering is aimed at higher-end use cases.
CommVault deduplication is inline, content-aware and supports both source and target deduplication. Backup Exec deduplication can be at the source or target and is also inline.
EMC Data Domain
The EMC Data Domain product portfolio consists of inline hardware data deduplication appliances that take a somewhat distributed approach to the backup deduplication process. In the Data Domain case, data deduplication can occur at the client side, via DD Boost, as well as on the Data Domain appliance. This approach has allowed EMC Data Domain to increase deduplication efficiencies as well as performance.
DD Boost, formerly called Data Domain Boost Software, enables part of the deduplication effort to be offloaded to the backup client server. DD Boost is now integrated with EMC NetWorker as well as with Symantec NetBackup and Symantec Backup Exec.
Over the past year or two, EMC has as much as doubled performance and scalability across the majority of its product line. The vendor's latest figures claim throughput of up to 26 TB per hour for its highest-performing product, the Global Deduplication Array (GDA).
The GDA, which consists of two DD890 controllers pooled as a single system, can now be made to operate as a virtual tape library (VTL) by using it in conjunction with EMC DD VTL software.
In October last year EMC boosted the lower end of its backup deduplication device family with the addition of its smallest member, the DD160. This is aimed at SMBs and is available as a 2U device with either 1.6 TB or 4 TB of capacity and throughput of 1.1 TB per hour with DD Boost software.
Launch of the ExaGrid EX13000E flagship product has increased the scalability of the company's grid-type architecture. Specifically, the ExaGrid EX 13000E is now able to pool up to 10 systems together to create a logical pool of 130 TB with a backup throughput of 24 TB per hour (10 EX13000E devices, each capable of 2.4 TB per hour, equals 24 TB per hour).
With ExaGrid's approach to scaling, each time you add capacity, you also add performance since each new system brings more compute, memory, disk capacity and network connectivity. This allows performance to scale in line with capacity.
In the third-party software space, ExaGrid has added integration with Veeam Backup and Replication Version 5 to enable backup and recovery of VMware virtual machines. Among the integration points with Veeam is integration with its Instant VM Recovery option, which allows instant booting of failed virtual machines directly from the ExaGrid product.
In addition to integration with Veeam, ExaGrid also supports Symantec OST for NetBackup and Backup Exec.
There's also a final subtle, but potentially significant, improvement in what ExaGrid offers: its zone-level deduplication. Zone-level dedupe delivers both content-aware deduplication (less efficient but easier to deploy) as well as generic deduplication (more efficient) in a single product, meaning customers don't need to choose between the two approaches.
In October ExaGrid announced that its deduplication products gained support for the IBM Tivoli Storage Manager backup application.
FalconStor has a VTL solution called FalconStor Virtual Tape Library, as well as a LAN-based file system approach called the File Interface Deduplication System (FDS) aimed more towards the lower end of the market.
FalconStor takes a policy-based, and therefore post-process, approach to data deduplication. However, the company also offers something it refers to as pre-hashing, or concurrent deduplication. This looks in some respects like inline deduplication and has led to speculation that FalconStor is working on an inline deduplication offering.
While FalconStor supports Symantec OST, it does not play heavily in the distributed deduplication space and does not have a product like EMC Data Domain's DD Boost. FalconStor is squarely focused on improving the efficiency and performance of its products so that they perform well without the need for deduplication top-ups on the client side. As a result, the company's highest-performing clustered gateway appliance has seen huge performance improvements, sporting up to 9.8 TB per hour in sustained backup throughput without the need for deduplication at the client side.
FalconStor also offers industry-leading business continuity with VTL systems being deployed in high-availability (HA) pairs that allow for greater system uptime. There is also industry speculation that HA functionality will be extended to the FDS range. One thing is for sure: If FalconStor is able to bring its LAN-mounted file offerings up to par with its VTL offerings, it will have a great solution.
FalconStor has also made improvements to replication, with throttling, encryption over the wire and dedupe awareness.
On the performance side, HP has improved the performance of its D2D4324 model by increasing backup throughput to as much as 4 TB per hour.
Bidirectional replication has been added to the feature set, with appliances now able to act as both replication source and replication target.
HP has improved integration with third-party applications by adding support for Symantec OST, bringing tighter integration with Symantec backup and recovery products.
HP also added support for 10 Gbps Ethernet as well as the common file protocols: CIFS (Windows file sharing) and NFS (Unix and Linux file sharing). This allows D2D products to act as backup targets for Windows- and Unix-based clients.
HP has increased capacities across the line, starting with 1.5 TB usable on the entry-level model, D2D2502i -- yielding up to 30 TB based on a data deduplication ratio of 20-to-1 -- all the way up to 72 TB usable on the high-end D2D4324, yielding about 1.4 PB based on a deduplication ratio of 20-to-1.
HP also offers a VLS9x00 Virtual Library System (VLS); the latest generation of that product, the 9200, includes several hardware improvements.
The addition of SAS drives throughout the portfolio has helped increase the density of the system, and with a compression ratio of 2-to-1, it can scale to 2.6 PB. Performance has been increased with new firmware, as well as with optional 2U performance nodes. Each performance node can boost performance to 4.4 TB per hour, with a four-node system capable of more than 17 TB per hour.
HP OEMs the software for its VLS from Sepaton.
IBM has been hard at work making feature and functionality improvements to its ProtecTier inline deduplication appliances.
The company has been making noise about not just improving the backup performance of the ProtecTier line but also improving restore performance. At the top end, a dual-node cluster is now capable of up to 2 GB per second (GBps) (7.2 TB per hour) backup throughput, but restore can be even faster, at up to 2.8 GBps.
Replication is now native to ProtecTier; it no longer requires an externally attached storage array to do the replication. It is IP-based, bidirectional and can support up to four hub sites. And, of course, since native ProtecTier replication is dedupe-aware, it does not expand data back to its original undeduped size when transmitting over the wire, meaning businesses can significantly reduce network bandwidth requirements for ProtecTier replication.
IBM has also seen value in the Symantec OpenStorage (OST) API and has improved integration via the ProtecTier OST plug-in module. OST enables ProtecTier to act as storage presented over the LAN instead of its otherwise more traditional VTL presentation. If OST is used, replication is handled at the client side by Symantec.
IBM has made all feature and functionality improvements available to all members of the ProtecTier family, with the major differentiating factors for the higher-end systems being scale and performance. As you might expect, since ProtecTier is from IBM, it supports the System i and System z mainframes in addition to the usual open systems hosts.
Quantum also takes the inline deduplication approach with its DXi range of hardware disk backup appliances.
On the hardware side, faster CPUs and internal storage tiering have been introduced. Also, new platforms have been introduced, including the DXi4500 series intended for smaller businesses and the DXi6x00 series for medium-sized businesses.
On the software side, DXi 2.0 software has been released. This is a no-cost upgrade and a major overhaul of the previous code. According to Quantum, over and above the usual performance improvements that come with new versions of code, DXi 2.0 software should enable the company to continue improving the product well into the future.
Some of the architectural improvements in DXi 2.0 include simplified data flow and use of the improved proprietary Quantum StorNext file system. Its DXi Accent is an upgrade that offloads dedupe work to the client, like EMC's DD Boost. Quantum has also added support for Symantec OST.
The DXi8500 sports performance numbers of up to 6.4 TB per hour for backup throughput, an increase of approximately three times when compared with Quantum's previous top-end product. This was achieved by blending new hardware with an improved indexing architecture.
This combination of new hardware and software has brought, according to Quantum, a performance uplift of four times for the DXi4500 model and a performance uplift of two times for the DXi6500 series.
In autumn 2011, Quantum also added an SMB-targeted dedupe product to its range. The NDX-8d is a disk backup system with data deduplication based on its NDX-8 NAS device. It comes installed with Quantum DataStor Shield backup software.
In January 2011, Sepaton launched its S2100-ES2 enterprise virtual tape library with data deduplication. It can scale to eight 2U nodes over which processing is distributed, with each node supporting up to 1,500 megabytes per second (MBps). When it comes to capacity, support for larger SATA drives in conjunction with hardware compression allow a system to scale to about 1.6 PB before deduplication.
Sepaton devices come with its DeltaStor backup software on board. It uses a content-aware, post-process, byte-level, non-hash-based data deduplication approach. It also uses forward referencing, which maintains a full copy of the latest copy of data to allow for fast recoveries of the most recent backups.
Sepaton devices support Symantec's OST and have improved integration (including data deduplication) with IBM Tivoli Storage Manager.
In March 2011, Sepaton launched the S2100-MS2. This device can scale from 30 TB to 160 TB, has an ingest rate of 1,200 MBps, and takes aim at Data Domain's DD800.
In May 2011, Sepaton launched the S2100-DS3 devices, which work in a hub-and-spoke arrangement -- from branch offices, for example -- with the ES2. There are three sizes of DS3, with ingest rates from 600 MBps to 1.5 GBps and maximum capacities ranging from 40 TB to 80 TB.
Symantec got into the deduplicating appliance market in 2010 and since then has released four devices that work with NetBackup and one with Backup Exec, which is aimed at SMB customers.
The 5000 and 5020 models carry out data deduplication but require a separate media server running NetBackup for backup. Meanwhile, the 5200 and 5220 models combine backup target and media server. The 5000 and 5020 have capacities of 16 TB and 32 TB, respectively, (expandable with add-on nodes) and throughput of 4.3 TB per hour. The 5200 and 5220 have capacities of 32 TB and 4 TB, respectively, (both expandable), and Symantec claims throughput of 10.5 TB per hour for these devices.
The Symantec 3600 has capacity of 6 TB raw and runs Backup Exec.
This is a hot market segment, with vendors aggressively investing (and publishing benchmark figures). Changes and improvements to products are coming rapidly, and you should consult with each vendor for the latest improvements to its products. You should be aware that, as with most things in IT, your mileage will vary. This can be especially true of data deduplication technologies, where the type of data being backed up has a significant impact on deduplication effectiveness. External factors such as client and network performance can also affect backup throughput.