Data deduplication firm pushes performance

Data Domain boosts performance of its data deduplication product, but users are looking for a change in design for better single-stream throughput.

Data Domain has announced that it's shipping a new model of its data deduplication array, dubbed the DD580, which boasts new dual-core Intel processors and support for more capacity, slotting it into Data Domain's product line between its DD560 model and its data center-sized DDX array.

The announcement comes as debate continues to rage in the hot data deduplication market space, where one of the many battles is around performance. The new Data Domain performance claim of 800 GB per hour (220 megabytes per second [MBps]) in lab tests now matches competitor Diligent Technologies's claims with the previous version of its ProtecTier VTL product. Diligent now claims transfer rates of up to 400 MBps.

However, the new aggregate throughput is for a fully-stocked box with support for more of those single data streams thanks to the new processors. The single data stream rate, according to Data Domain co-founder and vice president of product management, Brian Biles, remains the same as in previous products, around 100 MBps.

More on data deduplication
Symantec adds reporting, failover to data deduplication  

IBM updates VTL, shuns data deduplication  

Data Domain files S-1, Quantum to receive shares  

Data deduplication vendors duke it out
"There are many variables on deployed speed in a given datacentre, including client throughput, network load and media server capacity," Biles added.

Currently, according to Biles, Data Domain sees load balancing for performance as the purview of the backup application, such as Tivoli Storage Manager (TSM), which allows users to designate separate storage pools and choose target devices manually to optimize performance.

"Backup software has been very good at load balancing and targeting different loads to different devices, and users have had to do it that way for years with tape," he said. "Our product slides into that environment nondisruptively."

However, Biles also said that the ability to cluster the network-attached storage (NAS) heads on the arrays for automated load balancing is on the Data Domain roadmap slated for release sometime next year.

"Right now the market space Data Domain is targeting is the SMB, particularly the midsize business," according to Curtis Preston, vice president of data protection services at Glasshouse Technologies Inc. "Two hundred megabytes per second would be more than many data centers in that category would need."

That said, Preston added, "anyone looking to purchase a Data Domain box and needing bigger [performance] numbers isn't going to get it with a single head. Data Domain is going to have to find a way to go to multiple heads to reach new customers."

DD580 beta tester Kirk Schoeffel, technology specialist with the city of Vancouver, British Columbia, had another idea: the ability to "trunk" Ethernet ports on the box in order to aggregate their throughput when things are working properly and for high availability if one port fails.

Rumor has it that capability could be coming by the end of the year, but Data Domain was mum on its roadmap on that front. "Data Domain is very aware of customer requirements, the core considerations of our product planning and delivery," wrote Data Domain officials in an email to "Data Domain does not preannounce products."

Meanwhile, Schoeffel said the increased performance with the DD580 is a potential perk, but not his chief reason for upgrading -- that was the higher capacity. The DD580 supports between 550 terabytes (TB) and 1.25 petabytes (PB) of logical capacity, as opposed to 400 TB to 900 TB with the DD560. For Schoeffel, it amounts to an extra tray of disks or another 5.5 TB raw physical capacity, which, with the 22-to-1 data deduplication ratio Vancouver has seen with its data, can be expected to hold another 123 logical terabytes.

To speed backups, the city of Vancouver stages some 2 TB of TSM incremental backups to its IBM DS4000 storage area network (SAN) before backing them up overnight to the Data Domain box during a 14-hour backup window. Currently, a typical backup from the SAN to the Data Domain system takes about five hours, roughly equivalent to what Schoeffel calls the "best-case scenario" for TSM backups also being fed by the SAN. However, he said, TSM backups often trend longer because of restore requests, which hold up the backup process with tape.

More importantly, the Data Domain box is exponentially faster than tape when it comes time for a restore. "We didn't get the Data Domain array for backup performance -- we got it for restore performance," Schoeffel said.

There, according to Schoeffel, Data Domain gives him the ability to specify flexible file sizes and restore multiple data streams at once in addition to throttling the transfer rate on restores depending on priority through TSM, none of which is possible with his tape libraries.

"That kind of efficiency is the huge advantage to the Data Domain [array]," he said.

Read more on IT risk management