Deduplication helps firm expand data centre into two hot sites

CMC Markets achieves data reduction averaging 15:1 as part of live replication project using Data Domain deduplication hardware.

London-based financial services company CMC Markets has put data deduplication at the heart of a project to replace a single data centre and implement live replication between two hot sites. The firm has achieved compression ratios averaging 15:1 – but as high as 67:1 - with its Data Domain DD530 devices in the project, which also called for an upgrade of the firm's backup hardware and software to accommodate a threefold increase in jobs.

CMC Markets is a financial services provider that offers real-time Internet-based trading of more than 3,500 products from 20 offices worldwide. Its core applications are its homegrown trading platform, Oracle databases and Lotus Notes, all running on some 300 servers.

On the first backups we got little in the way of dedupe, but that's the way it works. This week we have achieved 23x and 67x compression.
Greg Gawthorpe
technical operations team leaderCMC Markets
Growth in business volumes and increasingly onerous retention requirements as a result of legal guidelines meant that by the beginning of this year the business had made plans to expand storage capacity as well as to make more efficient use of physical space and power.

At that point, the IT department was set on a simple expansion of nearline storage capacity for backups. But this expansion evolved into a plan to create two live sites with replication between them, says Greg Gawthorpe, technical operations team leader with CMC Markets.

"We had grown out of nearline storage very rapidly because we had seen our databases double in size in a year," Gawthorpe said. "When we first set up Bakbone Netvault: Backup 18 months ago, it had 80 jobs; now it has 250."

Gawthorpe's team tested Data Domain data deduplication hardware and after one week achieved compression ratios of up to 100:1 on Lotus Notes files. The choice for Gawthorpe was between Data Domain – whose devices his team tested – and software bolt-ons to their existing Nexsan SATAbeast storage arrays. "We looked at a number of software data dedupe products but they didn't seem to be robust enough for the enterprise," he said.

Because of throughput issues, CMC decided to opt for a pair of mid-range devices at each data centre rather than just one larger capacity device. "We needed two devices at each site, as one couldn't cut the mustard in terms of I/O," Gawthorpe said. "There were issues of throughput because of the speed of the dedupe engine, which can produce a bottleneck if you allow it to."

CMC Markets eventually settled on two Data Domain DD530s of 5.2 TB capacity at each site at a cost of approximately £50,000 each including licences. They have been set up as virtual tape libraries emulating StorageTek L180 tape drives.

At the same time, the establishment of the two live sites saw the CMC team upgrade its Netvault configuration by implementation of a Netvault Smart Client and offload servers and boosting processing power by moving to quad-core CPUs, giving 64 cores in total.

Gawthorpe continues to achieve what he calls impressive data reduction ratios, as the dedupe engine has been able to strip out increasing amounts of duplicate data over time. He said, "In terms of compression, we have been getting just what they said we would. We're about 75% full now and on the first backups we got little in the way of dedupe, but that's the way it works. This week we have achieved 23x and 67x compression – it really depends what you throw at it."

Such increasing compression ratios illustrate an essential characteristic of data deduplication technology: the more you deduplicate, the better compression ratios you will achieve as the engine recognises files and blocks that it has already tagged and need not store twice. For that reason, deduplication achieves far better compression rates on data types that contain many common features, for example, database information. By contrast, businesses that store many unique files, such as images, tend to achieve far less from data deduplication.

Gawthorpe calculates that by squeezing around 160 TB of data into 10.4 TB of real capacity he has been able to avoid buying the equivalent of 10 Nexsan SATAbeast arrays and is using half the space that simply expanding existing nearline storage would have entailed. He is also able to provide much quicker service to users, which also boosts the standing of the storage and backup teams.

"We are able to keep five weeks of backups on nearline storage," Gawthorpe said. "Users are used to getting yesterday's files pretty quickly but they don't if they're three weeks old. So when you can give it to them before you're off the phone it makes a really good impression."

Read more on Data protection, backup and archiving