Data deduplication in data backup software products
Data deduplication offers a potential step change in the way backup is done. By removing duplicated data blocks and replacing them with a tag, data deduplication can reduce the amount of data in a backup by 10:1, 20:1 and more. The ratio depends on the nature of your data, with structured data with lots of commonality achieving the highest ratios.
|
||||
Continuous data protection
Continuous data protection started like many products as a point solution sold by niche vendors. It's been around for a few years, but failed to get off the ground owing to many storage/backup managers not wanting to trust data protection of their most vital assets to a non-traditional data backup software product. Standalone CDP products also required separate administration and maintenance. But there's no denying the technology's appeal when looked at objectively. Data copies with near-zero recovery point objectives (RPOs) and rapid recovery time objectives (RTOs) made possible, and all without the need for a backup window. It's all very appealing, and the main data backup software vendors are now on board and offer continuous data protection products and features. Symantec has NetBackup RealTime, IBM has Tivoli Continuous Data Protection for Files and TSM FastBack, EMC has RecoverPoint and CommVault has built CDP into Simpana.Data protection management
When you run backups you want to know how they have performed, so it's natural that data backup software products should include advanced reporting features. Well, it should be. It's only now that the main backup product vendors are getting a kick up the backside from specialist data protection management vendors that offer advanced reporting features such as trend reporting, capacity planning and cross-product monitoring. Key vendors and products include Aptare's StorageConsole, Bocada's Prism, Rocket Software's Servergraph, SolarWinds/Tek-Tools' Backup Profiler and TSMworks' Smart. Enterprise-level backup product vendors have entered the fray via acquisitions that have led to products such as Symantec's OpsCenter Analytics (formerly Backup Reporter) and EMC's Data Protection Advisor (from its 2008 acquisition of WysDM Software).Synthetic backups
Synthetic backups -- also called synthetic fulls -- are built on the recognition that once you've done a full backup there's no need to copy files that have already been copied and that to do so is a waste of time and network resources. IBM TSM long ago dispensed with backing up data that had already been backed up and called it the "progressive incremental." What's essentially happening is that intelligence built into the backup product doesn't copy data that already exists in backups afresh, but creates a "new" full from existing copies plus any new data created since the most recent full. In short, only delta changes are saved. CommVault and Symantec call this "synthetic backup," while EMC calls it "saveset consolidation."Backups for server virtualisation
Server virtualisation has been all the rage over the past couple of years. It has helped businesses cut down on the number of servers and speed time to create new servers and roll out new applications and services. But virtualised servers need backup, and on that front things are only now beginning to emerge from a difficult period. The key issue that arises is that with many virtual servers packed into few machines, I/O loads during backup increase significantly. Despite this, many users do backups for virtualised servers as if they are physical; they do this by installing agents on them, backing up as normal, and living with the increased I/O load and its consequences. To address the issue VMware introduced VMware Consolidated Backup (VCB), which aimed to arbitrate I/O issues on ESX servers; however, it was an inelegant solution to the problem. VCB used a proxy server and required a two-step backup process and two-step restores. Many VMware users opted not to use VCB on its own and plugged in point products designed specifically for virtualised environments -- such as PHD Virtual Technologies' esXpress, Veeam Backup & Replication and VizionCore's vRanger Pro -- to make virtual server backups easier. Microsoft Hyper-V users have also tended to treat their virtual servers as if they were physical. But the VMware backup scene is now changing. The company discontinued VCB and with its vSphere operating system it brought VMware vStorage APIs for Data Protection (VADP). VADP doesn't handle the backups itself, but allows block-level incremental backups and integration with backup products. EMC Avamar and Symantec NetBackup support VADP, and CommVault, EMC NetWorker and IBM TSM are working on integrating with the VMware APIs.Open source data backup
The last couple of years have seen the rise of a number of commercially available open source data backup applications. These products are almost certainly not suitable for large and complex enterprise environments, but small- and medium-sized businesses (SMBs) can take advantage of them and they will come at a considerable costs saving over other products. The key is to check whether products support the features you require and to do your homework on usability and support. Open source data backup software products include BackupPC, which includes file-level data deduplication based on a hashing algorithm that checks for repeat files and replaces duplicates with a link. Growing in installed base is Zmanda, which is the commercial manifestation of the University of Maryland-developed Advanced Maryland Disk Archiver Unix-based backup application. Finally, there is Bacula, which can provide backup for nearly all operating systems, including Unix, Linux and Mac OSs.Email Alerts
This was first published in June 2010
