By Antony Adshead, Bureau Chief, SearchStorage.co.UK
Data backup software doesn't just do backup anymore. Key developments in backup products over the past year are the inclusion of data deduplication, continuous data protection (CDP), a tighter fit with virtualised servers and data protection management (DPM), as well as the rise of synthetic backups and open source data backup applications.
Data deduplication in data backup software products
Data deduplication offers a potential step change in the way backup is done. By removing duplicated data blocks and replacing them with a tag, data deduplication can reduce the amount of data in a backup by 10:1, 20:1 and more. The ratio depends on the nature of your data, with structured data with lots of commonality achieving the highest ratios.
Data deduplication allows you to bring disk in as a target for backups and store much more data on it, which makes restores much faster. That capability has seen some businesses adopt a disk-to-disk-to-tape (D2D2T) strategy in which disk holds deduplicated data for a period of a month or more before being migrated to tape. Disk may be more expensive than tape, but those costs can be defrayed by the speed at which files can be recalled from disk-held backups in case of user error, often by the user themselves.
In addition, data deduplication at the source can give a boost to backups from remote offices. Source data deduplication puts an agent on the source server that communicates with the backup server, and only new data is transmitted across the network. By cutting the volume of data at the remote site, you'll place far less strain on the available bandwidth to your main data centre.
Deduplication capability is now included in all major data backup software products. For example, EMC incorporates source-based data deduplication into its Avamar product, as does Symantec in its NetBackup PureDisk product. Avamar and PureDisk also come as standalone products so you don't need to use Symantec's NetBackup or EMC's NetWorker to take advantage of them.
Symantec also incorporates target data deduplication -- in which backups are received at the target before being deduplicated -- in NetBackup. IBM includes source data deduplication in its Tivoli Storage Manager 6.2 product.
Commvault's Simpana also incorporates data deduplication but divides the tasks involved in deduplication between source and target, providing something of a hybrid that can save on bandwidth requirements although not as much as true source deduplication. Simpana also allows for writes of deduplicated data to tape.
CA's ARCserve Backup now incorporates target data deduplication for disk and tape.
Continuous data protection
Continuous data protection started like many products as a point solution sold by niche vendors. It's been around for a few years, but failed to get off the ground owing to many storage/backup managers not wanting to trust data protection of their most vital assets to a non-traditional data backup software product. Standalone CDP products also required separate administration and maintenance.
But there's no denying the technology's appeal when looked at objectively. Data copies with near-zero recovery point objectives (RPOs) and rapid recovery time objectives (RTOs) made possible, and all without the need for a backup window.
It's all very appealing, and the main data backup software vendors are now on board and offer continuous data protection products and features. Symantec has NetBackup RealTime, IBM has Tivoli Continuous Data Protection for Files and TSM FastBack, EMC has RecoverPoint and CommVault has built CDP into Simpana.
Data protection management
When you run backups you want to know how they have performed, so it's natural that data backup software products should include advanced reporting features. Well, it should be. It's only now that the main backup product vendors are getting a kick up the backside from specialist data protection management vendors that offer advanced reporting features such as trend reporting, capacity planning and cross-product monitoring.
Key vendors and products include Aptare's StorageConsole, Bocada's Prism, Rocket Software's Servergraph, SolarWinds/Tek-Tools' Backup Profiler and TSMworks' Smart. Enterprise-level backup product vendors have entered the fray via acquisitions that have led to products such as Symantec's OpsCenter Analytics (formerly Backup Reporter) and EMC's Data Protection Advisor (from its 2008 acquisition of WysDM Software).
Synthetic backups -- also called synthetic fulls -- are built on the recognition that once you've done a full backup there's no need to copy files that have already been copied and that to do so is a waste of time and network resources.
IBM TSM long ago dispensed with backing up data that had already been backed up and called it the "progressive incremental." What's essentially happening is that intelligence built into the backup product doesn't copy data that already exists in backups afresh, but creates a "new" full from existing copies plus any new data created since the most recent full. In short, only delta changes are saved.
CommVault and Symantec call this "synthetic backup," while EMC calls it "saveset consolidation."
Backups for server virtualisation
Server virtualisation has been all the rage over the past couple of years. It has helped businesses cut down on the number of servers and speed time to create new servers and roll out new applications and services. But virtualised servers need backup, and on that front things are only now beginning to emerge from a difficult period.
The key issue that arises is that with many virtual servers packed into few machines, I/O loads during backup increase significantly. Despite this, many users do backups for virtualised servers as if they are physical; they do this by installing agents on them, backing up as normal, and living with the increased I/O load and its consequences.
To address the issue VMware introduced VMware Consolidated Backup (VCB), which aimed to arbitrate I/O issues on ESX servers; however, it was an inelegant solution to the problem. VCB used a proxy server and required a two-step backup process and two-step restores. Many VMware users opted not to use VCB on its own and plugged in point products designed specifically for virtualised environments -- such as PHD Virtual Technologies' esXpress, Veeam Backup & Replication and VizionCore's vRanger Pro -- to make virtual server backups easier.
Microsoft Hyper-V users have also tended to treat their virtual servers as if they were physical.
But the VMware backup scene is now changing. The company discontinued VCB and with its vSphere operating system it brought VMware vStorage APIs for Data Protection (VADP). VADP doesn't handle the backups itself, but allows block-level incremental backups and integration with backup products. EMC Avamar and Symantec NetBackup support VADP, and CommVault, EMC NetWorker and IBM TSM are working on integrating with the VMware APIs.
Open source data backup
The last couple of years have seen the rise of a number of commercially available open source data backup applications. These products are almost certainly not suitable for large and complex enterprise environments, but small- and medium-sized businesses (SMBs) can take advantage of them and they will come at a considerable costs saving over other products. The key is to check whether products support the features you require and to do your homework on usability and support.
Open source data backup software products include BackupPC, which includes file-level data deduplication based on a hashing algorithm that checks for repeat files and replaces duplicates with a link. Growing in installed base is Zmanda, which is the commercial manifestation of the University of Maryland-developed Advanced Maryland Disk Archiver Unix-based backup application. Finally, there is Bacula, which can provide backup for nearly all operating systems, including Unix, Linux and Mac OSs.