Host-based replication
While the lines of distinction among data protection technologies such as backup, continuous data protection and replication have blurred, host-based replication can play a key role in your overall data protection strategy.
The lines between data replication and data protection software products are becoming blurred as storage managers wrestle with the familiar trinity of data protection: centralized backup, disaster recovery (DR) and business continuity. With some industry observers predicting a convergence of backup, continuous data protection (CDP) and data replication software in the next few years, choosing a host-level (server-based) data replication product has become much more complicated.
The similarities that host-level replication software products have with backup and CDP products contribute to the difficulty in understanding where one ends and the other begins. For instance, CA XOsoft's WANSync replication software suite includes options that could allow it to fit into all three of these categories. Its WANSync high-availability option (WANSyncHA) provides for automatic failover and DR for Exchange, SQL Server, Oracle and file-serving environments. Its Enterprise Rewinder option includes CDP functionality that provides point-in-time recoveries and lets users consolidate remote-office backup, while its WANSync Assured Recovery feature lets users perform DR testing on standby servers without disrupting their production servers.
Check out this chart that compares CDP, data protection, and replication software.
Yet there are caveats related to using host-level replication software instead of backup or CDP software. Not all host-level replication software products catalog or track the copied versions of replicated data. This necessitates the continued use of backup software at the central site to maintain older copies of the replicated data. Vendors' definitions and deployments of CDP also differ. Steve Duplessie, founder and senior analyst at Milford, MA-based Enterprise Strategy Group, refers to the features of products like EMC Corp.'s RepliStor as "kCDP" or "kinda CDP" because while they let users create multiple point-in-time data snapshots, they differ from true CDP functionality where each write I/O is journaled and users can recover data from any specified point in time.
Host-level replication software is available in three distinct architectures: Windows-only file system, multi-OS file system and multi-OS block-level products. Each alternative offers specific features that make it a better fit for some types of data protection. Key features like real-time failover, bidirectional replication, and tools for sizing bandwidth and data compression make some host-level replication software products better equipped for enterprise deployments.
Replication for DR
A key way users deploy host-level replication software is for DR. Double-Take Software Inc. estimates that approximately 75% of its customers use its replication software for DR or remote-application availability. However, different host-level replication software products, such as Symantec Corp.'s Veritas Replication Exec and Veritas Volume Replicator (VVR), satisfy different data protection requirements. Veritas Replication Exec is a Windows-only product that operates at the file-system level and allows administrators to configure one central DR server to receive replicated data from multiple servers at different sites. Like most Windows products, Veritas Replication Exec supports only asynchronous replication and some data may be lost if you need to switch over to the DR server.
VVR is a block-based product that works with a variety of OSes and permits real-time synchronization of data from source and target servers. It has its own set of restrictions. For instance, admins must have Veritas Storage Foundation installed on both the source and target servers, and convert the volumes to Veritas-managed volumes. Second, all changes to data on the volumes selected for replication must be sent. Finally, the servers must be within 80km of one another for the app on the source server to proceed without waiting on write confirmations from the target server.
Bunker Replication, a feature that uses a combination of synchronous and asynchronous replication within Symantec's VVR 4.0, lets you exceed the 80km limitation. It synchronously replicates data to the bunker site that's within 80km of the primary site while asynchronously replicating data to the server at the more distant DR site. The trick to this approach is that the bunker site houses only the data differential between the primary and DR sites. In this way, if the primary site fails, the bunker site automatically updates the DR site with the data that's normally lost using only asynchronous replication.
The hybrid real-time data replication feature of BakBone Software Inc.'s NetVault:Replicator offers near real-time availability for apps. But it commits writes to the local server without a guarantee that the write was committed to the remote server, asynchronously transmitting the data after the write is complete. Though this technique minimizes app interruption, it's not guaranteed that data on the remote server is consistent with data on the host server and in a recoverable form.
Check out the pros and cons of different replication architectures.
Two-way replication
Bidirectional replication--when servers at both sites send and receive data--is another key consideration when selecting a host-level replication software product. With bidirectional replication, a company can run different apps on the servers at each site, but also use those same servers to protect data from the other site.
File-system and block-based products give admins different options for handling replication between the target and source server. For instance, BakBone's file-system-based NetVault:Replicator includes a global bidirectional parameter, which lets admins select specific directories on the source server they want to include or exclude for replication to the target server and vice versa. This prevents any changes to files on the target server from being replicated to the source server. NetVault:Replicator also includes an option that lets an admin configure all files in all selected directories on both the source and target servers to stay in sync regardless of which server a file change occurred on.
One of the key benefits that file-system products offer over block-based products for bidirectional replication is that they require less storage. Block-based products, such as Softek Storage Solutions Corp.'s Replicator, require dedicated volumes of at least the same size on both the source and target servers. If the source server has a 100GB volume and the target server has a 500GB volume, the source server then needs a 500GB volume to receive replicated data from the target server; the target server would need a 100GB volume for the source server's data for bidirectional replication to work, regardless of what the actual utilization is on the volumes being replicated.
Replication sizing
Companies that need to replicate data between sites must determine the right amount of storage to purchase for the remote site and how much bandwidth is required between sites. Replication software products include a variety of tools that tell how much and how often data changes, and when it changes; based on those statistics, the software recommends ideal levels of storage and bandwidth.
Double-Take Software's Throughput Diagnostics Utility simulates the replication set and makes recommendations on how to configure the replication connection. Operating in this mode, Double-Take doesn't replicate data to the target server, but instead tracks all I/O operations on the source server and estimates the amount of bandwidth needed.
Double-Take also offers low, medium or high data compression levels. The appropriate level of compression for each server is determined by establishing what additional amount of overhead the server can support. Because each higher level of compression adds about 5% more overhead to the server, a server running replication and the highest level of compression could experience CPU overhead rates as high as 15% to 20%.
Softek's Replicator can be set to run in tracking or pass-through mode. In this mode, Replicator is installed on the source server, but simulates only data replication while gathering the necessary statistics to make a determination of how long replication will take. One of the parameters that Replicator monitors is the packet chunk size, which is the size of the buffer that gets sent by the source server to the target server. Softek finds that smaller packets work best when replicating data over long distances. By tuning the source server to send smaller packets, it takes less time for the source server to resend a packet in the event the packet is dropped.
Symantec's Veritas Volume Replicator Advisor (VR-Advisor) uses one of the following methods to collect data for analysis and then makes recommendations:
- VRAdvisor can be installed on the source server.
- If Veritas Volume Manager is installed on the server, administrators may use Veritas Volume Manager's vxstat command to collect data.
- Administrators may also collect the data using native OS command scripts provided with VRAdvisor. In AIX environments, VRAdvisor uses the lmstat command; on HP-UX and Linux servers it uses sar; and on Sun Solaris OSes it uses iostat.
Lowering replication costs
Optimizing the target site's storage and bandwidth requirements isn't the only way replication software allows organizations to lower operational costs. Replication software lets organizations use virtual OSes such as VMware or Microsoft Corp.'s Virtual Server in conjunction with their replication software. One large server running a virtual OS can host multiple guest OSes, with each guest OS acting as a target for each of the remote sites. Though each guest OS still requires a licensed copy of the replication software, most replication software vendors discount their replication software if it's deployed on a server supporting a virtual OS with multiple guest OSes.
Some organizations choose to cut costs by using replication software in lieu of backup software at their remote sites. With replication software copying all remote data to a central location, companies may then use their central backup software to back up the replicated data rather than the data on the source server. Using replication software in this manner also allows for faster recoveries, but most replication and backup software products aren't yet integrated with one another. This may become problematic when users at remote sites need to recover older copies. If the remote users no longer have access to the backup software, they'll need to rely on someone else to locate and restore their data at the central site.
Another way replication software may allow users to lower costs is by facilitating data migrations during server or storage technology refreshes. Data migrations are usually viewed as one-time data movements, while data replication is considered an ongoing process. All data replication products include a number of options to help users create the initial replica of data at the remote site (see "Comparing backup, continuous data protection and replication software"). This is important when users need to migrate large amounts of data to a remote site, but lack server/network resources to complete the migration in a timely manner. The integration of replication, CDP and backup software lets organizations achieve new levels of near real-time local and remote data recovery. Host-level replication software (see "Pros and cons of different replication architectures") is a good option for organizations to lower data protection costs and better protect the company's data.
About the author:Jerome M. Wendt ([email protected]) is lead analyst and president of DCIG Inc.