While the lines of distinction among data protection
technologies such as backup, continuous data protection and
replication have blurred, host-based replication can play a key
role in your overall data protection strategy.The lines between data replication and data protection software
products are becoming blurred as storage managers wrestle with the
familiar trinity of data protection: centralized backup, disaster
recovery (DR) and
business continuity. With some industry
observers predicting a convergence of backup,
continuous data protection (CDP) and data
replication software in the next few years, choosing a
host-level (server-based) data replication product has become
much more complicated.
The similarities that host-level replication software products
have with backup and CDP products contribute to the difficulty in
understanding where one ends and the other begins. For instance, CA
XOsoft's WANSync replication software suite includes options that
could allow it to fit into all three of these categories. Its
WANSync high-availability option (WANSyncHA) provides for automatic
failover and DR for Exchange, SQL Server, Oracle and file-serving
environments. Its Enterprise Rewinder option includes CDP
functionality that provides point-in-time recoveries and lets users
consolidate remote-office backup, while its WANSync Assured
Recovery feature lets users perform DR testing on standby servers
without disrupting their production servers.
Check out this chart that compares
CDP, data protection, and
replication software.
Yet there are caveats related to using host-level replication
software instead of backup or CDP software. Not all host-level
replication software products catalog or track the copied versions
of replicated data. This necessitates the continued use of backup
software at the central site to maintain older copies of the
replicated data. Vendors' definitions and deployments of CDP also
differ. Steve Duplessie, founder and senior analyst at Milford,
MA-based Enterprise Strategy Group, refers to the features of
products like EMC Corp.'s RepliStor as "kCDP" or "kinda CDP"
because while they let users create multiple point-in-time data
snapshots, they differ from true CDP functionality where each write
I/O is journaled and users can recover data from any specified
point in time.
Host-level replication software is available in three distinct
architectures: Windows-only file system, multi-OS file system and
multi-OS block-level products. Each alternative offers specific
features that make it a better fit for some types of data
protection. Key features like real-time failover, bidirectional
replication, and tools for sizing bandwidth and data compression
make some host-level replication software products better equipped
for enterprise deployments.
Replication for DR
A key way users deploy host-level replication software is for DR.
Double-Take Software Inc. estimates that approximately 75% of its
customers use its replication software for DR or remote-application
availability. However, different host-level replication software
products, such as Symantec Corp.'s Veritas Replication Exec and
Veritas Volume Replicator (VVR), satisfy different data protection
requirements. Veritas Replication Exec is a Windows-only product
that operates at the file-system level and allows administrators to
configure one central DR server to receive replicated data from
multiple servers at different sites. Like most Windows products,
Veritas Replication Exec supports only asynchronous replication and
some data may be lost if you need to switch over to the DR
server.
VVR is a block-based product that works with a variety of OSes
and permits real-time synchronization of data from source and
target servers. It has its own set of restrictions. For instance,
admins must have Veritas Storage Foundation installed on both the
source and target servers, and convert the volumes to
Veritas-managed volumes. Second, all changes to data on the volumes
selected for replication must be sent. Finally, the servers must be
within 80km of one another for the app on the source server to
proceed without waiting on write confirmations from the target
server.
Bunker Replication, a feature that uses a combination of
synchronous and asynchronous replication within Symantec's VVR 4.0,
lets you exceed the 80km limitation. It synchronously replicates
data to the bunker site that's within 80km of the primary site
while asynchronously replicating data to the server at the more
distant DR site. The trick to this approach is that the bunker site
houses only the data differential between the primary and DR sites.
In this way, if the primary site fails, the bunker site
automatically updates the DR site with the data that's normally
lost using only asynchronous replication.
The hybrid real-time data replication feature of BakBone
Software Inc.'s NetVault:Replicator offers near real-time
availability for apps. But it commits writes to the local server
without a guarantee that the write was committed to the remote
server, asynchronously transmitting the data after the write is
complete. Though this technique minimizes app interruption, it's
not guaranteed that data on the remote server is consistent with
data on the host server and in a recoverable form.
Check out the pros and cons of
different replication
architectures.
Two-way replication
Bidirectional replication--when servers at both sites send and
receive data--is another key consideration when selecting a
host-level replication software product. With bidirectional
replication, a company can run different apps on the servers at
each site, but also use those same servers to protect data from the
other site.
File-system and block-based products give admins different
options for handling replication between the target and source
server. For instance, BakBone's file-system-based
NetVault:Replicator includes a global bidirectional parameter,
which lets admins select specific directories on the source server
they want to include or exclude for replication to the target
server and vice versa. This prevents any changes to files on the
target server from being replicated to the source server.
NetVault:Replicator also includes an option that lets an admin
configure all files in all selected directories on both the source
and target servers to stay in sync regardless of which server a
file change occurred on.
One of the key benefits that file-system products offer over
block-based products for bidirectional replication is that they
require less storage. Block-based products, such as Softek Storage
Solutions Corp.'s Replicator, require dedicated volumes of at least
the same size on both the source and target servers. If the source
server has a 100GB volume and the target server has a 500GB volume,
the source server then needs a 500GB volume to receive replicated
data from the target server; the target server would need a 100GB
volume for the source server's data for bidirectional replication
to work, regardless of what the actual utilization is on the
volumes being replicated.
Replication sizing
Companies that need to replicate data between sites must determine
the right amount of storage to purchase for the remote site and how
much bandwidth is required between sites. Replication software
products include a variety of tools that tell how much and how
often data changes, and when it changes; based on those statistics,
the software recommends ideal levels of storage and bandwidth.
Double-Take Software's Throughput Diagnostics Utility simulates
the replication set and makes recommendations on how to configure
the replication connection. Operating in this mode, Double-Take
doesn't replicate data to the target server, but instead tracks all
I/O operations on the source server and estimates the amount of
bandwidth needed.
Double-Take also offers low, medium or high data compression
levels. The appropriate level of compression for each server is
determined by establishing what additional amount of overhead the
server can support. Because each higher level of compression adds
about 5% more overhead to the server, a server running replication
and the highest level of compression could experience CPU overhead
rates as high as 15% to 20%.
Softek's Replicator can be set to run in tracking or
pass-through mode. In this mode, Replicator is installed on the
source server, but simulates only data replication while gathering
the necessary statistics to make a determination of how long
replication will take. One of the parameters that Replicator
monitors is the packet chunk size, which is the size of the buffer
that gets sent by the source server to the target server. Softek
finds that smaller packets work best when replicating data over
long distances. By tuning the source server to send smaller
packets, it takes less time for the source server to resend a
packet in the event the packet is dropped.
Symantec's Veritas Volume Replicator Advisor (VR-Advisor) uses
one of the following methods to collect data for analysis and then
makes recommendations:
- VRAdvisor can be installed on the source server.
- If Veritas Volume Manager is installed on the server,
administrators may use Veritas Volume Manager's vxstat
command to collect data.
- Administrators may also collect the data using native OS
command scripts provided with VRAdvisor. In AIX environments,
VRAdvisor uses the lmstat command; on HP-UX and Linux
servers it uses sar; and on Sun Solaris OSes it uses
iostat.
VVR uses the Storage Replicator Log (SRL) to buffer writes before
replicating them. The size of the SRL will vary according to
whether VVR replicates data synchronously or asynchronously between
the servers. VRAdvisor also lets users create what-if scenarios so
they can simulate app growth by varying the parameters and then
recalculating the results.
Lowering replication costs
Optimizing the target site's storage and bandwidth requirements
isn't the only way replication software allows organizations to
lower operational costs. Replication software lets organizations
use virtual OSes such as VMware or Microsoft Corp.'s Virtual Server
in conjunction with their replication software. One large server
running a virtual OS can host multiple guest OSes, with each guest
OS acting as a target for each of the remote sites. Though each
guest OS still requires a licensed copy of the replication
software, most replication software vendors discount their
replication software if it's deployed on a server supporting a
virtual OS with multiple guest OSes.
Some organizations choose to cut costs by using replication
software in lieu of backup software at their remote sites. With
replication software copying all remote data to a central location,
companies may then use their central backup software to back up the
replicated data rather than the data on the source server. Using
replication software in this manner also allows for faster
recoveries, but most replication and backup software products
aren't yet integrated with one another. This may become problematic
when users at remote sites need to recover older copies. If the
remote users no longer have access to the backup software, they'll
need to rely on someone else to locate and restore their data at
the central site.
Another way replication software may allow users to lower costs
is by facilitating data migrations during server or storage
technology refreshes. Data migrations are usually viewed as
one-time data movements, while data replication is considered an
ongoing process. All data replication products include a number of
options to help users create the initial replica of data at the
remote site (see "Comparing backup, continuous data protection and
replication software"). This is important when users need to
migrate large amounts of data to a remote site, but lack
server/network resources to complete the migration in a timely
manner. The integration of replication, CDP and backup software
lets organizations achieve new levels of near real-time local and
remote data recovery. Host-level replication software (see "Pros
and cons of different replication architectures") is a good option
for organizations to lower data protection costs and better protect
the company's data.
About the author:Jerome M. Wendt
(Jerome.wendt@att.net) is lead analyst and president of DCIG
Inc.