Key backup and recovery considerations for big data storage

A streamlined data backup and recovery approach is important for Indian organizations planning to address big data storage. Here are a few aspects.

With recent developments in data analytics, data is becoming more and more mission critical to the Indian enterprise. The growing importance of information analysis means that Indian organizations clearly need a better approach to data backup, security and recovery capabilities.

For big data storage, the need of the hour is a simple, integrated approach with fast backup and recovery that spans structured and unstructured data. Storage administrators, in collaboration with database administrators (DBAs), and network administrator groups should draw up new backup, restoration, archiving and disaster recovery strategies. Here are some key considerations:

1 Analyze database drift and segregate hot or cold data.

There is a need to track database drift, which may get out of focus in many an Indian business. Classify data as hot (regular use) or cold data (infrequent use), from big data storage. The key is to differentiate between active and passive records and intelligently place data at appropriate locations. This could be done by segmenting large tables using partition technology, based on frequency of data accessed, thus allowing parallel backup and recovery operations.

2 Keep snapshot copies of data, offloading production resources for backup operations and explore NDMP for server-less backups.

 Applications and storage-aware snapshots reduce backup windows significantly, while offloading server and network resources. Fast resynchronization snapshot “breaks off” a tertiary mirror and mounts it on a backup server or storage media server. This results in significant performance gain as the original owner-host is bypassed and does not get affected by backup operations. A “bitmap dirty region” is utilized to record the intent to update the mirror with data. This can be used to track which blocks need to be resynchronized in the event of mirror failure.

Other snapshot techniques – ‘copy-on-write’, ‘cache’ and ‘apply-deferred’ are based on whenever a snapshot is taken for backup operations, with new data written, rather than being rewritten, on the original data source in big data storage.

3 Implement data reduction strategy using embedded global deduplication and compression technologies.

Deduplication technologies eliminate redundant data, increasing network efficiency and reducing backup, recovery and storage provisioning. This could reduce 40% of data to be backed-up and moved across the enterprise networks. Deduplication replaces redundant data, breaking the files into segments and storing a single copy of each unique file segment. For big data storage, integrate compression technologies into backup and recovery strategies to encode data to fewer bits using specific encoding schemes.

4 Structure storage tiers to meet different retention and recovery needs.

Store critical snapshots near the original data for quick granular restore, even as you move older backups to less costly storage tiers. Business value data, data required for compliance and a small subset of data for legal discovery should be appropriately positioned on storage tiers.


  • Highly available, onsite.
  • High rotations per minute (RPM), low capacity disk-based.
  • Fiber channel.
  • Encrypted.


  • Highly available.
  • Lower RPM, high capacity.
  • SATA.
  • Snapshots, integrated virtual tape libraries.


  • Tape-based, offsite.
  • High capacity, slower performing.
  • Long-term historical preservation, archiving.

5 Rely more on automation, role-based security and data governance.

Secure data, both ‘in-flight’ and ‘at rest’, with good data protection strategies and appropriate use of encryption. Intuitive reporting and smart alerts help in deeper understanding of backup and recovery environments. Minimize manual activities through policy-based approaches and centralized administration.

Many big data storage backup systems have an “all-or-nothing” approach to administrative authorization. This means that someone can do everything or nothing at all within the backup system. Instead, role-based access allows an appropriate set of privileges that are limited to the role.

6 Implement recovery management tools that enable granular recovery of files from any storage tier, improving recovery time objective (RTO) and recovery point objective (RPO) SLAs.

Recovery strategy for big data storage should be capable of implementation and integration with operations so that required service recovery objectives (RPO as well as RTO) are met. For instance, if an RPO SLA defines no less than two hours of data to be lost in an outage, but backups take six hours to complete, then it is obvious that backups alone cannot meet this SLA.

The styles of protection that may need to be added to an environment with minimum data loss RPO requirements may include:

  • Snapshots taken periodically through the day.
  • Asynchronous replication between sites.
  • Synchronous replication between sites.


About the author: Vijay Veerapaneni is principal consultant with the infrastructure managed services group at Syntel. He has over 14 years experience in infrastructure management across various technologies. He holds a bachelor’s degree in engineering and has earned multiple industry accreditations including ITIL and Oracle Certified Professional.

Read more on Disaster recovery