To manage your backup load, identify your data

In an ever more pressurised IT world, where data grows exponentially year on year, do we really know what data we are securing every day, week or month? Even more pertinent, do we know what data we are retaining?

Knowledge about the data they are backing up is a problem for a vast number of businesses I have been involved with over the past year. The storage team is trying to back up more data while they have additional pressures to perform. Add to the mix compliance and unclear data retention policies, and the world of the storage team becomes very fraught.

What can be done to alleviate the load in terms of backup? We could all throw more technology at the problem, to try and solve the backup window issue. Some examples:

  • Point-in-time copies/snapshots: Disk vendors provide point-in-time copy capability, which can be used to secure that data before streaming it down to tape from a mount server.
  • De-duplication technology: This effectively reduces the amount of data you back up, with RPO benefits, cost and potential footprint savings.

These two examples involve investment costs in terms of hardware/licensing purchases and training, as well as deployment costs for the storage team and potentially the application/server platform support teams. The more pragmatic course of action is to identify what you are backing up on a daily, weekly and monthly basis, to understand if you can remove unnecessary data from the backup schedule.

Identify what you are backing up on a daily, weekly and monthly basis, to see if you can remove unnecessary data from the backup schedule.

I visited a customer recently to investigate their poorly performing backup infrastructure, in whch a number of servers were failing to meet backup windows and data was not being secured in a timely manner. One of the issues causing this was the content of the backups. Whilst there were defined file types that should not have been backed up, these policies were not being enforced or policed. Other issues included DBAs dumping full copies of one or many databases onto disk, which were also captured as part of the backup process.

Data retention periods are another area of concern. We work in a time where compliance is king, and must now retain more data for longer periods of time. It is key to keep track of how much data is being retained, especially where the likes of RMAN backups are concerned, as many of the major backup products do not manage RMAN backups within their own backup retention policies. In the case of RMAN backups, it is the DBA's responsibility (along with the backup administrator) to ensure the data is expired in a timely manner.

It is not uncommon to see database nodes with extremely high occupancy of data in comparison to the actual database size. I have witnessed cases where a 500GB database has an occupancy of 50TB. Situations like this need to be addressed, for if left unmanaged, such a case will require the purchase of additional media on a regular basis and then expansion of the tape library to cope with the inorganic growth. All of this comes at a cost and has to be justified.

Remember the diverse effect this will have on server backup windows, and the consumption of precious capacity and bandwidth within the backup infrastructure.

There are other scenarios similar to those mentioned above but the key is avoiding these issues. How do we achieve that?

This is not only the responsibility of backup administrators -- it is a job for everyone who has a stake in the environment. This includes service management, system administrators and application/database administrators.

The best way to achieve these goals is to clearly define backup policies with the business (SLAs). The SLA is then managed by both the storage team (at a technology level) and by service management, through strong processes and relationships with application owners.

These are the cornerstones for any efficient backup environment. With strong policies in place, your backup environment should become easier to manage and cost savings in terms of media, tape library costs and staff overtime costs can then be realised.

About the author: Spencer Huckstepp is a technical consultant at Glasshouse Technologies (UK), a global provider of IT infrastructure services, with 11 years IT industry experience, eight of which have been in the enterprise storage arena. His role includes involvement with various virtualisation, storage and backup strategy engagements.

Next Steps

Microsoft 365 retention policy vs. backup: Why you need both

Read more on Data protection, backup and archiving