Partial backups are a success. . .aren't they?

We've been led to believe that a backup that completes with a partial success is a success. And in terms of the backup application it is. But in real terms it is not. A partial success is a backup which is in fact not 100% successful, meaning that not all of the files have been backed up.

Backups run. Some fail, some complete and some complete with a partial success.

But a partial success is a success, right?

Yes -- confirmed -- If we open the restore window for one of the clients in question, we can see all of the files we can restore for when the hot line rings. And the anxious person on the other end has the IT equivalent of the "I've just realised I've lost my wallet" feeling, because they just deleted a crucial file. Restore performed, and customer is happy as we've performed the IT equivalent of finding that user's wallet containing all of their cards and bank notes.

For an average 3000 daily backup jobs, 720 complete with a partial success, equating to 24% of the total figure. The working day is action-packed with tasks such as user restores, as well as scrambling under office desks and data centres searching for scratch tapes.

If the tape library had a low fuel light (such as those found in cars), the light would have been glowing for days, library running on empty and unable to last the night without a stop at the tape service station. Naturally, the tape hunt takes over investigation into partial backups, but then partial backups don't need investigation as when viewed through the log files, there's mention of .tmp .ini .dat .log files which have not been backed up.

As the number of partial success backups is 720 per day, at the rate of one per minute, it's going to take 12 hours to work through all of the night's partial successes. That's a luxury we don't have.
Hywel Matthews
senior consultantGlasshouse Technologies UK
From previous conversations with the server administrator gurus, these files would not be recovered in the event of a full restore, so in effect a partial backup is as good as a complete backup success. .mdf and .ldf files are not required as they are live database files, which DBAs have confirmed would not be restored; they would request restores of database dump to disk files.

As the number of partial success backups is 720 per day, at the rate of one per minute, it's going to take 12 hours to work through all of the night's partial successes. Now that's a luxury which we don't have, therefore after the tape hunt has been fruitful, it's time to go home.

Another day begins with an extra shot in the coffee, then the hot line phone bursts into life and the user needs a database restore from a couple of days ago. The database in question is dumped to disk each night; DBAs have scripted the dump to run at a defined time (a standard practice in the backup world) in due course for the file system backup. Then the database dump is deleted during the following day in time for the subsequent dump and backup.

We open the restore window. But for some reason, the file requested by the DBA doesn't exist. We're now suffering from the "I've lost my wallet" feeling, as the dump files have not been backed up. A check of the settings indicates that the backups are configured to run, but then the bombshell drops. The backups of the dumps to disk have been partially completing; there are so many partial backups per night; the genuine partial backups have been missed and not investigated, nor backups re-run. This is probably the tip of the iceberg; other key files are probably not being backed up either. . .for whatever reason.

Further investigation indicates the reason for such issues. Why? Well, the dump to disk runs during the backup window, therefore the backup attempts to back up the usual file system plus the dump to disk file, but the dump to disk file is in use, causing the partial success, along with other .mdf .ldf .dat .log files.

One more messy encounter with the DBA manager and the CTO will be in the offering….

About the author: Hywel Matthews is a senior consultant at Glasshouse Technologies (UK), a global provider of IT infrastructure services, with over with 12 years experience in the IT Industry and 9 years experience in backup, recovery, disaster recovery (DR), systems and storage.

Read more on Data protection, backup and archiving