Some people believe that data archival and backup are one and the same thing, but it is not so. Backup is a secondary copy of the primary data, whereas archiving refers to methods that stage the primary data itself to a cheaper storage media. Whether it is file data or email data, data is growing exponentially day by day, and will continue to grow like this in future. Compliance requirements such as the Sarbanes-Oxley Act require that an enterprise retains historical data for a certain period of time. The requirements under this Act are not just for data retention, but also for quick data retrieval.
In such regulatory requirements, you can't afford to wait for a week for your IT team to get you the right records from the historical data. Such scenarios are common in case your company faces legal action and urgently needs the data to defend and clear itself. Any delay in getting the right data can lead to penalties, necessitating the need for data archival. Hence, you should consider the following points while evaluating and deploying a file and email data archiving solution.
- The first step when devising a data archival strategy is to sit down with your senior management, representatives of various departments, and legal representatives. This step is critical to decide which data you need to retain, for what duration, and defining your future legal, compliance as well as discovery requirements.
- Unstructured data (such as user email) which has not been modified for years occupies a significant part of the primary storage. For the data archival implementation to be successful, a decision should be taken regarding the retention policies for this data.
- After the categorization of data is complete, a data archival solution should be chosen from the wide range available. The most important thing to look for in a solution for archival of data is that it should perform single instance storage, that is, a file should be stored once across the enterprise. You should understand how the data archival solution identifies the candidate for single storage. Does it look at the file by its name or by its contents? To give an example, there may be three users in an enterprise who all have the same file, but each employee has saved the file using his own naming conventions on the file server. So does the solution store that file once and make pointers to the file—or will it store every file separately? If a solution looks a file by its contents, and if there are 1,000 users each having the same file of 5 MB, that file will be stored only once and occupy only 5 MB on the server rather than occupying 5,000 MB.
- You can opt for a hardware data archiving solution or a software archiving solution based on your requirements.
- Several data archiving products use node-based architecture to take care of any hardware failure. Such solutions usually avoid single points of failure, and are useful for enterprises which look for the write-once-read-many option. These products take care of the information lifecycle management of data from their creation to deletion, moving data among the storage tiers according to the defined policies. However, these solutions are much costlier than software-based solutions.
- You can also look for software solutions that sit on your application servers and archive data. These can be your email archiving solutions or file server archiving solutions which migrates data from primary storage to secondary storage according to the policies. Software-based data archiving solutions use the host CPU cycles for the data archival process. As a result, they may affect the application processes.
To sum up, if you plan to go in for a data archival solution, first define archiving policies, decide what you need to archive, and for how long. Accordingly, you can opt for a solution after evaluating the technologies which various vendors employ for data archival.
About the author: Anuj Sharma is an EMC Certified and NetApp accredited professional. Sharma has experience in handling implementation projects related to SAN, NAS and BURA. He also has to his credit, several research papers published globally on SAN and BURA technologies.