The data holdings of organisations regularly extend to many terabytes of individual files that have accumulated over years of often unplanned or semi-planned activity. Companies frequently don't have data classification methods and/or policies in place that allow them to know what data they hold and where it's located. This can have a number of deleterious effects, such as:
- Finding files can become a difficult task, which can impact an organization's ability to gain the full benefits of the knowledge accumulated in its data.
- Legal ramifications can result from not being able to quickly find or produce documents for a court hearing.
- Security might be compromised if data isn't correctly matched to staff access profiles.
- You could be spending more money than required if data isn't matched to storage media of the appropriate cost per GB. Data classification allows low-priority data to be moved from high-performance storage systems to secondary media, for example.
The benefits of data classification
Data classification projects and products help businesses discover what data they hold, where it's located, who has access to it and how long it must be retained, as well as what level of protection it should receive and on what type of media it should be retained to meet legal and regulatory requirements. It can also enable a business to gain intelligence from data it would otherwise be ignorant of.
Data classification isn't necessarily a complex process, but it's one that should be well planned. Potentially large amounts of data have to be discovered and matched to a wide range of business, legal and compliance needs.
Know why you're classifying data
Business need and risk tolerance should be the drivers behind data classification methods and projects. It shouldn't be an IT-only project. The first question to ask should be "What do we want to achieve?" Reasons can include legal compliance, the ability to respond quickly to legal discovery requests, making data more accessible to users to aid business activity, restricting data to users to whom access is inappropriate, moving data to storage media that's most appropriate for its frequency of use and ease of access, and freeing up high-performance storage systems by offloading secondary data.
The best way to formulate answers to these questions is to initiate a discussion that involves representatives from those parts of the business that have a stake in corporate data – namely the board, IT, finance, the various business departments and company lawyers.
Once you understand why you're undertaking the project, you can create a data classification policy and begin the process. This can be a long and involved project, in large part because it's manual in nature. There are software products that can help with data classification and lifecycle management, but at the key early stages of the process no tool can tell you what's important to your business. You'll get those answers from discussions with staff from key areas of the organisation.
Experts recommend coming up with a set of classifications of data type and then tackling one or two of them. By dealing with one type of data and knowing what you want from it – retention time, storage media, access rights, etc. – you can begin to make some project wins and get some data classification policies in place.
Once an organisation has decided what it wants from the data classification exercise, tools can help locate and organise data on the basis of user-defined rules.
Many tools allow organisations to discover the data they hold and its location, and to then apply classification rules to this information. Metadata can be applied to files, making them more manageable and searchable, and allowing data to be migrated to the appropriate tier.
Metadata created in the classification process can enable searches based on values that go much further than standard attributes such as filename or creation date. Such information can aid in compliance or business intelligence searches.
Data classification products are often limited in their abilities. Today's tools can work well in their own niche, but no one product does everything. As mentioned earlier, no product can determine the value of specific data types to your business, so there's always a good deal of manual work involved.
Decide what you need from a tool and then look for the ones that fit those needs. You should be prepared to go through in-house testing to determine whether the data classification tool can do what you need it to do using the data in your environment.
This was first published in April 2009