conejota - Fotolia

Toxic data: What it is and how to find it and deal with it

In this podcast, Mathieu Gorge, CEO of Vigitrust, looks at the management of legacy data and how to find and deal with toxic data that could expose your organisation to compliance breaches

There are all kinds of data we have to keep to comply with legal and regulatory frameworks. But sometimes that data can become legacy data that must also be disposed within specified timeframes to ensure compliance. Failure to do so can lead to it becoming “toxic data”.

In this podcast, storage editor Antony Adshead talks with Vigitrust CEO Mathieu Gorge about managing legacy data and dealing with toxic data. Why is it important to manage legacy data, and in particular to look at toxic data?

Mathieu Gorge: Let’s try to define legacy data and toxic data to start with. Legacy data is data that you have, whether structured or unstructured, on your systems, but that you don’t necessarily use on a day-to-day basis.

Sometimes you need to keep data for regulation purposes, in particular financial data, data pertaining to employee records, and so on. You really need to keep it, but it very quickly becomes legacy data because it is not actually live data.

You may also be unaware of where that data resides. It could be on your servers in-house. It could be on decommissioned systems but systems that are still potentially within your ecosystem that you might want to plug in again at some stage, or on USB sticks, on CDs or floppy disks, depending on how old your organisation is.

The concept of toxic data is any data on your systems, whether live or legacy systems, that you don’t really need to conduct your business and that is potentially increasing your risk surface.

For instance, it could be credit card-holder data that you kept after the transaction went through, which would obviously be in breach of PCI and data protection regulations.

It could be protected health information that you needed at the time but don’t need any more. It could be contracts that have expired. It could be any type of documents, legal documents, that have reached the end of their life.

So, you might find toxic data within your legacy systems, within your live systems, but also potentially within cloud services that you either no longer use or think you don’t use.

The challenge here is that whether it’s for legacy data or for toxic data, it is still data that you’re managing and so it is still covered under current data protection regulation and also under the upcoming General Data Protection Regulations. So you need to ensure it is protected appropriately at all times. How can we manage the process of dealing with toxic data from a storage perspective?

Gorge: Once you have identified the toxic data and any other type of legacy data, you need to map where it is within your ecosystem.

The best way to do that is work with your IT and also the legal team and the compliance folks, because they will know what they’re trying to comply with and therefore the type of data to look for.

You may also wish to use data discovery tools to look for specific strings of data, whether financial data, credit card data, protected health information or any other type of structured or unstructured data.

Once you have done all that, the first thing is to dispose of any toxic data you don’t need and to do that securely, according to best practices, such as those published by NIST in the US. And only keep the data you need.

Read more about storage and compliance

  • Vigitrust CEO Mathieu Gorge surveys the key challenges of data growth, regulation and mobile and legacy data that affect legal and regulatory compliance in 2015 and 2016.
  • Cloud compliance is an issue for anyone using cloud storage or backup services. What do you need to know about your data, and how do you ensure it is compliant when in the cloud?

One very quick point of reference to understand the concept is to look at requirements 3.4 of PCI-DSS on the storage of credit card-holder data. It is actually a simple model that tells you when you can keep credit card-holder data, what type of credit card-holder data and how you need to keep it, encryption, access, and so on.

The idea would be for you to reproduce that for your own ecosystem.

What type of data do I really need? What type of data will I keep and, if so, how will I keep it?

The key rule is: if you don’t need it, don’t store it. Don’t let it be on your system because it immediately becomes what I was talking about, which is toxic data.

The key to successfully getting rid of, or at least managing, toxic data within your systems is to make this a continuous process.

So, continually map the ecosystem, continually run data discovery tools, continually dispose of data that you no longer need and make sure your staff are aware of what you are doing, so they don’t end up creating new data that might become toxic for the organisation.

Read more on Data protection regulations and compliance

Data Center
Data Management