How organisations can right-size their data footprint

More data is not always better, says Gartner, which is calling for organisations to focus on metadata and synthetic data to reduce their data liability and address privacy challenges

Aaron Tan, Informa TechTarget

Published: 08 Nov 2022 11:16

Rather than collect as much data as they can, organisations should focus on metadata and synthetic data to reduce their data liability and address privacy issues, according to data and analytics experts from Gartner.

During the keynote address at the Gartner Data and Analytics Summit in Sydney, Australia, Peter Krensky, director and analyst at the research firm’s business analytics and data science team, said that although organisations have been integrating volumes of big data, not all of that data is useful.

Krensky said that such “big data thinking” – that more data is always better – is outdated and counterproductive. He noted that the issue is being exacerbated with the cloud, which “gives us unlimited space to store everything we thought we wanted”.

Sally Parker, research director in Gartner’s chief data officer leadership team, said that being “data plumbers” who lay data pipelines from source to storage is not a good strategy either. She called for data concierges to provide guidance on data sources that provide the right insights.

And while data has been described as an asset, it can also be a liability because it costs money to mine and keep data, said Parker, who advised organisations to focus on data that drives business results.

This can be done in a few ways, said Krensky, starting with generating metadata, which organisations do not have enough of today. “Without metadata, we do not have meaningful data and we don’t know what we have, what it means and where it came from,” he added.

Krensky said that traditionally, it required a lot of effort to maintain metadata, but by applying machine learning techniques, organisations can transform it into active metadata.

“Active metadata continually detects and adjusts to the patterns in our data, and this is going to enable self-organising and self-optimising concepts such as the data fabric,” he said. “This is not only a more efficient data management environment, but it also drives usage.”

With a data fabric powered by active metadata, Krensky said data scientists, for example, would be able to identify data drifts due to changing customer behaviour and give them a nudge that it is time to refresh their predictive models.

It can also notify data engineers that certain use cases are generating new categories of data, he said. “And if the same data is being used by multiple people across the organisation, active metadata can tell us that they are probably making interrelated decisions.”

Organisations could also look for “small data” that could be more accurate, safer, cheaper and more accessible. Krensky said such data could be more insightful than the big data organisations tend to collect by habit.

How organisations can right-size their data footprint

More data is not always better, says Gartner, which is calling for organisations to focus on metadata and synthetic data to reduce their data liability and address privacy challenges

Read more about data analytics in APAC

Next Steps

Read more on Big data analytics

Rethinking identity in the age of AI impersonation

Synthetic data vs. real data for predictive analytics

Qlik expands cloud footprint to India

Clearly smart, SAS acquires Hazy: A wider vision for synthetic data