Data democratization in the age of big data: why data lakes won't work

This is a guest blogpost by Ravi Shankar, chief marketing officer at Denodo


I predict that 2016 is going to be the year that data is finally liberated. Data, at long last, is going to be democratized, just as CRM applications were democratized by Salesforce in the 2000s.

Before Salesforce, CRM applications were installed on-premise by IT, and business users were dependent on IT to make any changes. This created a natural bottleneck, not through any fault of IT, but because business users had no direct control over their own data.

Salesforce relieved this bottleneck by running everything in the cloud. Now, on-premise CRM is on the decline, whereas Salesforce CRM is dominant. Salesforce truly democratized CRM for direct consumption by business users.

This year, it will be data’s turn to be democratized, and be made available to anyone throughout the enterprise as long as they have the proper access privileges, of course. This means that data must be freed from its silos. Today, it resides in a variety of independent business functions, such as HR, manufacturing, supply chain logistics, sales order management and marketing. To get a unified view of this data, businesses are engaging in a variety of ad-hoc, highly labor-intensive processes.   

Data lakes won’t solve the problem

Through distributed processing, Big Data implementations can crunch extremely large volumes of data quickly and inexpensively. And because of this power, many Big Data vendors advocate dumping all of an organization’s data into a single giant repository called a “data lake.” Unlike a data store, a data lake is a raw collection of data, and users would only worry about the format at the time of access.

However, there’s the rub. Data is always structured by the applications that access it. So businesses would still have to develop ad-hoc pathways to the data, for each application. It would be like moving all of your household items to the garage, in the hopes that they’ll be easier to find. This rarely works! Each time you try to find something, chances are you’ll need to make a new pathway through that garage.  

Unless you replace all of your separate applications by a single one that handles all functions, the data can’t reside in one place. Data lakes alone won’t democratize the data.

The only way to liberate the data is to leave it exactly where it is; in its existing, separate applications, but provide a layer of intelligence above the disparate sources that can integrate the data without replicating it. To set the data free, this intelligent layer would need to provide business users with an up-to-date, unified view of all the data in an organization, no matter the source.

Data virtualization

Data virtualization can provide such a layer. This technology provides real-time data integration across virtually any source, from the tightly structured to the completely unstructured, and from the latest state-of-the-art database to the oldest legacy system.

By providing this level of access to data, a data virtualization layer enables companies to quickly solve business problems – such as improving compliance or customer service – all without the cost and headache of rebuilding the infrastructure. Autodesk, a multinational software corporation that develops software for the architecture, engineering, construction, manufacturing, media, and entertainment industries, leveraged a data virtualization platform to quickly change the way it does business (subscription pricing versus traditional-license-based pricing) without their financial department even feeling the impact.  

For Autodesk and many other companies, data virtualization has truly democratized the data, making it available to anyone with the permission to access it, enabling greater agility. Data virtualization has evolved, and 2016 will be remembered as the year it set data free.