This is a guest blogpost by James Corcoran, SVP Customer Value, KX
Generating tangible business value from data in as short a time frame as possible is becoming a strategic priority for businesses in every industry sector.
Real-time data is the driving force behind this new imperative, with numerous studies showing how business decisions based on real-time insights can increase revenues, improve customer satisfaction, drive operational efficiency and enhance profitability.
As more and more companies recognize the strategic importance of real-time data, the suitability of existing data architectures to support real-time analytics is being put under the microscope.
In most organisations, data resides in a multitude of formats, applications, and locations. Each dataset can hold critical insights that can be unlocked by bringing that data together in a database to run queries. To achieve this, organisations typically employ a ‘big data’ approach to data management and analytics. This trend started with data warehousing more than 20 years ago and more recently has evolved to data lakes, where structured and unstructured data can be stored at any scale in a central repository.
However, many companies are now finding that rather than making data analysis faster it actually adds complexity, significant latency, and cost to the process of querying enterprise data and returning the analysis.
Historically, data management has fallen under the ownership of the IT department and employees with the technical skills needed to run queries and analyses were rarely found outside of the IT team. That world no longer exists. Today a flexible and intelligent data infrastructure is needed to facilitate self-serve analytics, allowing business users right across an enterprise to access the insights found within the data and act on them in real-time.
To achieve this, most of the leading analyst firms recommend implementing a tiered data management fabric (what Gartner calls a Unified Data Delivery platform).
By implementing a data management fabric, organisations can make their data available for analysis – both streaming and historical – in real-time, for maximum business value. It also reduces the need for costly integration projects as data remains in the location where it is most efficient for it to be housed (i.e. Oracle, S3, etc).
While there are a number of key components that should be present in a data management fabric – such as being able to support data delivery latencies (both batch and streaming) and being able to run anywhere, whether on-prem, hybrid or public cloud, there are two fundamental requirements.
Firstly, storage and compute must be separated i.e. processing is taken to where the data resides. Secondly, any infrastructure design must be able to access both relational and non-relational data (from Hadoop, cloud object stores, file stores, NoSQL stores, etc) via push-down processing and interface with these externalized platforms via standard languages and tooling.
This drive for standard languages and tooling results, in effect, with a distributed SQL engine sitting on top of a virtualised data layer, where queries are sent to the underlying databases, with the results then being aggregated and sent back to the user. By pushing workloads down to where the data resides at source, the heavy lifting is done at the optimum location. Without distributed SQL, too much complexity lands on the shoulders of the end-user as they need to have full knowledge of the underlying structure and location of the different datasets. In this data fabric model, data is more easily accessed and queried on a self-serve basis by business users across the enterprise.
The fast and the curious
Data volumes are only going to increase, data landscapes are only going to get more complex. The need for organizations to extract the value from their data in-the-moment will move from a nice to have to a critical need.
It’s important to say that at KX, we don’t believe a data management fabric requires existing big data architectures to be replaced. We advocate adding a fast-data layer using an analytics and management engine that can federate queries across existing storage repositories, whatever and wherever they reside.
However, without an approach that incorporates distributed SQL and some of the other key elements described above, businesses risk falling behind competitors who can respond to operational challenges and business opportunities faster, using insights and intelligence gained from in-the-moment analysis of real-time and historic data wherever that data resides.