Data engineering - Cloudera: Counting the cost of data integration tax
This is a guest post for the Computer Weekly Developer Network written by Wim Stoop, senior director of product marketing at Cloudera.
Styled as the ‘hybrid data company’, Cloudera is known for its data analytics, machine learning and data engineering platform built on open source technologies, including Apache Hadoop which works to help users analyse, store and secure large amounts of data.
The company says its data management expertise spans the entire data lifecycle, from ingestion to visualisation.
Stoop writes as follows…
While the concept of data engineering isn’t new, its importance has grown significantly in the past 18-24 months due to the AI-driven reliance on high-quality, well-governed data. Here, the need to contextualise large language models with business data, highlights the central role governance and trust must play.
However, a gap persists, with only a minority of organisations truly governing their data effectively. From a data engineering perspective, it is a two-fold challenge spanning both technology and people.
The data integration tax
There is no escaping the fact that data volumes are growing relentlessly and organisations face an increasingly complex regulatory landscape. What’s more, data is residing in multiple cloud and on-premises environments, with hybrid cloud infrastructure becoming the norm. This complexity makes it even harder to derive insights from data and manage it across the entire lifecycle – from creation and storage to analysis and beyond.
Traditionally, organisations have turned to point solutions to address challenges of scale and compliance. However, these tools often introduce additional costs and complexities. While they may appear to accelerate specific use cases and data engineering processes, offering a perception of quicker time to value, they frequently come with hidden costs tied to integration efforts.
Beyond the technical costs of integrating point solutions, organisations must also consider the additional cost of specialised training. Each tool typically requires unique skill sets, increasing operational overhead and diluting return on investment (ROI) in the medium to long term. This effectively imposes a ‘data integration tax’ on organisations.
Creating a modern data architecture
Traditional data integration approaches are increasingly unsustainable. Building a data fabric, which automates the discovery, integration and governance of data across systems, can help address many of these challenges. This hands-off approach allows organisations to get to grips with more data as it emerges while continuing to autonomously secure and govern it.

Cloudera’s Stoop: Organisations only use a small fraction of their data, they need a modern data architecture & internal resources for long-term success.
A data fabric enables self-service access to data – while it does not ensure automatic compliance with data privacy regulations, it does make achieving and demonstrating compliance a lot easier.
When integrated with a data lakehouse, this architecture can promote democratised data access by eliminating the need for redundant data copies, which reduces both complexity and cost.
Openness is also becoming a cornerstone of data strategies, with the likes of the Apache Iceberg table format gaining traction as the de facto standard for scalable, efficient and flexible data management. These open formats can ensure interoperability, prevent vendor lock-in and foster community-driven innovation.
Data engineering to data stewardship
Alongside the technological change, there is also a cultural one. Organisations must embrace decentralised data management models, with the goal of empowering business users to innovate independently while maintaining data security and compliance.
Implementing robust governance with clear standards for data formats and structures across the organisation is highly recommended. This approach helps minimise the time and effort needed to map and transform data while enhancing the consistency and quality of integrated data. By proactively planning governance, organisations can gain a comprehensive understanding of their data and ensure it is accessible in a secure and compliant manner to those who need it.
As a result, the role of the data engineer is increasingly evolving into that of a data steward – professionals who combine technical expertise with business acumen to contextualise data and extract its value. This shift also opens new opportunities for developers and data practitioners to assume more strategic roles in the future.
At a time when most organisations are only effectively utilising a small fraction of their data, it’s clear the focus should be on building a modern data architecture and developing internal resources that position them for long-term success.