This is a guest blogpost by John Wills, Field CTO, Alation
Arsenal vs Spurs. Federer vs Nadal. Senna vs Prost. Frost vs Nixon. There have been a lot of great rivalries over the years, and now, arguably the greatest the world has ever witnessed: Data Fabric vs Data Mesh. I’ll save you the pay-per-view fee and give you a front-row seat.
In one corner we have, Data Fabric, something Gartner calls the Future of Data Management. Thoughtworks, on the other, contends that Data Mesh is key to moving beyond a monolithic data lake. Which one is right? Who wins? Spoiler: they are independent concepts that are, in fact, entirely complementary.
To date, data fabric has taken most of the limelight, focusing on the technologies needed to support metadata-driven use cases across hybrid and multi-cloud environments.
Data mesh takes a more people-and process-centric view, forgoing technology edicts and arguing for “decentralised data ownership” and the need to treat “data as a product”. This approach overcomes the bottlenecks and disconnects that are typical of data lake and data warehouse environments. Disconnects that arise as data engineers play middlemen between data producers and consumers.
But what do these two terms actually mean, and why do we need them?
What is data fabric? Well, it depends on who you ask…
There are vendors out there that will have you believe their product is an example of a data fabric – some even have ‘Data Fabric’ in their product name.
All of this is no doubt well-intentioned, but it does confuse the market. Gartner’s view is that there is no single vendor that addresses the complete set of needs required to build a data fabric – at least not today.
They define data fabric as “a design concept that serves as an integrated layer (fabric) of data and connecting processes. A data fabric utilises continuous analytics over existing, discoverable and inferenced metadata assets to support the design, deployment and utilisation of integrated and reusable data across all environments, including hybrid and multi-cloud platforms.”
Okay, time to dig in: Gartner says a data fabric is a design concept. In other words, a data fabric is not a single thing or product. It is instead composable, made up of a set of integrated technologies that accelerate value from enterprise metadata. Gartner also acknowledges that data is sitting everywhere today in hybrid and multi-cloud environments (which, at this point, should go without saying.)
Design concept. Metadata. Hybrid and multi-cloud. These are the important terms in Gartner’s definition, but why do we need data fabric in the first place?
Metadata is the key to fueling data intelligence use cases across the board, including data search & discovery and data governance. But accessing and making sense of metadata is extremely challenging in today’s environment. A big reason is that metadata is everywhere. It’s in all types of data management systems, from databases to ERP tools, to data integration software. And metadata could be sitting in many different locations, including on-premises, in the cloud, and everywhere in between.
Humans are hard-pressed to find relevant metadata, let alone make sense of it, and data fabric is the answer to this problem. By using technologies to automate the discovery and continuous analysis and reuse of metadata, organisations can overcome the challenges associated with its proliferation and reduce the error-prone manual efforts that go with making sense of it.
So, what about data mesh?
Zhamak Dehghani of Thoughtworks is credited with having conceived of data mesh in a blog post back in May 2019. Subsequent posts have clarified the architectural aspects of data mesh, but all remain true to the founding vision and approach first introduced in 2019. Vendors have now started putting their own spin on data mesh, which will no doubt introduce some confusion. Yet these vendors universally cite the work of Dehghani as the basis for their “take” on data mesh.
Its origin is clear, but a clear definition is harder to come by. Fortunately, we are given exactly what we need in this blog from Arif Wider, also at Thoughtworks:
“The data mesh paradigm is a strong candidate to supersede the data lake as the dominant architectural pattern in data and analytics. Importantly, the data mesh mainly introduces a new organisational perspective and is independent of specific technologies. Its key idea is to apply domain-driven design and product thinking to the challenges in the data and analytics space. Comparable to the introduction of a DevOps culture, establishing a data mesh culture is about connecting people, creating empathy, and about creating a structure of federated responsibilities.”
Here, Wider calls for a new architectural approach, one that will supersede the data lake. But why? Much has been written about how data lakes have failed us all. How they’ve turned into data swamps due to lack of organisation, governance, and accessibility. For Wider, the underlying issue with data lakes is straightforward and can be captured in one word: centralisation.
A central team is responsible for maintaining the central infrastructure (aka data lake). This team is usually disconnected from the needs of data consumers and lacks the domain expertise of data producers. Yet here they are, forced to play middlemen between consumers and producers because the prevailing data lake architecture forces the teams to be organised this way. The end result is a team that doesn’t scale, and data being served up to consumers which may or may not meet their quality needs.
Data mesh inverts this model with domain-driven design and product thinking. Responsibilities are distributed to the people who are closest to the data. These product owners are responsible for delivering data as a product and, as such, they are accountable for objective measures. In other words, data mesh is all about people, calling for a shift in responsibilities to ensure high-quality data is put in the hands of data consumers faster and more efficiently.
So, there you have it. Despite the hype, data mesh and data fabric are complementary rather than rivals. What is indisputable is that both are having their “moment” and will more than likely continue to do so into 2022 and beyond.