What on earth is data lineage?
This is a guest blogpost by Steve Neat, general manager, EMEA, Solidatus
Perhaps you’ve heard someone mention “data lineage” in a meeting. You might have nodded along while secretly wondering what on earth they were talking about.
The reality is that the concept of data lineage has been around for decades, but it really started to gain traction in the late 1990s and early 2000s when businesses started drowning in data. As companies moved from paper records to databases, spreadsheets, and enterprise software, understanding where data came from, how it changed as it travelled through an organisation, and where it was going was becoming a major challenge.
The term “data lineage” itself started popping up in industry discussions as organizations sought better ways to track and validate their data. Regulations like Sarbanes-Oxley (2002) and later the Basel Committee on Banking Supervision’s BCBS 239 (2013) put even more focus on data transparency.
Why data lineage matters
Whilst data lineage can provide organisations with the ability to plan and manage business transformation, understand impact analysis and track data movement throughout their business, one of the biggest early drivers of adoption for data lineage has been regulation.
This is a tool for highly regulated technically complex industries – mainly adopted by banks because they have tech stacks that move between the cloud, mainframes and so on, so need to monitor how data moves through these systems and changes to be compliant with regulations like the European Union’s DORA (the Data Operational Resilience Act) and BCBS 239. However, other regulated industries such as healthcare must also prove that their data is accurate and traceable.
Regulators will examine an organisation’s data claims and assumptions and demand proof that the data they’re being shown is accurate. Organisations need tools to do this easily. Some banks have a hundred or more people on staff just managing various spreadsheets, but it is slow and error-prone.
In addition to regulation, data lineage can also help with business intelligence and decision-making. If a company is basing million-dollar decisions on data, it better be correct. Risk management is another discipline that benefits from accurate data lineage. Poor data quality can lead to financial losses, legal troubles, and reputational damage. Data lineage minimizes these risks.
The challenges of implementing data lineage
Implementing data lineage comes with significant challenges. One of the biggest obstacles is dealing with legacy systems. Many organisations continue to rely on outdated technologies that were never designed with data lineage in mind, making it difficult to track data movement and transformations effectively.
Another challenge is the presence of data silos. Different departments often store data in separate systems, creating barriers to organisation-wide visibility making it harder to establish a clear lineage. Without integration, tracking data flow across the enterprise becomes a complex and time-consuming task.
Additionally, some businesses still rely on manual processes, such as spreadsheets and documentation, to track data lineage. This approach is not only slow and inefficient but also highly prone to errors, making it an unsustainable long-term solution.
Finally, a lack of expertise further complicates implementation. Understanding and managing data lineage requires specialized knowledge, yet many companies lack in-house professionals with the necessary skills. Without the right expertise, even the best data lineage strategies can struggle to get off the ground.
Key Industries that depend on data lineage
Data lineage is especially critical in financial services, where institutions must track every piece of data for risk reporting, fraud detection, and regulatory compliance. Every bank is wrestling with these regulatory requirements, some have even appointed a head of BCBS 239.
Without a clear understanding of data flow, errors can lead to costly regulatory penalties or security vulnerabilities. For example, HSBC leveraged data lineage to overhaul its Wholesale Credit Lending system, mapping its entire data ecosystem to streamline loan approvals. As a result, approval times dropped from months to minutes, significantly improving efficiency and reducing costs. In an industry where trust and precision are paramount, robust data lineage is essential for maintaining compliance and operational integrity.
AI and machine learning also depend heavily on data lineage, as the quality of AI-driven insights hinges on the accuracy of the underlying data. AI is a black box so monitoring how data interacts with these systems is critical. Without clear lineage, organizations risk training models on flawed or biased datasets, leading to unreliable predictions and poor decision-making.
As AI adoption continues to grow across industries, including healthcare, retail, and e-commerce, the need for robust data tracking becomes even more pressing. Whether ensuring the accuracy of medical diagnoses, optimizing supply chains, or enhancing customer experiences, strong data lineage practices help businesses build trust and improve outcomes.
As businesses become more data-driven and regulations become stricter, understanding data lineage isn’t just important, it’s essential. As businesses become more data-driven and regulations become stricter, understanding data lineage isn’t just important, it’s essential. If an organisation in a highly regulated industry isn’t paying attention to data lineage yet, it’s time to start. Because in today’s data-driven world, knowing where your data comes from and how it’s used could be the key to success, or failure.