PicDY - stock.adobe.com

Why AI is forcing enterprises to rethink observability

With only 5% of companies generating value from artificial intelligence, traditional observability tools might be failing, creating a need for AI to undergo smarter monitoring

Only 5% of companies are successfully generating value from artificial intelligence (AI), according to Boston Consulting Group, despite IT spending on the technology rising sharply. The remaining 95% are struggling to turn that investment into cost savings or revenue growth. It’s the kind of statistic we are getting to used to seeing from consultants and analysts, but what does it mean practically?

As so many companies embark on AI projects, a problem they are encountering is understanding how systems behave once they are live, and whether they are delivering the expected results. This raises familiar questions around complexity, legacy systems and project planning. But it also raises a question about observability, and whether the tools organisations rely on today are enough for an AI age?

Observability is meant to give organisations visibility into how their systems are running. By bringing together metrics, logs and traces, it allows teams to monitor performance, diagnose issues and understand how services behave once they are live. Like everything though, it is also subject to the intricacies and variances of underlying data infrastructures.

For Pejman Tabassomi, EMEA field CTO at Datadog, organisations often struggle to correlate operational data across multiple systems and environments, limiting their ability to understand how services behave end to end or how performance links to business outcomes. This, he says, becomes more pronounced with AI projects, where systems span more data sources, services and models, making behaviour harder to trace and explain.

Jarrod Vawdrey, field chief data scientist at Domino Data Lab, takes this further. “Traditional observability tools were built to answer a simple question: is the system up and running? When an AI system is making decisions or interacting with customers, ‘up and running’ doesn’t tell you much.”

And therein lies a problem. Systems can be technically healthy, yet still produce the wrong outputs or behave in ways that are difficult to detect through traditional monitoring tools. Organisations may be able to see that systems are running, but not whether they are working as intended.

Chicken and egg

So, what is it that businesses hope to achieve? According to McKinsey, business leaders are now moving on from “short-term resilience to sustained productivity and long-term impact”, but 86% say their organisations are not prepared to adopt AI in day-to-day operations. Why is that? Is this a visibility thing? Is it to do with upfront costs? Or perhaps something else?

Virgin Atlantic is already dealing with this in practice. The airline has deployed an AI concierge to support customers, but monitoring the system involves far more than tracking infrastructure performance. Engineers are evaluating how the system behaves, assessing responses for accuracy, tone and appropriateness, and feeding that data back into development, effectively reviewing each customer “turn” as part of an ongoing feedback loop. The challenge also extends beyond performance into areas such as security.

“You move away from maybe more traditional attack vectors, where you’re looking at things like injection attacks or exploiting vulnerabilities in systems, to more human, persuasive types of attack, where users are trying to manipulate the model through language,” says Mark O’Neill, senior manager for applied AI engineering at Virgin Atlantic.

That requires a different approach to testing and monitoring, where systems are continuously evaluated in production rather than simply checked for availability or performance. The challenge is not just conceptual, but one of scale. As AI systems generate increasing volumes of data, traditional monitoring approaches are struggling to keep up.

Jeff Champagne, field CTO at Cribl, describes the shift as a “telemetry tsunami” of metrics, logs and traces, driven by agentic systems operating at speeds far beyond human interaction. The focus, he says, is moving away from infrastructure health towards “logical integrity” whether systems are using the right data, producing accurate outputs and acting safely.

In many cases, the root cause of a problem is not the model itself, but the data pipelines and downstream systems it depends on, making it harder to diagnose issues without visibility across the full stack. For observability platforms, this raises a question about what is actually being measured and whether current approaches can keep pace with the scale and complexity of AI systems.

As Domino Data Lab’s Vawdrey put it, traditional observability tools were built to test whether a system is up and running. In an AI context, he argues, that is no longer enough.

Analysts say this is not simply a tooling issue, but a reflection of how enterprise systems themselves are changing. Gartner identifies multi-agent systems and AI-native development platforms as key trends shaping enterprise IT, where applications are no longer static but made up of interacting components operating across distributed environments.

In this model, systems are continuously evolving, with decisions and actions taken across multiple layers of infrastructure, data and models. That, Gartner argues, increases both the complexity and the operational risk of enterprise IT, making it harder to establish clear lines of cause and effect when something goes wrong.

Intelligent observability emerging

That is already having an impact on how observability itself is evolving. According to IBM, platforms are becoming more intelligent to keep pace with AI systems, with organisations increasingly using machine learning to analyse telemetry, detect anomalies and automate responses. In effect, it is becoming a case of using AI to observe AI.

“The intelligence and speed required to keep these AI systems healthy also grows in parallel, demanding that more innovative and powerful types of intelligence are implemented,” says Arthur de Magalhaes, senior technical staff member for AIOps at IBM.

At the same time, Forrester argues that observability should be “woven into the fabric” of the software development lifecycle, using real-time telemetry to inform design, testing and deployment rather than reacting to failures in production.

These changes are already feeding into the concerns organisations are dealing with in practice. Tabassomi says CIOs are increasingly focused on understanding how systems are being used, distinguishing between human users, automated agents and external services, and identifying unusual patterns of behaviour.

That has implications beyond performance. As AI systems expand the number of interactions across environments, they also increase the potential attack surface and the risk of unexpected resource consumption.

“Observability is about understanding what is at risk, as well as how systems are performing,” says Tabassomi.

In that context, observability is being used not just to monitor infrastructure, but to manage exposure, cost and operational impact across increasingly complex systems. It’s an evolution of the technology that encompasses a broader remit, to help organisations manage the frustration of fragmentation.

Tabassomi says many CIOs are looking for greater consolidation across their technology environments, not just at a systems level, but across teams and workflows. Data, infrastructure and responsibility are often spread across different functions, making it harder to build a coherent picture of how services behave or where problems originate. As environments scale, that lack of alignment can lead to inefficiencies, slower response times and higher operational costs. Putting AI into this mix just adds more headaches.

Perhaps this is why there is a growing expectation that observability should go beyond visibility alone. As AI systems become more autonomous, teams are less interested in dashboards that describe system behaviour and more focused on what actions to take in response.

That places new demands on observability platforms, which are increasingly expected to identify root causes, prioritise issues and, in some cases, trigger automated responses. In that sense, observability is moving closer to decision support, rather than simply reporting on system performance.

This leads to a rethink of what observability is for. Observability is certainly not disappearing, but it is being stretched somewhat. The core idea, bringing together data to understand how systems behave, still works. But in an AI context, behaviour is no longer defined by performance alone. It includes outputs, decisions, interactions and their impact on users and the business.

There are already signs that organisations are responding. Gartner predicts that by 2027, 70% of enterprises implementing distributed data architectures will adopt data observability tools, up from 50% in 2025, as they look to improve visibility across increasingly complex data environments.

The same research also notes that traditional reactive monitoring approaches are no longer sufficient in these environments, particularly as AI initiatives place greater demands on data quality, governance and real-time insight.

What organisations need is a more complete view, one that combines traditional telemetry with insight into behaviour, context and outcomes. The challenge is how to adapt observability to systems that are less predictable, more autonomous and harder to interpret. Of course, technology has a habit of solving problems, only to then create new ones. Observability is part of that cycle, trying to keep up with systems that are becoming even harder to pin down.

As Champagne at Cribl says: “True observability in this era requires visibility across the entire stack, not just the model.”

Read more about AI and observability

Read more on Artificial intelligence, automation and robotics