spainter_vfx - stock.adobe.com

Why AI agent projects are stalling in production

A Confluent technology strategist explains why multi-agent workflows become brittle monoliths and how access to real-time, streaming data is key to the success of agentic AI deployments

While a number of Confluent customers have begun moving artificial intelligence (AI) proofs-of-concept (PoCs) into production in the past six months, they often encounter stumbling blocks when the projects are put to real-world use, according to Andrew Sellers, the company’s vice-president of technology strategy and enablement.

Organisations need to be realistic about the state of the art in AI, but even where a model is fit for its intended purpose, there are other considerations for success.

“Existing AI frameworks have made it easy to get started with AI agents, but the real challenge has been getting them production-ready,” Sellers said. “Too often, prototypes stall because they struggle to access real-time context or integrate reliably with the tools and data they need. The result is a tangle of systems and databases that turn multi-agent workflows into brittle monoliths that simply can’t scale.”

Accessible, good-quality, domain-specific data is essential, as are traceability and reproducibility. This presents problems, for instance, when model context protocol (MCP) calls are “forgotten” once a response is returned. Sellers noted that lessons learned from using microservices to call large language models (LLMs) can be applied here, with an event-driven architecture improving visibility of the events agents react to.

While LLMs are known to hallucinate, Sellers said this can be largely prevented by grounding them with specific questions and domain-specific data – for example, by using retrieval-augmented generation (RAG). “It’s more of a data problem and a trust problem” than a model problem, he suggested.

This challenge is compounded by the fact that modern applications increasingly need to incorporate real-time data. Previously, transactional systems operated in real time, while analytics were performed on an ad hoc basis or for end-of-day reports. Today, business leaders are shortening the decision-making cycle, and Sellers warned “there’s a risk to doing nothing” to support that change.

As a result, transaction processing and analytics must be merged to allow rapid analysis, with the results fed back into operational systems. One example is online retail, where hyper-personalisation is now expected by customers and cannot be delivered without real-time access to the right data.

This is the problem Confluent aims to solve with its recently introduced Streaming Agents. Sellers explained that they allow teams to build, deploy and orchestrate event-driven agents natively on Apache Flink. The agents can monitor and act on business events instantaneously, powering intelligent automation with state management and replayability.

That replayability comes from every input an agent sees being part of an immutable event log. “Teams can rewind the stream, whether to recover from failure, test new logic, or audit agent decisions after the fact,” he said.

In security, for example, Streaming Agents can perform real-time anomaly detection using built-in machine learning on high-velocity data streams, such as system metrics, network traffic and sensor data. Adding metadata from incident records and threat feeds can identify clusters of related events, which can then be fed into LLM-powered workflows to identify root causes and route alerts to the relevant teams, reducing mean time to resolution.

Regarding trust, Sellers advocates for processing and governance to occur as close to the data source as possible. “The people who know data the best are usually the ones that make it,” he said, adding that “governance is primarily a metadata creation problem.” This approach allows AI agents and analytics to be applied to the data wherever is most suitable.

This is in contrast with using data lakes, which he said tend to become “black holes”, adding that reverse ETL (extract, transform, load) “usually doesn’t work well”. Data streaming, he argued, allows for more efficient integration and lets users ask arbitrary questions, unconstrained by the primary purpose for which the data was collected.

Another barrier to agentic AI development is that building applications around LLMs is a departure from traditional software development. It requires different skills, tools and processes – especially around quality assurance.

He added that some sectors, including financial services and telecommunications, are more accustomed to dealing with “black box” systems and are therefore making better progress with agentic AI. “The rest of us have to get used to it,” he said.

Testing becomes less critical, however, when agentic AI is used to augment humans rather than replace them. Sellers gave the example of banks using agents to generate anti-money-laundering reports. The output is not used as a finished product, but as a starting point for a human-written report. This approach means 70% more reports can be prepared in the same time, and the best reports are used to retrain the model. While this work may never be fully automated in regulated industries, the human role could be reduced to auditing AI-generated reports.

This leaves organisations with three basic choices for testing. One is to conduct far more testing than has previously been considered necessary. Another is to keep a human in the loop. The third is to accept that a model is good enough even if it is only right 90% of the time.

That third option is not viable in high-stakes situations but could be acceptable when recommending a next action to a salesperson. Indeed, most current agentic AI projects are at the level of generating sales leads rather than closing deals.

More sophisticated projects are underway, however. Sellers pointed to insurers using agentic AI to underwrite certain perils and process claims, telcos implementing predictive call routing, and medical technology companies enhancing electronic health records to make them understandable to a wider range of specialists.

Ultimately, Sellers said, organisations that have put their data estate in order are best placed to take advantage of agentic AI. He stressed that data quality and accessibility must come first, before any model is chosen, as this allows an organisation to take advantage of any suitable model that comes along.

Finally, he urged organisations to pay more attention to ethical AI, including the proper use of personally identifiable information and ensuring that models do not omit factors that humans would rightfully take into account when making decisions.

Read more about AI in APAC

Read more on Artificial intelligence, automation and robotics