
Looker_Studio - stock.adobe.com
Interview: Using AI agents as judges in GenAI workflows
We speak to Ranil Boteju, chief data and analytics officer at Lloyds Banking Group, about how the bank sees agentic AI in customer-facing chatbots
Around 40 years ago, a bank branch manager probably knew the name of every customer and was able to offer personalised advice and guidance. But as Ranil Boteju, chief data and analytics officer at Lloyds Banking Group, points out, in today’s world, that model cannot scale.
“In the world of financial planning, most people in the UK cannot afford to see a financial planner,” he says.
There is also an insufficient number of trained financial advisers to help everyone seeking advice, which is why financial institutions are looking at how they can deploy generative artificial intelligence (GenAI) to support customers directly.
But the large language models (LLMs) and GenAI from hyperscalers are rather like black boxes and can deliver incorrect responses, known as hallucinations in AI terms. None of these things are acceptable in a sector regulated by the Financial Conduct Authority (FCA).
What excites Boteju is the ability to scale the 40-year-old model of a bank manager to meet current demand by using artificial intelligence in a way that provides the bank with confidence that the AI is able to understand what people need and give them the right guidance in a way that can be assessed and meets FCA guidelines.
• Listen to the full interview with Lloyds Banking Group’s chief data and analytics officer, Ranil Boteju, in this podcast.
“It would be a great ‘unlock’ for the UK in terms of giving access to high-quality financial guidance to a much broader and larger set of the population,” he says.
As Boteju notes, banks have been using AI for many years. “We’ve been using all sorts of machine learning algorithms for things like credit risk assessments and fraud screening for more than 15 years,” he says. “We’ve also been using chatbots for at least 10 years.”
As such, AI is a very well-used capability in financial services. What’s new, however, is generative AI and agentic AI. “Generative AI burst on the scene in late 2022 with ChatGPT. It’s been about for almost two-and-a-half years now,” says Boteju.
While banks have experience with AI, they have needed to figure out how to use generative AI and large language models. Speaking of his own experience, Boteju says: “We think about things like model performance and whether we are using the right algorithm.”
There is also transparency, ethics, guardrails and how the AI models are deployed. Boteju says: “These are common both to large language models and traditional AI. But generative AI has specific challenges in financial services because we are a regulated industry.”
Since generative AI can often lead to hallucinations, he says banks have to be very cautious about how they expose large action models directly to customers. “We put a lot of effort into ensuring that the outputs of the large language models are correct, accurate and transparent, and there’s no bias.”
In a regulated industry, it is vital to ensure the AI models are not hallucinating. “That’s probably one of the key things we need to be really cognisant of,” he says.
The need for specialist AI models
As Boteju notes, a model like Google Gemini is trained on everything. “If you ask it a question, the output will be based on its knowledge of everything. It’s been trained on lots and lots of data.”
Not all of this data is relevant to financial services, however. By restricting the AI model to data specific to financial services, the model should, in theory, hallucinate less.
“We felt quite strongly that we wanted to use a language model or a group of models that were specifically trained on financial services data relevant to the UK,” says Boteju.
This led to Lloyds Banking Group approaching Scottish startup Aveni to support the development of FinLLM, a financial services-specific large language model. In 2024, the company secured £11m of investment from Puma Private Equity, with participation from Lloyds and Nationwide.
Discussing the work with Aveni, Boteju says Lloyds Banking Group did not want to be tied to one specific model, so it decided to take an open approach to foundation models. From an AI sovereignty perspective, he says: “We don’t want to be limited to the large hyperscale models. There’s a fantastic ecosystem of open source models that we want to encourage, and the fact that we could create a FinLLM that is UK-centric in the UK is something we found very appealing.”
The bank has been testing FinLLM in its audit team, where an audit chatbot virtual assistant developed by Group Audit & Conduct Investigations (GA&CI) at Lloyds Banking Group is transforming how auditors access and interact with audit intelligence. The chatbot integrates generative AI with the group’s internal documentation system, Atlas, making information retrieval faster, smarter and more intuitive.
Boteju says the bank effectively trained the chatbot using FinLLM and its knowledge of audits, based on all the audit data it has collected.
He describes the approach Lloyds Banking Group has taken to reduce errors as “agent as a judge”. “You may have a specific model or agent that comes up with a specific outcome,” he says. “Then we’ll develop different models and different agents that review those outcomes and effectively score them.”
The bank has been working closely with Aveni to develop the approach of using AI agents as judges to assess the output of other AI models.
Each outcome is independently assessed by a set of different models. The review of the outputs from the AI models enables Lloyds to ensure they are aligned with FCA guidelines as well as the bank’s internal regulations.
Checking the outputs of AI models is a really good way to double-check that the customer is not being given bad advice, according to Boteju, who adds: “We’re in the process of refining these guardrails, and it’s imperative that we have [this process] in place.”
Boteju points out that having a human in the loop will remain important regardless of the “agent as a judge” approach. “There is still very much a place for humans in the loop in the future,” he says.
The power of different AI models in agentic AI
While an AI model like FinLLM has been tuned to understand the ins and outs of banking, Boteju says other models are much better at understanding human behaviour. This means the bank could, for instance, use one of the AI models from a hyperscaler, such as ChatGPT 5 or Google Gemini, to understand what the customer is actually saying.
“We would then use different models to break down what they’re saying into component parts,” he says. Different models are then tasked with tackling each distinct part of the customer query. “The way we think about this is that there are different models with different strengths, and what we want to do is to use the best model for each task.”
This approach is how the bank sees agentic AI being deployed. With agentic AI, says Boteju, problems are broken down into smaller and smaller parts, where different agents respond to each part. Here, having an agent as a judge is almost like a second-line colleague acting as an observer.
Read more about agentic AI
- How not to go off the rails with agentic AI: When enterprises multiply AI, to avoid errors or even chaos, strict rules and guardrails need to be put in place from the start.
- Agentic AI – storage and ‘the biggest tech refresh in IT history’: We talk to Jeff Denworth of Vast Data about a future where employees are outnumbered by artificial intelligence agents and even smaller enterprises may need supercomputing levels of resources.