Romolo Tavani - stock.adobe.com
Boomi CEO shares vision of AI cost management
Steve Lucas, CEO of Boomi, believes the answer is prompt routing, which sends queries to the LLM with the lowest token cost and caches responses
Boomi is hoping it can provide IT leaders with greater visibility on their token spend, something which is lacking across the industry. The company is developing a tool called Boomi Prompt, which acts as middleware between enterprise applications and large language models and the artificial intelligence (AI) agents that need to access these systems to perform a task on behalf of a human user.
As the use of artificial intelligence and AI agents starts ramping up, providers of large language models (LLMs) and AI tools are moving from subscription or software as a service (SaaS)-style software licensing to pricing based on the costs associated with AI inference, measured in tokens.
A token is the smallest piece of information an AI engine or LLM takes as input, such as a word in a sentence. The larger the volume of tokens submitted to the LLM, the larger the token usage, and this equates to more computational resources needed by the provider. That cost is the token cost the organisation pays to submit the query to the AI tool.
If a query is continuously being submitted, the token cost is paid again and again, even if the organisation already has the answer. Boomi aims to cache such repeated responses, to avoid organisations spending unnecessarily on tokens when they already have the answer.
According to Boomi, the Prompt tool is also able to figure out what LLM is least expensive to answer a user or AI agent’s question.
Speaking at the Boomi World Tour in London, the company’s CEO, Steve Lucas, said the company will release a tool called Prompt later this year that “provides a layer” between an AI engine and any backend system.
The agent may seek to find information held in an SAP or Oracle system, using an application programming interface (API), or it could call an LLM. When the agent is asked to do a task, he said: “If it seeks data from an SAP system and Oracle system, and the answer to the prompt is cached in our prompt layer, we will provide that cached response.”
Read more stories about AI costs
- AI’s effect on UC pricing models: AI costs are threatening traditional SaaS budgets. UC and CX leaders face new pricing models, vendor lock-in risks and the challenge of defining AI ROI.
- IT’s budgetary nightmare: Tech buyers face AI pricing variance: Did HubSpot just fix AI pricing?
This saves costs associated with continuously using APIs to access commercial off-the-shelf enterprise software, where there may be an indirect access cost associated with that API.
Lucas said the new Boomi tool is also able to understand when a prompt submitted by a user or an agent can be routed to a standard SQL-based query such as a Google search, rather than “burning tokens”.
However, he said: “If that prompt is of value, we will route it to an AI model, and the model we select will depend on the rated complexity of that response.”
One example of the prompt is a forecasting question such as expenses across two systems, he said. “We have Nemotron from Nvidia, which, in this hypothetical scenario, is effectively free for my business to run,” said Lucas. “We will route the prompt there.”
According to Lucas, prompt routing is a complex but highly necessary capability for the enterprise, which he said is completely unserved today. “There is no sophisticated prompt routing standard for the enterprise,” said Lucas.
Although Perplexity does offer prompt routing, according to the Boomi CEO, it is not enterprise-oriented. He said Boomi’s approach aims to go further. “The work that we’re doing has many layers and token reduction, and optimisation is one of those layers,” said Lucas. “Prompt routing will allow companies to reduce their token spend massively. Our design objective is to achieve greater than 50% reduction in token spend in the enterprise.”
