Using your AI investment wisely: the case for smaller models with bigger impact

Enterprise Applications Editor

This is a guest blogpost by Kris Kang, Head of Product, AI, JetBrains

Bigger models, more parameters, higher benchmarks. There is often a fixation on scale in the discourse around AI, making it easy to assume that the bigger a Large Language Model (LLM) is, the better and more useful it must be. But as agentic AI becomes more commonplace, it’s time to challenge that assumption.

CTOs working at AI companies are likely well aware of this, but from my conversations, this isn’t always the case for counterparts in other sectors, like pharmaceuticals or manufacturing. They don’t have as clear a picture of the compounding costs and environmental concerns that come with frontier models, or the comparable results they can achieve by adding smaller, purpose-built alternatives into their model mix at a fraction of the financial and environmental cost.

Hyper-specialised models – otherwise known as focal models – should become tech leaders’ go-to for workhorse tasks. If the biggest foundational models are the “luxury” items in the AI supermarket, this is the case for looking at the more cost-effective, practical alternatives that suit specific uses. You could exclusively go for premium items, but other brands will do the job the vast majority of the time (and you might not even notice a difference).

Ballooning costs: the hidden burden of scale

Let’s begin with the most immediate concern for any enterprise: costs. Frontier models are more expensive to train and access to proprietary APIs is costly for users. If a software developer is set a task, for example, they might need to make hundreds of individual requests to an LLM by the time they are finished. Scale that across a team of developers, and that could become thousands of requests for a project – this quickly adds up.

Now picture how much more that could balloon through the use of an agentic coding tool – because it’s operating autonomously by nature, there’s no telling how many requests it might make, and therefore the cost of completing a task. That same principle applies in any agentic application: the bigger the underlying model, the more exposure you have to unexpected charges.

Many smaller LLMs, in contrast, are open-source, which means there is no reliance on a single provider’s pricing or usage policy, which can cut costs dramatically – with the added benefit of avoiding vendor lock-in. Businesses can control the stack and avoid billing surprises. A “pick and mix” option becomes viable: businesses can choose the frontier model, but be sure that they are using the right tool for the job and getting value for money. This is especially important when considering the productivity/cost ratio in assessing AI outcomes. To maximise this, productivity must increase while costs decrease. Focal models are optimised to do exactly that.

The performance drop-off is insignificant too. While smaller, because a focal model is tuned to a hyper-specific use-case, it will still excel at meeting that specific goal at a fraction of the price. Some of the largest models cost around £0.15 per request for instance, and yet our coding-specific model matches 98% of their performance for coding at less than £0.01 per query. Providers like OpenAI, Google and Anthropic are also working on lower-cost, more sustainable alternatives to their foundational models, and more choice and competition can only be a good thing for customers.

Purpose built to really perform

Focal models that are designed from scratch or fine-tuned to do one, hyper-specialised thing well are faster, leaner and easier for users to consume. That makes them easier to use, which inevitably accelerates the pace of adoption. Time-to-value is critical, so efficiency translates directly into competitive advantage.

For example, while many frontier models might code well in Python because it is overrepresented on the web, they are often not optimised in the same way for Java and thus wouldn’t be as useful for powering a coding agent.

Few people walk into a supermarket and only pick the most expensive version of every item on their list, and the same goes for enterprises. Whether your teams are using an AI to write lines of code, or tracking inventory in the supply chain, there will be tailored models tuned to that specific use-case that trump paying a premium for a ‘do everything’ model.

Less compute, more environmentally responsible

The environmental cost of AI is becoming harder to ignore and larger models naturally leave a bigger footprint. They require vast amounts of compute power, which translates into higher electricity usage and increased cooling demands in data centres. According to recent estimates, processing one billion AI requests consumes the same amount of energy as powering 30,000+ UK homes for a year.

Focal models dramatically reduce this environmental impact. They require less compute for training and inference owing to their fewer parameters, thus consuming less electricity and generating less heat. Sustainability is already a rising priority and, particularly in regions with strict regulations like the EU, non-compliance risks costly penalties.

Some tools will need to be underpinned by a bigger model, and ensuring they’re used sustainably is part of the trade-off businesses need to make to take advantage of their power and breadth. But, that job is made far easier when you’ve got a balance of focal models with smaller footprints in your mix.

The future is focal

CTOs are under pressure to deliver AI results at speed, but often lack a clear understanding of the technology’s long-term costs, especially in industries that are earlier in their digital transformation journeys. Many enterprises risk making short-sighted decisions driven by fear of missing out, which could drive immediate productivity gains that are soon negated by the cost of achieving them through large, foundational LLMs.

Not every problem needs the largest model possible to solve. The future is about using the right model, and making trade-offs that reflect real-world needs. Precision can beat size without sacrificing performance, and can help drive purpose-built innovation for organisations.