Rethinking AI’s place in the software stack
This is a guest blogpost by Bobby Blumofe, Executive Vice President and Chief Technology Officer at Akamai.
DeepSeek’s recent developments have ignited significant discussion in the AI community. DeepSeek is very impressive, a tour de force of engineering optimisation. They’re building their models on the same transformer architecture as all the other models, but they have optimised the architecture and trained it in very creative ways.
Many people were surprised at how much less expensive it was to train their models compared to previous models, but in truth, it’s not all that surprising. In recent years, there’s been a lot of research, not just in the private sector but also in academia and non-profits, advancing the core transformer architecture on which these models are trained. This isn’t a discontinuity or a black swan event; it’s the product of continued global R&D.
What’s important here is that it shows there are significant optimisations to be made in the AI space. There was previously a belief that an LLM had to keep getting bigger and more expensive, with hundreds of billions or even trillions of weights, but you can get very effective models at much lower costs and much smaller sizes. DeepSeek is a step in the direction, and I think more will follow in rapid succession.
The shift away from multi-purpose LLMs
There will be a handful of companies in the world that produce the general-purpose models and do the pre-training because even with these advanced optimisations and more to come, the pre-training still takes a lot of computation and specific expertise. However, inference, or using the models once they’ve been built, is where the long-term value will be. For most enterprise use cases, you don’t need these big ‘ask me anything’ models to get the results that you need. Running a small model that is focused on the problem you’re trying to solve is going to require far less computation, use less carbon, have a smaller attack surface, and, for all these reasons, be much less expensive to run and maintain.
As the industry seeks to demonstrate ROI for investors, this year will see more much-needed optimisation in the AI sector. If you’re an accountancy firm, you don’t need your AI to be able to write sonnets or generate trivia about 60s surf rock. You need it to be really good at your specific business problems, and nothing else.
Where AI fits in the software stack
With smaller, more optimised LLMs, businesses will have the ability to run inference on any infrastructure they want, in any cloud. You no longer need to go to a hyperscaler who has the latest GPUs. In many cases, businesses will want this inference to happen at the edge, closer to end users, for cost, speed and data sovereignty considerations.
A lot of the conversation around AI today envisages AI as a horizontal layer in the software stack, reaching across everything as a universal ‘ask me anything’ platform. But this shift towards more optimized LLMs and inference will prompt a rethinking of this. In many business cases, AI tools are better suited as one component of a vertically integrated software stack.
There’s a lot of value in building specific applications for specific use cases, that may incorporate some small specialised LLMs and inference engines, but with other infrastructure built around it to really make the application work for that specific domain.
For example, think of the legal sector. AI can be a valuable assistant to a lawyer. But rather than a universal ‘ask me anything’ assistant, where you hand it some case law and give it some prompts, think of it instead as a purpose-built lawyer’s assistant application, with many different components. Some of these will be AI-based, but some will not — for example a document management system, a case management system, or a legal research database.
We’ll see much more value from AI when we stop seeing it as a stand-alone horizontal layer, and start to consider how it could work in conjunction with other components in a vertically integrated software stack. We can expect to see a wave of startups developing new applications for specific use cases along the lines of this approach. It’s an exciting time — we’re witnessing a step-change in our ability to cost-effectively embed AI functionality into practically useful, real-world applications.