Looker_Studio - stock.adobe.com

DeepSeek shows enterprises model distillation opportunity

DeepSeek showed how it is possible to run an AI model using far less compute than existing models. AI model distillation is now becoming mainstream

Model distillation is one of the technology trends that has reached a level of maturity identified in Gartner’s 2025 Hype Cycle for artificial intelligence (AI) as “the slope of enlightenment”.

However, while it was recently put into the spotlight at the start of the year with China’s DeepSeek demonstrating how model distillation can be used to train a large language model (LLM) that rivals models from OpenAI, it is not a new development, with Haritha Khandabattu, senior director analyst at Gartner, saying: “I was actually researching model distillation in 2017.”

In fact, the technique dates back to the 2006 Cornell university Model compression paper by Cristian Bucilă, Rich Caruana and Alexandru Niculescu-Mizil. Nine years later, in 2015, Cornell university’s Distilling the knowledge in a neural network paper by Geoffery Hinton, Oriol Vinyals and Jeff Dean used the term distillation to describe a technique to improve the performance of AI models.  

Although it is not considered a new technological development by Gartner, Khandabattu said: “Model distillation has been re-emphasised. The foundation models are compute hungry and extremely expensive to run, and enterprises have started asking how they can get 80% of the performance at 10% of the cost.”

She said DeepSeek has led to a downward pricing trend for pricing over the past six to 12 months. But rather than adapt to these price changes, Khandabattu recommended that CIOs “plan their use cases and prioritise with the expectation that training and inference costs will continue to decline”.

Khandabattu said that even the large AI technology providers recognise the usefulness of model distillation to enable more deployable, more tunable and more governable AI, adding: “Model distillation is finally gaining commercial traction.”

She describes model distillation as a bridge between innovation and scalability: “Model distillation unlocks both technical merit and access. It offers lower inference cost and IT infrastructure expenses are also a bit lower, which makes model distillation cost-effective for certain AI deployments.”

But Khandabattu also noted that there are other costs IT leaders need to consider beyond the IT infrastructure needed to run inference workloads. “CIOs need to be extremely careful and recognise that the total cost of deploying GenAI [generative AI] applications is not limited to the cost of the models.”

There are engineering costs and costs associated with integrating the AI system with enterprise IT, she said, adding: “Fine-tuning an AI model costs a lot of money. If the model provider decides to change the model, then you have to change all of the things that you’ve built on the older model to the newer one, which is very expensive.”

Beyond model distillation, she said: “With AI investment remaining strong this year, a sharper emphasis is being placed on using AI for operational scalability and real-time intelligence.”

According to Gartner, this has led to a gradual pivot from generative AI as a central focus, toward the foundational enablers that support sustainable AI delivery, such as AI-ready data and AI agents.

“Despite the enormous potential business value of AI, it isn’t going to materialise spontaneously,” said Khandabattu. “Success will depend on tightly business aligned pilots, proactive infrastructure benchmarking, and coordination between AI and business teams to create tangible business value.”

Among the AI innovations Gartner has forecast will reach mainstream adoption in the next five years are multimodal AI and AI trust, risk and security management (TRiSM).

Multimodal AI models are trained with multiple types of data simultaneously, such as images, video, audio and text. TRiSM is focused on layers of technical capabilities that support enterprise policies for all AI use cases and help assure AI governance, trustworthiness, fairness, safety, reliability, security, privacy and data protection. Gartner has predicted that, in combination, these developments will enable more robust, innovative and responsible AI applications, transforming how businesses and organisations operate.

Gartner also expects AI agents are at least two to five years away from becoming mainstream. 

“To reap the benefits of AI agents, organisations need to determine the most relevant business contexts and use cases, which is challenging given no AI agent is the same and every situation is different,” said Khandabattu. “Although AI agents will continue to become more powerful, they can’t be used in every case, so use will largely depend on the requirements of the situation at hand.”

Read more about model distillation

  • Distillation brings LLM power to SLMs: Knowledge distillation enables effective transfer from LLMs to SLMs, helping “high school student” AIs perform beyond their capabilities.
  • Distillation refines LLMs into smaller (richer) droplets of data science: Why data science teams will turn their focus back to smaller, faster models.

Read more on Artificial intelligence, automation and robotics