R Studio - stock.adobe.com
Sovereign and edge AI drive return to on-premise Kubernetes
While public cloud services remain popular, the need to control sensitive data and maximise GPU performance is pushing enterprises to deploy Kubernetes in their own datacentres
The rush to deploy sovereign artificial intelligence (AI) and run inference at the edge is driving more enterprises to deploy Kubernetes on-premise, according to Dan Ciruli, vice-president and general manager of cloud native at Nutanix.
Speaking to Computer Weekly on a recent visit to Singapore, Ciruli noted that while Kubernetes has been around for over a decade, the requirements of modern AI workloads are changing how and where the container orchestration platform is deployed.
While many organisations previously relied on managed Kubernetes services, such as Amazon Elastic Kubernetes Service or Google Kubernetes Engine, the need to keep sensitive data in-house is pushing Kubernetes workloads back to the datacentre.
“Kubernetes is a necessary ingredient in building an AI stack,” Ciruli said. “If you’re consuming AI as a service, or using Gemini or OpenAI, you don’t need it so much, but we’re seeing a lot of desire for sovereign AI.”
He noted that the rising cloud costs had already pushed some workloads back on-premise, but the desire to apply AI to internal data has accelerated the move.
“If you want to do AI on-premise and your data is there, you’re going to have to solve that Kubernetes puzzle,” Ciruli added. “It’s the same thing at the edge; the desire to run inference at the edge means you have got to get Kubernetes out there.”
On the challenges enterprises face when running Kubernetes on-premise to support AI workloads, Ciruli said the main hurdle isn’t a lack of talent, but the complexity of building a production-ready stack from scratch.
“Every survey I’ve seen in the last 12 years on what’s holding up Kubernetes adoption talks about the skills gap, but what that really means is that the complexity is too high,” Ciruli said. “I don’t like blaming it on the people.”
He explained that Kubernetes is rarely a standalone installation, often requiring integration with up to 30 open-source projects, graphics processing unit (GPU) certifications, and security governance.
“Someone has to figure out how to do upgrades and what’s going to be automated. That stuff is taken care of if you’re using the cloud, but if you want to do it on-premise, you have to figure out what’s the right Kubernetes.”
To address this, Ciruli said Nutanix provides a turnkey, AI-ready Kubernetes platform that offers a choice of operating systems and hypervisors, works with hardware vendors like Cisco on so-called AI factories, and ensures certification with Nvidia.
As AI workloads reshape infrastructure, the debate between running containers on bare metal versus virtual machines (VMs) has resurfaced. Ciruli believes that while VMs will remain the standard for general applications, AI is carving out a niche for bare metal deployments.
“If it’s running in a VM today, it will probably run in a VM forever,” Ciruli said, noting that most legacy applications will never be rewritten to run in containers. However, AI clusters are being treated differently.
“The place where we see companies increasingly running bare metal Kubernetes is AI,” he explained. "They are often buying clusters which are physically separate GPU-enabled nodes, and the GPU drivers also work better when running on bare metal.”
Ciruli added that while the performance overhead of virtualisation is in the low single digits, organisations investing heavily in GPU infrastructure want to squeeze every ounce of performance out of their hardware, leading them to choose bare metal servers for AI workloads.
“We even have customers that run their control plane nodes in VMs, and their worker nodes are on Kubernetes clusters running on bare metal GPU boxes,” he said.
The move towards on-premise infrastructure is happening amid market disruption following Broadcom’s acquisition of VMware. Ciruli noted that this has prompted enterprises to re-evaluate their supplier relationships, with lock-in being a primary concern. “We still hear customers talking about how that is affecting their plans,” he said.
While some companies have moved off VMware – and Nutanix offers a free migration tool – Ciruli warned that moving platforms is rarely just a technical challenge.
“The biggest concern is their people,” he said. “You might talk to a large enterprise customer who has hundreds of people in operations with over a decade of experience working in VMware – helping them plan the way they transition their people is the hard part.”
Ciruli said Nutanix’s strategy to address lock-in fears relies on using upstream Kubernetes rather than a forked version, ensuring portability across different Kubernetes distributions and certification with the Cloud Native Computing Foundation’s Kubernetes conformance programme.
“We don’t want to fork it and produce something that’s non-conformant,” he said. “A company using it today could deploy to someone else’s Kubernetes tomorrow with no change.”
Read more about Kubernetes in APAC
- APAC organisations need to move away from expensive managed services and DIY Kubernetes deployments. The way forward is a centralised team that can standardise Kubernetes and unlock its true potential for cost savings, agility and AI-driven growth.
- Kubernetes co-creator Joe Beda talks up the evolution of the container orchestration platform and efforts by VMware to help enterprises get the most out of the technology.
- The Korea Coast Guard has modernised its maritime operations using lightweight Kubernetes clusters on its vessels and SUSE’s Rancher Prime Kubernetes platform at its headquarters.
- The CNCF is anticipating the emergence of an open source alternative to Nvidia’s Cuda platform as the industry seeks to avoid supplier lock-in for AI workloads.
