Gorodenkoff - stock.adobe.com

Feature

Kubernetes at 10: When K8s ‘won’ and life now as a ‘surly teenager’

We talk to Datastax’s Patrick McFadin about Kubernetes’ success against competitors and how StatefulSet and operators solved key challenges, but why there are still obstacles to clear

Antony Adshead, Computer Weekly

Published: 11 Jul 2024

Kubernetes is 10! Mid-2024 sees the 10th birthday of the market-leading container orchestration platform. But, according to DataStax developer relations guy Patrick McFadin, the container orchestration platform is in a “surly teenager” phase, with some challenges in management efficiency still to be solved.

He also recalls how Kubernetes, also known as K8s, overcame its early challenges and believes it is here for some time yet.

Kubernetes’ early years started as containers emerged as a new way to virtualise applications, but with storage and data protection functionality that was practically non-existent. Now, Kubernetes offers a mature container platform for cloud-native applications with all that’s required for the storage of stateful data.

We mark the first decade of Kubernetes with a series of interviews with engineers who helped develop Kubernetes and tackle challenges in storage and data protection – including the use of Kubernetes Operators – as we look to a future characterised by artificial intelligence (AI) workloads.

Here, Patrick McFadin, vice-president for developer relations at NoSQL database specialist DataStax, talks about how Kubernetes was one among many container orchestrators, the time he realised “Kubernetes just won”, the obstacles to efficient storage, and the role of StatefulSet and operators that provide programmable automation for storage and other services.

What was the market like when Kubernetes first launched?

When Kubernetes first arrived, there were multiple players around container orchestration. You had Docker Swarm, and Apache Mesos, which was designed to run large analytic workloads. Kubernetes came in with one claim to fame – it came from Google. The issue of managing large infrastructure was a clear problem people were trying to solve, so the market was ready for a good solution.

How did you get involved in work on the data infrastructure around Kubernetes?

Working on a large distributed database put me square in the middle of working with the operational challenges for users. You can manage a handful of servers without a lot of tooling, but that goes out the door when you find yourself scaling up past 10, 100 or 1,000 nodes. Mesos was a good fit for Apache Cassandra and was why I started working with that project. I was also working on Apache Spark, which fitted well with Mesos. At the time, Kubernetes was becoming well known as a container manager for front ends, so in the back-end infrastructure community it wasn’t making as big of an impact.

How did you realise that Kubernetes was in the leading position in the market?

During a conference held in 2017 by Mesosphere, a company that provided an enterprise version of Mesos, CEO and co-founder Ben Hindman announced Mesos support for Kubernetes. It was a lightbulb moment. I turned to the person next to me and said: “Kubernetes just won.” It took a while to come through, but this was one of the turning points that I saw.

When you looked at Kubernetes, how did you approach data and storage?

That was the main issue between Mesos and Kubernetes. Mesos was much better at managing storage resources. Kubernetes treated storage as an afterthought initially, and when your database relies on high quality storage, it was a horrible fit. Avoiding Kubernetes for data workloads was the best approach for a long time.

What issues first came up around data and storage with Kubernetes for you?

Provisioning and quality. Kubernetes initially treated storage as ephemeral, meaning that as soon as the container was removed, all your storage disappeared. Not good for databases. There were some tricks to maintaining data from restart to restart, but none were built into Kubernetes. Then being container volumes, the performance left a lot to be desired. Overall, there was a lot of problems that kept serious users from jumping in.

What had to change?

Changing how storage was addressed was the first order of business. Kubernetes introduced the StatefulSet which completely changed how storage was provisioned. It modelled what was needed for a stateful server like a database. The next big change was the addition of operators. Because servers were being added to a system that controlled provisioning, there needed to be a way to translate commands like “start” and “stop”. Operators created the translation layer between Kubernetes and the established services being brought in.

How did you get involved with Kubernetes Operators?

The Data on Kubernetes Community did a lot of work to make this concept one that people were willing to buy into. When we started, we got feedback that said, “Put my data on Kubernetes? Are you mad?” But we also saw a lot of devs say, “I want everything in one place, so make it work.” Over time, that community effort succeeded. I think 2022 was the tipping point where we saw more people willing to run their data on Kubernetes than not. That was based on the efforts to create quality operators.

What happened around Operators that made them a success for data and storage?

Operators solved a huge problem and were simple to put together. Many people put together their own operators – it was like another operator for projects came out every other weekend. That was great, but it meant you had a lot of fracturing with abandoned projects that did the basics, but no collaboration. Everybody wanted to be the one to put the operator together, so you ended up with lots that did the same thing in very slightly different ways.

We saw this in the Apache Cassandra community. I think there were about 12 different operators concurrently at one point and they were all good at specific things. But we didn’t benefit from collaborating with each other and improving things. It took a little while for the community to come together and agree what we wanted, what we wanted to work on, and how to do it together. But when that started, I think it made a huge difference.

How did this support more cloud-native approaches? What were the consequences?

I think it helped the overall approach to cloud-native applications because you could run your applications and your infrastructure in the same place and consistently manage everything. When you have microservices and containers, you want to be able to scale and you want to be able to move things around when you need to rather than being tied to what you have. When you could do that for your database as well, it just made things more simple. And when you could start with testing, it made it easier to demonstrate that this worked and you could move into production. It’s the whole argument around data gravity. We saw data move to storage around virtualisation, and we saw more data move to the cloud alongside applications, so it’s only natural that you saw that take place with cloud-native and containers too.

DataStax has built its Cassandra as a service, Astra DB, on Kubernetes. I think that is about the strongest endorsement I can give it

Patrick McFadin, DataStax

Kubernetes is now 10. How do you think about it today?

I think many people think Kubernetes is finished and that things will continue to grow. I don’t see that. I think we are in the surly teenager phase, and there are still many things that need to be worked out. But it’s a stable system you can rely on. DataStax has built its Cassandra as a service, Astra DB, on Kubernetes. I think that is about the strongest endorsement I can give it.

What problems still exist around Kubernetes when it comes to data and storage?

The maturity of the project is going to be in efficiency. It still takes a lot of manual effort to run a Kubernetes deployment smoothly. Scaling up is easy, but scaling back is hard, to the point that it’s rarely done. GenAI [generative artificial intelligence] will no doubt make an appearance here as the idea of AIOps [artificial intelligence for IT operations] gains ground. Explaining what you want from an application standpoint and seeing the infrastructure emerge will seem like magic, but really that’s just progress.