Dynatrace at Kubecon CloudNativeCon: SRE is ABC 123 for Kubernetes

Kubernetes has emerged as the de facto orchestration layer for modern cloud-native environments.

Very few people (if any) would argue with that statement and proposition.

We know today that Kubernetes helps enable organisations to build and deploy new digital (increasingly cloud-native) services at speed.

However, the agility that’s inherent to Kubernetes is only valuable if it can be harnessed throughout the entire software development lifecycle to ensure the reliability, scalability and performance of containerised applications.

Alois Reitbauer says this level of real world Kubernetes control is more difficult than many might anticipate.

In his position as the chief technical strategist at software intelligence company Dynatrace, Reitbauer explains the issue.

He says that much of the problem stems from the fact that many software delivery methodologies have not been updated from the traditional approaches used for monolithic applications, which have a lower release velocity.

Reitbauer writes as follows to discuss the playing field ahead.

Let’s rethink SRE

To prevent bad Kubernetes deployments from impacting user experience, SRE practices need to ‘shift left’. SRE can no longer be seen as a responsibility for operations teams, it must be applied across the entire software development lifecycle (SDLC).

Of course, this is easier said than done given the IT skills shortage. Most organisations won’t have the luxury of being able to simply hire more SRE practitioners to extend these principles further across the SDLC. Instead, they will need to look towards capabilities such as automated analysis, to augment their existing teams.

With more services to manage, and a higher velocity of deployment, most previously manual decisions need to be automated.

Dynatrace’s Reitbauer: SRE practices need to ‘shift left’.

One of the most important steps in embracing a more mature SRE practice is to establish Service Level Objectives (SLOs). These objectives define the criteria that any service needs to fulfil in production. By establishing SLOs for key metrics such as availability, throughput, and response time, organisations can create a baseline for any new deployment, to ensure only good builds reach production and guarantee better quality releases.

Keptn donated to CNCF

However, it’s only possible to evaluate new builds against these SLOs at the speed of cloud-native delivery if the process is fully automated.

That’s why we created the Keptn project and donated it to the CNCF, to enable organisations to automate release decisions such as pushing a build through the pipeline, validating a canary deployment and implementing SLO-based performance analysis as part of their continuous deployment process.

It also enables them to automate operations functions such as scaling up to support increased demand.

Augmenting SRE via observability & AIOps

Keptn uses data from testing, monitoring and observability solutions to decide what steps should be taken to progress a deployment. As a result, organisations can automatically detect architectural and deployment changes that have a negative impact on performance and scalability, to prevent those builds from progressing along the pipeline.

DevOps and SRE teams can improve this process even further by applying AIOps capabilities to analyse the technical root cause of a problem, identify the precise reasons why a new build failed to pass a particular SLO, and determine the steps they need to take to resolve the issue.

If they can implement these practices effectively, organisations will be on a much firmer footing to ensure Kubernetes provides the speed and agility they’re seeking to drive shorter development cycles, produce better quality software and deliver faster innovation to the business.