Platform engineering - Mirantis: A route to resilience & compliance as-a-Service

This is a guest post for the Computer Weekly Developer Network written by William Rizzo in his capacity as strategy lead for Mirantis.

Mirantis is known for its approach designed to simplify cloud native infrastructure , from onboarding to operations and offering one consistent experience from the datacentre to the edge.

Rizzo writes as follows…

The challenge is clear: organisations need to increase developer velocity while simplifying operations and managing compliance risks. They need to do it at scale for increasing numbers of clusters in far-flung locations that are running on diverse infrastructures in public and private clouds, on bare metal and at the edge. Platform teams are under pressure to boost productivity and speed without letting reliability or regulatory obligations slip.

Modern platform engineering, done right, can meet this challenge by delivering “resilience and compliance as-a-Platform Service” and instead of each development team building their own failover logic or wrestling with policy checklists, the platform engineering team provides resilience and compliance out-of-the-box as a menu of composable self-service capabilities. Developers just consume these services – quickly, easily, in well-documented ways, with high reliability and low cognitive load.

Resilience without complexity

Embedding resilience and compliance into platforms isn’t simple, but today’s CNCF and broader open source ecosystems increasingly provide all the moving parts platform engineers need.

As one example, Linkerd, the ultra-lightweight CNCF service mesh, can be used to federate multi-cluster services – to treat services deployed in multiple clusters as one logical service. Linkerd automatically balances traffic across clusters and handles failover. So Linkerd-based resilience-as-a-service delivers (one flavor of) high availability by default.

In a real setup, platform engineers will leverage the capabilities of components like Linkerd with other components (and Kubernetes’ native resilience features), providing a spectrum of composable options to satisfy resilience requirements of different workload types and business needs – applying architectural patterns and automation to make services composable and self-service simple.

Developers gain confidence that their apps can withstand outages, without needing specialized skills in distributed systems or bespoke scripting for failover. It’s the epitome of what an IDP should do: provide useful infrastructure capabilities as a simple, self-service abstractions.

Compliance baked in

The other side of the coin is compliance and the security, policy and regulatory requirements that software must adhere to.

Platform engineering bakes compliance into the platform. The first step here, again, is to choose components (for example, Open Policy Agent, Kyverno, etc.) enabling policies to be compiled in human-readable form (for example, YAML), applied, then implemented, monitored and audited. With a platform that automatically enforces policies and security controls globally, developers are relieved from worrying about compliance.

In a compliance-as-a-service setup, developers get clusters (or namespaces) where the necessary guardrails are already in place. Developers can focus on building applications – largely without the significant overhead of configuring them to work for different security and compliance requirements.

Framework required

Achieving resilience and compliance at scale requires robust automation, especially in a world of many dynamically-changing clusters. What is needed is a framework for IDP creation, along with an ecosystem that solves for the scale and complexity challenges that delivers the following.

  • Composable components Ability to select from among a broad menu of component services, pre-validated, easily combined into working IDP architectures that ‘just work.’
  • Declarative configuration GitOps-friendly, versionable, human-readable declarative configuration within a composable system that provides a clear hierarchy and separation of concerns is the only way to manage such high-complexity systems.
  • Multi-cloud and multi-cluster capability – The framework must abstract infrastructures in a way that can keep up with hyperscaler, private cloud and hardware roadmaps – ideally all the way out to the edge.
  • Continuous reconciliation and self-healing – Essential for resilience and compliance: the framework should leverage Kubernetes’ own continuous-reconciliation logic to prevent IDPs from drifting away from their canonical declared states and must work hard to heal around failures without compromising resilience, compliance and other technical and business requirements. 

All of this must be done from a single point of control – with extra credit given if the single point of control isn’t also a potential single point of failure. Open source k0rdent ticks all these boxes.

It provides a fast-growing library of composable, pre-validated components; enables GitOps and CI-compatible declarative configuration and management of full stacks with segregation of concerns; leverages Kubernetes ClusterAPI for universal, community-accelerated management of virtually any cloud or infrastructure; and provides automatic drift correction and self-healing out of the box. k0rdent scales to manage many IDP configurations across many infrastructures – thousands of child clusters.

The k0rdent management cluster – that single point of control – can be backed up with standard tools (for example, Velero).

A new model for engineering

By delivering resilience and compliance as integrated platform services, platform engineering teams shift the burden away from developers without compromising on quality or control.

By using open source technologies (like Linkerd and k0rdent) and industry standards (like Cluster API), the solution remains flexible and community-powered. There is no vendor lock-in and the platform can evolve with the ecosystem – whether that’s adopting new cloud providers, new policy agents, or scaling to thousands of clusters.

Platform engineering isn’t about saying “no” or adding bureaucracy; it’s about enabling innovation with a safety net. By using tools like Linkerd and k0rdent to weave that safety net into the platform, we empower developers to climb higher and faster than ever, without fear of falling. And that is a recipe for both happy engineers and successful businesses.

William Rizzo is a CNCF Ambassador and Linkerd Ambassador currently helping Mirantis customers design, build and run initiatives across edge, AI and platform engineering domains. Rizzo has served in many different roles in his career – from engineering and pre-sales to product ownership and consulting – spanning high-performance computing, storage and distributed systems. William lives and works in Amsterdam, the Netherlands.