Getty Images/iStockphoto

Feature

Why a service mesh may not be for everyone

Choose carefully when considering best practices and approaches to software-defined network optimisation, to avoid adding complexity

Fleur Doidge

Published: 04 Jul 2023

A service mesh is a layer of IT infrastructure that controls service-to-service communication over a network to enable separate parts of an application to communicate with each other. It is often used as an elegant approach to deliver communications between containers in a cloud-native architecture.

But while the optimisation of data communications between applications in containerised, hybrid environments can help master complexity, and even cost, this doesn’t necessarily mandate a service mesh architecture.

Steve Judd, solutions architect at machine identity management supplier and Kubernetes consultant Venafi, warns that using a service mesh to separate application logic from the networking side won’t necessarily reduce complex IT management headaches for every organisation. But it does provide some security benefits.

Kubernetes containers need to talk to each other over some kind of network. But having a flat communications architecture, where any container can “talk” to any other container in a cluster, is problematic, he tells Computer Weekly.

“For security reasons, you probably want to isolate some of those workloads, so they’re not free to talk to whatever they like,” adds Judd. “You might want different containers to be able to effectively authenticate with each other through mutual TLS [transport layer security].”

Malicious actors may target cloud-native architectures, so security and compliance considerations remain key when thinking about network optimisation and management. Chief information security officers (CISOs) may require mutual TLS. Service mesh by default can help manage those security certificates in a self-contained way.

Major service mesh options, including Linkerd or Istio, provide mutual TLS porting out of the box, with Istio being more of a “kitchen sink” approach than Linkerd, says Judd.

Complexity plus – or minus for some

For now, highly regulated industries and sectors such as banks and other financial services providers are likely to be the biggest adopters of service mesh technologies. This is because a service mesh offers a way to meet certain regulatory compliance requirements in a Kubernetes environment.

Judd explains: “You don’t have to touch the applications running in the microservices applications. They don’t need to care that you’ve got mutual TLS and encrypted traffic, because that’s done externally to those microservices. All they know is that they can talk to this kind of port and the thing at the other end responds.”

By capturing every single request that goes between two containers or workloads in a cluster, such a network architecture also delivers strong observability.

“You can do things like traceability, you can get much better metrics for performance in terms of latency and volume and bandwidth and so on, in addition to kinds of network policies for traffic management and control,” says Judd.

Such traceability offers IT departments the ability to implement “circuit breaker” patterns for greater resiliency, for example enabling a shift to a different workload if the first one fails to respond within a specified number of seconds.

According to Judd, a service mesh also offers more options when it comes to upgrading microservices.

Suppose an application is made up of five “version one” microservices, with all the network traffic happily going between them. If you want to upgrade one of those microservices to version two without committing to send all the traffic in the first instance, you can send a tranche of traffic, according to a specified metric.

Alongside gradual roll-outs of microservices, because service mesh “understands” network traffic, you can add that into the mesh configuration while testing.

“That’s called a ‘canary release’, like the canary in a coalmine to see if the canary dies,” Judd says. “You just say, ‘Send 5% of all my traffic to this new service’, and monitor it. If it looks like it’s performing, you can say, ‘Now send 50%’.”

But, as Judd points out, Istio’s more “kitchen sink” offering can be quite complicated, requiring a huge amount of configuration and related knowledge to get right. “Then you have to spend ages troubleshooting, maybe. So that’s kind of a downside,” he says.

Linkerd offers a much slimmer mesh out of the box, yet has integrations with various other technologies. In canary deployments, you’ll integrate with a different tool to provide that aspect as part of the mesh, says Judd.

However, requirements can be met some other way if you don’t specifically need to opt for service mesh. Probably only about 15% of all the Kubernetes clusters in the world today run a service mesh. Of that 15%, a majority share is Istio-related service mesh, with the remainder largely Linkerd, says Judd.

Outsourcing remains an option

When it comes to getting started with a service mesh, Kai Waehner, field chief technology officer (CTO) at Confluent, notes that IT departments have a number of choices: either work with Istio and Envoy, or Linkerd with Sidecar, or target some service mesh capabilities via software-as-a-service (SaaS) provision “under the hood”.

“Our customers have told us, ‘We just want to use your service and you should optimise that internally’. The software providers can do that for you, so you don’t have to care,” he says.

A core engine can manage networking and routing optimisations, achieving internal terminations, rate-limiting connections and linking or routing physical cluster traffic internally for a lower total cost. Providers such as Confluent internally implement service mesh patterns, Waehner points out.

With optimised network connections and transfers across network performance patterns, with the right configuration, cost can be reduced. That’s a top use case for cellular networking, beyond “big contract cloud”, he says.

Manish Mangal, global head of 5G and network services business at Tech Mahindra, says the company uses a service mesh when migrating telco networks’ cloud-native platforms towards 5G for optimised communication and multicloud deployments.

“We recommend service mesh architectures like Istio to our customers to secure, connect and monitor services at scale and across geographies with few or no service code changes,” says Mangal.

He says a service mesh can assist network performance across distributed systems with efficient traffic routing, load balancing, traffic encryption and security, observability and monitoring, service resilience, and service discovery, helping reduce downtime, and potentially cost as well.

“Containers and microservices applications – and their developers – may require logical isolation from the complexity of network routing and security requirements. The abstraction provided by a service mesh enables rapid and flexible deployment of microservices independent of the physical network.”

Mangal says service mesh endpoints can be run in any container-based architecture and between clouds, tracking latency and performance metrics for cross-cloud service delivery. But as Mangal notes, service mesh technology is still considered a relatively new approach to working with distributed container-based applications in a software-centric way.

When an organisation decides it needs a service mesh, there is a range of providers that could appear on a shortlist. There are about 12 service mesh brands available for managing increasingly complicated and heterogeneous network topologies in a more software-defined way. Each will have different strengths and feature sets. For instance, an organisation may decide it has multiple requirements around security and compliance, which often are not supported in Kubernetes. This requirement dictates the service mesh platform choice.

Then there is the question of architectural fit. According to Roman Spitzbart, vice-president EMEA for solutions engineering at Dynatrace, a service mesh is a good fit in an organisation which uses a highly distributed and microservices-based enterprise architecture. A clear understanding of the architecture is needed, which means additional observability tools. But using additional tooling offers more opportunities for things to go wrong.

This perhaps is among the barriers smaller organisations face when considering a service mesh. “Only large organisations are willing to go for this level of complexity; the benefits for them will be large enough. For small organisations, I don’t want to say service mesh is going to be a headache, but it’s going to be a lot of complexity for not a lot of benefit,” says Spitzbart.

Why a service mesh may not be for everyone

Choose carefully when considering best practices and approaches to software-defined network optimisation, to avoid adding complexity

Complexity plus – or minus for some

Outsourcing remains an option

Read more about service meshes

Read more on Software-defined networking (SDN)

What is a service mesh?

Linkerd

Sidecarless service mesh: Fad or the future?

Kubernetes multi-cluster users tap service mesh alternatives