Enterprise Path To Service Mesh Architectures
Enterprise Path To Service Mesh Architectures
m
pl
im
en
ts
of
The Enterprise
Path to
Service Mesh
Architectures
Second Edition
Decoupling at Layer 5
Lee Calcote
REPORT
Try NGINX Plus and
NGINX App Protect Free
Get high-performance application delivery and security for
microservices. NGINX Plus is a software load balancer,
API gateway, and microservices proxy. NGINX App Protect
is a lightweight, fast web application firewall (WAF) built
on proven F5 technology and designed for modern apps
and DevOps environments.
Download at nginx.com/freetrial
SECOND EDITION
Lee Calcote
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. The Enterprise
Path to Service Mesh Architectures, the cover image, and related trade dress are trade‐
marks of O’Reilly Media, Inc.
The views expressed in this work are those of the author, and do not represent the
publisher’s views. While the publisher and the author have used good faith efforts to
ensure that the information and instructions contained in this work are accurate, the
publisher and the author disclaim all responsibility for errors or omissions, includ‐
ing without limitation responsibility for damages resulting from the use of or reli‐
ance on this work. Use of the information and instructions contained in this work is
at your own risk. If any code samples or other technology this work contains or
describes is subject to open source licenses or the intellectual property rights of oth‐
ers, it is your responsibility to ensure that your use thereof complies with such licen‐
ses and/or rights.
This work is part of a collaboration between O’Reilly and NGINX. See our statement
of editorial independence.
978-1-492-08933-9
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
2. Contrasting Technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Client Libraries 23
API Gateways 26
Container Orchestrators 29
Service Meshes 30
Conclusion 37
iii
The Performance of the Data Plane 68
Conclusion 69
5. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Adopting a Service Mesh 73
iv | Table of Contents
Preface
v
Who This Report Is For
The intended readers are developers, operators, architects, and
infrastructure (IT) leaders who are faced with operational challenges
of distributed systems. Technologists need to understand the various
capabilities of and paths to service meshes so that they can make an
informed decision about selecting and investing in an architecture
and deployment model. This will allow them to provide visibility,
resiliency, traffic, and security control of their distributed applica‐
tion services.
Acknowledgments
Many thanks to Matt Turner, Ronald Petty, Karthik Gaekwad, Alex
Blewitt, Dr. Girish Ranganathan (Dr. G), and the occasional two t’s
Matt Baldwin for their many efforts to ensure the technical correct‐
ness of this report.
vi | Preface
CHAPTER 1
Service Mesh Fundamentals
1
preexisting or created, are used to address distributed systems chal‐
lenges in microservices environments. It’s in these environments
that many teams first consider their path to a service mesh. The
sheer volume of services that must be managed on an individual,
distributed basis (versus centrally as with monoliths) and the chal‐
lenges of ensuring reliability, observability, and security of these
services cannot be overcome with outmoded paradigms, hence the
need to reincarnate prior thinking and approaches. New tools and
techniques must be adopted.
Given the distributed (and often ephemeral) nature of microservi‐
ces, and how central the network is to their functioning, it behooves
us to reflect on the fallacies that networks are reliable, are without
latency, and have infinite bandwidth and that communication is
guaranteed (it’s worth reflecting on the fact that these same assump‐
tions are held for service components using internal function calls).
When you consider how critical the ability to control and secure
service communication is to distributed systems that rely on net‐
work calls with every transaction every time an application is
invoked, you begin to understand that you are under-tooled and see
why running more than a few microservices on a network topology
that is in constant flux is so difficult. In the age of microservices, a
new layer of tooling for the caretaking of services is needed—a ser‐
vice mesh is needed.
• A services-first network
• A developer-driven network
• A network that is primarily concerned with removing the need
for developers to build infrastructure concerns into their appli‐
cation code
• A network that empowers operators with the ability to declara‐
tively define network behavior, node identity, and traffic flow
through policy
The more services, the more value derived from the mesh. In subse‐
quent chapters, I show how service meshes provide value outside of
the use of microservices and containers and help modernize existing
services (running on virtual or bare-metal servers) as well.
Network engineers also receive training in the OSI model. The OSI
model is shown in Figure 1-2, as a refresher for those who have not
seen it in some time. We will refer to various layers of this model
throughout the book.
Observability
Many organizations are initially attracted to the uniform observabil‐
ity that service meshes provide. No complex system is ever fully
healthy. Service-level telemetry illuminates where your system is
behaving sickly, illuminating difficult-to-answer questions like why
your requests are slow to respond. Identifying when a specific
Traffic control
Service meshes provide granular, declarative control over network
traffic to determine where a request is routed to perform canary
release, for example. Resiliency features typically include circuit
breaking, latency-aware load balancing, eventually consistent service
discovery, timeouts, deadlines, and retries.
Timeouts provide cancellation of service requests when a request
doesn’t return to the client within a predefined time. Timeouts limit
the amount of time spent on any individual request and are
enforced at a point in time when a response is considered invalid or
too long for a client (user) to wait. Deadlines are an advanced service
mesh feature in that they facilitate the feature-level timeouts (a col‐
lection of requests) rather than independent service timeouts, help‐
ing to avoid retry storms. Deadlines deduct time left to handle a
request at each step, propagating elapsed time with each down‐
stream service call as the request travels through the mesh. Time‐
outs and deadlines, illustrated in Figure 1-8, can be considered as
enforcers of your service-level objectives (SLOs).
When a service times out or is unsuccessfully returned, you might
choose to retry the request. Simple retries bear the risk of making
things worse by retrying the same call to a service that is already
under water (retry three times = 300% more service load). Retry
budgets (aka maximum retries), however, provide the benefit of mul‐
tiple tries but with a limit, so as to not overload what is already a
load-challenged service. Some service meshes take the elimination
Instead of retrying and adding more load to the service, you might
elect to fail fast and disconnect the service, disallowing calls to it.
Circuit breaking provides configurable timeouts (or failure thresh‐
olds) to ensure safe maximums and facilitate graceful failure, com‐
monly for slow-responding services. Using a service mesh as a
separate layer to implement circuit breaking will avoid excessive
overhead on applications (services) at a time when they are already
oversubscribed.
Rate limiting (throttling) is used to ensure stability of a service so
that when one client causes a spike in requests, the service continues
to run smoothly for other clients. Rate limits are usually measured
over a period of time, but you can use different algorithms (fixed or
sliding window, sliding log, etc.). Rate limits are typically operation‐
ally focused on ensuring that your services aren’t oversubscribed.
Security
Most service meshes provide a certificate authority to manage keys
and certificates for securing service-to-service communication. Cer‐
tificates are generated per service and provide a unique identity of
that service. When sidecar proxies are used (discussed later in Chap‐
ter 3), they take on the identity of the service and perform life-cycle
management of certificates (generation, distribution, refresh, and
revocation) on behalf of the service. In sidecar proxy deployments,
you’ll typically find that local TCP connections are established
between the service and sidecar proxy, whereas mutual Transport
Layer Security (mTLS) connections are established between proxies,
as demonstrated in Figure 1-9.
Encrypting traffic internal to your application is an important secu‐
rity consideration. Your application’s service calls are no longer kept
inside a single monolith via localhost; they are exposed over the net‐
work. Allowing service calls without TLS on the transport is setting
Decoupling at Layer 5
Service meshes help you avoid bloated service code, fat on infra‐
structure concerns.
You can avoid duplicative work in making services production-
ready by singularly addressing load balancing, autoscaling, rate lim‐
iting, traffic routing, and so on. Teams avoid inconsistency of
implementation across different services to the extent that the same
set of central control is provided for retries and budgets, failover,
deadlines, cancellation, and so forth. Implementations done in silos
lead to fragmented, nonuniform policy application and difficult
debugging.
Service meshes insert a dedicated infrastructure layer between Dev
and Ops, separating what are common concerns of service commu‐
nication by providing independent control over them. The service
mesh is a networking model that sits at a layer of abstraction above
TCP/IP. Without a service mesh, operators are still tied to develop‐
ers for many concerns as they need new application builds to con‐
trol network traffic, change authorization behavior, implement
resiliency, and so on. The decoupling of Dev and Ops is key to pro‐
viding autonomous independent iteration.
Decoupling is an important trend in the industry. If you have a sig‐
nificant number of services, you have at least these three roles:
Conclusion
The data plane carries the actual application request traffic between
service instances. The control plane configures the data plane, pro‐
vides a point of aggregation for telemetry, and also provides APIs for
modifying the mesh’s behavior. The management plane extends gov‐
ernance and backend systems integration and further empowers
personas other than strictly operators while also significantly bene‐
fitting developers and product/service owners.
Decoupling of Dev and Ops avoids diffusion of the responsibility of
service management, centralizing control over these concerns in a
new infrastructure layer: Layer 5.
Service meshes make it possible for services to regain a consistent,
secure way to establish identity within a datacenter and, further‐
more, do so based on strong cryptographic primitives rather than
deployment topology.
With each deployment of a service mesh, developers are relieved of
their infrastructure concerns and can refocus on their primary task
(creating business logic). More-seasoned software engineers might
have difficulty breaking the habit and trusting that the service mesh
Conclusion | 21
will provide, or even displacing the psychological dependency on
their client libraries.
Many organizations find themselves in the situation of having incor‐
porated too many infrastructure concerns into application code.
Service meshes are a necessary building block when composing
production-grade microservices. The power of easily deployable ser‐
vice meshes will allow for many smaller organizations to enjoy fea‐
tures previously available only to large enterprises.
Client Libraries
Client libraries (sometimes referred to as microservices frame‐
works) became very popular when microservices took a foothold in
modern application design as a means to avoid rewriting the same
logic in every service. Example frameworks include the following:
Twitter Finagle
An open source remote procedure call (RPC) library built on
Netty for engineers who want a strongly typed language on the
Java Virtual Machine (JVM). Finagle is written in Scala.
23
Netflix Hystrix
An open source latency and fault tolerance library designed to
isolate points of access to remote systems, services, and third-
party libraries; stop cascading failure; and enable resilience.
Hystrix is written in Java.
Netflix Ribbon
An open source Inter-Process Communication (IPCs) library
with built-in software load balancers. Ribbon is written in Java.
Go kit
An open source toolkit for building microservices (or elegant
monoliths) with gRPC as the primary messaging pattern. With
pluggable serialization and transport, Go kit is written in Go.
Other examples include Dropwizard, Spring Boot, Akka, and so on.
Client Libraries | 25
this is changing, and their use for influencing or implementing busi‐
ness logic is on the rise.
API Gateways
How do API gateways interplay with service meshes?
This is a very common question, and the nuanced answer puzzles
many, particularly given that within the category of API gateways
lies a subspectrum. API gateways come in a few forms:
NGINX
As a stable, efficient, ubiquitous L7 proxy, NGINX is commonly
found at the core of API gateways. It may be used on its own or
wrapped with additional features to facilitate container orchestrator
native integration or additional self-service functionality for devel‐
opers. For example:
Envoy
The Envoy project has also been used as the foundation for API
gateways:
Ambassador
With its basis in Envoy, Ambassador is an API gateway for
microservices that functions standalone or as a Kubernetes
Ingress Controller.
API Gateways | 27
API Management
API gateways complement other components of the API manage‐
ment ecosystem, such as API marketplaces and API publishing por‐
tals—both of which are surfacing in service mesh offerings. API
management solutions provide analytics, business data, adjunct pro‐
vider services like single sign-on, and API versioning control. Many
of the API management vendors have moved API management sys‐
tems to a single point of architecture, designing their API gateways
to be implemented at the edge.
An API gateway can call downstream services via service mesh by
offloading application network functions to the service mesh. Some
API management capabilities that are oriented toward developer
engagement can overlap with service mesh management planes in
the following ways:
Container Orchestrators
Why is my container orchestrator not enough? What if I’m not
using containers? What do you need to continuously deliver and
operate microservices? Leaving CI and their deployment pipelines
aside for the moment, you need much of what the container orches‐
trator provides at an infrastructure level and what it doesn’t at a
services level. Table 2-1 takes a look at these capabilities.
Container Orchestrators | 29
Additional key capabilities include simple application health and
performance monitoring, application deployments, and application
secrets.
Service meshes are a dedicated layer for managing service-to-service
communication, whereas container orchestrators have necessarily
had their start and focus on automating containerized infrastructure
and overcoming ephemeral infrastructure and distributed systems
problems. Applications are why we run infrastructure, though.
Applications have been and are still the North Star of our focus.
There are enough service and application-level concerns that addi‐
tional platforms/management layers are needed.
Container orchestrators like Kubernetes have different mechanisms
for routing traffic into the cluster. Ingress Controllers in Kubernetes
expose the services to networks external to the cluster. Ingresses can
terminate Secure Sockets Layer (SSL) connections, execute rewrite
rules, and support WebSockets and sometimes TCP/UDP, but they
don’t address the rest of service-level needs.
API gateways address some of these needs and are commonly
deployed on a container orchestrator as an edge proxy. Edge proxies
provide services with Layer 4 to Layer 7 management while using
the container orchestrator for reliability, availability, and scalability
of container infrastructure.
Service Meshes
In a world of many service meshes, you have a choice when it comes
to which service mesh(es) to adopt. For many of you, your organiza‐
tion will end up with more than one type of service mesh. Every
organization I’ve been in has multiple hypervisors for running VMs,
multiple container runtimes, different container orchestrators in
use, and so on. Infrastructure diversity is a reality for enterprises.
Diversity is driven by a broad set of workload requirements. Work‐
loads vary from those that are process-based to those that are event-
driven in their design. Some run on bare metal, while other
workloads execute in functions. Others represent each and every
style of deployment artifact (container, virtual machine, and so on)
in between. Different organizations need different scopes of service
mesh functionality. Consequently, different service meshes are built
with slightly divergent use cases in mind, and therefore, the archi‐
tecture and deployment models of service meshes differ between
Service Meshes | 31
Multi-Vendor Service Mesh Interoperation (Hamlet)
This is a set of API standards for enabling service mesh federa‐
tion. Created by VMware.
Service Mesh Interface (SMI)
This is a standard interface for service meshes on Kubernetes.
Created by Microsoft; Meshery is the official SMI conformance
tool used to ensure that a cluster is properly configured and that
its behavior conforms to official SMI specifications.
Data Plane
Service proxies (gateways) are the elements of the data plane. How
many are present is both a factor of the number of services you’re
running and the design of the service mesh’s deployment model.
Some service mesh projects have built new proxies, while many
others have leveraged existing proxies. Envoy is a popular choice as
the data plane element.
Service Meshes | 33
Nelson
Takes advantage of integrations with Envoy, Prometheus, Vault,
and Nomad to provide Git-centric, developer-driven deploy‐
ments with automated build-and-release workflow. Open
source. From Verizon Labs. Written in Scala.
Control Plane
Control plane offerings include the following:
Consul
Announced service mesh capable intention in v1.5. Became a
full service mesh in v1.8. Consul uses Envoy as its data plane,
offering multicluster federation.
• Open and closed source. From HashiCorp. Primarily writ‐
ten in Go.
Linkerd
Linkerd is hosted by the Cloud Native Computing Foundation
(CNCF) and has undergone two major releases with significant
architectural changes and an entirely different code base used
between the two versions.
Linkerd v1
The first version of Linkerd was built on top of Twitter Finagle.
Pronounced “linker-dee,” it includes both a proxying data plane
and a control plane, Namerd (“namer-dee”), all in one package.
• Open source. Written primarily in Scala.
• Data plane can be deployed in a node proxy model (com‐
mon) or in a proxy sidecar (not common). Proven scale,
having served more than one trillion service requests.
• Supports services running within container orchestrators
and as standalone virtual or physical machines.
• Service discovery abstractions to unite multiple systems.
Linkerd v2
The second major version of Linkerd is based on a project for‐
merly known as Conduit, a Kubernetes-native and Kubernetes-
only service mesh announced as a project in December 2017. In
contrast to Istio and in learning from Linkerd v1, Linkerd v2’s
design principles revolve around a minimalist architecture and
Service Meshes | 35
Interface (SMI) specification as its API, NGINX Service Mesh
presents its control plane as a CLI, meshctl.
• Open and closed source. From NGINX. Primarily written
in C.
Other examples include Open Service Mesh, Maesh, Kuma, and App
Mesh.
Many other service meshes are available. This list is intended to give
you a sense of the diversity of service meshes available today. See the
community-maintained, Layer5 service mesh landscape page for a
complete listing of service meshes and their details.
Management Plane
The management plane resides a level above the control plane and
offers a range of potential functions between operational patterns,
business systems integration, and enhancing application logic while
operating across different service meshes. Among its uses, a man‐
agement plane can perform workload and mesh configuration vali‐
dation—whether in preparation for onboarding a workload onto the
mesh or in continuously vetting their configuration as you update to
new versions of components running your control and data planes
or new versions of your applications. Management planes help
organizations running a service mesh get the most out their invest‐
ment. One aspect of managing service meshes includes performance
management—a function at which Meshery excels.
Meshery
The service mesh management plane for adopting, operating,
and developing on different service meshes, Meshery integrates
business processes and application logic into service meshes by
deploying custom WebAssembly (WASM) modules as filters in
Envoy-based data planes. It provides governance, policy and
performance, and configuration management of service meshes
with a visual topology for designing service mesh deployments
and managing the fine-grained traffic control of a service mesh.
• Open source. Created by Layer5. Primarily written in Go.
Proposed for adoption in the CNCF.
Conclusion
Client libraries (microservices frameworks) come with their set of
challenges. Service meshes move these concerns into the service
proxy and decouple them from the application code. API gateways
are the technology with the most overlap in functionality, but they
are deployed at the edge, not on every node or within every pod. As
service mesh deployments evolve, I’m seeing an erosion of sepa‐
rately deployed API gateways and in an inclusion of them within the
service mesh. Between client libraries and API gateways, service
meshes offer enough consolidation functionality to either diminish
their need or replace them entirely with a single layer of control.
Container orchestrators have so many distributed systems chal‐
lenges to address within lower-layer infrastructure that they’ve yet to
holistically address services and application-level needs. Service
meshes offer a robust set of observability, security, traffic, and
application controls beyond that of Kubernetes. Service meshes are a
necessary layer of cloud native infrastructure. There are many ser‐
vice meshes available along with service mesh specifications to
abstract and unify their functionality.
Conclusion | 37
CHAPTER 3
Adoption and Evolutionary
Architectures
Piecemeal Adoption
Desperate to gain an understanding of what’s going on across their
distributed infrastructure, many organizations seek to benefit from
auto-instrumented observability first, taking baby steps in their path
to a full service mesh after initial success and operational comfort
have been achieved. Stepping into using a service mesh for its ability
to provide enhanced observability is a high-value, relatively safe first
step. First steps for others might be on a parallel path. A financial
organization, for example, might seek improved security with strong
identity (assignment of a certificate to each individual service) and
strong encryption through mutual TLS between each service, while
others might begin with an ingress proxy as their entryway to a
larger service mesh deployment.
39
Consider an organization that has hundreds of existing services run‐
ning on virtual machines (VMs) external to the service mesh that
have little to no service-to-service traffic, rather nearly all of the traf‐
fic flows from the client to the service and back to the client. This
organization can deploy a service mesh ingress (e.g., Istio Ingress
Gateway) and begin gaining granular traffic control (e.g., path
rewrites) and detailed service monitoring without immediately
deploying hundreds of service proxies (Figure 3-1).
You can start with a full service mesh deployment from the get-go,
or you can work your way up to one.
Retrofitting a Deployment
Recognize that although some greenfield projects have the luxury of
incorporating a service mesh from the start, most organizations will
have existing services (monoliths or otherwise) that they’ll need to
onboard to the mesh. Rather than a container, these services could
be running in VMs or bare-metal hosts. Fear not! Some service
meshes squarely address such environments and help with moderni‐
zation of such services, allowing organizations to renovate their
services inventory by:
Evolutionary Architectures
Different phases of adoption provide multiple paths to service mesh
architectures.
Retrofitting a Deployment | 43
Client Libraries
Some people consider libraries to be the first service meshes.
Figure 3-2 illustrates how the use of a library requires that your
architecture has application code either extending or using primi‐
tives of the chosen library(ies). Additionally, your architecture must
consider whether to use language-specific frameworks and (poten‐
tially) the application servers to run them.
Evolutionary Architectures | 45
Client Libraries as a Proxyless Service Mesh
Recognizing the merits of the service mesh design, gRPC—a high-
performance, open source remote procedure call framework—has
worked to support Envoy’s xDS APIs such that gRPC can be dynam‐
ically (re)configured, as shown in Figure 3-4.
Service meshes are also used to enforce policy about what egress
traffic is leaving your cluster. Typically, this is accomplished in one
of a couple of ways:
Evolutionary Architectures | 47
• Registering the external services with your service mesh (so that
they can match traffic against the external destination) and con‐
figuring traffic control rules to both allow and govern external
service calls (e.g., provide timeouts on external services)
• Calling external services directly without registering them with
your service mesh but configuring your mesh to allow traffic
destined for external services (maybe for a specific IP range) to
bypass the service proxies
Router Mesh
Depicted in Figure 3-6, a router mesh performs service discovery
and provides load balancing for service-to-service communication.
All service-to-service communication flows through the router
mesh, which provides circuit breaking through active health checks
(measuring the response time for a service, and when latency/time‐
out threshold is crossed, the circuit is broken) and retries.
Given the following disadvantages, I generally recommend skipping
this model, if you can:
Advantages
• A starting point for building a brand-new microservices archi‐
tecture or for migrating from a monolith
Disadvantages
• When the number of services increases, it becomes difficult to
manage
• A crutch on your path to a better architecture that can be over‐
whelmed with a single point of failure
Evolutionary Architectures | 49
• Easier to scale distribution of configuration information than it
is with sidecar proxies (if you’re not using a control plane).
• This model is useful for deployments that are primarily physical
or virtual server based. Good for large monolithic applications.
Disadvantages
• Coarse support for encryption of service-to-service communi‐
cation provided by host-to-host level encryption and authenti‐
cation policies.
• Blast radius of a proxy failure includes all applications on the
node, which is essentially equivalent to losing the node itself.
• Not a transparent entity; services must be aware of its existence.
Evolutionary Architectures | 51
Sidecar Proxies with a Control Plane
Most service mesh projects and their deployment efforts promote
and support this deployment model foremost. In this model, you
provision a control plane (and service mesh) and get the logs and
traces out of the service proxies. A powerful aspect of a full service
mesh is that it moves away from thinking of proxies as isolated com‐
ponents and acknowledges the network they form as something val‐
uable unto itself. In essence, the control plane is what takes service
proxies and forms them into a service mesh. When you’re using the
control plane, you have a service mesh, as illustrated in Figure 3-9.
Evolutionary Architectures | 53
Disadvantages
• Sidecar footprint—per service overhead of running a service
proxy sidecar
Evolutionary Architectures | 55
Conclusion
In many respects, deployment of a control plane is what defines a
service mesh. Otherwise, what you have is an unmanaged collection
of service proxies.
Service meshes support onboarding existing (noncontainerized)
services onto the mesh. Service meshes can be deployed across mul‐
tiple clusters. Federation of disparate service meshes is now being
facilitated.
As technology evolves, capabilities are sometimes commoditized
and pushed down the stack. Data plane components will become
mostly commoditized. Standards like TCP/IP incorporated solu‐
tions to flow control and many other problems into the network
stack itself. This means that that piece of code still exists, but it has
been extracted from your application to the underlying networking
layer provided by your operating system.
It’s commonplace to find deployments with load balancers deployed
external to the cluster handling north-south traffic in addition to the
ingress/egress proxies that handle east-west traffic within the service
mesh. Over time, these two separate tiers of networking will look
more and more alike.
57
The Power of the Data Plane
Control planes bring much-needed element management to opera‐
tors. Data planes composed of any number of service proxies need
control planes to go about the task of applying service mesh-specific
use cases to their fleet of service proxies. Configuration manage‐
ment, telemetry collection, infrastructure-centric authorization,
identity, and so on are common functions delivered by a control
plane. However, their true source of power is drawn significantly
from the service proxy. Users commonly find themselves in need of
customizing the chain of traffic filters (modules) that service proxies
use to do much of their heaving lifting. Different technologies are
used to provide data plane extensibility and, consequently, addi‐
tional custom data plane intelligence, including:
Lua
A scripting language for execution inside a Just-In-Time com‐
piler, LuaJIT
WebAssembly (WASM)
A virtual stack machine as a compilation target for different lan‐
guages to use as an execution environment
Swappable Sidecars
Functionality of the service mesh’s proxy is one of the more impor‐
tant considerations when adopting a service mesh. From the per‐
spective of a developer, much significance is given to a proxy’s
cloud-native integrations (e.g., with OpenTelemetry/OpenTracing,
Prometheus, and so on). Surprisingly, a developer may not be very
interested in a proxy’s APIs. The service mesh control plane is the
point of…well, control for managing the configuration of proxy. A
developer will, however, be interested in a management plane’s APIs.
A top item on the list of developers’ demands for proxies is protocol
support. Generally, protocol considerations can be broken into two
types:
TCP, UDP, HTTP
Network team-centric consideration in which efficiency, perfor‐
mance, offload, and load balancing algorithm support are evalu‐
ated. Support for HTTP/2 often takes top billing.
gRPC, NATS, Kafka
A developer-centric consideration in which the top item on the
list is application-level protocols, specifically those commonly
used in modern distributed application designs.
Swappable Sidecars | 63
NGINX
While you won’t be able to use NGINX as a proxy to displace
Envoy (recall that the nginMesh project was set aside), based on
your operational expertise, need for a battle-tested proxy, or
integration of F5 load balancer, you might want to use NGINX.
You might be looking for caching, web application firewall
(WAF), or other functionality available in NGINX Plus, as well.
An enhanced version of NGINX Plus that interfaces natively
with Kubernetes is the service proxy used in the NGINX Service
Mesh data plane.
CPX
You might choose to deploy the Citrix Service Mesh (which is
an Istio control plane with CPX data plane) if you have existing
investment in Citrix’s Application Delivery Controllers and have
them across your diverse infrastructure, including new micro‐
services and existing monoliths.
MOSN
MOSN can deploy as an Istio data plane. You might choose to
deploy MOSN if you need to highly customize your service
proxy and are a Golang shop. MOSN supports a multiprotocol
framework, and you access private protocols with a unified
routing framework. It has a multiprocess plug-in mechanism,
which can easily extend the plug-ins of independent MOSN
processes through the plug-in framework and do some other
management, bypass, and functional module extensions.
The arrival of choice in service proxies for Istio has generated a lot
of excitement. Linkerd’s integration was created early in Istio’s 0.1.6
release. Similarly, the ability to use NGINX as a service proxy
through the nginMesh project (see Figure 4-2) was provided early in
the Istio release cycle.
Swappable Sidecars | 65
Extensible Adapters
Management planes and control planes are responsible for enforc‐
ing access control, usage, and other policies across the service mesh.
In order to do so, they collect telemetry data from service proxies.
Some service meshes gather telemetry as shown in Figure 4-3. The
service mesh control plane uses one or more telemetry adapters to
collect and pass along these signals. The control plane will use mul‐
tiple adapters either for different types of telemetry—races, logs,
metrics—or for transmitting telemetry to external monitoring
providers.
Figure 4-4. Data plane performing the heavy lifting to ensure efficient
processing of packets and telemetry generation. “Mixerless Istio” is an
example of this model.
Extensible Adapters | 67
The Performance of the Data Plane
Considering the possibilities of what can be achieved when the intel‐
ligence of the management plane and the power of the data plane
are combined, service meshes pack quite a punch in terms of facili‐
tating configuration management, central identity, telemetry collec‐
tion, traffic policy, application infrastructure logic, and so on. With
this understanding, consider that the more value you try to derive
from a service mesh, the more work you will ask it to do. The more
work a service mesh does, the more its efficiency becomes a con‐
cern. While benefiting from a service mesh’s features, you may pon‐
der the question of what overhead your service mesh is incurring.
This is one of the most common questions service mesh users have.
Conclusion | 69
CHAPTER 5
Conclusion
71
• Provide compelling topology and dependency graphs that allow
you to not only visualize the service mesh and its workloads but
design them as well.
• Participate in application life-cycle management but would bene‐
fit from shifting left to incorporate:
— Deeper automated canary support with integration into con‐
tinuous integration systems, which would improve the
deployment pipelines of many software projects.
— Automatic API documentation, perhaps, integrating with
toolkits like Swagger or ReadMe.io.
— API function and interface discovery.
• Participate in service and product management by enabling ser‐
vice owners to offload any number of examples of application
logic to the service mesh, allowing developers to hand off
“application infrastructure logic” to the layer just below the
application—Layer 5:
— Perform A/B testing directly with service users without the
need for developer or operator intervention.
— Control service pricing based on the accounting of service
request traffic in the context that the tenant/user is making
these requests.
— Forgo requesting specific application logic changes from the
development team and instead deploy a network traffic filter
to reason over and control user and service behavior.
• Deeper observability to move beyond distributed tracing alone
and into full application performance monitoring leverage deep
packet inspection for business layer insights.
• Multitenancy to allow multiple control planes running on the
same platform.
• Multicluster and cross-cluster such that each certificate authority
shares the same root certificate and workloads can authenticate
each other across clusters within the same mesh.
• Cross-mesh to allow interoperability between heterogeneous
types of service meshes.
• Improve on the integration of performance management tools
like Meshery to identify ideal mesh resiliency configurations by
facilitating load testing of your services’ response times so that
72 | Chapter 5: Conclusion
you can tune queue sizes, timeouts, retry budgets, and so on,
accordingly. Meshes provide fault and delay injection. What are
appropriate deadlines for your services under different loads?
• Advanced circuit breaking with fallback paths, identifying alter‐
nate services to respond as opposed to 503 Service Unavailable
errors.
• Pluggable certificate authorities component so that external CAs
can be integrated.
Table 5-1. Factors in considering how strongly you need a service mesh
Concern Begin considering a Strongly consider a Consider that..
service mesh service mesh
Service Low interservice High interservice The higher the volume of
communication communication. communication. requests to internal and
external services, the more
insight and control you
will need and the higher
the return on investment
your service mesh will
deliver.
Observability Edge focus: metrics and Uniform and You can bring much
usage are for response ubiquitous: insight immediately with
time to clients and observability is key for little effort.
request failure rates. understanding service
behavior.
74 | Chapter 5: Conclusion
• Use Meshery to identify antipatterns and also to analyze the
state and configuration of your service mesh against known best
practices.
• Inspect service request headers with your service mesh and
annotate when requests fail to help you identify whether the
failure is in your workloads or the service mesh.
76 | Chapter 5: Conclusion
About the Author
Lee Calcote is the founder and CEO of Layer5, where the commu‐
nity helps organizations harness the value of service meshes as a
maintainer of Meshery, Service Mesh Performance (SMP), and Ser‐
vice Mesh Interface (SMI). Previously, Calcote stewarded technol‐
ogy strategy and innovation across SolarWinds as head of CTO
technology initiatives. He led software-defined data center engineer‐
ing at Seagate, delivering predictive analytics and modern systems
management. Calcote held various leadership positions at Cisco,
where he created Cisco’s cloud management platform and pioneered
software-defined network orchestration and autonomic remote
management services.
In addition to his role at Layer5, Calcote serves in various industry
bodies chairing the Cloud Native Computing Foundation (CNCF)
SIG Network, and formerly, in the Distributed Management Task
Foundation (DMTF), delivering Redfish 1.0, and in the Center for
Internet Security (CIS), delivering the Docker Benchmark 1.0.
He serves on Cisco’s advisory board, and formerly advised startups
Twistlock and Octarine, acquired by Palo Alto Networks and
VMware, respectively. As a Docker Captain and Cloud Native
Ambassador, he is a frequent speaker in the cloud native ecosystem.
Calcote is the coauthor of Istio: Up and Running and the forthcom‐
ing Service Mesh Patterns (both O’Reilly), in addition to titles with
other publishers. He holds a bachelor’s degree in computer science,
a master’s degree in business administration, and retains a list of
industry certifications.