Architecting Distributed Transactional Applications
Architecting Distributed Transactional Applications
m
pl
im
en
ts
of
Architecting
Distributed
Transactional
Applications
Data-Intensive Distributed
Transactional Applications
Guy Harrison,
Andrew Marshall
& Charles Custer
REPORT
Architecting Distributed
Transactional Applications
Data-Intensive Distributed
Transactional Applications
978-1-098-14261-2
[LSI]
Table of Contents
iii
Serverless or Dedicated Deployment? 26
Kubernetes 28
Placement Policies 29
Multiregion Database Deployments 30
Distributed Database Consensus 30
Survival Goals 32
Locality Rules 34
Summary 35
iv | Table of Contents
CHAPTER 1
Planning for a Distributed
Transactional Application
1
After reading this report, we hope you’ll have a good handle on
the business and technology motivations for modern distributed
architectures and will be familiar with the architectural patterns
and software frameworks most widely deployed across the indus‐
try. In particular, you should be well equipped to understand the
role that technologies and patterns such as Docker, Kubernetes,
and distributed transactional databases play in modern distributed
architectures.
The upshot of these advances has reduced the total cost of owner‐
ship for a distributed application as well as reduced complexity in
application design, implementation, and maintenance.
Summary
Modern enterprises require highly available, globally scoped, and
scalable software solutions. These requirements are best met by
distributed transactional application architectures.
Today, there exists a well-proven cloud-based architectural pattern
for distributed transactional applications. This pattern involves the
use of public cloud platforms, microservices, Docker containers,
Kubernetes, and a distributed transactional database.
In Chapter 2, we’ll take a deep dive into the architecture of the
application layer, and in Chapter 3 we will examine the distributed
database layer.
9
Each region will contain multiple zones—typically at least three
zones per region. These zones usually represent a specific data cen‐
ter within the region that has no common point of failure with other
zones. So, the zones represent regional redundancy that allows a
region to continue to function even if an individual data center fails.
Figure 2-1 illustrates three regions in a public cloud, each of which
has three zones. In a typical public cloud, there are many regions—
for instance, Google currently supports 35 regions.
In the remainder of this report, we are going to assume that you are
deploying into a public cloud that implements such a region/zone
scheme. However, even if you are deploying on your own hardware,
you are likely to emulate this sort of regional separation and inter‐
regional redundancy.
Microservices
In a microservices architecture, units of application functionality are
developed and deployed independently. These microservices inter‐
act to deliver the overall application functionality. Microservices
development is based on some fundamental principles:
Microservices | 11
A microservice should have a single concern.
A microservice should aspire to satisfy just one item of func‐
tionality. The service should be an “expert” at that one function
and should not be concerned with any other functions.
The microservice should be independent of all other services.
The microservice should not be directly dependent on some
other service and should be testable and deployable independ‐
ently of all other services.
The microservice should be small enough to be developed by
a single team.
The “two-pizza rule” developed at Amazon stipulated that each
team should be small enough to be fed by two pizzas. The core
principle is that a microservice should be small enough that
interteam dependencies do not arise.
The microservice should be ephemeral.
A microservice should not hold state in such a way that pre‐
vents another microservice from taking over should it fail. In
practice, this simply means that each microservice interaction is
completely independent of past or future interactions.
The microservice implementation should be opaque.
The internal implementation of the microservice should be of
no concern to other services.
Containers
Microservices are often deployed in containers. Containers package
all the software dependencies necessary to execute a microservice
independently. This includes a minimal operating system image,
together with any libraries and other dependencies necessary to
support the microservice’s code.
By far, the most common mechanism for creating and deploying
containers is Docker. Docker containers provide significant advan‐
tages for the deployment of microservices. By encapsulating all
dependencies, developers can be confident that their service will
run on any Docker platform. Docker containers are less resource-
intensive than full-blown virtualization, and Docker containers iso‐
late the internals of the microservice and thereby improve security.
Multiregion Kubernetes
Kubernetes clusters are generally confined to a single region, though
a cluster can span multiple availability zones. While it is technically
possible to deploy a “stretch” Kubernetes cluster with nodes across
multiple regions, the latency penalty that results is likely to be
intolerable.
Consequently, each region will typically have its own Kubernetes
cluster and will use a global load balancer to route requests to
the appropriate cluster. The solution looks somewhat similar to the
“old-school” pattern shown in Figure 2-2, except that the global load
balancer routes requests not to services exposed within a VM but
to services exposed in a Kubernetes cluster. Figure 2-4 illustrates the
configuration.
The major public clouds all offer a global load balancing service
that is usually adequate for a single-cloud solution. However, if you
want to load balance between cloud platforms, or between noncloud
premises, third-party global load balancing services are offered by
the major content delivery network vendors, such as Cloudflare and
Akamai.
Event Management
In a distributed application, there is often a need for a reliable
messaging or event management layer that allows microservices to
coordinate their work. Often, these take the form of work queues
that allow work requiring the interaction of multiple microservices
to progress reliably and asynchronously.
It’s possible to use the database as the one and only common com‐
munication layer between microservices, but this can increase the
load on what is often already a critical component in transactional
latency and is wasteful since these messages do not need to have
long-term persistence.
Consequently, many distributed applications employ a distributed
messaging service to support message requests between services.
Such a messaging service guarantees that messages will survive Pod
or node failures, without necessarily requiring long-term storage
once a work request has been fully processed.
Event Management | 15
The distributed messaging service most widely used in modern
applications is Apache Kafka, though native cloud messaging serv‐
ices such as Google Cloud Pub/Sub are also commonly used.
In many distributed designs, transient messages between services do
not pass between regions. Regions operate independently of each
other; if a region fails, messages within that region may be lost or
at least unavailable until the region is recovered. In this scenario,
the messaging system runs encapsulated within each region, which
keeps latency low and performance high.
However, in a regional failure scenario, if another region is expected
to pick up incomplete work from the failed region, the messaging
solution must span regions. In these scenarios, latency is increased
since messages must be replicated across regions before processing.
In some cases, a compromise solution can be implemented in which
messages are replicated asynchronously across regions. In this case,
some messages might be lost if a region fails, but hopefully the bulk
of work in progress is transferred to the new region. Figure 2-5
illustrates the scenario in which each region has its own Kafka
messaging service and the two are kept in sync via replication.
Serverless Deployments | 17
The developer experience for a serverless platform is very
straightforward:
Summary
Modern cloud platforms and containerization solutions provide
cost-effective and low-risk mechanisms for deploying a globally
distributed application. Legacy application architectures can be
deployed to cloud-based virtual machines with global load balancers
providing fault tolerance and workload distribution. Docker con‐
tainers orchestrated by Kubernetes provide a more robust and effec‐
tive means of deploying microservice-based applications. Serverless
platforms provide an even higher level of abstraction and simplifica‐
tion, albeit with some limitations on flexibility.
These platforms work well with the stateless and transient compute-
oriented components of an application. However, almost all dis‐
tributed applications must maintain a consistent and persistent data
store that coordinates and records the activities of the microservices
that deliver the application’s functionality. In Chapter 3, we’ll dig
into the configuration of such a distributed transactional database.
Multiregion Serverless | 19
CHAPTER 3
Distributing and Scaling
the Storage Layer
21
Distributed Database Examples
In the following sections, we describe the capabilities
common to modern distributed transactional database
systems. Such systems include CockroachDB, Google
Spanner, YugabyteDB, Microsoft Cosmos DB, and
others.
However, not all distributed transactional databases
implement all the features outlined in this chapter, and
this report does not attempt to compare the feature
sets of various distributed transactional database sys‐
tems. In most cases, the terminology and capabilities
described are those of CockroachDB, with which the
authors are most familiar. Other databases may use
different terminology or have different capabilities.
Figure 3-1. Major public clouds have redundant network links between
each location, reducing the likelihood of network partitions
• You are paying only for the resources you use, so if your appli‐
cation has peaks and troughs of activity, you will probably save
money.
• Resources applied to your workload will scale dynamically—as
the workload demands increase or decrease, CPU and memory
will be adjusted to suit. As a result, you may not need to
perform benchmarks or otherwise determine ahead of time
Kubernetes
We spoke at length about Kubernetes in Chapter 2. Kubernetes is
almost a no-brainer for a modern application, providing advantages
in deployment, portability, manageability, and scaling. It’s also an
attractive framework for running databases—most of the database
vendors use Kubernetes in their own cloud deployments.
However, providing you can arrange for your Kubernetes nodes to
be running in the same data center as your database nodes, we don’t
think it is mandatory to use Kubernetes as the database platform
for a Kubernetes-based application layer. The types of workloads
encountered by the database are very different from those in the
application layer, and it may be that the two workloads won’t coexist
all that well if colocated in the same Kubernetes cluster.
Default Kubernetes settings are rarely appropriate for database
deployments. Historically, Kubernetes has been used for applications
rather than databases. For applications, CPU and memory allocation
have been more influential than I/O management. Consequently,
many Kubernetes clusters—particularly those on cloud platforms—
are configured with economical “storage by the GB” disks. For
instance, when creating a Kubernetes cluster on Google Cloud Plat‐
form, the default disk type for the node pool is standard persistent
disk, whereas an SSD persistent disk is a much better option for a
database deployment.
Placement Policies
We argued earlier that the advantages of fully managed or serverless
cloud-based deployments for your database layer are compelling.
A fully managed deployment reduces the human costs involved
in managing the distributed database and usually reduces the
operational risks involved in a complex, multiregion deployment.
However, there are a few considerations that might lead you to a
self-managed cloud deployment.
In a fully managed deployment, you don’t have access to all the
fine-grained configuration options that will be available in a self-
managed deployment. For instance, you won’t be able to modify the
Linux kernel configuration or all the database tuning parameters,
and access to logs and other diagnostic information will be reduced.
Furthermore, you won’t have completely fine-grained control over
the placement of your database nodes. Most fully managed options
allow you to determine the region in which each node will exist, but
not normally the default zone. It might be difficult or impossible to
ensure that an application node and database node are in the same
zone (in effect, within the same data center).
In some cases, you might want to control the placement of nodes
even within the data center. For instance, most clouds allow for
placement policies that can encourage two nodes to be located phys‐
ically close to one another—in the same rack, for instance. This
optimization is not available when using a fully managed service.
Placement Policies | 29
Multiregion Database Deployments
We discussed the concepts of zones and regions in Chapter 2. The
concept of regions and zones is common to the major cloud plat‐
forms, allowing the provisioning of services with no single point
of failure, either globally or locally. In almost all cases, we use the
same region and zone definitions for the database as for the applica‐
tion layer. Any mismatch between application regions and database
regions will increase latency and jeopardize availability should a
region fail.
As with the application layer, we define regions and zones such that
low-latency requests can be satisfied in each region and application
service can continue even if computing resources in one of the zones
fail.
Survival Goals | 33
Locality Rules
Regardless of the survival goal, we may be able to fine-tune the
distribution of data within a table to optimize access from specific
regions. This is primarily done to optimize low-latency requests
from various regions and can also be used to comply with legal
requirements for data domiciling.
Tables in a distributed database may have locality rules that deter‐
mine how their data will be distributed across zones:
Global table
This table will be optimized for low-latency reads from any
region.
Regional table
This table will be optimized for low-latency reads and writes
from a single region.
Regional-by-row table
This table will have specific rows optimized for low-latency
reads and writes for a region. Different rows in the table can be
assigned to specific regions.
With a global table, replicas for all rows within the table will be
duplicated in each region. This ensures that read time is optimized
but creates the highest overhead for writes because all regions must
coordinate on a write request. Global tables are suitable for relatively
static lookup tables that are relevant across all regions. A product
table might be a relevant example—product information is often
shared across regions and not subject to frequent updates; therefore,
performance is optimized if each region has a complete copy of
the product table. The downside is that writes to the product table
will require participation from all regions and therefore be relatively
slow.
With a regional table, as much replica information as possible
(subject to failure goals) for all ranges in the table is in a single
region. This makes sense either if that region is more important
to the business than other regions or if the data is particularly
relevant to that region. For instance, if in an internationalized appli‐
cation error codes for each language were in separate tables, then it
might make sense to locate these in particular regions (though this,
Summary
A distributed transactional application will need a database platform
that can maintain consistency across widely separated geographi‐
cal regions while simultaneously supporting low-latency operations
from each of those regions.
Modern distributed transactional databases can span multiple geog‐
raphies and potentially survive failures of entire regions. Multi‐
region configurations also allow you to fine-tune the distribution
of data such that data resides where it is most likely to be used, thus
reducing latency for both reads and writes.
There are some trade-offs between latency and availability. Most
critically, where regional survival is required, some increase in write
latency will occur because multiple regions will have to participate
in transaction consensus.
Summary | 35
For most organizations, only public clouds will offer the global
scope and redundancy necessary for a successful distributed data‐
base deployment. Database vendors offer fully managed cloud plat‐
forms that reduce administrative overhead and operational risk.
Some vendors also offer serverless options that can further reduce
complexity and optimize billing. However, in some cases, a do-it-
yourself deployment on a public cloud can deliver the ultimate per‐
formance optimizations by fine-tuning the placement of application
and database nodes.
The requirements of modern applications often demand a dis‐
tributed, transactional solution. In the past, such solutions were
only available to the largest and most sophisticated organizations.
Today, the existence of public cloud platforms and container tech‐
nologies such as Docker and Kubernetes, together with distributed
transactional database platforms, allows virtually any team to imple‐
ment a distributed transactional application.
We hope this report is a useful starting point for those embarking on
the journey to a distributed transactional architecture.