CloudNative II
CloudNative II
Cloud Native –
Communication & Data
2
Outline
• Introduction
• Pillars of Cloud Native
• Cloud Native Applications
• Cloud-native Communication Patterns
• Cloud-native Data Patterns
• Cloud-native Resiliency
• Monitoring & Health
• Devops
3
Query
• Many times, one microservice might need to query another, requiring
an immediate response to complete an operation.
• A shopping basket microservice may need product information and a
price to add an item to its basket.
• There are many approaches for implementing query operations.
12
Request-Response Messaging
• One option for implementing this scenario is for the calling back-end microservice to
make direct HTTP requests to the microservices it needs to query.
• While direct HTTP calls between microservices are relatively simple to implement, care
should be taken to minimize this practice.
• To start, these calls are always synchronous and will block the operation until a result is
returned or the request times outs.
• What were once self-contained, independent services, able to evolve independently and
deploy frequently, now become coupled to each other.
• As coupling among microservices increase, their architectural benefits diminish.
Request/Reply Pattern
• Another approach for decoupling synchronous HTTP messages is a Request-Reply Pattern, which uses
queuing communication.
• Communication using a queue is always a one-way channel, with a producer sending the message and
consumer receiving it.
• With this pattern, both a request queue and response queue are implemented, shown in figure below.
• Here, the message producer creates a query-based message that contains a unique correlation ID and
places it into a request queue.
• The consuming service dequeues the messages, processes it and places the response into the response
queue with the same correlation ID.
• The producer service dequeues the message, matches it with the correlation ID and continues
processing.
18
Commands
• Another type of communication interaction is a command.
• A microservice may need another microservice to perform an action.
• The Ordering microservice may need the Shipping microservice to create a shipment for an
approved order.
• In figure below, one microservice, called a Producer, sends a message to another microservice, the
Consumer, commanding it to do something.
Commands
Most often, the Producer doesn't require a response and can fire-and-forget the
message.
If a reply is needed, the Consumer sends a separate message back to Producer on
another channel.
A command message is best sent asynchronously with a message queue. supported by
a lightweight message broker.
In the previous diagram, note how a queue separates and decouples both services.
A message queue is an intermediary construct through which a producer and consumer
pass a message.
• Queues implement an asynchronous, point-to-point messaging pattern.
• The Producer knows where a command needs to be sent and routes appropriately.
• The queue guarantees that a message is processed by exactly one of the consumer instances
that are reading from the channel.
• In this scenario, either the producer or consumer service can scale out without affecting the
other.
• Moreover, technologies can be disparate on each side, meaning that we might have a Java
microservice calling a Golang microservice.
Message queues are backing services.
20
Events
Message queuing is an effective way to implement communication where a
producer can asynchronously send a consumer a message.
However, what happens when many different consumers are interested in
the same message?
• A dedicated message queue for each consumer wouldn't scale well and would
become difficult to manage.
To address this scenario, we move to the third type of message interaction,
the event.
• One microservice announces that an action had occurred.
• Other microservices, if interested, react to the action, or event.
• This is also known as the event-driven architectural style.
Eventing is a two-step process.
• For a given state change, a microservice publishes an event to a message
broker, making it available to any other interested microservice.
• The interested microservice is notified by subscribing to the event in the
message broker.
• The Publish/Subscribe pattern is used to implement event-based
communication.
21
Events
Event-Driven messaging - a shopping basket microservice publishing an event with two other
microservices subscribing to it.
• Note the event bus component that sits in the middle of the communication
channel.
• It's a custom class that encapsulates the message broker and decouples it
from the underlying application.
• The ordering and inventory microservices independently operate the event
with no knowledge of each other, nor the shopping basket microservice.
• When the registered event is published to the event bus, they act upon it.
22
Topic
gRPC
REST-based communication is widely implemented, .
• REST is a flexible architectural style that defines CRUD-based operations against entity
resources.
• Clients interact with resources across HTTP with a request/response communication model.
• A newer communication technology, gRPC, has gained tremendous momentum across
the cloud-native community.
• gRPC is a modern, high-performance framework that evolves the age-old remote
procedure call (RPC) protocol.
• At the application level, gRPC streamlines messaging between clients and back-end
services.
• Originating from Google, gRPC is open source and part of the Cloud Native Computing
Foundation (CNCF) ecosystem of cloud-native offerings.
24
gRPC
• A typical gRPC client app will expose a local, in-process function that implements a
business operation.
• Under the covers, that local function invokes another function on a remote machine.
• What appears to be a local call essentially becomes a transparent out-of-process call
to a remote service.
• The RPC plumbing abstracts the point-to-point networking communication,
serialization, and execution between computers.
• In cloud-native applications, developers often work across programming languages,
frameworks, and technologies.
• This interoperability complicates message contracts and the plumbing required for
cross-platform communication.
• gRPC provides a "uniform horizontal layer" that abstracts these concerns.
• Developers code in their native platform focused on business functionality, while gRPC
handles communication plumbing.
• gRPC offers comprehensive support across most popular development stacks,
including Java, JavaScript, C#, Go, Swift, and NodeJS.
25
gRPC Benefits
gRPC uses HTTP/2 for its transport protocol.
While compatible with HTTP 1.1, HTTP/2 features many advanced
capabilities:
• A binary framing protocol for data transport - unlike HTTP 1.1, which is text based.
• Multiplexing support for sending multiple parallel requests over the same connection
- HTTP 1.1 limits processing to one request/response message at a time.
• Bidirectional full-duplex communication for sending both client requests and server
responses simultaneously.
• Built-in streaming enabling requests and responses to asynchronously stream large
data sets.
• Header compression that reduces network usage.
gRPC is lightweight and offers high performance
• It can be up to 8x faster than JSON serialization with messages 60-80% smaller.
26
gRPC Usage
• Synchronous backend microservice-to-microservice communication where an
immediate response is required to continue processing.
• Polyglot environments that need to support mixed programming platforms.
• Low latency and high throughput communication where performance is
critical.
• Point-to-point real-time communication - gRPC can push messages in real
time without polling and has excellent support for bi-directional streaming.
• Network constrained environments – binary gRPC messages are always
smaller than an equivalent text-based JSON message.
27
• Note in the figure how messages are intercepted by a proxy that runs alongside
each microservice.
• Each proxy can be configured with traffic rules specific to the microservice.
• It understands messages and can route them across your services and the
outside world.
29
Service Mesh
Along with managing service-to-service communication, the Service Mesh
provides support for service discovery and load balancing.
Once configured, a service mesh is highly functional.
• The mesh retrieves a corresponding pool of instances from a service discovery
endpoint.
• It sends a request to a specific service instance, recording the latency and response
type of the result.
• It chooses the instance most likely to return a fast response based on different
factors, including the observed latency for recent requests.
A service mesh manages traffic, communication, and networking concerns at
the application level.
• It understands messages and requests.
• A service mesh typically integrates with a container orchestrator.
• Kubernetes supports an extensible architecture in which a service mesh can be
added.
30
Istio
While a few service mesh options currently exist, Istio is the most popular at
the time of this writing.
Istio is a joint venture from IBM, Google, and Lyft.
It's an open-source offering that can be integrated into a new or existing
distributed application.
The technology provides a consistent and complete solution to secure,
connect, and monitor microservices.
Its features include:
• Secure service-to-service communication in a cluster with strong identity-based
authentication and authorization.
• Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
• Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and
fault injection.
• A pluggable policy layer and configuration API supporting access controls, rate
limits, and quotas.
• Automatic metrics, logs, and traces for all traffic within a cluster, including cluster
ingress and egress.
31
Envoy
A key component for an Istio implementation is a proxy service entitled the
Envoy proxy.
It runs alongside each service and provides a platform-agnostic foundation for
the following features:
• Dynamic service discovery.
• Load balancing.
• TLS termination.
• HTTP and gRPC proxies.
• Circuit breaker resiliency.
• Health checks.
• Rolling updates with canary deployments
Envoy is deployed as a sidecar to each microservice in the cluster.
32
Outline
• Introduction
• Pillars of Cloud Native
• Cloud Native Applications
• Cloud-native Communication Patterns
• Cloud-native Data Patterns
• Cloud-native Resiliency
• Monitoring & Health
• Devops
33
ACID
ACID is an acronym that refers to the set of 4 key properties that define a
transaction: Atomicity, Consistency, Isolation, and Durability.
ACID transactions guarantee that each read, write, or modification of a table
has the following properties:
• Atomicity
− Each statement in a transaction (to read, write, update or delete data) is treated as a single unit.
− Either the entire statement is executed, or none of it is executed.
− This property prevents data loss and corruption from occurring if, for example, if your streaming data source
fails mid-stream.
• Consistency
− Ensures that transactions only make changes to tables in predefined, predictable ways.
− Transactional consistency ensures that corruption or errors in your data do not create unintended
consequences for the integrity of your table.
• Isolation
− When multiple users are reading and writing from the same table all at once, isolation of their transactions
ensures that the concurrent transactions don't interfere with or affect one another.
− Each request can occur as though they were occurring one by one, even though they're actually occurring
simultaneously.
• Durability
− ensures that changes to your data made by successfully executed transactions will be saved, even in the
event of system failure.
35
Cross-service Queries
• While microservices are independent and focus on specific functional capabilities, like
inventory, shipping, or ordering, they frequently require integration with other microservices.
• Often the integration involves one microservice querying another for data.
Cross-service Queries
One option is a direct HTTP call from the shopping basket to the catalog and
pricing microservices.
• However synchronous HTTP calls couple microservices together, reducing their
autonomy and diminishing their architectural benefits.
We could also implement a request-reply pattern with separate inbound and outbound
queues for each service.
• However, this pattern is complicated and requires plumbing to correlate request and
response messages.
• While it does decouple the backend microservice calls, the calling service must still
synchronously wait for the call to complete.
• Network congestion, transient faults, or an overloaded microservice and can result
in long-running and even failed operations.
Instead, a widely accepted pattern for removing cross-service dependencies is the
Materialized View Pattern
38
Distributed Transactions
• While querying data across microservices is difficult, implementing a transaction
across several microservices is even more complex.
• The inherent challenge of maintaining data consistency across independent data
sources in different microservices can't be understated.
• The lack of distributed transactions in cloud-native applications means that you
must manage distributed transactions programmatically.
• You move from a world of immediate consistency to that of eventual consistency.
40
CAP Theorem
As a way to understand the differences between these types of
databases, consider the CAP theorem, a set of principles applied
to distributed systems that store state.
The theorem states that distributed data systems will offer a
trade-off between consistency, availability, and partition tolerance
and any database can only guarantee two of the three properties:
• Consistency:
− Every node in the cluster responds with the most recent data, even if the system
must block the request until all replicas update.
− If you query a "consistent system" for an item that is currently updating, you'll
wait for that response until all replicas successfully update.
− However, you'll receive the most current data.
• Availability:
− Every node returns an immediate response, even if that response isn't the most
recent data.
− If you query an "available system" for an item that is updating, you'll get the
best possible answer the service can provide at that moment.
• Partition Tolerance.
− Guarantees the system continues to operate even if a replicated data node fails
or loses connectivity with other replicated data nodes.
CAP theorem explains the tradeoffs associated with managing
consistency and availability during a network partition
However tradeoffs with respect to consistency and performance
also exist with the absence of a network partition
46
Referential integrity states that all references in data are valid, and that if one attribute's value references
another attribute's value, then the referenced value must exist.
49
Database as a Service
• Cloud-native applications favor data services exposed as a Database
as a Service (DBaaS).
• Fully managed by a cloud vendor, these services provide built-in
security, scalability, and monitoring.
• Instead of owning the service, users simply consume it as a backing
service.
• The provider operates the resource at scale and bears the
responsibility for performance and maintenance.
• They can be configured across cloud availability zones and regions to
achieve high availability.
• They all support just-in-time capacity and a pay-as-you-go model.
50
NewSQL Databases
• NewSQL is an emerging database technology that combines the distributed scalability
of NoSQL with the ACID guarantees of a relational database.
• NewSQL databases are important for business systems that must process high-volumes
of data, across distributed environments, with full transactional support and ACID
compliance.
• While a NoSQL database can provide massive scalability, it does not guarantee data
consistency.
• Intermittent problems from inconsistent data can place a burden on the development
team.
• Developers must construct safeguards into their microservice code to manage
problems caused by inconsistent data.
• A key design goal for NewSQL databases is to work natively in Kubernetes, taking
advantage of the platform's resiliency and scalability.
• NewSQL databases are designed to thrive in ephemeral cloud environments where
underlying virtual machines can be restarted or rescheduled at a moment's notice.
• The databases are designed to survive node failures without data loss nor downtime.
51
NewSQL Databases
The Cloud Native Computing Foundation (CNCF) features several NewSQL
database projects.
52
CQRS
• CQRS, is an architectural pattern that can help maximize performance, scalability, and
security.
• The pattern separates operations that read data from those operations that write data.
• For normal scenarios, the same entity model and data repository object are used for both
read and write operations.
• However, a high volume data scenario can benefit from separate models and data tables
for reads and writes.
• To improve performance, the read operation could query against a highly denormalized
representation of the data to avoid expensive repetitive table joins and table locks.
• The write operation, known as a command, would update against a fully normalized
representation of the data that would guarantee consistency.
• A mechanism need to be implemented to keep both representations in sync.
• Typically, whenever the write table is modified, it publishes an event that replicates the
modification to the read table.
• Implementing CQRS can improve application performance for cloud-native services.
• However, it does result in a more complex design.
• This principle should be carefully and strategically applied to those sections of a cloud-
native application that will benefit from it.
54
CQRS
Event Sourcing
A system typically stores the current state of a data entity.
• If a user changes their phone number, for example, the customer record is updated with
the new number.
• We always know the current state of a data entity, but each update overwrites the previous
state.
• In most cases, this model works fine.
In high volume systems, however, overhead from transactional locking and
frequent update operations can impact database performance, responsiveness, and
limit scalability.
Event Sourcing takes a different approach to capturing data.
Each operation that affects data is persisted to an event store.
Instead of updating the state of a data record, we append each change to a sequential list
of past events.
The Event Store becomes the system of record for the data.
It's used to propagate various materialized views within the bounded context of a
microservice.
56
Event Sourcing
• In the figure, note how each entry (in blue) for a user's shopping cart is appended to an
underlying event store.
• In the adjoining materialized view, the system projects the current state by replaying all the
events associated with each shopping cart.
• This view, or read model, is then exposed back to the UI.
• Events can also be integrated with external systems and applications or queried to determine
the current state of an entity.
• With this approach, history is maintained.
• You know not only the current state of an entity, but also how you reached this state.
57
Event Sourcing
Mechanically speaking, event sourcing simplifies the write model.
• There are no updates or deletes.
• Appending each data entry as an immutable event minimizes contention, locking, and
concurrency conflicts associated with relational databases.
Building read models with the materialized view pattern enables the decoupling of
the view from the write model and choose the best data store to optimize the
needs of your application UI.
While event sourcing can provide increased performance and scalability, it comes
at the expense of complexity and a learning curve.
58
Caching Architecture
Cloud native applications typically implement a distributed caching architecture.
The cache is hosted as a cloud-based backing service, separate from the microservices.
In the figure, note how the cache is independent of and shared by the microservices.
• The cache is invoked by the API Gateway.
• The gateway serves as a front end for all incoming requests.
• The distributed cache increases system responsiveness by returning cached data whenever possible.
• Additionally, separating the cache from the services allows the cache to scale up or out independently to meet increased
traffic demands.
The figure presents a common caching pattern known as the cache-aside pattern.
• For an incoming request, you first query the cache (step #1) for a response.
• If found, the data is returned immediately.
• If the data doesn't exist in the cache (known as a cache miss), it's retrieved from a local database in a downstream service
(step #2).
• It's then written to the cache for future requests (step #3), and returned to the caller.
• Care must be taken to periodically evict cached data so that the system remains timely and consistent.
60
Caching Architecture
• As a shared cache grows, it might prove beneficial to partition its data across multiple nodes.
• Doing so can help minimize contention and improve scalability.
• Many Caching services support the ability to dynamically add and remove nodes and
rebalance data across partitions.
• This approach typically involves clustering.
• Clustering exposes a collection of federated nodes as a seamless, single cache.
• Internally, however, the data is dispersed across the nodes following a predefined distribution
strategy that balances the load evenly.
61
Readings
• Architecting Cloud Native .NET Applications for Azure.
https://dotnet.microsoft.com/en-us/download/e-book/cloud-native-
azure/pdf