Lecture Notes
Lecture Notes
Cloud Applications
Jeffrey Richter
Microsoft Software Architect
Jeffrey Richter: Microsoft Azure Software
Architect, Wintellect Co-Founder, &
Author
@JeffRichter
Jeffrey Richter: Microsoft Software Engineer, Wintellect
Co-Founder, & Author
Architecting Distributed Cloud
Apps
http://aka.ms/RichterCloudApps
[email protected]
www.linkedin.com/in/JeffRichter
@JeffRichter
Course purpose
Properly architecting distributed cloud apps
requires a new mindset towards software
development and introduces many new terms
and patterns
The purpose of this course is to delve into
many of these terms, patterns & engineering
trade-offs while being technology-agnostic
Topics include: orchestrators, datacenters, containers, networking,
messaging, versioning, configuration, storage services, and disaster
recovery
Why cloud apps?
Feature Past Present
Clients Enterprise/Intranet Public/Internet
Demand Stable (small) Dynamic (small massive)
Datacenter Single tenant Multi-tenant
Operations People (expensive) Automation (cheap)
Scale Up via few reliable (expensive) Out via lots of (cheap) commodity
PCs PCs
Failure Unlikely but possible Very likely
Machine Catastrophic Normal (no big deal)
We must do things differently when building
loss
Website Inventory #1
Web Site
Service Data store
Inventory #2
Load #1
Web Site
Balanc
#2
er Web Site
#3
Orders Service
Orders #1
Data store
Orders #2
Each service solves a
domain-specific problem & Orders #3
has exclusive access to its Orders #4
own data store
4 reasons to split a monolith into
microservices
Scale Independently Different Technology
(Balance cost with speed) Stacks
Photo Share Thumbnail Photo Share
Photo Share Thumbnail Thumbnail
Service
Photo Share Service
Thumbnail Service
Service
Photo Share Service Service
Service Service node.js
Service .NET
2+ Clients Conflicting
(Clients adopt new features at will) Dependencies
Photo Share Service
Photo Share SharedLib-
Thumbnail
Service (V1) Thumbnail v1 SharedLib-
Service v7
V1
V2
Video Share
Service (V1) Photo Share Thumbnail
Backward Service Service
compatibility must SharedLib- SharedLib-
be maintained v1 v7
Microservice architecture benefits myths
Myth: Microservices offer small,
easy-to-understand/manage code bases
A monolith can use OOP & libraries (requires developer discipline)
Library changes cause build failures (not runtime failures)
Myth: A failing service doesn’t impact other
services
Many services require dependencies be fully functioning
Hard to write/test code that gracefully recovers when dependency
fails
We run multiple service instances so there is no such thing as
“failure”
A monolith is up/down completely; no recovery code
Orchestrator restarts failed instances keeping them up
Composing SLAs for dependent services
What about the network’s
SLA?
? ? ?
Servic Servic Servic Servic
e e e e
Service-
s)
2 Load Service-2
Service- Balance
3 r
Service- Service-3
4
http://12factor.net
12-factor services (1-5)
1. Single root repo; don’t share code with
another service
2. Deploy dependent libs with service
3. No config in code; read from environment
vars
4. Handle unresponsive service dependencies
robustly
5. Strictly separate build, release, & run steps
Build: Builds a version of the code repo & gathers dependencies
Release: Combines build with config ReleaseId (immutable)
Run: Runs service in execution environment
12-factor services (6-12)
6. Service is 1+ stateless processes & shares
nothing
7. Service listens on ports; avoid using (web)
hosts
8. Use processes for isolation; multiple for
concurrency
9. Processes can crash/be killed quickly & start
fast
10. Keep dev, staging, & prod environments
similar
The 12 factors are all about…
Services should be simple to build, test, &
deploy
Services should be lightweight
Few dependencies (OS/language/runtime/libraries), run fast, & use
less RAM
Services should give reproducible results on
developer PC as well as test, staging, &
production clouds
Containers
Container images & containers
A container image is immutable & defines a
version of
a single service with its dependencies
(runtimes, etc.)
Use the same container image everywhere: dev, test, staging,
production
A container runs an PC/VM image in an isolated
Container: Svc- Container: Svc- Container: Svc-
environment
A:v1 v1
Svc-A B:v3 v3
Svc-B A:v2 v2
Svc-A
Multiple containers
Lib-L v2(services)Lib-L
can run
v3side-by-side
Lib-Lwithin
v3 a single
PC/VM Lib-M v2
Runtime v5 Runtime v7 Runtime v6
Isolation versus density
More isolation More density
Hyper-V
Contain Proces
PC VM Contain
er s
er
Not
Hardware Shared Shared Shared Shared
shared
Kernel Kernel
Hypervisor (Xen/Hyper-V)
Orchestrator starts containers on cluster’s
PCs/VMs
PC/VM
Svc-A:v1
Container
Orchestrator Image
(Docker Client) Docker Daemon Registry
(Ports 2375 & Svc-
"docker run Svc- 2376) A:v1
Svc-
A:v1"
🛈 Orchestrator can Local Registry B:v3
Svc-
restrict container’s RAM Svc-
& CPU usage A:v1
A:v2
CI: Continuous Integration
CD: Continuous Delivery, & Deployment
Continuous
Code Check- Integration
Ins 1. Checks-out code
Source 2. Builds it
Code
3. Creates container
Repository
image
u s u s us
u o u o uo en
i n ry i n ry in ym
nt v e nt v e t
n lo
li li
Container Co De Co De o
C ep t
D Production
Image Test Staging
Registry
Fallacy Effect
The network is reliable App needs error handling/retry
Latency is zero App should minimize # of requests
Bandwidth is infinite App should send small payloads
The network is secure App must secure its data/authenticate
requests
Topology doesn't change Changes affect latency, bandwidth, &
endpoints
There is one administrator Changes affect ability to reach destination
Transport cost is zero Costs must be budgeted
The network is Affects reliability, latency, & bandwidth
Service endpoints
Original design: IP:Port PC:Service
Designed to allow a client to talk to a specific service running on a
specific PC
On 1 IP, you can’t have 2+ services listening on the same port at the
same time
Today: 1 PC hosts many VMs & 1 VM hosts
many containers; each can run a service
desiring the same port
Virtualization (hacks) are required to make this
work
Routing tables, SNAT/DNAT, modification to client code, etc.
We need something better but too much legacy exists: network cards,
Service scalability & high-availability
Making things worse, we run multiple service
instances
For service failure/recovery & scale up/down
So, instances’ endpoints dynamically change over the service’s
lifetime
Ideally, we’d like to abstract this from client code
Each client wants a single stable endpoint as the face of the
dynamically-changing service instances’ endpoints
Typically, this is accomplished via a reverse
proxy
NOTE: Every request goes through the RP causing an extra network
hop
We’re losing some performance to gain a lot of simplification
Forward & reverse proxies
Client Server
Infrastructure Infrastructure
Client-1 (Forward Server-
Reverse 1
)
Proxy
Client-2 Proxy Server-
2
Processes outgoing requests: Processes incoming requests:
• Content filtering • Stable client endpoint over changing
(ex: censoring, translation) server instances’ endpoints
• Caching • Load balancing (Levels 4 [udp/tcp] &
• Logging, monitoring 7 [http]), server selection, A/B
testing
• Client anonymization
• SSL termination
• Caching
• Authentication/validation
• Tenant throttling/billing
• Some DDoS mitigation
Cluster DNS & service reverse proxy
⚠ It’s impossible to keep
endpoints in sync as service
instances come/go;
client code must be robust
against this DNS
Inventory RP-I Endpoint
Orders RP-O
Endpoint
Web Site Inventory
Load #1 #1
Web Site Inventory
Balance
#2 RP-I #2
r Web Site Inventory
#3 #3
⚠ WS #1 could Orders #1
fail before I #3 RP-
replies O Orders #2
Reverse proxy load balancer service
probes
Inventory
RP Load #1503
HTTP 200
Balancer
Inventory Inventory
#1
Inventory #2
(no
HTTPreply)
200
#2
Inventory
#3 Inventory
1. Seconds=15, Port=80, #3200
HTTP
Path=HealthProbe.aspx
2. Seconds=15, Port=8080,
Path=HealthProbe.aspx
Turning a monolith into a microservice
Explicit, language-agnostic, multi-version API
contract
(loss of IntelliSense, refactoring & compile-time type-safety)
(de)serializatio
var result = Method(arg1, arg2);
n
In-process call network request
Performance: Worse, increases network congestion, unpredictable
timing
Unreliable: Requires retries, timeouts, & circuit breakers
Server code must be idempotent
Security: Requires authentication, authorization, & encryption
Required in VNET for compliance or running 3rd-party (untrusted)
code
Diagnostics: network issues, perf counters/events/logs, causality/call
4 reasons to split a monolith into
microservices
Scale Independently Different Technology
(Balance cost with speed) Stacks
Photo Share Thumbnail Photo Share
Photo Share Thumbnail Thumbnail
Service
Photo Share Service
Thumbnail Service
Service
Photo Share Service Service
Service Service node.js
Service .NET
2+ Clients Conflicting
(Clients adopt new features at will) Dependencies
Photo Share Service
Photo Share SharedLib-
Thumbnail
Service (V1) Thumbnail v1 SharedLib-
Service v7
V1
V2
Video Share
Service (V1) Photo Share Thumbnail
Backward Service Service
compatibility must SharedLib- SharedLib-
be maintained v1 v7
API versioning
Is an illusion; you must always be backward
compatible
You’re really always adding new APIs & stating that latest version is
preferred
The required “version” indicates which API to
call
http://api.contoso.com/v1.0/products/users
http://api.contoso.com/products/users?api-version=1.0
http://api.contoso.com/products/users?api-version=2016-12-07
Add new API when changing mandatory
parameters, payload format, error codes (fault
contract), or behavior
Defining network API contracts
Define explicit, formal cross-language API/data
contracts
“Contracts” defined via code do not work; do not do this
Ex: DateTime can be null in Java but not in .NET; not all languages
support templates/generics, nullable types, etc.
Consider https://www.openapis.org/ &
http://swagger.io/
Use tools to create language-specific client libraries
Beware of (de)serialization RAM/CPU costs
Use cross-language data transfer formats
Ex: JSON/XML, Avro, Protocol Buffers, FlatBuffers, Thrift, Bond, etc.
Consider embedding a version number in the data structure
Beware leaky RPC-like abstractions
To “simplify” programming, many technologies
try to
map method calls network requests
Examples: RPC, RMI, CORBA, DCOM, WCF, etc.
These frequently don’t work well due to
Network fallacies (lack of retries, timeouts, & circuit breakers)
Chatty (method) versus chunky (network) conversations
Language-specific data type conversions (ex: dates, times, durations)
Versioning: Which version to call on the server?
Authentication: How to handle expiring tokens?
Logging: Log request parameters/headers/payload, reply
headers/payload?
NOTE: Servers’ clocks are not absolutely synchronized
Clients must retry failed network
operations
Client code must retry operations due to
Network fallacies (timeout, topology changes [avoid sticky sessions])
Server throttling
Don’t immediately retry if service unavailable or on error reply
Never assume a dependent service is already up & running
To prevent DDoS attacking yourself
Use exponential back-off & circuit breakers
https://msdn.microsoft.com/en-us/library/dn589784.aspx
Client retries assume server handles request
idempotently
Services must implement operations
idempotently
An idempotent operation can be performed
2+ times with no ill effect
Methods that input/process/output are
idempotent
Repeatedly creating a thumbnail of a specific photo produces the
same result
Methods with side-effects are not idempotent
Exactly Once Semantics
Repeatedly adding $100 to a specific account produces different
results
V1
V2 V1
V2 V2
V1 V2
V1
e migration
(or VIP
Proxy swap)
V1
V2 V1
V2 V2
V1 V2
V1
V1 V1 V2 V2
V1
V2 V1
V2 V2
V1 V2
V1
V1 V1 V2 V2
V1 V1 V2 V2
Comparing service update options
Feature Delete Rolling Blue-Green
& Upload Update Deployment
Add’l Hardware Costs None None 1x to 2x
Service availability Downtime Reduced Same
scale
Failed update Downtime Reduced Immediate
recovery until scale after
V1 until rollback swap back
redeployed
V2 testability Not with V1 Not with V1 With V1
Of course, you can 1-Phase
perform 2-Phase
some updates one
Protocol/Schema 1-Phase
way change
& other updates a different way
Rolling update: how to version APIs
All API requests must pass version info starting
with v1
New service versions must be backward
compatible
What about intra-service instance requests?
During rolling update, old & new service instances run together
Failure occurs if v2 instance makes v2 API request to v1 service
instance
Fix by performing a 2-phase update
1. Deploy v2 service instances (which accept v2 & v1 API requests)
But never send v2 API requests
2. After all instances are v2, reconfigure instances to send v2 API
Gracefully shutting down a service
instance
12-factor services are stopped via SIGTERM or
Ctrl-C
Your service code should intercept this, and then…
Drain inflight requests before stopping its
process
Use an integer representing requests inflight; initialize to 1 (not 0)
As requests start/complete, increment/decrement the integer
To stop, answer all future LB probes with “not ready” so LB stops
sending traffic
When you’re sure LB stops sending traffic (~30 seconds), decrement
integer
When integer hits 0, the service process can safely terminate
NOTE: Don’t let long-running inflight requests prevent process
Service Configuration
& Secrets
Service (re)configuration
Use config for info that shouldn’t be in source
code
Account names, secrets (passwords/certificates), DB connection
strings, etc.
Use Cryptographic Message Syntax (CMS) to avoid clear text secrets
12-factor services pass config via environment
variables
Change config: stop process & restart it with new environment
variable values
When using rolling upgrade to reconfigure, roll
back if new config causes service instance(s) to
fail
Cryptographic Message Syntax (CMS)
Use CMS to avoid cleartext secrets
CMS encrypts/decrypts messages via RFC-3852
Secret producer
Encrypts cleartext secret for a recipient using a certificate and
embeds the certificate’s thumbprint in cyphertext
Set the desired setting’s value to the cyphertext
Secret consumer
Get the desired setting’s cyphertext value
Decrypts cyphertext producing the cleartext secret for use in the
service code
Decryption automatically uses the certificate referenced by the
embedded thumbprint; the certificate must be available
Leader Election
Leader election
Picks 1 service instance to coordinate tasks
among others
Leader can “own” some procedure or access to some resource
At a certain time, chose 1 instance to do billing, report generation,
etc.
Commonly used to ensure data consistency
Aggregates results from multiple instances together
Conserves resources by reducing chance of work being done by
multiple
service instances
Problem: If leader dies, elect new leader
(quickly)
These algorithms are hard to implement due to race conditions &
Leader election via a lease
All service instances execute:
while (!AskDB_IsProcessingDone()) {
bool isLeader = RequestLease()
if (isLeader) {
ProcessAndRenewLease() NOTE: may crash; lease
abandoned
TellDB_ProcessingIsDone()
} else { /* Continuously try to become the new leader
*/ }
Delay() // Avoid DB DDoS Database
} Service Lease
#1 Work time Done
Leasee expiration
Service (not
#2 2017-07-27 false
true (none)
#1
#3 (expired)
expired)
Service
#3
Leader election via queue message
At a certain time, insert 1 msg into a
queue
All service instances execute:
while (true) {
Msg msg = TryDequeueMsg()
if (msg != null) {
/* This instance is the leader */
ProcessMsg() NOTE: may crash; msg becomes visible
again
DeleteMsg(msg)
} else { /* Continuously try to become the new leader
*/ }
Delay() // Avoid queue DDoS
}
Data Storage Services
Data storage service considerations
Building reliable & scalable services that
manage state is substantially harder than
building stateless services
Due to data size/speed, partitioning, replication, leader election,
consistency, security, disaster recovery, backup/restore, costs,
administration, etc.
So, use a robust/hardened storage service
instead
When selecting a storage service, fully understand your service’s
requirements & the trade-offs when comparing available services
It is common to use multiple storage services from within a single
service
Data temperature
Cache
Web Comput
Other
Load e
Internal
Balanc Tiers
er ?
Object Storage Services
Object storage services
The most frequently-used storage service
Used for documents, images, audio, video, etc.
Fast & inexpensive: GB/month storage, I/O requests, & egress bytes
All cloud providers offer an object storage
service
Minimal lock-in: It’s relatively easy to move objects across providers if
you
avoid provider-specific features
Object storage services offer public (read-only)
access
Give object URLs to clients; URL goes to storage service reducing load
on
your other services!
How a CDN works
US West DC Many Around the
Origin Server World
CDN PoP
(Object Storage
Service) Objec
Objec t
t
Nearby Nearby
Client Client
#1 #2
Objec Objec
t t
Database Storage
Services
DB storage services
Store many small, related entities
Features: query, joins, indexing, sorting, stored proc, viewers/editors,
etc.
Rel-DBs (SQL) require expensive PC for better
size/perf
For data relationships: a customer orders
Supports sorts, joins, & ACID updates
NonRel-DBs (noSQL) spread data across many
cheap PCs
For customer preferences, shopping carts, product catalogs, session
state, etc.
Cheaper, faster, bigger & flexible data models (entity ≈ in-memory
object)
Relational DB vs non-relational DB
Replica
AAA
BBB
Replica
Load
Balanc
AAA
BBB
er
Replica
BBB
AAA
Data Consistency
Data consistency
Strong: 2+ records form relationship at same
time
ACID transactions: Atomicity, Consistency, Isolation, Durability
Goal: looks like 1 thing at a time is happening even if work is
complex
Done via distributed txs/locks across stores; hurts perf & not fault
tolerant
Weak: 2+ records form relationship eventually
BASE transactions: Basically Available, Soft state, Eventual
consistency
Done via communication retries & idempotency across stores
CAP theorem states
When facing a network Partition (stores can’t talk to each other):
Replication: Network partition (failure)
Database Stores
Service
Command Data store
m an Processor
Com (tables)
User
d AC
Interface QK Eventual
ue consistenc
Dy r y
at
a
Query Data store
Processor (views)
Event Sourcing
Commonly used with CQRS & Big Data
Save events in append-only & immutable
tables
Event Source Table for Jeffrey
Account
A Snapshot View for All Accounts
Accou Timestamp Amoun
Timestamp Amoun Memo nt t
t Jeffrey 2017-04- +
2017-04- +$0.00 New 17T01:05:22 $47.65
01T09:00:05 account … Pros
… …
2017-04- + Paycheck When reading, no event
16T08:28:36 $100.00
Cons locking (good perf)
2017-04- -$52.35 Restaurant
Boundless
17T01:05:22 storage (but it’s Write bugs unlikely &
cheap)
… … … can’t corrupt immutable
Replaying data is time- data
consuming (improve with Easy to (re)build today,
historical or audit views
Implementing eventual consistency
A client can determine when data is consistent
Client reads entities A & B; If A references B but B doesn’t reference A
(yet),
client assumes the relationship doesn’t exist (yet)
Use fault-tolerant message queues & the Saga
pattern
to guarantee that all operations eventually
complete
Saga pattern compromises atomicity in order to give greater
availability
Saga pattern
SEC attempts txs in concurrent or risk-centric
order
For fault tolerance, operations may be retried & must be idempotent
2 recovery modes Rental Car
Backwards: If a tx fails, undo all successful txs Reservation
Forwards: retry every tx until all are successful Service
Airplane
Trip Trip Trip
Reservation
Saga Saga Saga Service
Concurrency, Versioning,
& Backup
Concurrency control
A data entity maintains its integrity for
competing accessors via concurrency control
Pessimistic: accessor locks 1+ entries (blocking
other accessors), modifies entries, & then
unlocks them
Bad scalability (1 accessor at a time) & what if locker fails to release
the lock?
Optimistic: accessor gets 1+ entries with
version IDs (etags), modifies entries’ values, &
updates entries if versions haven’t changed
(still contain the original values)
Optimistic concurrency:
Two instances adding $23 & $32
to Jeff’s balance
Service #1
DB Partition
Account: Jeff
Version: 0001 Account: Aidan
Balance: $100
$123 Version: 0023
Balance: $768
Account: Grant
Version: 0762
Balance: $444
Service #2
Account: Jeff Account: Jeff
Version:
Balance:
0001
0002
$100
$155
$123
$132
X Version:
Balance:
0001
0003
0002
$100
$155
$123
Versioning data schemas
Use formal, language-agnostic data schemas
The data is the truth; not the language data type
All data must specify version info starting with
v1
New services must be infinitely backward
compatible
Service v1 might create an entry that isn’t accessed for years
During rolling update, v1 & v2 instances run
together
Failure occurs if v2 instance writes v2 data schema & v1 instance
reads it
Fix by performing a 2-phase update
Backup & restore
Needed to recover from corrupt data due to
coding bug or a hacker attack
Detecting data corruption is a hard domain-specific problem
You periodically backup data in order to restore
it to a known good state
NOTE: Restore usually incurs some downtime & data loss
Many DBs don’t support making a consistent backup across all
partitions
Try hard to avoid cross-partition relationships
Incremental backups are faster than a full backup but hurt restore
performance
Make sure you test restore
Recovery point & time objectives
Recovery Point Objective (RPO)
Maximum data (in minutes) the business can afford to lose
Recovery Time Objective (RTO)
Maximum downtime the business can afford to lose when restoring
data
Data loss & recovery cannot be completed
prevented
The earth is a single point of failure
Deciding RPO & RTO are mostly business decisions
NOTE: The smaller the RPO/RTO, the more expensive it is to run the
service
Disaster Recovery (DR)
Disaster recovery
Dealing with a datacenter outage
Code: Easy, upload to other DC
Data: Hard, data must be replicated across DCs
Latency: ~133ms for ½ way around the earth round trip (best case)
Create similar clusters in different geographical
regions
When data changes in a cluster, replicate to
other cluster
Usually batch changes & replicate periodically
The delay is the RPO as first cluster could die before sending next
batch
The more clusters, the more
Active/passive architecture
Datacenter-A takes traffic & periodically
replicates data changes to Datacenter-B
DC-A handles all traffic spikes
DC-B has wasted capacity
Code development is easy
Failover is infrequently-tested
Datacenter- Datacenter-
A B
Admin decides when to failover (Active) (Passive)
& manually initiates it Replication
Traffic
Active/active architecture
Datacenters A & B take traffic &
periodically replicate data changes
to other DC
Both DCs handle spikes
Less expensive & less wasted capacity
Continuously tested
Datacenter- Datacenter-
Development is harder A B
Data inconsistency or dual reads (Active) (Active)
Replication
Failover is fast & automatic Traffic