0% found this document useful (0 votes)
9 views

Lecture Notes

Uploaded by

Rajan Thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture Notes

Uploaded by

Rajan Thakur
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 87

Architecting Distributed

Cloud Applications
Jeffrey Richter
Microsoft Software Architect
Jeffrey Richter: Microsoft Azure Software
Architect, Wintellect Co-Founder, &
Author

Architecting Distributed Cloud Apps


[email protected] 6.5hr technology-agnostic course
YouTube: http://aka.ms/RichterCloudApps
www.linkedin.com/in/ EdX:
JeffRichter https://aka.ms/edx-devops200_9x-about

@JeffRichter
Jeffrey Richter: Microsoft Software Engineer, Wintellect
Co-Founder, & Author
Architecting Distributed Cloud
Apps
http://aka.ms/RichterCloudApps

[email protected]
www.linkedin.com/in/JeffRichter
@JeffRichter
Course purpose
 Properly architecting distributed cloud apps
requires a new mindset towards software
development and introduces many new terms
and patterns
 The purpose of this course is to delve into
many of these terms, patterns & engineering
trade-offs while being technology-agnostic
 Topics include: orchestrators, datacenters, containers, networking,
messaging, versioning, configuration, storage services, and disaster
recovery
Why cloud apps?
Feature Past Present
Clients Enterprise/Intranet Public/Internet
Demand Stable (small) Dynamic (small  massive)
Datacenter Single tenant Multi-tenant
Operations People (expensive) Automation (cheap)
Scale Up via few reliable (expensive) Out via lots of (cheap) commodity
PCs PCs
Failure Unlikely but possible Very likely
Machine Catastrophic Normal (no big deal)
 We must do things differently when building
loss

cost-effective, failure-resilient solutions


Example Past Present
Exceptions Catch, swallow & keep Crash & restart
running
Communicatio In order Out of order
n Exactly once Clients must retry & servers must be
Cloud computing is all about embracing
failure
 Some reasons why a service instance may fail
(stop)
 Developer: Unhandled exception
 DevOps: Scaling the number of service instances down
 DevOps: Updating service code to a new version
 Orchestrator: Moving service code from one machine to another
 Force majeure: Hardware failure (power supply, fans [overheating],
hard disk, network controller, router, bad network cable, etc.)
 Force majeure: Data center outages (natural disasters, attacks)
 Since failure is inevitable & unavoidable,
embrace it
 Architect assuming failures will happen; thinks cattle, not pets
 Use an orchestrator that avoids single points of failure
Infrastructure/Platform/Containers/Functions
as a Service
aka Orchestrators
 Manage a cluster’s (set of PC/VMs) lifecycle,
networking,
health, upgrades, scaling, &Region
deploys/runs service
code
Cluster’s Virtual Network
Service Code PC/VM PC/VM PC/VM
Repository Servic Servic Servic
e e e
Servic
Code Code Code
e
Code PC/VM PC/VM PC/VM
Load Servic Servic Servic
Balancer e e e
Code Code Code
Regions, availability zones, & fault
domains
Region
Your Availability Zone #1
(Independent power & networking)
app’s AZ #2
public Rack #1 Rack #2
endpoint Private
PC #1 PC #2 PC #1 PC #2
fiber-
VM #1 VM #1 VM #1 VM #1 optic
VM #2 VM #2 VM #2 VM #2
network
AZ #3

 A fault domain is a unit of failure


 Hierarchy: Planet/Region/Availability Zone/Rack/PC/VM
 Intra-service communication (replication): More fault tolerance =
higher latency
Applications consist of many
(micro)services E-Commerce Application
Inventory Service

Website Inventory #1
Web Site
Service Data store
Inventory #2
Load #1
Web Site
Balanc
#2
er Web Site
#3
Orders Service
Orders #1
Data store
Orders #2
Each service solves a
domain-specific problem & Orders #3
has exclusive access to its Orders #4
own data store
4 reasons to split a monolith into
microservices
Scale Independently Different Technology
(Balance cost with speed) Stacks
Photo Share Thumbnail Photo Share
Photo Share Thumbnail Thumbnail
Service
Photo Share Service
Thumbnail Service
Service
Photo Share Service Service
Service Service node.js
Service .NET

2+ Clients Conflicting
(Clients adopt new features at will) Dependencies
Photo Share Service
Photo Share SharedLib-
Thumbnail
Service (V1) Thumbnail v1 SharedLib-
Service v7
V1
V2
Video Share
Service (V1) Photo Share Thumbnail
Backward Service Service
compatibility must SharedLib- SharedLib-
be maintained v1 v7
Microservice architecture benefits myths
 Myth: Microservices offer small,
easy-to-understand/manage code bases
 A monolith can use OOP & libraries (requires developer discipline)
 Library changes cause build failures (not runtime failures)
 Myth: A failing service doesn’t impact other
services
 Many services require dependencies be fully functioning
 Hard to write/test code that gracefully recovers when dependency
fails
 We run multiple service instances so there is no such thing as
“failure”
 A monolith is up/down completely; no recovery code
 Orchestrator restarts failed instances keeping them up
Composing SLAs for dependent services
What about the network’s
SLA?
? ? ?
Servic Servic Servic Servic
e e e e

Each Service’s 1 2 3 n Services


SLA Service Services Services
99.99% 99.98% 99.97% 99.99n%
99.99% 260s/mo 520s/mo 780s/mo (n x
260s)/mo
99.999% 99.998% 99.997% 99.999n%
99.999% 26s/mo 52s/mo 78s/mo (n x
26s)/mo
Auto-scaling service instances
Periodically check queue length Periodically check resource
Service- usage
1 Service-1
Client(

Service-
s)

2 Load Service-2
Service- Balance
3 r
Service- Service-3
4

 Periodically check queue length/resource usage


 If growing  scale up; if shrinking  scale down
 Scheduled (day/night,
weekdays/weekends/holidays)
 You’re predicting load based on what you expect
 Potentially dangerous as actual load may be different than predicted
12-Factor Services (Apps)

http://12factor.net
12-factor services (1-5)
1. Single root repo; don’t share code with
another service
2. Deploy dependent libs with service
3. No config in code; read from environment
vars
4. Handle unresponsive service dependencies
robustly
5. Strictly separate build, release, & run steps
 Build: Builds a version of the code repo & gathers dependencies
 Release: Combines build with config  ReleaseId (immutable)
 Run: Runs service in execution environment
12-factor services (6-12)
6. Service is 1+ stateless processes & shares
nothing
7. Service listens on ports; avoid using (web)
hosts
8. Use processes for isolation; multiple for
concurrency
9. Processes can crash/be killed quickly & start
fast
10. Keep dev, staging, & prod environments
similar
The 12 factors are all about…
 Services should be simple to build, test, &
deploy
 Services should be lightweight
 Few dependencies (OS/language/runtime/libraries), run fast, & use
less RAM
 Services should give reproducible results on
developer PC as well as test, staging, &
production clouds
Containers
Container images & containers
 A container image is immutable & defines a
version of
a single service with its dependencies
(runtimes, etc.)
 Use the same container image everywhere: dev, test, staging,
production
 A container runs an PC/VM image in an isolated
Container: Svc- Container: Svc- Container: Svc-
environment
A:v1 v1
Svc-A B:v3 v3
Svc-B A:v2 v2
Svc-A
 Multiple containers
Lib-L v2(services)Lib-L
can run
v3side-by-side
Lib-Lwithin
v3 a single
PC/VM Lib-M v2
Runtime v5 Runtime v7 Runtime v6
Isolation versus density
More isolation More density

Hyper-V
Contain Proces
PC VM Contain
er s
er

Not
Hardware Shared Shared Shared Shared
shared

Not Not Not


OS Kernel Shared Shared
shared shared shared
System
Not Not Not Not
Resources Shared
shared shared shared shared
(ex: File
System)
OS kernel & container images
 Container image must match kernel
(Linux/Windows)
 However, a Windows Hyper-V container can host a Windows or Linux
container image PC
VM VM

Container Container Hyper-V Hyper-V C-1 C-2 C-3 C-4


-1 -2 Container Container
-3 -4

Kernel Kernel

OS Kernel (Linux/Windows) OS Kernel

Hypervisor (Xen/Hyper-V)
Orchestrator starts containers on cluster’s
PCs/VMs
PC/VM

Svc-A:v1

Container
Orchestrator Image
(Docker Client) Docker Daemon Registry
(Ports 2375 & Svc-
"docker run Svc- 2376) A:v1
Svc-
A:v1"
🛈 Orchestrator can Local Registry B:v3
Svc-
restrict container’s RAM Svc-
& CPU usage A:v1
A:v2
CI: Continuous Integration
CD: Continuous Delivery, & Deployment
Continuous
Code Check- Integration
Ins 1. Checks-out code
Source 2. Builds it
Code
3. Creates container
Repository
image

u s u s us
u o u o uo en
i n ry i n ry in ym
nt v e nt v e t
n lo
li li
Container Co De Co De o
C ep t
D Production
Image Test Staging
Registry

Modern DevOps is all about automation; any


failures
Networking
Communication
8 fallacies of distributed computing
http://www.rgoarchitects.com/Files/fallacies.pdf

Fallacy Effect
The network is reliable App needs error handling/retry
Latency is zero App should minimize # of requests
Bandwidth is infinite App should send small payloads
The network is secure App must secure its data/authenticate
requests
Topology doesn't change Changes affect latency, bandwidth, &
endpoints
There is one administrator Changes affect ability to reach destination
Transport cost is zero Costs must be budgeted
The network is Affects reliability, latency, & bandwidth
Service endpoints
 Original design: IP:Port  PC:Service
 Designed to allow a client to talk to a specific service running on a
specific PC
 On 1 IP, you can’t have 2+ services listening on the same port at the
same time
 Today: 1 PC hosts many VMs & 1 VM hosts
many containers; each can run a service
desiring the same port
 Virtualization (hacks) are required to make this
work
 Routing tables, SNAT/DNAT, modification to client code, etc.
 We need something better but too much legacy exists: network cards,
Service scalability & high-availability
 Making things worse, we run multiple service
instances
 For service failure/recovery & scale up/down
 So, instances’ endpoints dynamically change over the service’s
lifetime
 Ideally, we’d like to abstract this from client code
 Each client wants a single stable endpoint as the face of the
dynamically-changing service instances’ endpoints
 Typically, this is accomplished via a reverse
proxy
 NOTE: Every request goes through the RP causing an extra network
hop
 We’re losing some performance to gain a lot of simplification
Forward & reverse proxies
Client Server
Infrastructure Infrastructure
Client-1 (Forward Server-
Reverse 1
)
Proxy
Client-2 Proxy Server-
2
Processes outgoing requests: Processes incoming requests:
• Content filtering • Stable client endpoint over changing
(ex: censoring, translation) server instances’ endpoints
• Caching • Load balancing (Levels 4 [udp/tcp] &
• Logging, monitoring 7 [http]), server selection, A/B
testing
• Client anonymization
• SSL termination
• Caching
• Authentication/validation
• Tenant throttling/billing
• Some DDoS mitigation
Cluster DNS & service reverse proxy
⚠ It’s impossible to keep
endpoints in sync as service
instances come/go;
client code must be robust
against this DNS
Inventory  RP-I Endpoint
Orders  RP-O
Endpoint
Web Site Inventory
Load #1 #1
Web Site Inventory
Balance
#2 RP-I #2
r Web Site Inventory
#3 #3
⚠ WS #1 could Orders #1
fail before I #3 RP-
replies O Orders #2
Reverse proxy load balancer service
probes
Inventory
RP Load #1503
HTTP 200
Balancer
Inventory Inventory
#1
Inventory #2
(no
HTTPreply)
200
#2
Inventory
#3 Inventory
1. Seconds=15, Port=80, #3200
HTTP
Path=HealthProbe.aspx
2. Seconds=15, Port=8080,
Path=HealthProbe.aspx
Turning a monolith into a microservice
Explicit, language-agnostic, multi-version API
contract
(loss of IntelliSense, refactoring & compile-time type-safety)

(de)serializatio
var result = Method(arg1, arg2);
n
 In-process call  network request
 Performance: Worse, increases network congestion, unpredictable
timing
 Unreliable: Requires retries, timeouts, & circuit breakers
 Server code must be idempotent
 Security: Requires authentication, authorization, & encryption
 Required in VNET for compliance or running 3rd-party (untrusted)
code
 Diagnostics: network issues, perf counters/events/logs, causality/call
4 reasons to split a monolith into
microservices
Scale Independently Different Technology
(Balance cost with speed) Stacks
Photo Share Thumbnail Photo Share
Photo Share Thumbnail Thumbnail
Service
Photo Share Service
Thumbnail Service
Service
Photo Share Service Service
Service Service node.js
Service .NET

2+ Clients Conflicting
(Clients adopt new features at will) Dependencies
Photo Share Service
Photo Share SharedLib-
Thumbnail
Service (V1) Thumbnail v1 SharedLib-
Service v7
V1
V2
Video Share
Service (V1) Photo Share Thumbnail
Backward Service Service
compatibility must SharedLib- SharedLib-
be maintained v1 v7
API versioning
 Is an illusion; you must always be backward
compatible
 You’re really always adding new APIs & stating that latest version is
preferred
 The required “version” indicates which API to
call
 http://api.contoso.com/v1.0/products/users
 http://api.contoso.com/products/users?api-version=1.0
 http://api.contoso.com/products/users?api-version=2016-12-07
 Add new API when changing mandatory
parameters, payload format, error codes (fault
contract), or behavior
Defining network API contracts
 Define explicit, formal cross-language API/data
contracts
 “Contracts” defined via code do not work; do not do this
 Ex: DateTime can be null in Java but not in .NET; not all languages
support templates/generics, nullable types, etc.
 Consider https://www.openapis.org/ &
http://swagger.io/
 Use tools to create language-specific client libraries
 Beware of (de)serialization RAM/CPU costs
 Use cross-language data transfer formats
 Ex: JSON/XML, Avro, Protocol Buffers, FlatBuffers, Thrift, Bond, etc.
 Consider embedding a version number in the data structure
Beware leaky RPC-like abstractions
 To “simplify” programming, many technologies
try to
map method calls  network requests
 Examples: RPC, RMI, CORBA, DCOM, WCF, etc.
 These frequently don’t work well due to
 Network fallacies (lack of retries, timeouts, & circuit breakers)
 Chatty (method) versus chunky (network) conversations
 Language-specific data type conversions (ex: dates, times, durations)
 Versioning: Which version to call on the server?
 Authentication: How to handle expiring tokens?
 Logging: Log request parameters/headers/payload, reply
headers/payload?
 NOTE: Servers’ clocks are not absolutely synchronized
Clients must retry failed network
operations
 Client code must retry operations due to
 Network fallacies (timeout, topology changes [avoid sticky sessions])
 Server throttling
 Don’t immediately retry if service unavailable or on error reply
 Never assume a dependent service is already up & running
 To prevent DDoS attacking yourself
 Use exponential back-off & circuit breakers
 https://msdn.microsoft.com/en-us/library/dn589784.aspx
 Client retries assume server handles request
idempotently
Services must implement operations
idempotently
 An idempotent operation can be performed
2+ times with no ill effect
 Methods that input/process/output are
idempotent
 Repeatedly creating a thumbnail of a specific photo produces the
same result
 Methods with side-effects are not idempotent
Exactly Once Semantics
 Repeatedly adding $100 to a specific account produces different
results

 Retry & Idempotency


Idempotent CRUD considerations
Operation HTTP Verb What to do
C id = Create() POST See below pattern
data = GET/HEAD/OPTIONS/ Naturally
R
Read(id) TRACE idempotent
Update(id, PUT Last writer wins
U
data)
 HTTP requiresDELETE
D Delete(id) most verbs (not POST) begone, OK
If already
idempotent
 Idempotency pattern
1. Client: asks server to create a unique ID or client (if trusted) creates
an ID
2. Client: sends ID & desired operation to server  may be
retried
Messaging
Communication
http://ReactiveManifesto.
org/
Messaging communication
 The request/reply pattern is frequently not the
best
 Client may send to busy (not idle) service instance
 Client may crash/scale down while waiting for service instance’s reply
 So, consider messaging communication instead
 Resource efficient
 Client doesn’t wait for service reply (no blocked threads/long-lived
locks)
 Service instance pulls work vs busy service instances pushed more
work
 Services don’t need listening endpoints; clients/services talk to
queue service
 Resilient: client/service instances can come, go, & move at will
 If a service instance fails, another instance processes the message (1+ delivery,
Messaging with queues
Cluster
🛈 Request/reply isn’t required;
Service-B #1 could post to Q-WS1;
not to Q-A
Q-A Q-B
Service- Service-
A B
WebSite Q- #1 #1
Load #1 WS1 Service- Service-
Balanc
WebSite Q-
A B
#2 WS2
er WebSite Q- #2 #2
#3 WS3 Service-
A
#3
🛈 All Service-A & Service-B instances could go down and recovery is
automatic when any come back up; but if WebSite #1 goes down,
originator must retry
Fault-tolerant message processing
 Get msg: DequeueCount++ & hides msg for n
seconds Service-1
 If DequeueCount > threshold (2), log bad msg & delete it
 Else, after processing msg, delete msg from queue
 NOTE: Msgs can be processed 1+ times
& out of order 131
Client-1
Service-2
Client-2
30 210 12103
Client-3
1
221
Additional queue features
 A msg can be sent to multiple “subscribers”
 Allows single msg to be broadcast & processed in parallel
 Ex: Chat msgs, weather/stock/news updates
 Message TTL
 Prevents costs from skyrocketing should consumers never come
online or take too long to process messages
 Consumer-specified invisibility timeout
 Short: Service failure lets another service process the msg right away
 Long: Prevents msg from being processed multiple times
 Service can periodically update timeout if msg actively being
processed
 Service can periodically update msg content enabling efficient
continuation on failure
At-most-once message processing
 Good for time-sensitive data that expires/gets
replaced
 Ex: stock prices, temperature, sports scores, etc.
 Pattern
 Client places msg in queue with maximum TTL
 Service gets msg setting invisibility timeout > maximum TTL
 If consumer crashes, msg expires before becoming visible again
 Result: Msg is processed 0 or 1 times
Versioning Service Code
Service update options
Delete Rolling Blue-Green
& Upload Update Deployment
Cluster Cluster Cluster (or across 2 clusters)
Revers Controlled

V1
V2 V1
V2 V2
V1 V2
V1
e migration
(or VIP
Proxy swap)
V1
V2 V1
V2 V2
V1 V2
V1

V1 V1 V2 V2
V1
V2 V1
V2 V2
V1 V2
V1

V1 V1 V2 V2

V1 V1 V2 V2
Comparing service update options
Feature Delete Rolling Blue-Green
& Upload Update Deployment
Add’l Hardware Costs None None 1x to 2x
Service availability Downtime Reduced Same
scale
Failed update Downtime Reduced Immediate
recovery until scale after
V1 until rollback swap back
redeployed
V2 testability Not with V1 Not with V1 With V1
 Of course, you can 1-Phase
perform 2-Phase
some updates one
Protocol/Schema 1-Phase
way change
& other updates a different way
Rolling update: how to version APIs
 All API requests must pass version info starting
with v1
 New service versions must be backward
compatible
 What about intra-service instance requests?
 During rolling update, old & new service instances run together
 Failure occurs if v2 instance makes v2 API request to v1 service
instance
 Fix by performing a 2-phase update
1. Deploy v2 service instances (which accept v2 & v1 API requests)
 But never send v2 API requests
2. After all instances are v2, reconfigure instances to send v2 API
Gracefully shutting down a service
instance
 12-factor services are stopped via SIGTERM or
Ctrl-C
 Your service code should intercept this, and then…
 Drain inflight requests before stopping its
process
 Use an integer representing requests inflight; initialize to 1 (not 0)
 As requests start/complete, increment/decrement the integer
 To stop, answer all future LB probes with “not ready” so LB stops
sending traffic
 When you’re sure LB stops sending traffic (~30 seconds), decrement
integer
 When integer hits 0, the service process can safely terminate
 NOTE: Don’t let long-running inflight requests prevent process
Service Configuration
& Secrets
Service (re)configuration
 Use config for info that shouldn’t be in source
code
 Account names, secrets (passwords/certificates), DB connection
strings, etc.
 Use Cryptographic Message Syntax (CMS) to avoid clear text secrets
 12-factor services pass config via environment
variables
 Change config: stop process & restart it with new environment
variable values
 When using rolling upgrade to reconfigure, roll
back if new config causes service instance(s) to
fail
Cryptographic Message Syntax (CMS)
 Use CMS to avoid cleartext secrets
 CMS encrypts/decrypts messages via RFC-3852
 Secret producer
 Encrypts cleartext secret for a recipient using a certificate and
embeds the certificate’s thumbprint in cyphertext
 Set the desired setting’s value to the cyphertext
 Secret consumer
 Get the desired setting’s cyphertext value
 Decrypts cyphertext producing the cleartext secret for use in the
service code
 Decryption automatically uses the certificate referenced by the
embedded thumbprint; the certificate must be available
Leader Election
Leader election
 Picks 1 service instance to coordinate tasks
among others
 Leader can “own” some procedure or access to some resource
 At a certain time, chose 1 instance to do billing, report generation,
etc.
 Commonly used to ensure data consistency
 Aggregates results from multiple instances together
 Conserves resources by reducing chance of work being done by
multiple
service instances
 Problem: If leader dies, elect new leader
(quickly)
 These algorithms are hard to implement due to race conditions &
Leader election via a lease
 All service instances execute:
while (!AskDB_IsProcessingDone()) {
bool isLeader = RequestLease()
if (isLeader) {
ProcessAndRenewLease()  NOTE: may crash; lease
abandoned
TellDB_ProcessingIsDone()
} else { /* Continuously try to become the new leader
*/ }
Delay() // Avoid DB DDoS Database
} Service Lease
#1 Work time Done
Leasee expiration
Service (not
#2 2017-07-27 false
true (none)
#1
#3 (expired)
expired)
Service
#3
Leader election via queue message
 At a certain time, insert 1 msg into a
queue
 All service instances execute:
while (true) {
Msg msg = TryDequeueMsg()
if (msg != null) {
/* This instance is the leader */
ProcessMsg()  NOTE: may crash; msg becomes visible
again
DeleteMsg(msg)
} else { /* Continuously try to become the new leader
*/ }
Delay() // Avoid queue DDoS
}
Data Storage Services
Data storage service considerations
 Building reliable & scalable services that
manage state is substantially harder than
building stateless services
 Due to data size/speed, partitioning, replication, leader election,
consistency, security, disaster recovery, backup/restore, costs,
administration, etc.
 So, use a robust/hardened storage service
instead
 When selecting a storage service, fully understand your service’s
requirements & the trade-offs when comparing available services
 It is common to use multiple storage services from within a single
service
Data temperature

Hot/ Warm Cold


RAM Network service & Network service &
Local SSD/disk tape
Latency ms ms-sec min-hour
Cost/GB $$-$ $-¢¢ ¢
Request Very high High Low
rate
Durability Low-high High Very high
Max size MB-GB GB-TB PB-EB
Item size B-KB KB-MB KB-TB
A cache can improve performance
but introduces stale (inconsistent) data
Stateles Stateles Storage
s s Service

Cache
Web Comput
Other
Load e
Internal
Balanc Tiers
er ?
Object Storage Services
Object storage services
 The most frequently-used storage service
 Used for documents, images, audio, video, etc.
 Fast & inexpensive: GB/month storage, I/O requests, & egress bytes
 All cloud providers offer an object storage
service
 Minimal lock-in: It’s relatively easy to move objects across providers if
you
avoid provider-specific features
 Object storage services offer public (read-only)
access
 Give object URLs to clients; URL goes to storage service reducing load
on
your other services!
How a CDN works
US West DC Many Around the
Origin Server World
CDN PoP
(Object Storage
Service) Objec
Objec t
t

Nearby Nearby
Client Client
#1 #2
Objec Objec
t t
Database Storage
Services
DB storage services
 Store many small, related entities
 Features: query, joins, indexing, sorting, stored proc, viewers/editors,
etc.
 Rel-DBs (SQL) require expensive PC for better
size/perf
 For data relationships: a customer  orders
 Supports sorts, joins, & ACID updates
 NonRel-DBs (noSQL) spread data across many
cheap PCs
 For customer preferences, shopping carts, product catalogs, session
state, etc.
 Cheaper, faster, bigger & flexible data models (entity ≈ in-memory
object)
Relational DB vs non-relational DB

Service Service Non-Relational


#1 #1 Database
Service Service Partition
#2 #2 #1
Relational
Service Service Partition
Database
#3 #3 #2
(1 partition)
Service Partition
Service
#4 #3
#4
Service Complex Service
#5 CRUD, #5
joins, sorts, Joins,
Simple
stored procs, sorts,
CRUD
X-table txns etc.
Data partitioning & replicas
 Data is partitioned for size, speed, or both
 Architecting a service’s partitions is often the hardest part of
designing a service
 X-partition ops require network hops & different/distributed
transactions
 How many partitions depends on how much data you’ll have in the
future
 And how you intend to access that data

 Each partition’s data is replicated for


reliability
 Replicating state increases chance of data surviving 1+
simultaneous failures
 But, more replicas increase cost & network latency to sync replicas
 For some scenarios, data loss is OK
Replication: No failure scenario
(consistency & availability)
Database Stores

Replica

AAA
BBB
Replica
Load
Balanc
AAA
BBB
er
Replica

BBB
AAA
Data Consistency
Data consistency
 Strong: 2+ records form relationship at same
time
 ACID transactions: Atomicity, Consistency, Isolation, Durability
 Goal: looks like 1 thing at a time is happening even if work is
complex
 Done via distributed txs/locks across stores; hurts perf & not fault
tolerant
 Weak: 2+ records form relationship eventually
 BASE transactions: Basically Available, Soft state, Eventual
consistency
 Done via communication retries & idempotency across stores
 CAP theorem states
 When facing a network Partition (stores can’t talk to each other):
Replication: Network partition (failure)

Database Stores

Consistency: if enough stores don’t ack


the change, DB won’t respond to avoid
Replica
returning inconsistent data; new store
may come up AAA
Replica
Load
Balanc
AAA
er
Replica
Availability: stores don’t have to ack the
change, DB may respond with
inconsistent data (AAA or BBB) BBB
AAA
Consistency or availability: which is
better?
 Businesses love the service responding to
customers
 Developers love trusting data; but do you
really get this?
 No distributed tx across 2+ services’ DBs
 Ex: You can’t atomically transfer item from Inventory service  Order
service
 Web page/cache data gets out of sync with back-end truth
 CQRS: writes are asynchronous; reads are synchronous
 Apology-based computing
 If software models the real world, then the real world is the truth
 Physical example: Item physically destroyed during shipping
CQRS: Command Query Responsibility
Segregation
 Decouples command & query data models
 Each view can be complex (with relations) & (re)built in background

Service
Command Data store
m an Processor
Com (tables)
User
d AC
Interface QK Eventual
ue consistenc
Dy r y
at
a
Query Data store
Processor (views)
Event Sourcing
 Commonly used with CQRS & Big Data
 Save events in append-only & immutable
tables
Event Source Table for Jeffrey
Account
A Snapshot View for All Accounts
Accou Timestamp Amoun
Timestamp Amoun Memo nt t
t Jeffrey 2017-04- +
2017-04- +$0.00 New 17T01:05:22 $47.65
01T09:00:05 account … Pros
… …
2017-04- + Paycheck  When reading, no event
16T08:28:36 $100.00
Cons locking (good perf)
2017-04- -$52.35 Restaurant
 Boundless
17T01:05:22 storage (but it’s  Write bugs unlikely &
cheap)
… … … can’t corrupt immutable
 Replaying data is time- data
consuming (improve with  Easy to (re)build today,
historical or audit views
Implementing eventual consistency
 A client can determine when data is consistent
 Client reads entities A & B; If A references B but B doesn’t reference A
(yet),
client assumes the relationship doesn’t exist (yet)
 Use fault-tolerant message queues & the Saga
pattern
to guarantee that all operations eventually
complete
 Saga pattern compromises atomicity in order to give greater
availability
Saga pattern
 SEC attempts txs in concurrent or risk-centric
order
 For fault tolerance, operations may be retried & must be idempotent
 2 recovery modes Rental Car
 Backwards: If a tx fails, undo all successful txs Reservation
 Forwards: retry every tx until all are successful Service

Queue Saga Hotel


Execution Controller Reservation
Service

Airplane
Trip Trip Trip
Reservation
Saga Saga Saga Service
Concurrency, Versioning,
& Backup
Concurrency control
 A data entity maintains its integrity for
competing accessors via concurrency control
 Pessimistic: accessor locks 1+ entries (blocking
other accessors), modifies entries, & then
unlocks them
 Bad scalability (1 accessor at a time) & what if locker fails to release
the lock?
 Optimistic: accessor gets 1+ entries with
version IDs (etags), modifies entries’ values, &
updates entries if versions haven’t changed
(still contain the original values)
Optimistic concurrency:
Two instances adding $23 & $32
to Jeff’s balance
Service #1
DB Partition
Account: Jeff
Version: 0001 Account: Aidan
Balance: $100
$123 Version: 0023
Balance: $768

Account: Grant
Version: 0762
Balance: $444
Service #2
Account: Jeff Account: Jeff
Version:
Balance:
0001
0002
$100
$155
$123
$132
X Version:
Balance:
0001
0003
0002
$100
$155
$123
Versioning data schemas
 Use formal, language-agnostic data schemas
 The data is the truth; not the language data type
 All data must specify version info starting with
v1
 New services must be infinitely backward
compatible
 Service v1 might create an entry that isn’t accessed for years
 During rolling update, v1 & v2 instances run
together
 Failure occurs if v2 instance writes v2 data schema & v1 instance
reads it
 Fix by performing a 2-phase update
Backup & restore
 Needed to recover from corrupt data due to
coding bug or a hacker attack
 Detecting data corruption is a hard domain-specific problem
 You periodically backup data in order to restore
it to a known good state
 NOTE: Restore usually incurs some downtime & data loss
 Many DBs don’t support making a consistent backup across all
partitions
 Try hard to avoid cross-partition relationships
 Incremental backups are faster than a full backup but hurt restore
performance
 Make sure you test restore
Recovery point & time objectives
 Recovery Point Objective (RPO)
 Maximum data (in minutes) the business can afford to lose
 Recovery Time Objective (RTO)
 Maximum downtime the business can afford to lose when restoring
data
 Data loss & recovery cannot be completed
prevented
 The earth is a single point of failure
 Deciding RPO & RTO are mostly business decisions
 NOTE: The smaller the RPO/RTO, the more expensive it is to run the
service
Disaster Recovery (DR)
Disaster recovery
 Dealing with a datacenter outage
 Code: Easy, upload to other DC
 Data: Hard, data must be replicated across DCs
 Latency: ~133ms for ½ way around the earth round trip (best case)
 Create similar clusters in different geographical
regions
 When data changes in a cluster, replicate to
other cluster
 Usually batch changes & replicate periodically
 The delay is the RPO as first cluster could die before sending next
batch
 The more clusters, the more
Active/passive architecture
 Datacenter-A takes traffic & periodically
replicates data changes to Datacenter-B
 DC-A handles all traffic spikes
 DC-B has wasted capacity
 Code development is easy
 Failover is infrequently-tested
Datacenter- Datacenter-
A B
 Admin decides when to failover (Active) (Passive)
& manually initiates it Replication
Traffic
Active/active architecture
 Datacenters A & B take traffic &
periodically replicate data changes
to other DC
 Both DCs handle spikes
 Less expensive & less wasted capacity
 Continuously tested
Datacenter- Datacenter-
 Development is harder A B
 Data inconsistency or dual reads (Active) (Active)
Replication
 Failover is fast & automatic Traffic

You might also like