REPEAT 2 Architecture Patterns For Multi-Region Active-Active ARC213-R2
REPEAT 2 Architecture Patterns For Multi-Region Active-Active ARC213-R2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Design principles
Foundational pillars
Werner Vogels
CTO Amazon.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Guarding against failure of your applications in one region
Service 1 Service 1
Service 2
Applications Applications Service 2
in Canada in Mumbai
Service 3 Service 3
Service 4 Service 4
2. Waste money
Advantage of multi-region active-active architecture
Backbone
Region A Region B
Minimal data replication requirements
Catalogue Events,
Transactions Server logs
information objects
High-level metrics
monitoring
Pattern 1: Read local, write global
Users in Users in
India Read & write Read Canada
Write
Web Web
server server
Users in Users in
India Read + write Read + write Canada
Web Web
server server
Eventual
Idempotency Static stability
consistency
Exponential Circuit
Throttling
backup breaking
High
availability
Region topology
AWS Region AWS Availability Zone (AZ)
Transit AZ
3
3
3 3
4 3 3
2 3 2 Data center Data center
3 3
3 6 2
4 AZ AZ
3 1
3
2
Data center
3
Transit AZ
3
3
Ohio
Availability Zone A Availability Zone B Availability Zone C
VPC
High Data
availability replication
S3 cross-region replication
Region A Region B
Backbone
• Automatically replicate data to any other AWS Region
• Replicate by object, bucket, or prefix
• Replication time control
S3 replication metrics
BytesPendingReplication ReplicationLatency
OperationsPendingCount
Amazon Elastic Block Store snapshots
• Point-in-time backup
• Stored in S3
• Incremental
EBS volume
EBS snapshot • Cross-region copy
DynamoDB Global Tables
Replica (Europe)
Replica (US)
Replica (Asia)
Global Tables management
Amazon CloudWatch metrics
• ReplicationLatency: Elapsed time of propagating
• PendingReplicationCount: Number of items written to one replica but not
propagated to other regions
Amazon RDS cross-region replication
Replica
Replica Master Replica
Replica
Multi-region consolidation of analytics data
US East
Amazon Redshift
US West
VPC VPC
Database Database
AWS Global Infrastructure
VPC VPC
AWS backbone
VPC peering
Database Database
Foundational pillars of a multi-region active-active architecture
Resource A
Amazon
Route 53
Resource B
*Latency numbers are only examples
Traffic routing with Amazon Route 53
Latency-based routing
Geolocation routing
Resource A
in Canada Central
54.86.52.59 2.3.4.5
52.45.82.211 1.2.3.4
54.86.52.59
52.45.82.211 54.86.52.59
52.45.82.211
3.4.5.6
54.86.52.59 2.3.4.5
52.45.82.211 1.2.3.4
54.86.52.59
52.45.82.211 54.86.52.59
52.45.82.211
3.4.5.6
54.86.52.59 2.3.4.5
52.45.82.211 1.2.3.4
54.86.52.59
52.45.82.211 54.86.52.59
52.45.82.211
3.4.5.6
Canada Central
Instances
Instances
Traffic routing with Amazon CloudFront + Lambda@Edge
Canada Central
Instances
Instances
Traffic routing with Amazon CloudFront + Lambda@Edge
Canada Central
Instances
Instances
Traffic routing with Amazon CloudFront + Lambda@Edge
Canada Central
Instances
Instances
Foundational pillars of a multi-region active-active architecture
StackSet https://amzn.to/2tNVHQl
● Work experience:
○ Network Engineer
○ Corporate IT
○ Small Startups
○ Freelance Work
○ LinkedIn (professional social network)
○ Wish (mobile-first ecommerce platform)
About Us
Who We Are
Leading mobile commerce
platform in US and EU.
Our Mission
To offer the most affordable,
convenient, and effective mobile
shopping mall in
the world.
Global Reach
500M+
Users
1M+
Merchants
200M+
Items
Wish (the company)
600+
Employees
7
Offices
$11.2B
Valuation
High-Level Architecture (before)
AWS Cloud
● Read-heavy application
○ ~90%+ reads
○ Globally sharded+replicated database On-premises
Backbone
DC1 DC3
DC2 DC4
High-Level Architecture (before)
AWS Cloud
On-premises
DC1 Backbone DC3
DC2 DC4
Cross-AZ transfer costs
● What?
○ Service fan-out means at each hop we have a
chance for cross-AZ traffic
Bandwidth total (Gbps)
● Monitoring
○ Prometheus: scraping, storage, alerting
○ Promxy: aggregation, alerting
○ Trickster: caching
High-Level Monitoring Architecture (before)
Promxy
● Scalability
○ ICE: InsufficientInstanceCapacity
● The cloud is elastic, until it isn’t
○ An issue in “crunch” times
○ An instance type might become “hot”
in one or more AZs
○ Mitigate by using many instance types
Why?
● Availability
● At our scale, an outage is too costly
○ Any DR adds complexity, not always worth it
The plan (v1): Active/Active
● Deprecations
○ Chef
○ Icinga
○ Graphite
○ Outdated OS versions
○ Etc.
The plan (v1): Newer systems
● Saltstack
● Kubernetes
● Prometheus
The plan (v1): Change plan
● Context/background
○ Very little cross-AZ traffic, but some
○ Need to support cross-region as well (e.g., notifications cluster)
● Solutions
○ Inter-region VPC peering for AWS <-> AWS
○ Backbone for Colo <-> Colo
Hurdle 2: Data consistency
Region Replication
DC1 DC3
Primary read & write
P S
DC2 DC4
Secondary read
S S
Hurdle 2: Data consistency
● S3
○ Images, static content, etc.
○ Unidirectional cross-region replication
AWS Cloud
Region Region
Replication over
AWS backbone
Hurdle 3: Traffic routing
AWS Cloud
Region Region
Global Promxy Global Alerts Global Promxy
Regional Alerts
Regional Promxy Regional Promxy
Local Alerts
The best laid plans of mice and men
often go awry
- Robert Burns
Adjustments
AWS Cloud
Region Region
On-premise
Backbone
DC1 DC2 DC3 DC4
S S P S
High-Level Architecture (after)
AWS Cloud
Region Region
read
write & consistent read
On-premise
Backbone
DC1 DC2 DC3 DC4
S S P S
Takeaways
Visit aws.amazon.com/training/path-architecting/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Girish Dilip Patil Jonathan Dion Thomas Jackson
linkedin.com/in/girish-cloud linkedin.com/in/jotdion [email protected]
@jotdion linkedin.com/in/jacksontj
github.com/jacksontj
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.