DB Scalable
DB Scalable
• In the end you want to build a Web 2.0 app that can serve millions of users with ZERO
downtime
The Variables
• Scalability - Number of users / sessions / transactions / operations
the entire system can perform
• Performance – Optimal utilization of resources
• Responsiveness – Time taken per operation
• Availability - Probability of the application or a portion of the
application being available at any given point in time
• Downtime Impact - The impact of a downtime of a
server/service/resource - number of users, type of impact etc
• Cost
• Maintenance Effort
High: scalability, availability, performance & responsiveness
Low: downtime impact, cost & maintenance effort
The Factors
• Platform selection
• Hardware
• Application Design
• Database/Data store Structure and Architecture
• Deployment Architecture
• Storage Architecture
• Abuse prevention
• Monitoring mechanisms
• … and more
Let's Start …
• We will now build an example architecture for an
example app using the following iterative incremental
steps –
• Inspect current Architecture
• Identify Scalability Bottlenecks
• Identify SPOFs and Availability Issues
• Identify Downtime Impact Risk Zones
• Apply one of -
• Vertical Scaling
• Vertical Partitioning
• Horizontal Scaling
• Horizontal Partitioning
• Repeat process
Step 1 – Let's Start …
Appserver &
DBServer
Step 2 – Vertical Scaling
Appserver, CPU
DBServer
CPU
RAM RAM
Step 2 - Vertical Scaling
• Introduction
• Increasing the hardware resources without
changing the number of nodes
• Referred to as “Scaling up” the Server
Appserver, CPU CPU
DBServer
CPU CPU • Advantages
RAM RAM
• Simple to implement
RAM RAM • Disadvantages
• Finite limit
• Hardware does not scale linearly
(diminishing returns for each incremental
unit)
• Requires downtime
• Increases downtime Impact
• Incremental costs increase exponentially
Step 3 – Vertical Partitioning
(Services)
• Introduction
AppServer
• Deploying each service on a separate node
DBServer • Positives
• Increases per application Availability
Example • Task-based specialization, optimization and
www.blah.com tuning possible
mail.blah.com • better cache performance
• Reduces context switching
images.blah.com
• No changes to App required
shopping.blah.com
• Flexibility increases
my.blah.com
etc. etc.
Vertical Partitioning (Services)
• Disadvantages
• lower peak capacity
• sub-optimal resource utilization
• coarse load balancing across servers/services
• finite Scalability
• management costs
Understanding Vertical
Partitioning
• The term Vertical Partitioning denotes –
• Increase in the number of nodes by distributing the tasks/functions
• Each node (or cluster) performs separate Tasks
• Each node (or cluster) is different from the other
DBServer
Understanding Horizontal
Scaling
• The term Horizontal Scaling denotes –
• Increase in the number of nodes by replicating the nodes
• Each node performs the same Tasks
• Each node is identical
• Typically, the collection of nodes maybe known as a cluster
• Also referred to as “Scaling Out”
• Introduces SPOF
• An additional variable AppServer AppServer AppServer
• Session reads and writes generate
Disk + Network I/O
• Also known as a Shared Session Session Store
Store Cluster
Load Balancer – Session
Management
Clustered Session Management
• Clustered Session Management Load Balancer
• Easier to setup
• No SPOF
• Network I/O increases exponentially AppServer AppServer AppServer
with increase in number of nodes
• In very rare circumstances a request
may get stale session data
• User request reaches
subsequent node faster than
intra-node message
• Intra-node communication fails
Load Balancer – Session
Management
• Recommendation
• Use Clustered Session Management if you have –
• Smaller Number of App Servers
• Fewer Session writes
• Use a Central Session Store elsewhere
• Use sticky sessions only if you have to
Load Balancer – Removing SPOF
Active-Passive LB
• Multi-Master
• Writes can be sent to any of the multiple masters which replicate them
to other masters and slaves
• Conflict Management required
• Deadlocks possible if same data is simultaneously modified at multiple
places
Replication Considerations
• Asynchronous
• Guaranteed, but out-of-band replication from Master to Slave
• Master updates its own db and returns a response to client
• Replication from Master to Slave takes place asynchronously
• Faster response to a client
• Slave data is marginally behind the Master
• Requires modification to App to send critical reads and writes to
master, and load balance all other reads
Replication Considerations
• Synchronous
• Guaranteed, in-band replication from Master to Slave
• Master updates its own db, and confirms all slaves have updated their
db before returning a response to client
• Slower response to a client
• Slaves have the same data as the Master at all times
• Requires modification to App to send writes to master and load balance
all reads
Replication Considerations
• Replication at RDBMS level
• Support may exist in RDBMS or through 3rd party tool
• Faster and more reliable
• App must send writes to Master, reads to any db and critical
reads to Master
• Replication at Driver level
• Driver layer ensures
• writes are performed on all connected DBs
• Reads are load balanced
• Critical reads are sent to a Master
• In most cases RDBMS agnostic
• Slower and in some cases less reliable
Real Application Cluster
• All DB Servers in the cluster share
a common storage area on a SAN DBServer DBServer DBServer
replication
• Use Master-Slave Async
replication
• Write your layer to ensure
DBServer
• writes are sent to a single DB
• reads are load balanced DBServer DBServer
SAN
Step 7 – Vertical / Horizontal
Partitioning (DB)
Load Balanced
App Servers • Introduction
• Increasing the number of DB Clusters by
dividing the data
• Options
DB DB DB • Vertical Partitioning - Dividing tables /
columns
DB Cluster
• Horizontal Partitioning - Dividing by rows
(value)
SAN
Vertical Partitioning (DB)
• Take a set of tables and move them
onto another DB App Cluster
• Eg in a social network - the users table
and the friends table can be on
separate DB clusters
• Each DB Cluster has different tables DB Cluster 1 DB Cluster 2
DB DB DB DB DB DB DB DB DB DB DB DB
SAN SAN
Thank you!