How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
This is your last free member-only story this month. Upgrade for unlimited access.
But there are the following disadvantages with the current architecture.
In this case, we don’t have failover and redundancy. If a server goes down, all
goes down.
Usually, the Domain Name System (DNS) server is used as a paid service
provided by the hosting company and is not running on your own server.
We must decide how to scale the system. In this case, there are the following
two types of scaling: scale-up and scale-out .
Get unlimited access
Scaling up: add more RAM and CPU to the existing server
This is also called “ vertical scaling ”, it refers to the resource maximization
of a system to expand its ability to handle the increasing load — for example,
we increase more power to our server by adding RAM and CPU.
Vertical scaling is a good option for small systems and can afford the
hardware upgrade but it also comes with serious limitations as follows.
When we upgrade RAM to the system, we must shut down the server and
so, if the system just has one server, downtime is unavoidable.
Scaling up not only applies to hardware terms but also in software terms, for
example, it includes optimizing queries and application code.
With the growth of the number of users, one server is never enough
Horizontal scaling often initially costs more because we need more servers
for the most basic but it pays off at the later stage. We need to do trade-offs.
The code of the system also needs changes to allow parallelism and
distribution of work among multiple servers.
Typically a load balancer sits between the client and the server accepting
incoming network and application traffic and distributing the traffic across
multiple backend servers using various algorithms. So, it can also be used in
various places, for example; between the Web Servers and the Database
servers and also between the Client and the Web Servers.
HAProxy and NGINX are two popular open-source load balancing
software.
If server 1 goes offline, all the traffic will be routed to server 2 and server
3. The website won’t go offline. You also need to add a new healthy server
to the server pool to balance the load.
When the traffic is growing rapidly, you only need to add more servers to
the web server pool and the load balancer will route the traffic for you.
Fastest response time: the server that has the fastest response time (either
recently or frequently) will be directed to the request.
Weighted: the more powerful servers will receive more requests than the
weaker ones underweighted strategy.
IP Hash: in this case, a hash of the IP address of the client is calculated to
redirect the request to a server.
Layer 4: the load balancer uses the information provided by TCP at the
network layer. At this layer, it usually selects a server without looking at
the content of the request.
SQL tuning.
Master-slave replication
The master-slave replication technique enables data from one database
server (the master) to be replicated to one or more other database servers
(the slaves) like in the below figure.
All update made to the master
The data would then ripple through the slaves until all of the data is
consistent across the servers.
If the master server goes down for whatever reason, the data will still be
available via the slave, but new writes won’t be possible.
Master-master replication
Each database server can act as the master at the same time as other servers
are being treated as masters. At some point in time, all of the masters sync
up to make sure that they all have correct and up-to-date data.
All nodes read and write all data
If one master fails, the other database servers can operate normally and
pick up the slack. When the database server is back online, it will catch
up using replication.
Federation
Federation (or functional partitioning) splits up databases by function. For
example, instead of a single, monolithic database, you could have three
databases: forums, users, and products, resulting in less read and write
traffic to each database and therefore less replication lag.
Smaller databases result in more data that can fit in memory, which in turn
results in more cache hits due to improved cache locality. With no single
central master serializing writes you can write in parallel, increasing
throughput.
Sharding
Sharding (also known as data partitioning), is a technique to break up a big
database into many smaller parts such that each database can only manage a
subset of the data.
In the ideal case, we have different users all talking to different database
nodes. It helps to improve the manageability, performance, availability, and
load balancing of a system.
Each user only has to talk to one server, so gets rapid responses from that
server.
Horizontal partitioning
In this technique, we put different rows into different tables. For example, if
we are storing profiles of users in a table, we can decide that users with IDs
less than 1000 are stored in one table, and users with IDs greater than 1001
and less than 2000 are stored in the another
we put different rows into different tables
Vertical partitioning
In this case, we divide our data to store tables related to a specific feature in
their own server. For example, if we are building an Instagram-like system —
where we need to store data related to users, photos they upload, and people
they follow — we can decide to place user profile information on one DB
server, friend lists on another, and photos on a third server.
we divide our data to store tables related to a specific feature in their own server
We can use this method when a data store is likely to need to scale beyond
the resources available to a single storage node or to improve performance
by reducing contention in a data store. But keep in mind that there are some
common problems with Sharding techniques as follows.
Database joins become more expensive and not feasible in certain
cases.
The data distribution is not uniform and there is a lot of load on a shard.
Denormalization
Denormalization attempts to improve read performance at the expense of
some write performance. Redundant copies of the data are written in
multiple tables to avoid expensive joins.
In most systems, reads can heavily outnumber writes 100:1 or even 1000:1. A
read resulting in a complex database join can be very expensive, spending a
significant amount of time on disk operations.
SQL
Relational databases store data in rows and columns. Each row contains all
the information about one entity and each column contains all the separate
data points.
Some of the most popular relational databases are MySQL, Oracle, MS SQL
Server, SQLite, Postgres, and MariaDB.
NoSQL
It also called non-relational databases. These databases are usually grouped
into main five categories: Key-Value, Graph, Column, Document, and Blob
stores.
Key-Value stores
Data is stored in an array of key-value pairs. The ‘ key ’ is an attribute name
that is linked to a ‘ value ’.
Document databases
Wide-column databases
Columnar databases are best suited for analyzing large datasets, big names
include Cassandra and HBase.
Graph databases
These databases are used to store data whose relations are best represented
in a graph. Data is saved in graph structures with nodes (entities), properties
(information about the entities), and lines (connections between the
entities).
Blob databases
Blobs are more like a key/value store for files and are accessed through APIs
like Amazon S3, Windows Azure Blob Storage, Google Cloud Storage,
Rackspace Cloud Files, or OpenStack Swift.
In the above scenario, the load balancer can achieve maximum efficiency
because it can select any server for optimal request handling.
Advanced Concepts
Caching
Load balancing helps you scale horizontally across an ever-increasing
number of servers, but caching will enable you to make vastly better use of
the resources you already have, so that the data may be served faster during
subsequent requests.
If data is not in cache, get it from database then save it to cache and read from
By adding caches to our servers, we can avoid reading the webpage or data
directly from the server, thus reducing both the response time and the load
on our server. This helps in making our application more scalable.
Caching can be applied at many layers such as the database layer, web server
layer, and network layer.
The use of CDN improves page load time for users as the data is retrieved at a
location closest to it. This also helps in increasing the availability of content,
since it is stored at multiple locations.
The use of CDN improves page load time for users as the data is retrieved at a location closest to it
The CDN servers make requests to our Web server to validate the content
being cached and update them if required. The content being cached is
usually static such as HTML pages, images, JavaScript files, CSS files, etc.
Go Global
When your app goes global, you will own and operating data centers around
the world to keep your products running 24 hours a day, 7 days a week.
Incoming requests are routed to the “best” data center based on GeoDNS.
SQL Tuning.
Elastic Computing.
Easy, right?
References
[1] https://httpd.apache.org
[2] http://tomcat.apache.org
[3] https://www.oracle.com/database/
[4] https://www.mysql.com
[5] https://en.wikipedia.org/wiki/Domain_Name_System
[6] https://www.facebook.com/note.php?note_id=10150468255628920
Level Up Coding
Thanks for being a part of our community! Level Up is transforming tech
recruiting. Find your perfect job at the best companies.