Database Sharding

Database sharding is the process of dividing data into partitions that can be stored across multiple database instances. It partitions data using a key attribute, such as user ID, to determine which database instance a record belongs to. This improves performance and scalability by distributing data and queries across instances.

Uploaded by

TABAHI YADAV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views

Database Sharding

Uploaded by

TABAHI YADAV

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

What is database sharding?

Database sharding is the process of dividing the data into partitions which can then be stored in
multiple database instances.
It uses some key to partition the data. This key is an attribute of the data that you are storing.

Working

Suppose, there are 1000 users in your database and you have 5 database servers. You want to shard
the data on userID. So you can partition the data in the following manner

userID 000 - 199 -> database 1.

userID 200 - 399 -> database 2.
userID 400 - 599 -> database 3.
userID 600 - 799 -> database 4.
userID 800 - 999 -> database 5.

Now if the userID 546 wants to perform read/write operations, he will only connect to database
instance-3. And since there are only 200 userIDs, query processing will be fast.

Note: This is an example of range-based sharding.

Types of Sharding Architectures

Range Based Sharding
In this method, we partition the data based on the ranges of the key.
It is very easy to implement
Data may not be evenly distributed across shards.
In the example below there are 5 tuples and 3 shards.

https://get.interviewready.io/
Key Based Sharding / Hash-Based Sharding
In this method, we generate a hash value of the key (Here the key is one of the attributes of
the data). This hash value determines the shard we will use to store the data.
Using a simple hash function to distribute data can cause skewed distribution. To overcome
this we can use Consistent Hashing.
In the example below there are 6 tuples and 3 shards. We have used a simple hash function
h(x) = x%3

Directory-Based Sharding
In this method, we create a lookup table that uses a shared key to check which shard holds
which data. The lookup maps each key to the shard.
It is more flexible than range and key-based sharding.
The lookup table is a single point of failure.

https://get.interviewready.io/
Difference between Horizontal Partitioning and Sharding
In Horizontal Partitioning, we split the table into multiple tables in the same database instance
whereas in sharding we split the table into multiple tables across multiple database instances.
In Horizontal partitioning, we use the same database instance so the names of the partitioned
tables have to be different. In sharding, since the tables are stored in different database
instances, table names can be the same.

Advantages of Sharding
High availability
Even if one shard crashes, other shards are still functioning and can still process queries. So the
database as a whole remains partially functional.
Provides security
Users can only access certain shards. So you can implement different access control
mechanisms on different shards.
Faster query processing
Since the size of the dataset in each server is small the size of the index is also small. This results
in faster query processing.
Increase read and write throughput
Both read and write capacity increases as long as operations are done on one shard.
High Scalability
Partitioning the data and storing them in different shards provides scalability in terms of data and
memory (because it spreads the load on multiple machines memory usage in each shard is less
and the network bandwidth does not saturate)

Disadvantages of Sharding

https://get.interviewready.io/
Complexity
The server has to know how to route a query to the appropriate shard. If we add the code for
finding the shard in the server, it makes the server more complex.
Transactions and Rollback
You cannot process queries for the two different tables present in different shards. So transactions
across shards are not possible. And therefore rollbacks are also not possible.
Joins across shards
If we want to join two tables from two different shards, then the query needs to go to two different
shards, pull out the data and join the data across the network. This is a very expensive operation.
Infrastructure cost
Sharding requires more machines and computing power over a single database server. If there is
no proper optimization then the increase in cost can be significant.

Hierarchal Sharding
It is very difficult to increase/ decrease the number of shards. So you can only have a fixed number
of shards.

Since the number of shards is fixed, one of the shards could grow too big. To solve this issue we can
do sharding on the large shard. Every shard has a manager who maps the request to the correct mini-
shard. It is known as hierarchal sharding.

In the example below, on the first level, we have 3 shards and we use directory-based sharding. We
again partition Shard 0 and use key-based sharding.

Master Slave architecture for High Availability

If we want to query data from a shard even if the database instance goes offline, we can use master-
slave architecture.

https://get.interviewready.io/
In master-slave architecture, there are multiple slaves which are copying the master. Whenever there
is a written request it is always on the master and whenever there is a read request it is distributed
evenly across the slaves. In case the master fails, slaves choose one master among themselves.

https://get.interviewready.io/

Sop On Outsourcing of Manpower
67% (3)
Sop On Outsourcing of Manpower
4 pages
MySQL Scaling and High Availability Architectures
100% (8)
MySQL Scaling and High Availability Architectures
57 pages
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
User Requirements For A Water For Injection System
67% (3)
User Requirements For A Water For Injection System
42 pages
060311714B - Service Manual DKS Generation 3 - MTECH - DOC
100% (1)
060311714B - Service Manual DKS Generation 3 - MTECH - DOC
369 pages
Workflow Through OOPs
No ratings yet
Workflow Through OOPs
16 pages
Mongo-Sharding and Replication
No ratings yet
Mongo-Sharding and Replication
8 pages
Sharding
No ratings yet
Sharding
13 pages
S Harding
No ratings yet
S Harding
7 pages
Class 7 - Scaling, Sharding, Consistent Hashing
No ratings yet
Class 7 - Scaling, Sharding, Consistent Hashing
4 pages
Next Gen Database
No ratings yet
Next Gen Database
1 page
SHARD
No ratings yet
SHARD
7 pages
Sharding
No ratings yet
Sharding
1 page
6q9k5yndkd9j-SDE DF400 020 Full Deck
No ratings yet
6q9k5yndkd9j-SDE DF400 020 Full Deck
81 pages
S Harding
No ratings yet
S Harding
17 pages
Sharding Strategy in MongoDB
No ratings yet
Sharding Strategy in MongoDB
4 pages
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
No ratings yet
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
25 pages
Database_Sharding_Presentation
No ratings yet
Database_Sharding_Presentation
6 pages
module 2
No ratings yet
module 2
36 pages
Манго Дб
No ratings yet
Манго Дб
28 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
Big Data
No ratings yet
Big Data
12 pages
To Shard or Not To Shard
No ratings yet
To Shard or Not To Shard
31 pages
NoSql Module 2 Part 1
No ratings yet
NoSql Module 2 Part 1
13 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Sharding:: Vertical Scaling Involves Increasing The Capacity of A Single Server, Such As Using A More Powerful CPU
No ratings yet
Sharding:: Vertical Scaling Involves Increasing The Capacity of A Single Server, Such As Using A More Powerful CPU
233 pages
Data
No ratings yet
Data
233 pages
MA023 ADBMS TermWork
No ratings yet
MA023 ADBMS TermWork
234 pages
5 Partitioning
No ratings yet
5 Partitioning
23 pages
NOSQL_MOD2
No ratings yet
NOSQL_MOD2
25 pages
S Harding
No ratings yet
S Harding
7 pages
Partitioning in Distributed Systems
No ratings yet
Partitioning in Distributed Systems
34 pages
4 Key Value
No ratings yet
4 Key Value
30 pages
Distribution Model
100% (1)
Distribution Model
24 pages
Sharding
No ratings yet
Sharding
12 pages
MongoAsia - Scaling
No ratings yet
MongoAsia - Scaling
44 pages
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
No ratings yet
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
13 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
NOSQL M2-P1-P2 PPT
No ratings yet
NOSQL M2-P1-P2 PPT
75 pages
Sharding in MongoDB
No ratings yet
Sharding in MongoDB
4 pages
Lec 18 Notes
No ratings yet
Lec 18 Notes
1 page
DBMS UNIT 4
No ratings yet
DBMS UNIT 4
18 pages
III-sharding-strategies
No ratings yet
III-sharding-strategies
30 pages
22-distributed
No ratings yet
22-distributed
6 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
Mongodb Auto Sharding: Aaron Staple Mongo Seattle July 27, 2010
No ratings yet
Mongodb Auto Sharding: Aaron Staple Mongo Seattle July 27, 2010
53 pages
distributeddbms
No ratings yet
distributeddbms
46 pages
Distributed database system
No ratings yet
Distributed database system
5 pages
Beginner'S Guide To Concepts of Nosql and Mongodb: Documented By: - Maulin Shah
No ratings yet
Beginner'S Guide To Concepts of Nosql and Mongodb: Documented By: - Maulin Shah
5 pages
Cost-Effective Database Scalability Using Database Sharding
No ratings yet
Cost-Effective Database Scalability Using Database Sharding
19 pages
Database Sharding White Paper V1
No ratings yet
Database Sharding White Paper V1
17 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
MongoDB Sharding PDF
No ratings yet
MongoDB Sharding PDF
3 pages
NoSQL - Unit2
No ratings yet
NoSQL - Unit2
8 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
DBMSass2_removed
No ratings yet
DBMSass2_removed
25 pages
21 Distributed
No ratings yet
21 Distributed
6 pages
W7 DBMS Chapter23
No ratings yet
W7 DBMS Chapter23
33 pages
10 NoSQL Databases - HBase Hive Cassandra
No ratings yet
10 NoSQL Databases - HBase Hive Cassandra
74 pages
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
ABD0200 DO-254: Interactive Guidance For Safety Critical Avionics
No ratings yet
ABD0200 DO-254: Interactive Guidance For Safety Critical Avionics
6 pages
Non-Linear Curve Fit Proof
0% (1)
Non-Linear Curve Fit Proof
5 pages
Adinarayana Devops
No ratings yet
Adinarayana Devops
5 pages
Unit-II Mos Transistor Theory
No ratings yet
Unit-II Mos Transistor Theory
27 pages
MGMT Brief
No ratings yet
MGMT Brief
52 pages
COMPUTING - BASIC 6 - T1
No ratings yet
COMPUTING - BASIC 6 - T1
6 pages
Configuration Management Procedure V1R0 Draft 1.1
No ratings yet
Configuration Management Procedure V1R0 Draft 1.1
7 pages
Bashore To Radnor Distributed by Bashore at April 13 Public Meeting
No ratings yet
Bashore To Radnor Distributed by Bashore at April 13 Public Meeting
20 pages
Cau Hinh VOIP Gpon
No ratings yet
Cau Hinh VOIP Gpon
10 pages
Hon Fin4000mik 100K PDF
No ratings yet
Hon Fin4000mik 100K PDF
2 pages
Methods of Data Collection Advantages and Disadvantages
100% (2)
Methods of Data Collection Advantages and Disadvantages
4 pages
The Decorator Pattern - Decorating Objects - Head First Design Patterns
No ratings yet
The Decorator Pattern - Decorating Objects - Head First Design Patterns
31 pages
PiggyBank (1)
No ratings yet
PiggyBank (1)
5 pages
How Do I Figure Out How Much Time I Spend Facetiming Someone - Google Search PDF
No ratings yet
How Do I Figure Out How Much Time I Spend Facetiming Someone - Google Search PDF
1 page
Binsearch VFR 02 02
No ratings yet
Binsearch VFR 02 02
18 pages
General Mathematics - Lesson 2
No ratings yet
General Mathematics - Lesson 2
10 pages
Magnum 1 Minute System
No ratings yet
Magnum 1 Minute System
21 pages
Wpice For Wpide Hardware Installation Quick Start1.1
No ratings yet
Wpice For Wpide Hardware Installation Quick Start1.1
2 pages
FOCS 2014-15
No ratings yet
FOCS 2014-15
4 pages
Digital Electronics
No ratings yet
Digital Electronics
77 pages
DCSD 20.2F Computer Networks
No ratings yet
DCSD 20.2F Computer Networks
6 pages
Series 9180/ 9270 Power Amplifier
No ratings yet
Series 9180/ 9270 Power Amplifier
20 pages
Queensland Rail - DXC Success Story
No ratings yet
Queensland Rail - DXC Success Story
3 pages
Brochure GV55 Series
No ratings yet
Brochure GV55 Series
2 pages
17 Lagrange Interpolation Mathematica Program
No ratings yet
17 Lagrange Interpolation Mathematica Program
14 pages
OPC Exercises
No ratings yet
OPC Exercises
14 pages

Database Sharding

Uploaded by

Database Sharding

Uploaded by

What is database sharding?

userID 000 - 199 -> database 1.

Note: This is an example of range-based sharding.

Types of Sharding Architectures

Master Slave architecture for High Availability

You might also like