0% found this document useful (0 votes)

24 views23 pages

5 Partitioning

Partitioning involves splitting large tables into multiple chunks or partitions that are distributed across database nodes. This helps distribute data and queries more evenly to avoid overloaded nodes. Partitions can be split by key ranges or hashed key ranges. Local indexes only index local data while global indexes index all data but require coordination. When adding nodes, some partitions are split and moved to keep most keys in the same place while balancing load. The partitioning strategy must be chosen to best support the query patterns and scaling needs of the application.

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views23 pages

5 Partitioning

Uploaded by

rohit kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Partitioning

What is partitioning?
In large systems, we are dealing with tons of data, and as a result tables may
become too big/perform too many queries for one single machine.
Partitioning is splitting this table up into many chunks to go on various database
nodes.
Partitioning is often used in conjunction with replication.
Objectives of partitioning
On each node we want:
● A relatively similar amount of data
● A relatively similar amount of reads/and writes to the data

If we are unable to achieve this, certain nodes will be overloaded relative to

others, known as hot spots
Methodologies for partitioning
● By ranges of keys
● By ranges of the hash of keys
○ Note: do not take a hash of the key and then do a modulo with the number of nodes, as this
will cause the location of every key to change if a partition node is added or removed
Key Range Partitioning
● Not necessarily even ranges, of keys, some ranges may be
hotter
● Keep keys sorted within a partition to best support range
queries

Pros:

● Simple and allows for effective range queries

Cons:

● Have to actually determine the ranges to make

sure they are relatively even in data and load
(can be done manually or by database)
● Can easily lead to hotspots (if for example
partitioning by range of timestamps)
Hash Range Partitioning
Take a hash of the key, and put it into the proper partition

:f22476

f22476:r91mbb

r91mbb:
Hash Range Partitioning
Take a hash of the key, and put it into the proper partition

jordan
:f22476
hash function
f22476:r91mbb
po23av

r91mbb:
Hash Range Partitioning
Take a hash of the key, and put it into the proper partition

jordan
:f22476
hash function
f22476:r91mbb
po23av

r91mbb:
Hash Range Partitioning Tradeoffs
Pros:
● Keys are evenly distributed between nodes (assuming good hash function)

Cons:
● No more range queries on the partition key, have to check every partition
● If a key has a lot of activity will still lead to hot spots
Indexes in a partitioned database configuration
Recall: An index is additional metadata that shows memory addresses of rows
corresponding to certain field values in the row

Secondary Index

Position: point guard - [10, 14, 21, 37]

Position: center - [1, 8, 12, 19]
Position: power forward - [5, 11]
Position: small forward - [6, 7, 13, 22]
Position: shooting guard - [3, 15, 16]
Secondary Index Options
● Local Indexes
● Global Indexes
Local Indexes
Idea: Hold a secondary index that only holds rows from the partition the index is located on
ID Name Position Secondary Index

1 Michael Jordan Shooting Guard Position: point guard - []

Position: center - []
2 Lebron James Small Forward Position: power forward - []
Position: small forward - [1, 3]
3 Kobe Bryant Shooting Guard Position: shooting guard - [2]

ID Name Position Secondary Index

65 Khris Middleton Small Forward Position: point guard - [66]

Position: center - [67]
66 Chris Paul Point Guard Position: power forward - []
Position: small forward - [65]
67 Dwight Howard Center Position: shooting guard - []
Local Index Tradeoffs
Pros:
● Fast on write because all data that is being kept track of is being stored locally
on the partition
Cons:
● Slow on read because if using a secondary index have to query every
partition to accumulate the index results
Global Indexes
Idea: Partition the secondary index, the index can contain references to data on any partition
ID Name Position Secondary Index

1 Michael Jordan Shooting Guard Position: point guard - [66]

Position: center - [67]
2 Lebron James Small Forward

3 Kobe Bryant Shooting Guard

ID Name Position Secondary Index

65 Khris Middleton Small Forward Position: power forward - []

Position: small forward - [2, 65]
66 Chris Paul Point Guard Position: shooting guard - [1, 3]

67 Dwight Howard Center

Global Index Tradeoffs
Pros:
● Fast on read because all data for that index is being kept on one partition
node
Cons:
● Slow on write because need to write to multiple partitions to update all of the
various secondary indexes
● May require a distributed transaction (imagine the case one write succeeds
and the other fails)
Rebalancing Partitions
If a node is added or removed, the goal is to keep the majority of the keys in the
same place, and only move a few from each node so that we do not use a ton of
bandwidth remapping every key (recall to use hash ranges instead of modulo)
Fixed Number of Partitions

In this system, we have 20 partitions no matter how many nodes there are
Fixed Number of Partitions

In this system, we have 20 partitions no matter how many nodes there are
Take all of the white chunks from each server and pass them to the new server!
Fixed Number of Partitions - Considerations
Choose a number of partitions that is reasonable:
● If too low, each partition will get too big and we will not be able to scale the
application further (additionally transferring the partition to another node will
take a super long time)
● If too high, there will be a lot of overhead on disk devoted to each partition

If your dataset is going to vary significantly in the future, maybe this isn’t for you
Dynamic Partitioning
Certain databases will adjust partition ranges dynamically so that they can reduce
hot spots as the data access patterns change over time:
● Once a partition becomes too big, it is split into two pieces and one is
assigned to another node
● Sometimes dynamic partitioning is not good because if the database
incorrectly assumes a node is down, when there is actually just a slow
network, it will repartition leading to more strain on the network
Fixed number of partitions per node
● Each node has a certain number of partitions on it which grow in size
proportionally to the dataset
● If a new node joins the cluster it will split some of the partitions on existing
nodes into two pieces, and take those for itself
● Very similar to consistent hashing algorithm to avoid unfair data splits
Sharding Summary
Unlike replication, which is always important to have (to increase availability),
partitioning adds a lot of complexity to a system and should mainly only be used
when the dataset has gotten big enough that putting the whole table on a single
node is infeasible.

Generally requires some sort of

coordination service or gossip
protocol in order to keep track of
which range corresponds to which
partition
Sharding Summary Continued
Partitioning Methodology:

● Key ranges are better when we need to perform range queries

● Key hash ranges are better when we want to more evenly distribute data

Index Choices:

● Local indexes optimize for write speed

● Global indexes optimize for read speed

Rebalancing Choices:

● Fixed number of partitions is simpler to reason about, but requires choosing a good number
● A changing number of partitions may scale better, but doing so automatically may lead to
unnecessarily rebalancing and putting extra stress on our databases

GTAG 9 Identity and Access Management 11 07
100% (1)
GTAG 9 Identity and Access Management 11 07
32 pages
SQL Server Partitioning
100% (2)
SQL Server Partitioning
20 pages
DeltaV Problems and Solutions - PCEDCS
100% (1)
DeltaV Problems and Solutions - PCEDCS
10 pages
Parallel Databases
No ratings yet
Parallel Databases
19 pages
IO Parallelism
No ratings yet
IO Parallelism
4 pages
Slide 5
No ratings yet
Slide 5
43 pages
U4 - 5 I o Parallelism
No ratings yet
U4 - 5 I o Parallelism
8 pages
Lec 18 Notes
No ratings yet
Lec 18 Notes
1 page
Partitioning For Database Performance
No ratings yet
Partitioning For Database Performance
3 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
No ratings yet
Where To Leave The Data ?: - Parallel Systems - Scalable Distributed Data Structures - Dynamic Hash Table (P2P)
39 pages
Deep Dive Dynamo DB
No ratings yet
Deep Dive Dynamo DB
88 pages
Unit I
No ratings yet
Unit I
43 pages
Oracle 11g Partitioning
No ratings yet
Oracle 11g Partitioning
11 pages
CH14
No ratings yet
CH14
43 pages
Database Partitioning A Review Paper
No ratings yet
Database Partitioning A Review Paper
4 pages
CDA C2 R 074 en File 68.en
No ratings yet
CDA C2 R 074 en File 68.en
3 pages
Third Year Engineering: 21BTCS604 - Advanced DBMS
No ratings yet
Third Year Engineering: 21BTCS604 - Advanced DBMS
51 pages
How To Partition PostgreSQL Database
No ratings yet
How To Partition PostgreSQL Database
8 pages
Partitioning in Oracle Database 10g: An Oracle White Paper Feburary, 2005
No ratings yet
Partitioning in Oracle Database 10g: An Oracle White Paper Feburary, 2005
7 pages
A Comprehensive Guide To Oracle Partitioning With Samples
No ratings yet
A Comprehensive Guide To Oracle Partitioning With Samples
36 pages
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
No ratings yet
I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems
42 pages
Lecture 2 Lecture PPT #3,4,5,6
No ratings yet
Lecture 2 Lecture PPT #3,4,5,6
34 pages
Database Sharding
No ratings yet
Database Sharding
5 pages
Data Partitioning
No ratings yet
Data Partitioning
5 pages
Partitioning PDF
No ratings yet
Partitioning PDF
5 pages
2 Parallel Databases
No ratings yet
2 Parallel Databases
44 pages
Class 7 - Scaling, Sharding, Consistent Hashing
No ratings yet
Class 7 - Scaling, Sharding, Consistent Hashing
4 pages
Oracle Performance Tuning - Oracle Partitioning - Introduction
No ratings yet
Oracle Performance Tuning - Oracle Partitioning - Introduction
57 pages
Partitioning in Distributed Systems
No ratings yet
Partitioning in Distributed Systems
34 pages
Performance Tuning - Partitioning
No ratings yet
Performance Tuning - Partitioning
11 pages
Ads Mse
No ratings yet
Ads Mse
22 pages
Oracle Partitioning For Developers
No ratings yet
Oracle Partitioning For Developers
70 pages
Basics of Partitioning
100% (1)
Basics of Partitioning
2 pages
Oracle Partitioning
No ratings yet
Oracle Partitioning
6 pages
Chapter 21: Parallel Databases
No ratings yet
Chapter 21: Parallel Databases
43 pages
Partitioning
No ratings yet
Partitioning
224 pages
Erfo Rma Nce With L5. 1 An D5. 5 Tion Ing: Giuseppe Maxia Mysql Community Team Lead Sun Microsystems
No ratings yet
Erfo Rma Nce With L5. 1 An D5. 5 Tion Ing: Giuseppe Maxia Mysql Community Team Lead Sun Microsystems
103 pages
Partitioned Tables and Indexes: Introduction To Partitioning
No ratings yet
Partitioned Tables and Indexes: Introduction To Partitioning
18 pages
Partitioning
No ratings yet
Partitioning
3 pages
S Harding
No ratings yet
S Harding
7 pages
3 RD Unit Partioning
No ratings yet
3 RD Unit Partioning
3 pages
Ads QB
No ratings yet
Ads QB
17 pages
Mongo-Sharding and Replication
No ratings yet
Mongo-Sharding and Replication
8 pages
Partitioning - DW
No ratings yet
Partitioning - DW
14 pages
Database Partitioning With MySQL
No ratings yet
Database Partitioning With MySQL
6 pages
18 Partitioned Tables and Indexes: Introduction To Partitioning
No ratings yet
18 Partitioned Tables and Indexes: Introduction To Partitioning
84 pages
Application Partitioning WP
No ratings yet
Application Partitioning WP
3 pages
Partition Table in STARS Concept and Evaluations
No ratings yet
Partition Table in STARS Concept and Evaluations
8 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
Partitioned Tables and Indexes
100% (1)
Partitioned Tables and Indexes
24 pages
Data Partition Survey
No ratings yet
Data Partition Survey
23 pages
Partitioning Techniques With Respect To Performance Tuning: Hash Technique Column1 Column2 Column3
No ratings yet
Partitioning Techniques With Respect To Performance Tuning: Hash Technique Column1 Column2 Column3
4 pages
Oracle 12c Partitioned and Subpartitioned Tables
No ratings yet
Oracle 12c Partitioned and Subpartitioned Tables
24 pages
An Optimized Scheme For Vertical Partitioning of A
No ratings yet
An Optimized Scheme For Vertical Partitioning of A
8 pages
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
No ratings yet
Cloud Computing Unit-3 Complete Notes 13-09-2024 Complete Notes
25 pages
Chapter 20: Parallel Databases
No ratings yet
Chapter 20: Parallel Databases
6 pages
DB Partitioning
No ratings yet
DB Partitioning
11 pages
4 Key Value
No ratings yet
4 Key Value
30 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
Udaya Parsa Resume
No ratings yet
Udaya Parsa Resume
8 pages
ICT Question Bank
No ratings yet
ICT Question Bank
4 pages
Using IndexedDB - Web APIs - MDN
No ratings yet
Using IndexedDB - Web APIs - MDN
26 pages
AA Incedo Business Pro Device Configuration Processes v2.2 ENG
No ratings yet
AA Incedo Business Pro Device Configuration Processes v2.2 ENG
145 pages
My Resume
No ratings yet
My Resume
1 page
Telemedicine Documentation
No ratings yet
Telemedicine Documentation
100 pages
Software Requirements Specification: For Online Movie Ticket Booking System
No ratings yet
Software Requirements Specification: For Online Movie Ticket Booking System
14 pages
Information Technology SQP-01 2024
No ratings yet
Information Technology SQP-01 2024
5 pages
AIN2601-22-S1 - Study Unit 1 - Data and The Computerised Information System Process
No ratings yet
AIN2601-22-S1 - Study Unit 1 - Data and The Computerised Information System Process
11 pages
Research Gap Analysis Template
No ratings yet
Research Gap Analysis Template
47 pages
Bca Exam Schedule
No ratings yet
Bca Exam Schedule
2 pages
Ali Khan
No ratings yet
Ali Khan
7 pages
AZ-305 StudyGuide ENU v101 1.0a
No ratings yet
AZ-305 StudyGuide ENU v101 1.0a
8 pages
Unit 5
No ratings yet
Unit 5
102 pages
Informatica February Release
No ratings yet
Informatica February Release
15 pages
E - Neplan SmartGrid v2 1
No ratings yet
E - Neplan SmartGrid v2 1
1 page
CSC Project Hospital Management System
No ratings yet
CSC Project Hospital Management System
31 pages
Informatica TDM Resume
No ratings yet
Informatica TDM Resume
16 pages
Back Allocation Overview 4442573 01
100% (1)
Back Allocation Overview 4442573 01
12 pages
Process Documents Created by Me
No ratings yet
Process Documents Created by Me
31 pages
Multilab IV PDF
No ratings yet
Multilab IV PDF
240 pages
TELSCOPE User Manual SW 1.24 Ver 1.0
No ratings yet
TELSCOPE User Manual SW 1.24 Ver 1.0
52 pages
CCS334-Big-Data-Analytics UNIVERSITY QP
No ratings yet
CCS334-Big-Data-Analytics UNIVERSITY QP
20 pages
Docs Jboss Org Hibernate Orm 5 2 Userguide HTML Single Hiber
No ratings yet
Docs Jboss Org Hibernate Orm 5 2 Userguide HTML Single Hiber
365 pages
Vicky Kumar Resume
No ratings yet
Vicky Kumar Resume
1 page
Application For Smart Hall Arrangement For Examinations
No ratings yet
Application For Smart Hall Arrangement For Examinations
5 pages
DBMS MCQ Final
No ratings yet
DBMS MCQ Final
49 pages
National University of Computer & Emerging Sciences, FAST, Islamabad Computer Science Department
No ratings yet
National University of Computer & Emerging Sciences, FAST, Islamabad Computer Science Department
5 pages

5 Partitioning

Uploaded by

5 Partitioning

Uploaded by

Partitioning

If we are unable to achieve this, certain nodes will be overloaded relative to

● Simple and allows for effective range queries

● Have to actually determine the ranges to make

Position: point guard - [10, 14, 21, 37]

1 Michael Jordan Shooting Guard Position: point guard - []

ID Name Position Secondary Index

65 Khris Middleton Small Forward Position: point guard - [66]

1 Michael Jordan Shooting Guard Position: point guard - [66]

3 Kobe Bryant Shooting Guard

ID Name Position Secondary Index

65 Khris Middleton Small Forward Position: power forward - []

67 Dwight Howard Center

Generally requires some sort of

● Key ranges are better when we need to perform range queries

● Local indexes optimize for write speed

You might also like