0% found this document useful (0 votes)
4 views

Partitioning for Database Performance

Database partitioning addresses issues like slow queries, large storage requirements, and operational struggles by splitting large tables into smaller partitions based on specific keys. It is particularly useful for very large tables, frequent data archiving, and queries targeting subsets of data. Various partitioning types include range, list, and hash, each with its own pros and cons regarding performance and management.

Uploaded by

Vignesh M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Partitioning for Database Performance

Database partitioning addresses issues like slow queries, large storage requirements, and operational struggles by splitting large tables into smaller partitions based on specific keys. It is particularly useful for very large tables, frequent data archiving, and queries targeting subsets of data. Various partitioning types include range, list, and hash, each with its own pros and cons regarding performance and management.

Uploaded by

Vignesh M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Database Partitioning

What problem are we solving?


🐌 slow queries
scanning a massive tables, even with indexes
sometimes we need to get only a small slice of data
💾 storage
indexes grow big and slow
backups, restore, or replication of large monolithic get heavy
🥊 operational struggles
dropping old data is slow and painful
archiving old records becomes a nightmare

How to use it ?
splitting one large table into smaller (called partitions) based on a key (created_at, user_id, etc.)

When to use it ?
🐘 very large tables (millions to billions of rows)
if you table is growing rapidly and you're noticing slower reads, writes, it's time to
think partitioning.
especially true for time-series data (logs, events, metrics, etc.)
🕢 queries that often target subsets of data
show me users from region X
give me data from the 2022
🪣 need to archive or delete data often
if you're regularly deleting old records (e.g. logs older than 90 days) you can just
DROP a whole partition - way faster than DELETE

How does it work under the hood?


for instance, you can create partition databases by year column values

CREATE TABLE users (


id INT,
name TEXT,
created_at DATE
) PARTITION BY RANGE (created_at);

CREATE TABLE users_2024 PARTITION OF users FOR VALUES FROM ('2024-01-01') TO


('2025-01-01');
CREATE TABLE users_2025 PARTITION OF users FOR VALUES FROM ('2025-01-01') TO
('2026-01-01');

INSERT INTO users (id, name, created_at) VALUES (1, 'Alice', '2024-04-10');
INSERT INTO users (id, name, created_at) VALUES (2, 'Bob', '2025-04-10');

select * from users; -- alice + bob


select * from users_2024; -- alice
select * from users_2025; -- bob

Types or patterns available


🕰️ range partition
PARTITION BY RANGE (created_at)
-- Partition 2: 2024-01-01 to 2025-01-01
-- Partition 1: 2023-01-01 to 2024-01-01

📒 list partition
PARTITION BY LIST (country)
-- Partition 1: 'US', 'CA'
-- Partition 2: 'UK', 'DE'

🔢 hash partition
PARTITION BY HASH (user_id)
PARTITIONS 4;
-- Partition 0: user_id % 4 = 0
-- Partition 1: user_id % 4 = 1

Pros & Cons


👍
🔥 faster execution
🤏 local indexes smaller and faster to search
💅 spread data across multiple files or machines
⌫ drop, truncate or archive individual partitions
🌶️ writes spread across partitions can reduce contention on hot tables
👎
🕰️ need to manage partitions over time
🔑 foreign keys often not supported across partitions
👣 the db has to figure out which partition to route row into
😧 once a partition strategy is chosen, changing it later is a pain

You might also like