0% found this document useful (0 votes)
7 views20 pages

TechVault Distributed Databases

The document discusses the challenges of centralized databases, including excessive data, slow query processing, and server failures. It contrasts centralized databases with distributed databases, highlighting advantages such as improved scalability, performance, and geographic distribution. Additionally, it covers concepts like sharding, replication, and the CAP theorem, which addresses consistency, availability, and partition tolerance in database systems.

Uploaded by

Zeyneb El Yebdri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

TechVault Distributed Databases

The document discusses the challenges of centralized databases, including excessive data, slow query processing, and server failures. It contrasts centralized databases with distributed databases, highlighting advantages such as improved scalability, performance, and geographic distribution. Additionally, it covers concepts like sharding, replication, and the CAP theorem, which addresses consistency, availability, and partition tolerance in database systems.

Uploaded by

Zeyneb El Yebdri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Centralized Databases

Challenges

• Too much data

• Slow query processing

• Server failures
Scaling storage and processing
Server Failure

Users

Application

Database

Centralized Database Distributed Database


Distributed Database

Cluster Cluster Cluster

Node Node Node


Distributed Database

Cluster
Distributed Database

Leader

Master

Coordinator

Followers

Segments
Sharding
Centralized Database
Distributed Database
Sharding

SELECT count(*)
FROM orders

SELECT count(*)
FROM orders 20 SELECT count(*)
FROM orders
15
65
30

SELECT count(*)
FROM orders
Replication

Master StandBy

Replication
• Synchronous
• Asynchronous

Segments Replica
Distributed Databases
Advantages

Improved
Scalability
Performance

High Geographic
Availability Distribution
Query Processing

SELECT count(*)
FROM orders

SELECT count(*) SELECT count(*)


FROM orders
20 FROM orders

15
65
30

SELECT count(*)
FROM orders
Query Processing
SELECT count(*)
FROM orders

SELECT count(DISTINCT customer_id)


FROM orders
Query Processing
SELECT count(DISTINCT customer_id)
FROM orders

SELECT count(DISTINCT customer_id)


FROM orders
Query Processing
SELECT customer_id
FROM orders

SELECT count(DISTINCT customer_id)


FROM orders
Query Processing
SELECT DISTINCT customer_id
FROM orders

SELECT count(DISTINCT customer_id)


FROM orders
Consistency

• Wait until node is back online?


• How long to wait?
QUERY
• Is there a backup/replica?
• Return results from the other
nodes?
• …
CAP Theorem
Consistency
• Every read receives the latest write or an error (no stale results)

Availability
• Every request receives a response (but not necessarily the latest data)

Partition Tolerance
• The system continues to operate even if some nodes are unavailable
CAP Theorem

CA Centralized Databases

CP
Distributed Databases

AP
CAP Theorem

• CP (prioritize consistency)
• No stale results
QUERY
• Return error (unavailable)
• CA (prioritize availability)
• Return stale result (not
consistent)

You might also like