0% found this document useful (0 votes)

28 views52 pages

Chapter 4 - Distributed Database System

Chapter 4 discusses distributed database systems, covering concepts, design, query processing, and transaction management. It highlights advantages such as fault tolerance and scalability, while also addressing challenges like data consistency and complexity. The chapter outlines key features, types of distributed databases, and essential considerations for effective design and management.

Uploaded by

nathanshumis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views52 pages

Chapter 4 - Distributed Database System

Uploaded by

nathanshumis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Chapter 4: Distributed Database System

Instructor: Melaku M.

Target Group: G3 SE
Outline

❖ Concepts of distributed database

❖ Distributed database design

❖ Distributed query processing

❖ Distributed transaction management and recovery

Concepts of distributed database
❖Distributed database is a collection of interrelated data that is
distributed across multiple physical locations. These locations may be on
different computers, servers, or even in different geographical regions.

❖Logically integrated and appears as a single database to the user, even

though it is physically distributed across multiple sites.

❖ Database systems that run on each site are independent of each other.

❖Transactions may access data at one or more sites.

Centralized DBS
• Logically integrated, Physically centralized

Traditionally: one large mainframe DBMS + n “stupid” terminals

Distributed DBS
• Data logically integrated (i.e., access based on one schema).
• Data physically distributed among multiple database nodes.
• Processing is distributed among multiple database nodes.

Traditionally: m mainframes for the DBMSs + n terminals

Advantages of Distributed DBMS
❖Fault Tolerance
▪ The system continues to operate even if some nodes fail.
▪ Achieved through data replication and redundancy.
▪ Redundancy ensures availability during failures.
❖Scalability
▪ Distributed databases can handle a growing amount of data and traffic by
adding more nodes.
❖Improved Performance: Localized access to data reduces latency.
❖Geographical Distribution: Useful for global applications.
❖Transparency of distribution
❖Efficiency
Problems/Challenges of Distributed Databases
– Complexity of design, implementation and maintain.
– Data consistency: Maintaining data consistency across nodes is difficult.
– Failure recovery
– Latency: Network delays can impact performance.
– Security: More nodes increase the attack surface for potential breaches.
Problems/Challenges of Distributed Databases
❖Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transparency and co-ordination across the several sites.

❖Processing overhead − Even simple operations may require a large number of

communications and additional calculations to provide uniformity in data across the sites.

❖Data integrity − The need for updating data in multiple sites pose problems of data integrity.

❖Overheads for improper data distribution − Responsiveness of queries is largely dependent

upon proper data distribution. Improper data distribution often leads to very slow response
to user requests.
Clients with Centralized Server Architecture

❖ Objects stored and administered

on server
❖ Objects processed (accessed and
modified) on workstations
[sometimes on the server too]
Clients with Distributed Server Architecture
Types of Distributed Databases
❖Distributed databases can be classified into the following
categories:
a. Homogeneous Distributed Database
b. Heterogeneous Distributed Database
a. Homogeneous Distributed Databases
➢In a homogeneous distributed database:

❖All sites have identical software

▪ All participating nodes (databases) use the same database management

system (DBMS) and schema.

❖Are aware of each other and agree to cooperate in processing user requests.

❖Appears to user as a single system

❖Easier to manage and maintain.

❖Example: Multiple MySQL databases in different locations.

b. Heterogeneous distributed database
➢In heterogeneous distributed database:

❖Different sites may use different schemas and software(DBMS).

❖Example: One node using MySQL and another using PostgreSQL

❖Requires middleware or translation layers for communication.

• Difference in schema is a major problem for query processing.

• Difference in software is a major problem for transaction processing.

❖Sites may not be aware of each other and may provide only limited
facilities for cooperation in transaction processing.
Key Features of Distributed Database
A. Autonomy
• Determines extent to which individual nodes can operate
independently.
• Design autonomy: Independence of data model usage and
transaction management techniques among nodes
• Communication autonomy: Determines the extent to which each
node can decide on sharing information with other nodes
• Execution autonomy
Cont’d
B. Distributed Data Storage

• Assume relational data model

❖Replication: System maintains multiple copies of data, stored in different sites, for
faster retrieval and fault tolerance.

❖Fragmentation: Relation is partitioned into several fragments stored in distinct sites

❖ Replication and fragmentation can be combined

➢ Relation is partitioned into several fragments: system maintains several identical replicas of each
such fragment.
C. Data Replication
❖Copying data to multiple nodes/sites to improve availability and fault tolerance.

❖A relation or fragment of a relation is replicated if it is stored redundantly in two or more

sites.

❖Partial replication, only some fragments/relations are replicated on selected nodes.

Balances fault tolerance with storage and communication costs.

❖Full replication of a relation is the case where the relation is stored at all sites. Fully
redundant databases are those in which every site contains a copy of the entire database

❖Types of replication:

✓Synchronous Replication: Changes are propagated immediately to all replicas.

✓Asynchronous Replication: Changes are propagated with some delay.

Replication(cont.)
❖Advantages of Replication
✓Availability: failure of site containing relation r does not result in unavailability of r is
replicas exist.

✓Parallelism: queries on r may be processed by several nodes in parallel.

✓Reduced data transfer: relation r is available locally at each site containing a replica
of r.

✓Fault Tolerance: If one node fails, data is still available on other nodes.

✓Improved Performance: Queries can access the nearest replica, reducing latency.

✓Load Balancing: Distributes query load across replicas.

Replication(cont.)
❖Disadvantages of Replication
✓Increased cost of updates: each replica of relation r must be updated.

✓Increased complexity of concurrency control: concurrent updates to distinct

replicas may lead to inconsistent data unless special concurrency control
mechanisms are implemented.

✓Consistency Management: Ensure all replicas are updated during write operations.

✓Increased Overhead: Storage and communication costs increase with replication.

D. Data Fragmentation
❖Fragmentation is the process of dividing a database into smaller, more
manageable pieces (fragments). These fragments can be distributed
across nodes to improve performance and availability.

❖Types of Fragmentation:
• Horizontal Fragmentation

• Vertical Fragmentation
D. Data Fragmentation(cont’d)
Horizontal Fragmentation

❖Divides a table into subsets of rows based on a condition.

❖Each fragment contains a subset of rows, and the union of all fragments
reconstructs the original table.

❖Example:
• Employee_US = SELECT * FROM Employee WHERE Country = 'US';

• Employee_EU = SELECT * FROM Employee WHERE Country = 'EU';

D. Data Fragmentation(cont’d)
Vertical Fragmentation

❖Divides a table into subsets of columns.

❖Each fragment contains specific columns, and the join of all fragments
reconstructs the original table.

❖Example:
• Employee_Personal = SELECT ID, Name, Address FROM Employee;
• Employee_Job = SELECT ID, Salary, Department FROM Employee;
D. Data Fragmentation(cont’d)
Mixed (Hybrid) Fragmentation

❖Combines horizontal and vertical fragmentation.

❖Example:
First, apply horizontal fragmentation (split by country), then apply
vertical fragmentation (split by columns) to each fragment.
Horizontal Fragmentation of account Relation
Vertical Fragmentation of employee_info Relation
Benefits of Fragmentation:
• Improved Performance: Queries can access only the relevant fragments
instead of the entire dataset.

• Parallelism: Fragments can be processed independently on different

nodes.

• Localization: Fragments can be stored close to where they are most

frequently accessed.
E. Data Transparency
❖Data transparency: Degree to which system user may remain unaware of the
details of how and where the data items are stored in a distributed system

❖From the user's perspective, the distributed database should appear as a

single, unified system.

❖Types of transparency include:

✓Location Transparency: Users don’t need to know where data is located.
✓Replication Transparency: Users don’t need to know about data replication.
✓Fragmentation Transparency: Users don’t need to know how data is partitioned.
Distributed database design
❖Refers to the process of designing a database system where data is
distributed across multiple physical locations (nodes) connected via a
network.

❖The primary goal is to ensure that the database system is efficient, scalable,
reliable, and capable of providing high performance while addressing
challenges like data distribution, replication, and consistency.

❖Distributed database design involves strategically planning how data is

stored and accessed across multiple physical locations (nodes).
Key Considerations
1. Data Distribution
❖Ensure that data is appropriately distributed across multiple nodes to balance workload
and minimize communication costs. Fragmentation, Replication
2. Transparency
❖Provide a user experience where the distributed nature of the database is hidden:
✓ Location Transparency: Users don't need to know where the data resides.
✓ Replication Transparency: Users don't need to know if data is replicated across sites.
✓ Fragmentation Transparency: Users don't need to know if data is divided into fragments.

3. Data Locality
❖Placing data closer to the users or applications that frequently access it to minimize
network latency.
Key Considerations
4. Transaction Management: Ensuring data consistency and preventing conflicts
when multiple transactions access and modify data concurrently.

5. Fault Tolerance: Implementing mechanisms to handle node failures or network

disruptions, such as data replication and automatic failover.

6. Security: Implementing robust security measures to protect data from

unauthorized access and ensure data privacy.
Distributed Database Design Goal

1. Scalability: The system should be able to easily scale to accommodate increasing

data volumes and user demands.
2. Reliability and Availability: Ensure the system can withstand node or network
failures while providing uninterrupted service.
3. Performance Optimize query execution and minimize data transfer between nodes.
4. Consistency/Data integrity: Maintain data consistency across nodes, particularly in
systems with data replication.
5. Flexibility: The system should be adaptable to changing business requirements.
Steps involved in DDB design
1 Requirement Analysis
❖ Understand the application requirements, such as:
• Data access patterns (e.g., which data is accessed most frequently and by whom).

• Query types and expected workload (e.g., read-heavy or write-heavy).

• Performance requirements (e.g., response time, throughput).

• Reliability and fault tolerance needs.

• Scalability requirements for future growth.

❖ Identify the geographical locations of users and data sources to minimize

latency.
Steps involved in DDB design
2. Data Modeling
❖Create a high-level logical schema of the database using techniques like
Entity-Relationship (ER) modeling to represent data structure.

❖Ensure the schema is normalized to remove redundancies and

dependencies.

❖Define relationships between entities, constraints, and business rules.

Steps involved in DDB design
3. Data Distribution Design:
❖Data Fragmentation: Fragment data into smaller, manageable pieces based on
access patterns and performance requirements.

❖Replicate data: to improve availability and fault tolerance.

Steps involved in DDB design
4. Data Allocation
❖Decide where to place the fragments across the nodes in the distributed
system.
❖Place data on nodes based on access frequency and network topology.
❖Consider the following data allocation strategies:
• Centralized Allocation: All fragments/data are stored in a single node (not truly
distributed). Simple but lacks scalability and fault tolerance.
• Partitioned Allocation: Each fragment is stored on a single/different node. Reduces
storage overhead but may lead to high communication costs.
• Replicated Allocation: Fragments are replicated across multiple nodes for fault
tolerance and query performance. But increases storage requirements and
consistency management overhead.
• Hybrid Allocation: Some fragments are replicated, and others are partitioned based
on access patterns and application requirements.
Steps involved in DDB design
5. Transaction Processing Design:
•Implement concurrency control and failure recovery mechanisms.
6. Performance Evaluation:
•Test the distributed database system to ensure it meets performance
requirements.
7. Maintenance and Evolution:
•Continuously monitor and adjust the system as needed.
Key Challenges in Distributed Database Design
➢Data Distribution Complexity: Determining the optimal way to fragment, allocate, and
replicate data is challenging.
➢Consistency Management
➢Fault Tolerance: Handling node and network failures while ensuring data integrity and
availability.
➢Query Optimization: Optimizing queries over distributed data to minimize communication
and processing costs.
➢Concurrency Control: Managing concurrent access to distributed data to avoid conflicts
and ensure correctness.
Transaction Management in Distributed Databases
❖A transaction is a program including a collection of database operations,
executed as a logical unit of data processing. The operations performed in a
transaction include one or more of database operations like insert, delete,
update or retrieve data.
✓read_item() − reads data item from storage to main memory.

✓modify_item() − change value of item in the main memory.

✓write_item() − write the modified value from main memory to storage.

Cont’d
❖A transaction that spans across multiple database nodes (may access data at several sites).
❖Transactions may comply with the ACID properties even when the participating databases
are spread over a network.
❖Each site has a local transaction manager responsible for:
✓ Maintaining a log for recovery purposes
✓ Participating in coordinating the concurrent execution of the transactions executing at that site.

❖Each site has a transaction coordinator, which is responsible for:

✓ Starting the execution of transactions that originate at the site.
✓ Distributing subtransactions at appropriate sites for execution.
✓ Coordinating the termination of each transaction that originates at the site, which may result in the
transaction being committed at all sites or aborted at all sites.
Cont’d
❖Each sub-transaction is executed on a different database, but all sub-
transactions are part of the same logical transaction.

❖Transaction management in distributed databases is a complex but

essential aspect of ensuring data integrity and consistency.

❖By understanding the challenges and employing appropriate techniques,

developers can build reliable and scalable distributed applications.
Challenges in Distributed Transaction Management:
❖Atomicity: Ensuring that all operations within a transaction are either fully committed or
completely rolled back, even if some nodes fail.

❖Consistency: Maintaining the validity of data across all nodes after a transaction is completed.

❖Isolation: Preventing interference between concurrent transactions, ensuring that each transaction
sees a consistent view of the data.

❖Durability: Guaranteeing that once a transaction is committed, it will not be lost due to failures.

❖Concurrency Control: Coordinating access to shared data among multiple transactions to

prevent conflicts.
Key Techniques for Distributed Transaction Management:
Commit Protocols
❖Commit protocols are used to ensure atomicity across sites
✓ a transaction which executes at multiple sites must either be committed at all the
sites, or aborted at all the sites.

✓not acceptable to have a transaction committed at one site and aborted at another.

❖The two-phase commit (2PC) protocol is widely used

❖The three-phase commit (3PC) protocol is more complicated and more

expensive, but avoids some drawbacks of two-phase commit protocol.
This protocol is not used in practice.
Key Techniques for Distributed Transaction Management:

1. Two-Phase Commit (2PC):

❖A standard protocol for ensuring atomicity.

❖Involves two phases:

• Voting Phase: The coordinator asks all participants if they can commit.

• Commit/Rollback Phase: If all participants vote yes, the coordinator instructs them to
commit; otherwise, it instructs them to rollback.

❖Ensures atomicity but can be susceptible to blocking in case of failures.

Cont’d

2. Three-Phase Commit (3PC):

• An extension of 2PC that addresses some of its limitations.
• Introduces an intermediate precommit phase to reduce the risk of
blocking.
• More complex but can improve availability in certain failure scenarios.
Cont’d
3. Concurrency Control Mechanisms:
❖Locking: A common technique where nodes acquire locks on data before accessing
it, preventing other transactions from modifying the same data concurrently.

❖Timestamp Ordering: Assigns timestamps to transactions and ensures that

operations are executed in the order of their timestamps.

❖Optimistic Concurrency Control: Assumes that conflicts are rare and only checks
for conflicts at the end of a transaction. If conflicts occur, the transaction is aborted
and retried.
Cont’d
4. Distributed Deadlock Detection:
• Algorithms to detect and resolve deadlocks, where two or more
transactions are waiting for each other to release resources.
• Common approaches include centralized and distributed deadlock
detection algorithms.
Failures in a Distributed System
❖Types of Failure:

– Transaction failure

– Node failure

– Media failure

– Network failure
Distributed Query Processing
❖Distributed Query Processing refers to the process of executing a query in a distributed
database system where data is stored/spread across multiple sites or nodes, connected via a
network.

❖The goal of distributed query processing is to retrieve results efficiently while minimizing costs
such as communication overhead, data transfer, response time and maximize performance by
strategically distributing the processing workload among the available nodes.

❖For centralized systems, the primary criterion for measuring the cost of a particular strategy is
the number of disk accesses.

❖ In a distributed system, other issues must be taken into account, the cost of a data transmission
over the network and other.
Key Components of Distributed Query Processing

A. Query Decomposition: The input query (usually in SQL) is decomposed into smaller
subqueries or operations, which can be executed independently or in parallel on different
nodes.

B. Data Localization: Determines where the data required by the query resides. Subqueries
are directed to the appropriate nodes storing the relevant data.

C. Query Optimization: Focuses on reducing the cost of query execution. Choosing the most
efficient execution plan involves considering various factors, such as data locality, amount
of data transfer, communication costs, and available processing power at each node.

D. Query Execution: Executes subqueries on the distributed nodes. Collects and aggregates
the results from all the nodes to form the final result.
Query Optimization Techniques/Common Approaches
1. Join Optimization: Minimize the cost of joining tables across nodes by:
• Reducing data transfer.

• Using semi-joins to prune unnecessary data.

2. Data Reduction/ Heuristic Optimization Apply filters/selection and

projections at the local nodes where the data resides to reduce the amount of
data sent over the network to optimize query execution.

3. Parallelism: Leverage parallel processing by executing subqueries

simultaneously on multiple nodes.
Query Optimization Techniques/Common Approaches
❖Data Shipping: Moving data to the node where the query originates or to a central processing node.
This is suitable when the amount of data to be transferred is relatively small.

❖Ship copies of all three relations to site SI and choose a strategy for processing the entire locally at site
SI .

❖Ship a copy of the account relation to site S2 and compute temp1 = account ⋈ depositor at S2. Ship
temp1 from S2 to S3, and compute temp2 = temp1 ⋈ branch at S3. Ship the result temp2 to SI.

❖Must consider following factors:

• amount of data being shipped

• cost of transmitting a data block between sites

• relative processing speed at each site

Example of Distributed Query Processing
• Example of Distributed Query Processing

• Consider a distributed database with two tables:

• Table A is stored in Node 1.

• Table B is stored in Node 2.

• Query:
SELECT A.name, B.salary
FROM A, B
WHERE A.id = B.id AND B.salary > 50000;
Example of Distributed Query Processing
1.Decomposition:
Break the query into subqueries:
•Retrieve rows from B where salary > 50000 (processed locally on Node 2).
•Perform a join between A and the filtered rows of B.
2.Optimization:
•Push the salary > 50000 condition to Node 2 to reduce transferred data.
•Use a semi-join to minimize data movement during the join operation.
3.Execution:
•Node 2 sends only the filtered rows of B to Node 1.
•Node 1 performs the join and returns the final result.

Pembekalan MOS Excel Expert 2019 - Day 3
No ratings yet
Pembekalan MOS Excel Expert 2019 - Day 3
59 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Chat Bot Final
100% (1)
Chat Bot Final
48 pages
WBS For Exercise 4 Corporate Website Development
No ratings yet
WBS For Exercise 4 Corporate Website Development
6 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
DB Unit-2
No ratings yet
DB Unit-2
27 pages
DDB Unit 1-5
No ratings yet
DDB Unit 1-5
190 pages
Chapter-7 Distributed Database Systems
No ratings yet
Chapter-7 Distributed Database Systems
40 pages
Distributed DB
No ratings yet
Distributed DB
16 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Module 1
No ratings yet
Module 1
24 pages
Distributed Systems
No ratings yet
Distributed Systems
25 pages
Final
No ratings yet
Final
46 pages
Types of Distributed Data Base System - 49724
No ratings yet
Types of Distributed Data Base System - 49724
37 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Distributed Databases
No ratings yet
Distributed Databases
55 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Distributed Databases: CMP-3440 - Database Systems
No ratings yet
Distributed Databases: CMP-3440 - Database Systems
12 pages
Lecture 8 - Distributed Databases
No ratings yet
Lecture 8 - Distributed Databases
4 pages
Unit V NoSQL Databases
No ratings yet
Unit V NoSQL Databases
124 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
Ddis U1-3
No ratings yet
Ddis U1-3
40 pages
Distributed Multimedia & Database System
No ratings yet
Distributed Multimedia & Database System
58 pages
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
No ratings yet
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
26 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Unit 2-DBP
No ratings yet
Unit 2-DBP
44 pages
Unit 5
No ratings yet
Unit 5
17 pages
DDBMS (3,4 & 14)
No ratings yet
DDBMS (3,4 & 14)
11 pages
ADBMS Tutorial
No ratings yet
ADBMS Tutorial
6 pages
CSE 453 Slide 1
No ratings yet
CSE 453 Slide 1
46 pages
Advanced Data Base Management Systems
No ratings yet
Advanced Data Base Management Systems
35 pages
Unit - 2 (1) DBMS
No ratings yet
Unit - 2 (1) DBMS
25 pages
Unit V
No ratings yet
Unit V
22 pages
Distributed DBMS
No ratings yet
Distributed DBMS
62 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Distributed Databases: Not Just A Client/server System
No ratings yet
Distributed Databases: Not Just A Client/server System
43 pages
Distributed Databases AND Client-Server Architechures
No ratings yet
Distributed Databases AND Client-Server Architechures
73 pages
DISTRIBUTED DATABASES Presentation
No ratings yet
DISTRIBUTED DATABASES Presentation
13 pages
Distrubuted Database Concept
No ratings yet
Distrubuted Database Concept
22 pages
DDBS Unit 1
No ratings yet
DDBS Unit 1
11 pages
Basis For Distributed Database Technology
No ratings yet
Basis For Distributed Database Technology
35 pages
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
No ratings yet
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
24 pages
CS3492-DBMS Unit-5
No ratings yet
CS3492-DBMS Unit-5
9 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Analisis Dan Perancangan Sistem Informasi Simpan Pinjam Berbasis Android Pada Koperasi Bina Usaha Muda Jambi
No ratings yet
Analisis Dan Perancangan Sistem Informasi Simpan Pinjam Berbasis Android Pada Koperasi Bina Usaha Muda Jambi
11 pages
Music Player in C
No ratings yet
Music Player in C
10 pages
NetBackup105 DeviceConfig Guide
No ratings yet
NetBackup105 DeviceConfig Guide
79 pages
Hyperion Strategic Finance
No ratings yet
Hyperion Strategic Finance
9 pages
THIRD YEAR 2019 SCHEDULE New PDF
No ratings yet
THIRD YEAR 2019 SCHEDULE New PDF
7 pages
Predulive Innovations PVT LTD Drone Service Management System Proposal v1.0
No ratings yet
Predulive Innovations PVT LTD Drone Service Management System Proposal v1.0
17 pages
From Freestyle Jobs To Pipeline With Jobdsl PDF
No ratings yet
From Freestyle Jobs To Pipeline With Jobdsl PDF
52 pages
Immediate Download Computer Architecture, Sixth Edition: A Quantitative Approach John L. Hennessy Ebooks 2024
100% (8)
Immediate Download Computer Architecture, Sixth Edition: A Quantitative Approach John L. Hennessy Ebooks 2024
47 pages
Log
No ratings yet
Log
86 pages
Set of 15 Sample Papers With Solutions & Blueprint For Class 12 IP, 2024-25 Exam Edition
No ratings yet
Set of 15 Sample Papers With Solutions & Blueprint For Class 12 IP, 2024-25 Exam Edition
142 pages
Cloud Managed 2.5G 8-Port 240W Poe++ Multi-Gigabit Switch W/ 4 SFP+ Uplink Ports
No ratings yet
Cloud Managed 2.5G 8-Port 240W Poe++ Multi-Gigabit Switch W/ 4 SFP+ Uplink Ports
2 pages
Sop JPK Registering Student - P06
No ratings yet
Sop JPK Registering Student - P06
5 pages
Infinera Solution Power Consumptions For STP DWDM Subsea 20200318
No ratings yet
Infinera Solution Power Consumptions For STP DWDM Subsea 20200318
5 pages
Lab 5 Gauss Elimination
No ratings yet
Lab 5 Gauss Elimination
5 pages
Công Nghệ Blockchain UET
No ratings yet
Công Nghệ Blockchain UET
7 pages
Worktips 10 Asphalt Paving With Automated Level Control
No ratings yet
Worktips 10 Asphalt Paving With Automated Level Control
2 pages
PopulationProjection2011 2031
No ratings yet
PopulationProjection2011 2031
78 pages
Train Ticket Reservation
No ratings yet
Train Ticket Reservation
14 pages
Instron Ceast Model 9050 Brochure
No ratings yet
Instron Ceast Model 9050 Brochure
20 pages
Manual Do Rele Tvoc Arc Guard
No ratings yet
Manual Do Rele Tvoc Arc Guard
44 pages
Presented To: Ricardo Javier Pineda
No ratings yet
Presented To: Ricardo Javier Pineda
48 pages
MDPI
No ratings yet
MDPI
15 pages
SC-401 Microsoft Exam Practice Questions
No ratings yet
SC-401 Microsoft Exam Practice Questions
10 pages
Lab 4
No ratings yet
Lab 4
7 pages
VIDUSHI - GARG Resume
No ratings yet
VIDUSHI - GARG Resume
2 pages
Setting Up DKIM For On-Prem Exchange Server
No ratings yet
Setting Up DKIM For On-Prem Exchange Server
8 pages
Unit-4 Iot
No ratings yet
Unit-4 Iot
157 pages

Chapter 4 - Distributed Database System

Uploaded by

Chapter 4 - Distributed Database System

Uploaded by

Chapter 4: Distributed Database System

❖ Concepts of distributed database

❖ Distributed database design

❖ Distributed query processing

❖ Distributed transaction management and recovery

❖Logically integrated and appears as a single database to the user, even

❖Transactions may access data at one or more sites.

Traditionally: one large mainframe DBMS + n “stupid” terminals

Traditionally: m mainframes for the DBMSs + n terminals

❖Processing overhead − Even simple operations may require a large number of

❖Overheads for improper data distribution − Responsiveness of queries is largely dependent

❖ Objects stored and administered

❖All sites have identical software

▪ All participating nodes (databases) use the same database management

❖Appears to user as a single system

❖Easier to manage and maintain.

❖Example: Multiple MySQL databases in different locations.

❖Different sites may use different schemas and software(DBMS).

❖Example: One node using MySQL and another using PostgreSQL

❖Requires middleware or translation layers for communication.

• Difference in software is a major problem for transaction processing.

• Assume relational data model

❖Fragmentation: Relation is partitioned into several fragments stored in distinct sites

❖ Replication and fragmentation can be combined

❖A relation or fragment of a relation is replicated if it is stored redundantly in two or more

❖Partial replication, only some fragments/relations are replicated on selected nodes.

✓Synchronous Replication: Changes are propagated immediately to all replicas.

✓Asynchronous Replication: Changes are propagated with some delay.

✓Parallelism: queries on r may be processed by several nodes in parallel.

✓Load Balancing: Distributes query load across replicas.

✓Increased complexity of concurrency control: concurrent updates to distinct

✓Increased Overhead: Storage and communication costs increase with replication.

❖Divides a table into subsets of rows based on a condition.

• Employee_EU = SELECT * FROM Employee WHERE Country = 'EU';

❖Divides a table into subsets of columns.

❖Combines horizontal and vertical fragmentation.

• Parallelism: Fragments can be processed independently on different

• Localization: Fragments can be stored close to where they are most

❖From the user's perspective, the distributed database should appear as a

❖Types of transparency include:

❖Distributed database design involves strategically planning how data is

5. Fault Tolerance: Implementing mechanisms to handle node failures or network

6. Security: Implementing robust security measures to protect data from

1. Scalability: The system should be able to easily scale to accommodate increasing

• Query types and expected workload (e.g., read-heavy or write-heavy).

• Performance requirements (e.g., response time, throughput).

• Reliability and fault tolerance needs.

• Scalability requirements for future growth.

❖ Identify the geographical locations of users and data sources to minimize

❖Ensure the schema is normalized to remove redundancies and

❖Define relationships between entities, constraints, and business rules.

❖Replicate data: to improve availability and fault tolerance.

✓modify_item() − change value of item in the main memory.

✓write_item() − write the modified value from main memory to storage.

❖Each site has a transaction coordinator, which is responsible for:

❖Transaction management in distributed databases is a complex but

❖By understanding the challenges and employing appropriate techniques,

❖Concurrency Control: Coordinating access to shared data among multiple transactions to

❖The two-phase commit (2PC) protocol is widely used

❖The three-phase commit (3PC) protocol is more complicated and more

1. Two-Phase Commit (2PC):

❖Involves two phases:

❖Ensures atomicity but can be susceptible to blocking in case of failures.

2. Three-Phase Commit (3PC):

❖Timestamp Ordering: Assigns timestamps to transactions and ensures that

• Using semi-joins to prune unnecessary data.

2. Data Reduction/ Heuristic Optimization Apply filters/selection and

3. Parallelism: Leverage parallel processing by executing subqueries

❖Must consider following factors:

• cost of transmitting a data block between sites

• relative processing speed at each site

• Consider a distributed database with two tables:

• Table A is stored in Node 1.

• Table B is stored in Node 2.

You might also like