0% found this document useful (0 votes)
2 views

Week 5 - Introduction to Distributed Databases

Uploaded by

lucbrouillard381
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Week 5 - Introduction to Distributed Databases

Uploaded by

lucbrouillard381
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected].

ke

TECHNICAL UNIVERSITY OF MOMBASA


CCI 4301: ADVANCED DATABASE MANAGEMENT SYSTEMS
WEEK 5: INTRODUCTION TO DISTRIBUTED DATABASES

OUTLINE

• Introduction to Distributed Databases

• Distributed Transactions

• Distributed Deadlock

1. Introduction to Distributed Databases:

Definition: Distributed Databases refer to a database system in which data is distributed

across multiple sites or nodes, often connected through a network. Each node can

independently process data and execute queries, while the entire system appears as a single,

logically integrated database to users and applications.

Explanation: Distributed databases are designed to address various challenges, such as data

scalability, availability, and fault tolerance. Here's a detailed breakdown:

• Data Distribution: Data is partitioned or replicated across different nodes in a

distributed database system. This distribution can be based on factors like data access

patterns, geographic locations, or other criteria.

• Network Connectivity: Nodes in a distributed database are connected through a

network, enabling them to communicate and share data. These networks can be
Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected]

local area networks (LANs), wide area networks (WANs), or even the internet.

• Data Independence: Distributed databases maintain data independence, allowing

each node to manage its portion of the data independently. This means that changes

to one part of the database do not necessarily affect other parts.

• Scalability: Distributed databases offer scalability by adding more nodes to the

system as data volume or user load increases. This helps handle growing data and

user demands.

• Fault Tolerance: Distributed databases provide fault tolerance by replicating data

across multiple nodes. If one node fails, data can still be accessed from other nodes,

ensuring system availability.

Examples and Scenarios:

1. Global Retail Chain: A global retail chain operates in multiple countries, each with

its own store database. These store databases are part of a distributed database

system. When a customer places an order online, the system seamlessly retrieves

inventory data from the nearest store's database, processes the order, and updates

the inventory. The distributed database ensures data consistency and availability

across all stores.

2. Social Media Platform: A social media platform stores user profiles, posts, and media

files in a distributed database. User data is partitioned based on geographic regions.

When a user logs in, the system retrieves data from the nearest data center,

providing low-latency access. User-generated content is replicated to multiple data

centers for fault tolerance.

3. Financial Services: Large financial institutions use distributed databases to manage


Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected]

customer accounts, transactions, and financial instruments. Different branches and

departments may have their databases, which are part of a distributed system. Cross-

branch transactions are coordinated to ensure data consistency.

Practical Application in the Field:

• Distributed databases find practical application in various fields, including e-

commerce, finance, telecommunications, healthcare, and more.

• Cloud computing providers like Amazon Web Services (AWS), Microsoft Azure, and

Google Cloud Platform offer distributed database services that enable organizations

to scale and manage data efficiently.

• Decentralized blockchain networks, such as Bitcoin and Ethereum, use distributed

databases to record and verify transactions across a global network of nodes.

• IoT (Internet of Things) applications generate massive amounts of data that can be

efficiently managed and analyzed using distributed databases, ensuring real-time

decision-making.

2. Distributed Transactions:

Definition: Distributed Transactions refer to a set of operations performed on a distributed

database system that are treated as a single, indivisible unit of work. These operations can

span multiple nodes, and the transaction must maintain ACID (Atomicity, Consistency,

Isolation, Durability) properties even in a distributed environment.

Explanation: Distributed transactions are challenging due to the need to coordinate

operations across multiple nodes while ensuring data consistency and integrity. Here's a

detailed breakdown:

• Atomicity: A distributed transaction must be atomic, meaning that it is treated as a


Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected]

single unit. All operations within the transaction must either succeed or fail as a

whole.

• Consistency: A distributed transaction must maintain consistency, ensuring that the

database is in a valid state before and after the transaction.

• Isolation: Isolation levels define how concurrent transactions interact. Distributed

systems often use isolation levels to control the visibility of changes made by one

transaction to other concurrently executing transactions.

• Durability: Once a distributed transaction is committed, its changes must be durable,

even in the face of system failures. Distributed systems employ various techniques

like replication and logging for durability.

Examples and Scenarios:

1. Online Banking: When a customer transfers money between accounts in an online

banking system, a distributed transaction ensures that the amount is deducted from

one account and added to another, even if these accounts are managed on different

servers or data centers.

2. Inventory Management: In a distributed inventory management system for a retail

chain, when a new product is added to the inventory, the transaction ensures that

the product details are updated in all relevant databases across various store

locations.

3. E-commerce Checkout: During the checkout process on an e-commerce platform, a

distributed transaction handles various tasks, including verifying product

availability, deducting the purchase amount, and updating order history records, all

while maintaining data consistency.


Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected]

Practical Application in the Field:

• Distributed transactions are widely used in banking and financial systems to ensure

the accuracy of fund transfers, account updates, and transaction records.

• They play a crucial role in e-commerce platforms, order processing systems, and

supply chain management, where data consistency and integrity are vital.

• Distributed databases and transactions are employed in cloud computing

environments to support scalable and fault-tolerant applications across multiple

data centers.

• In distributed systems like blockchain networks, every transaction is treated as a

distributed transaction, with nodes across the network reaching consensus on the

transaction's validity and order.

3. Distributed Deadlock:

Definition: Distributed Deadlock refers to a situation in a distributed computing


environment where multiple processes or transactions, running on different nodes or
systems, become deadlocked due to conflicting resource requests. A deadlock occurs when
each process or transaction is waiting for a resource held by another, resulting in a standstill
where none can proceed.
Explanation: Distributed deadlock is a challenging issue in distributed systems because it
involves multiple nodes or systems, each with its own set of resources and processes. Here's
a more detailed breakdown:
• Resource Allocation: In a distributed system, resources can include data objects, files,
database records, or any other shared resource that multiple processes or
transactions may need to access.
• Resource Locking: Processes typically request locks or access rights to these resources
Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected]

to ensure data consistency and integrity. Locks prevent multiple processes from
simultaneously modifying the same resource.
• Resource Conflicts: Distributed deadlock occurs when multiple processes or
transactions acquire some locks and then attempt to request additional locks that
are currently held by other processes. This results in a circular wait condition where
each process is waiting for a resource held by another.
• Deadlock Detection and Resolution: Detecting and resolving distributed deadlocks
require specialized algorithms and coordination mechanisms. These mechanisms
may involve a central deadlock detection process or a distributed approach where
nodes communicate to resolve deadlocks.
Examples and Scenarios:
1. Database Transactions: In a distributed database system, multiple transactions across
different nodes may access and update data items. If one transaction holds a lock
on a data item and needs to acquire a lock held by another transaction on a different
node, a distributed deadlock can occur.
2. Resource Sharing in a Cloud Cluster: In a cloud computing environment, multiple
virtual machines (VMs) or containers may share common resources such as CPU,
memory, or storage. If VM A requests access to a resource currently used by VM B,
and VM B simultaneously requests access to a resource held by VM C, a circular wait
among VMs can lead to a distributed deadlock.
3. Distributed File System: In a distributed file system, multiple clients may access
shared files. If two or more clients request exclusive access to files while holding
locks on other files, and these locks conflict with other clients' requests, a distributed
deadlock can occur.
Practical Application in the Field:
• Distributed deadlock resolution and prevention mechanisms are essential in
distributed systems, especially in databases, cloud computing, and distributed file
systems.
• Distributed deadlock detection algorithms, such as the resource allocation graph
algorithm, help identify deadlocks and initiate corrective actions.
Abdulrahman A. Mohamed Mobile: +254 713 500 814 Email: [email protected]

• Some distributed systems employ timeouts and resource preemption techniques to


break deadlocks and allow transactions to proceed.
• Load balancers and resource management systems in cloud computing
environments use distributed deadlock detection to ensure fair resource allocation
among multiple users or applications.
Addressing distributed deadlock scenarios is crucial for maintaining the reliability and
performance of distributed systems, as it ensures that resources are efficiently utilized, and
transactions can proceed without being indefinitely stuck in a deadlock state.

You might also like