0% found this document useful (0 votes)
21 views32 pages

Distributed Database Systems Guide

The document provides an overview of Distributed Database Systems (DDBS), highlighting their integration of database and computer network technologies. It discusses the benefits of DDBS, such as scalability, fault tolerance, and improved performance, as well as architectural models like client/server and peer-to-peer. Additionally, it covers distributed query processing, design strategies, and fragmentation rules essential for effective DDBS implementation.

Uploaded by

katiavilma97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views32 pages

Distributed Database Systems Guide

The document provides an overview of Distributed Database Systems (DDBS), highlighting their integration of database and computer network technologies. It discusses the benefits of DDBS, such as scalability, fault tolerance, and improved performance, as well as architectural models like client/server and peer-to-peer. Additionally, it covers distributed query processing, design strategies, and fragmentation rules essential for effective DDBS implementation.

Uploaded by

katiavilma97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

‫مالحظة‪ :‬تجدون فصال كامال خاص بهذا العرض‬

‫في مطوية الدروس على منصة موودل‬

‫‪Distributed‬‬
‫‪Databases‬‬
‫‪RABAH MOKHTARI‬‬
Introduction
Distributed database system (DDBS) technology is the union of two
approaches to data processing: database system and computer network
technologies.
1- Database systems
Database systems have taken us from a paradigm of data processing in
which each application defined and maintained its own data to one in which
the data are defined and administered centrally.
2- Computer network technologies
The technology of computer networks, on the other hand, promotes a mode of
work that goes against all centralization efforts.

2
Distributed Data Processing
 Distributed data processing is a computing model in which data
processing is distributed across multiple computers or nodes in a
network.

 The processing can be done in parallel, allowing for faster and more
efficient processing of large amounts of data.

 Each node in the network has access to a subset of the data, and the
nodes work together to process the data and generate the desired
output.

3
Distributed Database system
 A distributed database system is a type of database system that is spread
across multiple computers geographically distributed.

 In a distributed database system, the data is partitioned or replicated


across multiple nodes, and the nodes work together to process queries and
transactions from clients.

 A DDBS is also not a system where, despite the existence of a network, the
database resides at only one node of the network.

4
Distributed Database system

5
DDBS benefits
 Scalability: Distributed database systems can scale horizontally by adding
more nodes to the network. This allows the system to handle large volumes
of data and high transaction rates.

 Fault tolerance: Distributed database systems can continue to operate


even if one or more nodes fail. Data can be replicated across multiple nodes,
so if one node fails, another node can take over without loss of data.

 Improved performance: By distributing the data and processing across


multiple nodes, distributed database systems can improve performance by
processing queries and transactions in parallel.

6
Distributed DBMS architecture
 The architecture of a system defines its structure.

This means that the components of the system are identified, the
function of each component is specified, and the interrelationships
and interactions among these components are defined.

 The specification of the architecture of a system requires


identification of the various modules, with their interfaces and
interrelationships, in terms of the data and control flow through the
system.

7
ANSI/SPARC Architecture
 ANSI/SPARC Architecture is an early milestone in the field of database
systems

 It was developed by the American National Standards Institute (ANSI) and


the Standards Planning and Requirements Committee (SPARC) in the 1970s,
when the field of database management was still in its early stages.

 It helped to establish many of the fundamental concepts and principles that


are still used today.

The ANSI/SPARC architecture defines three levels of abstraction for a


database system

8
ANSI/SPARC Architecture

9
ANSI/SPARC Architecture
 External level: It describes how data is viewed by different users and
groups, and how data is accessed and manipulated by applications. Each
external schema is tailored to meet the specific needs of a particular user or
application.

 Conceptual level: This is the level of the database system that describes
the overall logical structure of the database. The conceptual schema is
independent of any particular application or user, and is used to ensure that
all data in the database is consistent and integrated

 Internal level: This is the level of the database system that describes how
data is physically stored and accessed by the computer system. It defines the
storage structures and access methods used by the DBMS to manage the
data. 10
Architectural Models for
Distributed DBMSs
The ways in which a distributed DBMS can be architected can be classified in
terms of: the autonomy of local systems, their distribution, and their
heterogeneity.

11
Architectural Models for
Distributed DBMSs
Autonomy
Autonomy refers to the distribution of control, not of data. It indicates the
degree to which individual DBMSs can operate independently.
 The local operations of the individual DBMSs are not affected by their
participation in the distributed system.
 The manner in which the individual DBMSs process queries and optimize
them should not be affected by the execution of global queries that access
multiple databases.
 System consistency or operation should not be compromised when
individual DBMSs join or leave the distributed system.

12
Architectural Models for
Distributed DBMSs
Distribution
 Distribution refers to the distribution of data over multiple sites.

 There are two alternatives classes: client/server distribution and peer-to-


peer distribution (or full distribution).

Heterogeneity
 Heterogeneity refers to the presence of diversity or differences in a
distributed database environment in terms of data models, query languages,
and transaction management protocols.

13
Client/Server architecture
 Client/server DBMSs entered the computing scene at the beginning of
1990s and have made a significant impact on both the DBMS technology and
the way we do computing.

 the functions are divided into two classes: server functions and client
functions.

 This provides a two-level architecture which makes it easier to manage the


complexity of modern DBMSs and the complexity of distribution.

 We can cite many examples of DDBMS that use client/server architecture of


distributed database systems. One such example is Microsoft SQL Server,
Oracle Database, MySQL and PostgreSQL.

14
Client/Server architecture

15
Peer-To-Peer architecture
 After a decade of popularity of client/server computing, peer-to-peer have
made a comeback in the last few years as an alternative to distributed
DBMSs.

 Apache Casandra DBMS represent a good example of peer-to-peer


DDBMS and makes use of an entirely peer-to-peer architecture.

 All nodes in a Cassandra cluster can accept reads and writes

16
Distributed query processing
 Distributed query processing is the process of executing a database query
that involves data stored on multiple nodes or servers in a distributed
database system.
When a query is submitted, it must be broken down into smaller subqueries
that can be executed on different nodes in parallel.
 The results must be combined to form the final result set.
 Distributed query processing involves several steps, including query
optimization, query decomposition, data fragmentation and
distribution, data transfer, local processing, and result consolidation.

17
Distributed query processing
The goal of distributed query processing is to minimize the amount of data
that needs to be transferred between nodes and to maximize parallelism in
the execution of subqueries in order to improve query performance .

Query processing problem


 The main function of a relational query processor is to transform a high-
level query (typically, in relational calculus) into an equivalent lower-level
query (typically, in some variation of relational algebra).
 The low-level query actually implements the execution strategy for the
query and The transformation must achieve both correctness and
efficiency.

18
Distributed query processing
Query processing problem
 The main function of a relational query processor is to transform a high-
level query (typically, in relational calculus) into an equivalent lower-level
query (typically, in some variation of relational algebra).
 The low-level query actually implements the execution strategy for the
query and The transformation must achieve both correctness and
efficiency.
Since each equivalent execution strategy can lead to very different
consumptions of computer resources, the main difficulty is to select the
execution strategy that minimizes resource consumption.

19
Distributed query processing
Query processing problem (Example)

20
Distributed query processing
Query processing problem (Example)
following simple user query: “Find the names of employees who are managing a
project”.

The expression of the query in relational calculus using the SQL syntax is

21
Distributed query processing
Query processing problem (Example 1)
Two equivalent relational algebra queries that are correct transformations of the
query above are:

It is intuitively obvious that the second query, which avoids the Cartesian
product of EMP and ASG, consumes much less computing resources than the
first, and thus should be retained.

22
Distributed query processing
Query processing problem
 In a centralized context, query execution strategies can be well expressed in an
extension of relational algebra
 The main role of a centralized query processor is to choose, for a given query,
the best relational algebra query among all equivalent ones.
 In a distributed system, relational algebra is not enough to express execution
strategies. It must be supplemented with operators for exchanging data between
sites
 In addition to the relational algebra operators, the distributed query processor
must also select the best sites to process data, and possibly the way data should
be transformed.

23
Distributed query processing
Query processing problem (Example 2)
 We consider the following query

 We assume that relations EMP and ASG are horizontally fragmented as follows

24
Distributed query processing
Query processing problem (Example 2)
 Fragments ASG1, ASG2, EMP1, and EMP2 are stored at sites 1, 2, 3, and 4,
respectively and the result is expected at site 5.
 Two equivalent distributed execution strategies for the above query are possibles.

25
Distributed database design
In the design of a distributed DBMSs, the distribution of applications involves
two things
 The distribution of the distributed DBMS software, and
 The distribution of the application programs that run on it

Two major strategies that have been identified for designing distributed
databases
The top-down approach and the bottom-up approach

26
Distributed database design
Top-down approach

27
Distributed database design
Distribution design

28
Fragmentation alternatives
Vertical and horizontal fragmentation

29
Correctness Rules of
Fragmentation
Completeness

30
Correctness Rules of
Fragmentation
Reconstruction

31
Correctness Rules of
Fragmentation
Disjointness

32

Common questions

Powered by AI

Distributed database systems offer several key benefits over centralized systems, including scalability, fault tolerance, and improved performance. These systems can scale horizontally by adding more nodes, allowing them to handle large data volumes and high transaction rates effectively. They exhibit fault tolerance, as data is replicated across multiple nodes, enabling the system to continue operation despite node failures. Additionally, by distributing data and processing across nodes, distributed database systems can perform queries and transactions in parallel, which enhances overall performance .

Distributed query processing involves executing queries across multiple nodes, which presents challenges such as data fragmentation, distribution, and efficient data transfer. These challenges can be addressed by leveraging optimized query decomposition to break down large queries into manageable subqueries, maximizing data locality and minimizing data transfer. Additionally, the use of advanced algorithms for query optimization and parallel processing strategies can help address resource consumption and execution efficiency issues .

Client/server architecture has significantly impacted DBMS technology by streamlining the management of resources and services through a two-level division of client and server functions. This structure facilitated efficient data handling and processing by offloading intensive tasks to servers, allowing clients to remain lightweight. The architecture improved system manageability and adaptability, influencing the design of modern databases such as Oracle, MySQL, and PostgreSQL .

Autonomy in a distributed DBMS refers to the degree to which individual database management systems operate independently. It indicates that local operations are unaffected by their participation in the distributed environment. This autonomy is vital because it ensures that local DBMS operations, such as query processing and optimization, proceed without being influenced by global queries accessing multiple databases, thereby maintaining system consistency and performance even when individual DBMSs join or leave the distributed network .

The ANSI/SPARC model, although developed in the 1970s, significantly influenced modern database systems by establishing foundational concepts and principles used today. Its three-level abstraction—external, conceptual, and internal—has shaped database design by supporting data independence, optimizing data access methods, and accommodating diverse application requirements. This model's emphasis on structured data organization paved the way for developing sophisticated DBMS architectures that are both flexible and scalable .

Scalability is a key advantage of distributed database systems because it allows them to manage growth in data volume and transaction rates efficiently. This is typically achieved through horizontal scaling, where additional nodes are added to the network to distribute the workload, facilitating balanced data processing and storage across the system. This prevents bottlenecks and enables the system to handle increased demands with relative ease .

The ANSI/SPARC architecture describes three levels of abstraction for a database system. The external level defines how data is viewed and accessed by users and applications, with each external schema tailored to specific needs. The conceptual level depicts the overall logical structure of the database, ensuring data consistency and integration, while remaining independent of particular applications or users. The internal level details how data is stored physically and accessed by the system, defining storage structures and access methods used by the DBMS .

In peer-to-peer architecture, every node in the network is equal and can assume the role of client and server, enabling direct interactions without centralized control, which allows for greater scalability and resilience to single points of failure. Conversely, client/server architecture divides functions into server functions (which manage resources and provide services) and client functions (which request services), offering a structured and manageable system but potentially becoming a bottleneck if the server is overburdened .

Vertical fragmentation involves dividing a database table into smaller tables with subsets of columns, while horizontal fragmentation involves dividing tables into subsets of rows. Correctness rules of fragmentation, such as completeness, ensure that the original data can be reconstructed from fragments without loss or duplication, maintaining data integrity across distributed environments .

Fault tolerance enhances system reliability in distributed databases by allowing continuous operation despite node failures. This is supported through data replication across multiple nodes, ensuring that if one node fails, another can seamlessly take over without data loss. Redundant data storage and failover processes are key mechanisms that sustain database accessibility and integrity under adverse conditions, thereby providing robust reliability .

You might also like