NATIONAL ENGINEERING SCHOOL OF TUNIS
Distributed
Systems
Level : 3rd year Software Engineering
Instructor : Dr. Wafa MEFTEH
2
Plan
1 - Key Elements and Architectures
2 - Distributed Databases
3 – Agent based Modeling, Simulation and Programing
4 - Distributed Artificial Intelligence
Wafa MEFTEH - ENIT 25/11/2024
Distributed
Systems Distributed
Part 2 Databases
Wafa MEFTEH - ENIT 11/25/2024 3
4
3 - Processing & Optimisation
of Distributed Queries
NOVEMBER 25, 2024
Dr. Wafa MEFTEH - ENIT
Wafa MEFTEH
5
Challenges Wafa MEFTEH
The execution rules and query optimization methods defined for a
centralized context are still valid, but we must consider:
• Fragmentation and distribution of data at different sites.
• The problem of the cost of inter-site communications to transfer data.
The problem of fragmentation with or without replication mainly
concerns updates, while the problem of communication costs mainly
concerns ordinary queries.
November
25, 2024
6
Updates Wafa MEFTEH
An update in a global schema relation results in several updates in
different fragments.
1. The first step is to identify the fragments affected by the update
operation.
1. Then decompose the operation accordingly into a set of update
operations on these fragments.
November
25, 2024
7
Updates Wafa MEFTEH
Identify the appropriate horizontal fragment based on the
Insert conditions defining each fragment and subsequently insert the
tuple into all corresponding vertical fragments to maintain data
integrity and alignment.
Search for the tuple within the fragments most likely to contain it
Delete and remove the corresponding attribute values across all
associated vertical fragments to ensure consistency.
Identify the relevant tuples, apply the necessary modifications,
Update and relocate them to the appropriate fragments as required to
maintain data accuracy and alignment.
November
25, 2024
8
Updates Examples Wafa MEFTEH
The horizontal fragment concerned
can be found with the CCs (in this
case, it is CC3).
Next, insert the tuple into all the
vertical fragments.
November
25, 2024
9
Updates Examples Wafa MEFTEH
We use the CC
conditions: here CC3 and
CC4 are concerned.
So, we will search in the
corresponding fragments.
November
25, 2024
10
Updates Examples Wafa MEFTEH
No CC is involved.
All fragments must be searched.
November
25, 2024
11
Updates Examples Wafa MEFTEH
CC3, CC4 and CC5 are affected.
We modify and verify that the CCs are always checked.
• Since numeq=1, it must be removed from F3 and F5.
• The tuple is then moved into fragment 4
November
25, 2024
12
Query on DDB Wafa MEFTEH
In a distributed environment, queries formulated at a global level are
broken down into sub-queries.
These subqueries are addressed to the systems available at the local
sites where they are executed.
Local responses are then grouped together to develop the response to
the global query.
November
25, 2024
13
Query on DDB Wafa MEFTEH
It is this process that we will describe when considering global queries
initially formulated in SQL.
They are rewritten in algebraic form to be reduced and optimized.
The fragmentation scheme allows to determine the local addressed
queries.
November
25, 2024
14
Fragmentation of Dist-Queries Wafa MEFTEH
1. Construction of the overall execution plan - Put the query as an algebraic
tree
❖ transition = relationship
❖ node = relational operation
2. Expression of the plan according to the fragments - Replace each sheet
with a global relationship reconstruction program.
3. Plan Transformation - Apply reduction techniques to eliminate unnecessary
operations.
November
25, 2024
15
Fragmentation of Dist-Queries Wafa MEFTEH
Examples
Considering:
Client(nclient, nom, ville)
Cde(ncde, #nclient, produit, qte)
Fragmentation Schema
November
25, 2024
16
Fragmentation of Dist-Queries Wafa MEFTEH
Examples
SELECT nom FROM Client;
The algebraic tree of
the query →
November
25, 2024
17
Fragmentation of Dist-Queries Wafa MEFTEH
Examples
Reduction of horizontal fragmentation
Rule: eliminate access to unnecessary fragments
SELECT nom FROM Client WHERE ville = ‘Paris’;
November
25, 2024
18
Fragmentation of Dist-Queries Wafa MEFTEH
Examples
Reduction of vertical fragmentation
Rule: eliminate access to basic relations that do not have attributes useful for the result.
SELECT nclient FROM Cde;
November
25, 2024
19
Fragmentation of Dist-Queries Wafa MEFTEH
Examples
Reduction of Derived-H fragmentation
Rule: distribute joints relative to unions and apply reductions for horizontal fragmentation.
SELECT * FROM Client, Cde WHERE Client.nclient = Cde.nclient AND Ville = ‘Paris’;
November
25, 2024
20
Fragmentation of Dist-Queries Wafa MEFTEH
Example
November
25, 2024
21
Execution Plan Wafa MEFTEH
In centralized DB, it is the sequence of algebraic operators for the
calculation of a query.
In distributed DB, it is the sequence of algebraic operators and inter-
site data exchanges for the calculation of a query.
November
25, 2024
22
Execution Plan Wafa MEFTEH
SELECT nom FROM Client, Cde
WHERE Client.nclient = Cde.nclient AND qte > 10 ;
Two execution plans are possible
algebraically
optimal
November
25, 2024
23
Execution Plan Wafa MEFTEH
Rule-based optimization
• Principle: Make the least expensive operators (projection,
selection) first, to reduce the size of input data for the most
expensive operators (join).
• Methodology: Lower the selections, then the projections to the
maximum.
November
25, 2024
24
Execution Plan Wafa MEFTEH
Example
November
25, 2024
25
Execution Plan Wafa MEFTEH
Example
November
25, 2024
26
Execution Plan Wafa MEFTEH
Example
Suppose that:
• size(Cde1) = size(Cde2) = 10 000 n-uplets
• size(Client1) = size(Client2) = 2 000 n-uplets
• Transfer cost of 1 n-uplets = 1
• Selectivity (qty > 10) = 1%
November
25, 2024
27
Execution Plan Wafa MEFTEH
Example
Cost comparison of the two solutions
Solution 1:
1. Transfer Cde1 + Cde2 = 20 000 n-uplets
2. Transfer Client1 + Client2 = 4 000 n-uplets
Solution 2:
1. Transfer C1 + C2 = 200 n-uplets
2. Transfer C3 + C4 = 200 n-uplets
November
25, 2024
28
Complexity of Distributed Queries Wafa MEFTEH
In a centralized database, only the I/O and CPU factors determine the complexity
of a query.
The complexity of a query in a DDB is defined by:
• Input/output on disks: the cost of data access.
• CPU cost: this is the cost of data processing to perform algebraic operations
(joins, selections, etc.).
• Communication on the network: this is the time needed to exchange a volume
of data between sites involved in the execution of a query.
November
25, 2024
29
Complexity of Distributed Queries Wafa MEFTEH
Note that we distinguish between the total cost and the overall response time of a
query:
• Total cost: this is the sum of all the time required to complete a query. In this
cost, the execution times on the different sites, the data accesses and the
communication times between the different sites that come into play.
• Global response time: this is the execution time of a query. Because some
operations can be performed in parallel at multiple sites, the overall
response time is generally less than the total cost.
November
25, 2024
30
Data Transfer Wafa MEFTEH
The transmission time of a message considers the access time and the
transmission time (data volume/ transmission rate).
The access time is negligible on a local network but can reach a few
seconds for transmissions over long distances or via satellite.
Under these conditions, a complete data processing is necessary.
The inter-site transfer unit is a relationship or fragment.
November
25, 2024
31
Thanks,
See You Next Session
NchaALLAH