0% found this document useful (0 votes)

10 views29 pages

Chapter - 7 Distributed Database System

Chapter Six discusses distributed databases and client-server architectures, focusing on concepts such as data fragmentation, replication, and allocation. It outlines the advantages of distributed databases, including increased reliability, improved performance, and easier scalability, as well as the types of distributed database systems. Additionally, it addresses query processing, concurrency control, and recovery challenges specific to distributed environments.

Uploaded by

eexit65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Chapter - 7 Distributed Database System

Uploaded by

eexit65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Chapter Six

Distributed Databases and

Client-Server Architectures

1
Outline
1. Distributed Database Concepts
2. Data Fragmentation, Replication and Allocation
3. Types of Distributed Database Systems
4. Query Processing
5. Concurrency Control and Recovery
6. 3-Tier Client-Server Architecture

2
Distributed Database Concepts
 A transaction can be executed by multiple
networked computers in a unified manner.
 A distributed database (DDB) processes Unit of
execution (a transaction) in a distributed manner.
 A distributed database (DDB) can be defined as :
– A distributed database (DDB) is a collection of
multiple logically related database distributed over a
computer network, and a distributed database
management system as a software system that
manages a distributed database while making the
distribution transparent to the user.

3
• Advantages
– Management of distributed data with different
levels of transparency:
• This refers to the physical placement of data (files,
relations, etc.) which is not known to the user
(distribution transparency).

4
– The EMPLOYEE, PROJECT, and WORKS_ON tables
may be fragmented horizontally and stored with
possible replication as shown below.

5
– Distribution and Network transparency:
• Users do not have to worry about operational details of the
network.
– There is Location transparency, which refers to freedom
of issuing command from any location without affecting
its working.
– Then there is Naming transparency, which allows access
to any names object (files, relations, etc.) from any
location.
– Replication transparency:
• It allows to store copies of a data at multiple sites as shown
in the above diagram.
• This is done to minimize access time to the required data.
– Fragmentation transparency:
• Allows to fragment a relation horizontally (create a subset of
tuples of a relation) or vertically (create a subset of columns
of a relation).
6
• Other Advantages
– Increased reliability and availability:
• Reliability refers to system live time, that is, system is
running efficiently most of the time. Availability is the
probability that the system is continuously available (usable
or accessible) during a time interval.
• A distributed database system has multiple nodes
(computers) and if one fails then others are available to do
the job.
– Improved performance:
• A distributed DBMS fragments the database to keep data
closer to where it is needed most.
• This reduces data management (access and modification)
time significantly.
– Easier expansion (scalability):
• Allows new nodes (computers) to be added anytime without
chaining the entire configuration.
7
Data Fragmentation, Replication and Allocation
• Data Fragmentation
– Split a relation into logically related and correct parts. A relation
can be fragmented in two ways:
• Horizontal Fragmentation
• Vertical Fragmentation
• Horizontal fragmentation
– It is a horizontal subset of a relation which contain those of
tuples which satisfy selection conditions.
– Consider the Employee relation with selection condition (DNO =
5). All tuples satisfy this condition will create a subset which
will be a horizontal fragment of Employee relation.
– A selection condition may be composed of several conditions
connected by AND or OR.
– Derived horizontal fragmentation: It is the partitioning of a
primary relation to other secondary relations which are related
with Foreign keys. 8
• Vertical fragmentation
– It is a subset of a relation which is created by a subset of columns. Thus a
vertical fragment of a relation will contain values of selected columns. There
is no selection condition used in vertical fragmentation.
– Consider the Employee relation. A vertical fragment of can be created by
keeping the values of Name, Bdate, Sex, and Address.
– Because there is no condition for creating a vertical fragment, each fragment
must include the primary key attribute of the parent relation Employee. In
this way all vertical fragments of a relation are connected.
• Representation
– Horizontal fragmentation
• Each horizontal fragment on a relation can be specified by a sCi (R)
operation in the relational algebra.
• Complete horizontal fragmentation
• A set of horizontal fragments whose conditions C1, C2, …, Cn include all
the tuples in R- that is, every tuple in R satisfies (C1 OR C2 OR … OR
Cn).
• Disjoint complete horizontal fragmentation: No tuple in R satisfies (Ci
AND Cj) where i ≠ j.
• To reconstruct R from horizontal fragments a UNION is applied.
9
– Vertical fragmentation
• A vertical fragment on a relation can be specified by a Li(R) operation
in the relational algebra.
• Complete vertical fragmentation
• A set of vertical fragments whose projection lists L1, L2, …, Ln include
all the attributes in R but share only the primary key of R. In this case
the projection lists satisfy the following two conditions:
• L1  L2  ...  Ln = ATTRS (R)
• Li  Lj = PK(R) for any i j, where ATTRS (R) is the set of attributes of
R and PK(R) is the primary key of R.
• To reconstruct R from complete vertical fragments a OUTER UNION is
applied.
– Mixed (Hybrid) fragmentation
• A combination of Vertical fragmentation and Horizontal fragmentation.
• This is achieved by SELECT-PROJECT operations which is represented
by Li(sCi (R)).
• If C = True (Select all tuples) and L ≠ ATTRS(R), we get a vertical
fragment, and if C ≠ True and L ≠ ATTRS(R), we get a mixed fragment.
• If C = True and L = ATTRS(R), then R can be considered a fragment.
10
• Fragmentation schema
– A definition of a set of fragments (horizontal or vertical or
horizontal and vertical) that includes all attributes and tuples in the
database that satisfies the condition that the whole database can be
reconstructed from the fragments by applying some sequence of
UNION (or OUTER JOIN) and UNION operations.
• Allocation schema
– It describes the distribution of fragments to sites of distributed
databases. It can be fully or partially replicated or can be
partitioned.
• Data Replication
– Database is replicated to all sites.
– In full replication the entire database is replicated and in partial
replication some selected part is replicated to some of the sites.
– Data replication is achieved through a replication schema.
• Data Distribution (Data Allocation)
– This is relevant only in the case of partial replication or partition.
– The selected portion of the database is distributed to the database
11
sites.
Types of Distributed Database Systems

• Homogeneous Window
Site 5 Unix
– All sites of the database
system have identical setup, Oracle Site 1
i.e., same database system Oracle
software. Window
– The underlying operating Site 4 Communications
system may be different. network
• For example, all sites run
Oracle or DB2, or Sybase or
some other database Oracle
system.
– The underlying operating Site 3 Site 2
systems can be a mixture of Linux Oracle Linux Oracle
Linux, Window, Unix, etc.

12
• Heterogeneous
– Federated: Each site may run different database system but the
data access is managed through a single conceptual schema.
• This implies that the degree of local autonomy is minimum.
Each site must adhere to a centralized access policy. There
may be a global schema.
– Multidatabase: There is no one conceptual global schema. For
data access a schema is constructed dynamically as needed by the
application software.
Object Unix Relational
Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux

13
• Federated Database Management Systems Issues
– Differences in data models:
• Relational, Objected oriented, hierarchical, network,
etc.
– Differences in constraints:
• Each site may have their own data accessing and
processing constraints.
– Differences in query language:
• Some site may use SQL, some may use SQL-89, some
may use SQL-92, and so on.

14
Query Processing in Distributed Databases
• Issues
– Cost of transferring data (files and results) over the network.
• This cost is usually high so some optimization is necessary.
• Example relations: Employee at site 1 and Department at Site
2
– Employee at site 1. 10,000 rows. Row size = 100 bytes.
Table size = 106 bytes.
– Department at Site 2. 100 rows. Row size = 35 bytes.
Table size = 3,500 bytes.
• Q: For each employee, retrieve employee name and
department name Where the employee works.
• Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)

Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno

Dname Dnumber Mgrssn Mgrstartdate 15

Query Processing in Distributed Databases

• Result
– The result of this query will have 10,000 tuples, assuming
that every employee is related to a department.
– Suppose each result tuple is 40 bytes long. The query is
submitted at site 3 and the result is sent to this site.
– Problem: Employee and Department relations are not
present at site 3.

16
• Strategies:
1. Transfer Employee and Department to site 3.
• Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.
• Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at site
1, and send the result to site 3.
• Total bytes transferred = 400,000 + 3500 = 403,500
bytes.
• Optimization criteria: minimizing data transfer.
– Preferred approach: strategy 3.

17
• Consider the query
– Q’: For each department, retrieve the department name and the name of
the department manager
• Relational Algebra expression:
– Fname,Lname,Dname (Employee Mgrssn = SSN Department)

• The result of this query will have 100 tuples, assuming that every department
has a manager, the execution strategies are:
1. Transfer Employee and Department to the result site and perform the join
at site 3.
• Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the result to
site 3. Query result size = 40 * 100 = 4000 bytes.
• Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1 and send the
result to site 3.
• Total transfer size = 4000 + 3500 = 7500 bytes.
– Preferred strategy: Choose strategy 3.

• 18
• Now suppose the result site is 2. Possible strategies :
1. Transfer Employee relation to site 2, execute the query and
present the result to the user at site 2.
• Total transfer size = 1,000,000 bytes for both queries Q
and Q’.
2. Transfer Department relation to site 1, execute join at site 1
and send the result back to site 2.
• Total transfer size for Q = 400,000 + 3500 = 403,500 bytes
and for Q’ = 4000 + 3500 = 7500 bytes.

19
• Semijoin:
– Objective is to reduce the number of tuples in a relation
before transferring it to another site.
• Example execution of Q or Q’:
1. Project the join attributes of Department at site 2, and
transfer them to site 1. For Q, 4 * 100 = 400 bytes are
transferred and for Q’, 9 * 100 = 900 bytes are transferred.
2. Join the transferred file with the Employee relation at site
1, and transfer the required attributes from the resulting
file to site 2. For Q, 34 * 10,000 = 340,000 bytes are
transferred and for Q’, 39 * 100 = 3900 bytes are
transferred.
3. Execute the query by joining the transferred file with
Department and present the result to the user at site 2.

20
Concurrency Control and Recovery
• Distributed Databases encounter a number of concurrency
control and recovery problems which are not present in
centralized databases. Some of them are listed below.

– Dealing with multiple copies of data items:

• The concurrency control must maintain global
consistency. Likewise the recovery mechanism must
recover all copies and maintain consistency after
recovery.
– Failure of individual sites:
• Database availability must not be affected due to the
failure of one or two sites and the recovery scheme must
recover them before they are available for use.

21
– Communication link failure:
• This failure may create network partition which would
affect database availability even though all database
sites may be running.
– Distributed commit:
• A transaction may be fragmented and they may be
executed by a number of sites. This require a two or
three-phase commit approach for transaction commit.
– Distributed deadlock:
• Since transactions are processed at multiple sites, two or
more sites may get involved in deadlock. This must be
resolved in a distributed manner.

22
• Distributed Concurrency control based on a distributed copy
of a data item
– Primary site technique: A single site is designated as a
primary site which serves as a coordinator for transaction
management.
Primary site
Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2

23
• Transaction management:
– Concurrency control and commit are managed by this site.
– In two phase locking, this site manages locking and
releasing data items. If all transactions follow two-phase
policy at all sites, then serializability is guaranteed.

– Advantages:
• An extension to the centralized two phase locking so
implementation and management is simple.
• Data items are locked only at one site but they can be
accessed at any site.
– Disadvantages:
• All transaction management activities go to primary site
which is likely to overload the site.
• If the primary site fails, the entire system is inaccessible.
– To aid recovery a backup site is designated which behaves as
a shadow of primary site. In case of primary site failure,
backup site can act as primary site. 24
• Primary Copy Technique:
– In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the
primary copy of the data item is locked.
• Advantages:
– Since primary copies are distributed at various sites, a
single site is not overloaded with locking and unlocking
requests.
• Disadvantages:
– Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.

25
• Recovery from a coordinator failure
– In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new
coordinator.
– Primary site approach with no backup site:
• Aborts and restarts all active transactions at all sites.
Elects a new coordinator and initiates transaction
processing.
– Primary site approach with backup site:
• Suspends all active transactions, designates the backup
site as the primary site and identifies a new back up site.
• Primary site receives all transaction management
information to resume processing.
– Primary and backup sites fail or no backup site:
• Use election process to select a new coordinator site.

26
• Concurrency control based on voting:
– There is no primary copy of coordinator.
– Send lock request to sites that have data item.
– If majority of sites grant lock then the requesting
transaction gets the data item.
– Locking information (grant or denied) is sent to all these
sites.
– To avoid unacceptably long wait, a time-out period is
defined. If the requesting transaction does not get any vote
information then the transaction is aborted.

27
Client-Server Database Architecture

• It consists of clients running client software, a set of servers

which provide all database functionalities and a reliable
communication infrastructure.

Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n

28
• Clients reach server for desired service, but server does reach
clients.
• The server software is responsible for local data management
at a site, much like centralized DBMS software.
• The client software is responsible for most of the distribution
function.
• The communication software manages communication among
clients and servers.
• The processing of a SQL queries goes as follows:
– Client parses a user query and decomposes it into a number
of independent sub-queries. Each subquery is sent to
appropriate site for execution.
– Each server processes its query and sends the result to the
client.
– The client combines the results of subqueries and produces
the final result. 29

FCP Sec Ops - FortiAnalyzer Analyst 7.4 - Study Guide
100% (3)
FCP Sec Ops - FortiAnalyzer Analyst 7.4 - Study Guide
216 pages
SQL Injections Seminar Report
100% (1)
SQL Injections Seminar Report
33 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
35 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
4.1 Lecture 4 Distributed Databases
No ratings yet
4.1 Lecture 4 Distributed Databases
42 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Data Communication Basics CH 7
No ratings yet
Data Communication Basics CH 7
27 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
41 pages
7 Distributed DB
No ratings yet
7 Distributed DB
38 pages
7-Distributed DB
No ratings yet
7-Distributed DB
37 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
41 pages
Dbms Unit V Notes 2 27
No ratings yet
Dbms Unit V Notes 2 27
26 pages
Unit I Distributed Databases
No ratings yet
Unit I Distributed Databases
15 pages
Distributed DB New
No ratings yet
Distributed DB New
44 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
Week 12 - Distributed Databases
No ratings yet
Week 12 - Distributed Databases
37 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
BIT - University of Colombo - Fundamentals of DB Systems
No ratings yet
BIT - University of Colombo - Fundamentals of DB Systems
41 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
42 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
DD Design
No ratings yet
DD Design
17 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Dbms Unit V
No ratings yet
Dbms Unit V
27 pages
Chapter 6
No ratings yet
Chapter 6
27 pages
Final
No ratings yet
Final
46 pages
Dbms Unit V Notes
No ratings yet
Dbms Unit V Notes
27 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Ddis U1-3
No ratings yet
Ddis U1-3
40 pages
DistributedDatabases 3
No ratings yet
DistributedDatabases 3
14 pages
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
No ratings yet
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
26 pages
DD Mid Answers
No ratings yet
DD Mid Answers
29 pages
Unit 1 DISTRIBUTED DATABASE
No ratings yet
Unit 1 DISTRIBUTED DATABASE
6 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
Dbmsunit5 Advancedtopics
No ratings yet
Dbmsunit5 Advancedtopics
23 pages
Dbms Unit 5
No ratings yet
Dbms Unit 5
27 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Distrubuted Database Concept
No ratings yet
Distrubuted Database Concept
22 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Adt Unit I
No ratings yet
Adt Unit I
18 pages
04 - Distributed DBMSs - Concepts and Design
No ratings yet
04 - Distributed DBMSs - Concepts and Design
72 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Unit 1
No ratings yet
Unit 1
28 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
123 pages
Week10DatabaseTerminology 38c594f2 f34d 431e 82f5 074ebff1acad 170579
No ratings yet
Week10DatabaseTerminology 38c594f2 f34d 431e 82f5 074ebff1acad 170579
30 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Mysql MCQ Class 12 Ip
No ratings yet
Mysql MCQ Class 12 Ip
4 pages
Unit-I & II DBMS
No ratings yet
Unit-I & II DBMS
176 pages
Top 50 SQL Question Answers
No ratings yet
Top 50 SQL Question Answers
11 pages
Cs Lab Manual Class Xii
No ratings yet
Cs Lab Manual Class Xii
63 pages
Dbms Manual
No ratings yet
Dbms Manual
22 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Subqueries & CTE
No ratings yet
Subqueries & CTE
11 pages
Backup of DB2
No ratings yet
Backup of DB2
1,297 pages
Shubham Pandit: - Delhi
No ratings yet
Shubham Pandit: - Delhi
3 pages
Query Optimization (Research Paper)
No ratings yet
Query Optimization (Research Paper)
10 pages
SQL Queries Practice
100% (1)
SQL Queries Practice
8 pages
SELECT Last - Name, Job - Id FROM Employees WHERE Job - Id (SELECT Job - Id FROM Employees WHERE Employee - Id 141)
No ratings yet
SELECT Last - Name, Job - Id FROM Employees WHERE Job - Id (SELECT Job - Id FROM Employees WHERE Employee - Id 141)
4 pages
Lab4 - DML3 - DML4
No ratings yet
Lab4 - DML3 - DML4
6 pages
SAP HANA Guide Book
No ratings yet
SAP HANA Guide Book
110 pages
DB2 Performance and Query Optimization
No ratings yet
DB2 Performance and Query Optimization
342 pages
Oracle91 Chapter 1 To 5
No ratings yet
Oracle91 Chapter 1 To 5
6 pages
Lec9 Lab CSC371 Database Systems
No ratings yet
Lec9 Lab CSC371 Database Systems
19 pages
DBMS Full Notes
No ratings yet
DBMS Full Notes
49 pages
SQL Server Architecture Explained
100% (1)
SQL Server Architecture Explained
22 pages
SQL - Notes
No ratings yet
SQL - Notes
3 pages
SQL Queries Sheet1
No ratings yet
SQL Queries Sheet1
6 pages
Join, Subquery & View
No ratings yet
Join, Subquery & View
28 pages
Advanced DB Questions and Answer
No ratings yet
Advanced DB Questions and Answer
66 pages
Revised DBMS All Modules
No ratings yet
Revised DBMS All Modules
75 pages
Unit-2 PPT SQL and PL SQL
No ratings yet
Unit-2 PPT SQL and PL SQL
26 pages
Security PsAdmin
No ratings yet
Security PsAdmin
28 pages
Data Engineering Bootcamp
No ratings yet
Data Engineering Bootcamp
5 pages
Fundamental of DB Lab Manual
No ratings yet
Fundamental of DB Lab Manual
48 pages

Chapter - 7 Distributed Database System

Uploaded by

Chapter - 7 Distributed Database System

Uploaded by

Chapter Six

Distributed Databases and

Dname Dnumber Mgrssn Mgrstartdate 15

– Dealing with multiple copies of data items:

Site 4 Communications neteork

• It consists of clients running client software, a set of servers

You might also like