0% found this document useful (0 votes)

43 views27 pages

Data Communication Basics CH 7

This document discusses distributed databases and client-server architectures. It covers distributed database concepts including fragmentation, replication and allocation of data across multiple sites. It describes types of distributed database systems as homogeneous or heterogeneous. It discusses query processing in distributed databases including issues around data transfer costs over networks and optimization techniques. Concurrency control and recovery challenges in distributed databases are also outlined. Finally, it introduces the 3-tier client-server architecture.

Uploaded by

Ukasha Mohammednur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views27 pages

Data Communication Basics CH 7

Uploaded by

Ukasha Mohammednur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Chapter six

Distributed Databases
and
Client-Server Architectures

1
Outline
1. Distributed Database Concepts
2. Data Fragmentation, Replication and Allocation
3. Types of Distributed Database Systems
4. Query Processing
5. Concurrency Control and Recovery
6. 3-Tier Client-Server Architecture

2
1. Distributed Database Concepts
 A transaction can be executed by multiple networked
computers in a unified manner.
 A distributed database (DDB) processes Unit of execution (a
transaction) in a distributed manner.
 A distributed database (DDB) can be defined as :
– A collection of multiple logically related database
distributed over a computer network, and a distributed
database management system as a software system that
manages a distributed database while making the
distribution transparent to the user.
– The physical placement of data (files, relations, etc.) which
is not known to the user (distribution transparency).

3
• The EMPLOYEE, PROJECT, and WORKS_ON tables
may be fragmented horizontally and stored with possible
replication as shown below.

Remark:
• Each site has a DBMS
– Fragments (replicated or unique).
– Linked by network.
– Can handle local users.
– Participates in at least one global
4
requests.
 Advantages of DDB :
i. Distribution and Network transparency:
 Users do not have to worry about operational details of the
network.
– There is Location transparency, which refers to
freedom of issuing command from any location without
affecting its working.
– Then there is Naming transparency, which allows
access to any names object (files, relations, etc.) from
any location.
ii. Replication transparency:
 It allows to store copies of a data at multiple sites as shown
in the above diagram.
 This is done to minimize access time to the required data.
iii. Fragmentation transparency:
• Allows to fragment a relation horizontally (create a subset
of rows of a relation) or vertically (create a subset of
columns of a relation).
5
iv. Increased reliability and availability:
 Reliability refers to system live time, that is, system is
running efficiently most of the time. Availability is the
probability that the system is continuously available (usable
or accessible) during a time interval.
 A distributed database system has multiple nodes
(computers) and if one fails then others are available to do
the job.
v. Improved performance:
 A distributed DBMS fragments the database to keep data
closer to where it is needed most.
 This reduces data management (access and modification)
time significantly.
vi. Easier expansion (scalability):
 Allows new nodes (computers) to be added anytime without
changing the entire configuration.
6
 Disadvantages of Distributed Database

i. Complexity- The data replication , failure recovery , network

management …make the system more complex than the central
DBMSs
ii. Cost- Since DDBMS needs more people and more hardware ,
maintaining and running the system can be more expensive than
the centralized system .
iii.Problem of connecting Dissimilar Machine- Additional layers
of operation system software are needed to translate and
coordinate the flow of data between machines.
iv.Data integrity and security problem - Because data maintained
by distributed systems can be accessed at locations in the
network, controlling the integrity of a database can be difficult.

7
2. Data Replication and Fragmentation: Distributed data storage
 There are two approaches to store the relation in the distributed
database : Replication and Fragmentation
I. Data Replication
 The system maintain several identical copies of the relation & store
each copy at a different site
 In general it enhance the performance of read operation and
increase the availability of data to read only transaction. However,
update transactions incur greater overhead
II. Data Fragmentation
– Split a relation into logically related and correct parts.
– The main reasons for fragmenting a relation are
•Efficiency- data that is not needed by the local applications is not
stored
•Parallelism – a transaction can be divided into several subqueries that
operate on fragments which will increase the degree of concurrency
– but reconstruction of the whole relation will require accessing data from all
sites containing part of the relation 8
• A relation can be fragmented in two ways:
 Horizontal fragmentation
• It is a horizontal subset of a relation which contain those
of rows which satisfy selection conditions.
• Consider the Employee relation with selection condition
(DNO = 5). All rows satisfy this condition will create a
subset which will be a horizontal fragment of Employee
relation.
• A selection condition may be composed of several
conditions connected by AND or OR.
 Vertical fragmentation
• It is a subset of a relation which is created by a subset of
columns. Thus a vertical fragment of a relation will
contain values of selected columns.

9
– Consider the Employee relation. A vertical fragment of can be
created by keeping the values of Name, Bdate, Sex, and Address.
– Because there is no condition for creating a vertical fragment,
each fragment must include the primary key attribute of the
parent relation Employee. In this way all vertical fragments of a
relation are connected.
 Representation
 There are three rules that must be followed during fragmentation
 Completeness – if a relation r is decomposed into fragments
r1, r2… rn , each data item that can be found in r must appear
in at least one fragment
 Reconstruction – it must be possible to define a relation
operation that will reconstruct the relation r from fragments
 Disjointness –if a data item di appears in fragment ri , then it
shouldn’t appear in any other fragment

10
3. Types of Distributed Database Systems

• Homogeneous Window
– All sites of the database Site 5 Unix
Oracle Site 1
system have identical Oracle
setup, i.e., same database Window
system software. Site 4 Communications
– The system may have network
little or no local
autonomy Oracle
– The underlying operating Site 3 Site 2
systems can be a mixture Linux Oracle Linux Oracle
of Linux, Window, Unix,
etc.

11
• Heterogeneous
– At least one of the database must be from different vendor : two variants
– Federated: Each site may run different database system but the data access
is managed through a single conceptual schema.
• This implies that the degree of local autonomy is minimum. Each site
must adhere to a centralized access policy. There may be a global
schema.
– Multidatabase: There is no one conceptual global schema. For data access
a schema is constructed dynamically as needed by the application software.

Object Unix Relational

Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
12
Linux Linux
4. Query Processing in Distributed Databases
 Issues
– Cost of transferring data (files and results) over the network.
• This cost is usually high, so some optimization is necessary.
• Example: suppose there are three sites. Where the relation Employee at site 1,
Department at Site 2 and no relation at site 3
– Employee at site 1. 10,000 rows. Row size = 100 bytes. Table size = 10 6
bytes.
– Department at Site 2. 100 rows. Row size = 35 bytes. Table size = 3,500
bytes.
– And a query is initiated from S3 to retrieve employees [First Name (15 byte long),
Last name (15 byte long) and Department name (10 byte long) total of 40 bytes]
• Q: For each employee, retrieve employee Fname, Lname, and department
name
• Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)

Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno

Dname Dnumber Mgrssn Mgrstartdate 13

 Assumption
– The result of this query will have 10,000 rows, assuming
that every employee is related to a department.
– Suppose each result row 40 bytes long. The query is
submitted at site 3 and the result is sent to this site.
– Problem: Employee and Department relations are not
present at site 3.

• what is your best strategy that can optimize data

transportation cost?

14
• Strategies : Minimizing data transfer.
1. Transfer Employee and Department to site 3.
• Total transfer bytes = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send
the result to site 3.
• Transferring employees data from site 1 to site 2: 1,000,000 bytes
• Query result size = 40 * 10,000 = 400,000 bytes.
• Total transfer size = 1,000,000 + 400,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join at site
1, and send the result to site 3.
• Data Transfer from site 2 to site 1: 3500 bytes
• Query result size = 40 * 10,000 = 400,000 bytes
• Total bytes transferred = 3500+ 400,000 = 403,500
bytes.
– Preferred approach: strategy 3.
15
Example 2 : Consider the query
– Q’: For each department, retrieve the department name ,Fname and LName of
the department manager
• Relational Algebra expression:
–  Fname,Lname,Dname (Employee Department)
Mgrssn = SSN
• The result of this query will have 100 tuples, assuming that every department has
a manager, the execution strategies are:
1. Transfer Employee and Department to the result site and perform the join at
site 3.
• Total bytes transferred = 1,000,000 + 3500 = 1,003,500 bytes.
2. Transfer Employee to site 2, execute join at site 2 and send the result to site 3.
• Site 1-- Site 2: 1,000,000
• Site2-- site3: Query result size = 40 * 100 = 4000 bytes.
• Total transfer size = 4000 +1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at site 1 and send the result
to site 3.
• Total transfer size = 4000 + 3500 = 7500 bytes.
Preferred strategy: Choose strategy 3.

16
Example 3: Now suppose the result is needed at site2. Possible
strategies :
1. Transfer Employee relation to site 2, execute the query and
present the result to the user at site 2.
• Total transfer size = 1,000,000 bytes for both queries
Q and Q’.
2. Transfer Department relation to site 1, execute join at site 1
and send the result back to site 2.
• Total transfer size for
– Q = 400,000 + 3500 = 403,500 bytes
– Q’ = 4000 + 3500 = 7500 bytes.

 Preferred strategy: Choose strategy 2.

17
5. Concurrency Control and Recovery
 Distributed Databases encounter a number of concurrency
control and recovery problems which are not present in
centralized databases. Some of them are listed below.

– Dealing with multiple copies of data items:

The concurrency control must maintain global
consistency. Likewise the recovery mechanism must
recover all copies and maintain consistency after
recovery.
– Failure of individual sites:
• Database availability must not be affected due to the
failure of one or two sites and the recovery scheme must
recover them before they are available for use.

18
– Communication link failure:
• This failure may create network partition which would
affect database availability even though all database
sites may be running.
– Distributed commit:
• A transaction may be fragmented and they may be
executed by a number of sites. This require a two or
three-phase commit approach for transaction commit.
– Distributed deadlock:
• Since transactions are processed at multiple sites, two or
more sites may get involved in deadlock. This must be
resolved in a distributed manner.

19
5. 1 Distributed Concurrency control
i. Primary site technique: A single site is designated as a
primary site which serves as a coordinator for transaction
management.

Primary site
Site 5
Site 1

Site 4 Communications neteork

Site 3 Site 2

20
• Transaction management:
– Concurrency control and commit are managed by this site.
– In two phase locking, this site manages locking and
releasing data items. If all transactions follow two-phase
policy at all sites, then serializability is guaranteed.

– Advantages:
• An extension to the centralized two phase locking so
implementation and management is simple.
• Data items are locked only at one site but they can be
accessed at any site.
– Disadvantages:
• All transaction management activities go to primary site
which is likely to overload the site.
• If the primary site fails, the entire system is inaccessible.
– To aid recovery a backup site is designated which behaves as
a shadow of primary site. In case of primary site failure,
backup site can act as primary site. 21
ii. Primary Copy Technique:
– In this approach, instead of a site, a data item partition is
designated as primary copy. To lock a data item just the
primary copy of the data item is locked.
• Advantages:
– Since primary copies are distributed at various sites, a
single site is not overloaded with locking and unlocking
requests.
• Disadvantages:
– Identification of a primary copy is complex. A distributed
directory must be maintained, possibly at all sites.

22
Recovery from a coordinator failure
• In both approaches a coordinator site or copy may become
unavailable. This will require the selection of a new
coordinator.
– Primary site approach with no backup site:
• Aborts and restarts all active transactions at all sites.
Elects a new coordinator and initiates transaction
processing.
– Primary site approach with backup site:
• Suspends all active transactions, designates the backup
site as the primary site and identifies a new back up site.
• Primary site receives all transaction management
information to resume processing.
– Primary and backup sites fail or no backup site:
• Use election process to select a new coordinator site.

23
iii. Concurrency control based on voting:
– There is no primary copy of coordinator.
– Send lock request to sites that have data item.
– If majority of sites grant lock then the requesting transaction
gets the data item.
– Locking information (grant or denied) is sent to all these
sites.
– To avoid unacceptably long wait, a time-out period is defined.
If the requesting transaction does not get any vote
information then the transaction is aborted.

24
Client-Server Database Architecture

• It consists of clients running client software, a set of servers

which provide all database functionalities and a reliable
communication infrastructure.

Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n

25
three-tier client/server architecture.

Many Web applications use an architecture called the three-tier

architecture, which adds an intermediate layer between the client
and the database server. This intermediate layer called the Web
server. This server plays an intermediary role by storing
business rules(constraints) that are used to access data from the
database server.
It can also improve database security by checking a client's credentials
before forwarding a request to the database server. The intermediate
server accepts requests from the client, processes the request and sends
database commands to the database server, and then acts as a conduit
for passing (partially) processed data from the database server to the
clients

26
• Clients reach server for desired service, but server does reach
clients.
• The server software is responsible for local data management
at a site, much like centralized DBMS software.
• The client software is responsible for most of the distribution
function.
• The communication software manages communication among
clients and servers.
• The processing of a SQL queries goes as follows:
– Client parses a user query and decomposes it into a number
of independent sub-queries. Each subquery is sent to
appropriate site for execution.
– Each server processes its query and sends the result to the
client.
– The client combines the results of subqueries and produces
the final result. 27

Distributed Database Concepts
No ratings yet
Distributed Database Concepts
52 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
27 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
35 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
29 pages
Chapter 4 Distributed Databases
No ratings yet
Chapter 4 Distributed Databases
36 pages
7-Distributed DB
No ratings yet
7-Distributed DB
37 pages
7 Distributed DB
No ratings yet
7 Distributed DB
38 pages
DDB Slides
No ratings yet
DDB Slides
30 pages
4.1 Lecture 4 Distributed Databases
No ratings yet
4.1 Lecture 4 Distributed Databases
42 pages
Lecture 2 Distriburted Databases
No ratings yet
Lecture 2 Distriburted Databases
45 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
DBMS-Unit 5
No ratings yet
DBMS-Unit 5
27 pages
Distributed Database
100% (1)
Distributed Database
24 pages
Chapter 5 - Distributed Databases Roobera
No ratings yet
Chapter 5 - Distributed Databases Roobera
58 pages
Advanced Database Chapter 6 and 7
No ratings yet
Advanced Database Chapter 6 and 7
30 pages
Final
No ratings yet
Final
46 pages
Distributed Database Frank Chinembiri and Florence-2
No ratings yet
Distributed Database Frank Chinembiri and Florence-2
42 pages
Distributed DBM S
No ratings yet
Distributed DBM S
67 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
A Distributed Database Management System ('DDBMS') Is A Software System
No ratings yet
A Distributed Database Management System ('DDBMS') Is A Software System
5 pages
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
No ratings yet
Distributed Databases: Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
26 pages
ch6 Distributed Database
No ratings yet
ch6 Distributed Database
25 pages
Unit I Distributed Databases
No ratings yet
Unit I Distributed Databases
15 pages
DDB Unit 1-5
No ratings yet
DDB Unit 1-5
190 pages
Week 12 - Distributed Databases
No ratings yet
Week 12 - Distributed Databases
37 pages
DDB Slides
No ratings yet
DDB Slides
67 pages
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
No ratings yet
Distributed Data Management: Distributed Systems Department of Computer Science UC Irvine
67 pages
Database MC A
No ratings yet
Database MC A
16 pages
Adb CH 4
No ratings yet
Adb CH 4
14 pages
Distributed Databases: Benefits and Issues To Be Considered
No ratings yet
Distributed Databases: Benefits and Issues To Be Considered
25 pages
Dbms Unit V Notes 2 27
No ratings yet
Dbms Unit V Notes 2 27
26 pages
Chapter-7 Distributed Database Systems
No ratings yet
Chapter-7 Distributed Database Systems
40 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
Unit-2 - Distributed Database System
No ratings yet
Unit-2 - Distributed Database System
7 pages
Ddis U1-3
No ratings yet
Ddis U1-3
40 pages
Unit V
No ratings yet
Unit V
22 pages
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
No ratings yet
Q # 1: What Are The Components of Distributed Database System? Explain With The Help of A Diagram. Answer
12 pages
DistributedDatabases 3
No ratings yet
DistributedDatabases 3
14 pages
Tybca Recent Trends in It Chpter 1
No ratings yet
Tybca Recent Trends in It Chpter 1
16 pages
Distributed Databases
No ratings yet
Distributed Databases
46 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
ADBS Chapter Seven
No ratings yet
ADBS Chapter Seven
22 pages
Distributed DB New
No ratings yet
Distributed DB New
44 pages
DDBS Unit 1
No ratings yet
DDBS Unit 1
11 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
33 pages
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
No ratings yet
Unit-V Distributed and Client Server Databases: A Lalitha Associate Professor Avinash Degree College
24 pages
Topic 7 DDBMS
No ratings yet
Topic 7 DDBMS
28 pages
Unit I (Distributed Databases)
No ratings yet
Unit I (Distributed Databases)
8 pages
Midterm Elective Database Notes
No ratings yet
Midterm Elective Database Notes
14 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
Dbms Unit V
No ratings yet
Dbms Unit V
27 pages
10 Distributeddbms
No ratings yet
10 Distributeddbms
56 pages
Chapter 6
No ratings yet
Chapter 6
45 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
41 pages
DISTRIBUTED DATABASES Presentation
No ratings yet
DISTRIBUTED DATABASES Presentation
13 pages
Unit - I Distributed Data Processing
100% (2)
Unit - I Distributed Data Processing
27 pages
Answer:: The Different Components of DDBMS Are As Follows
No ratings yet
Answer:: The Different Components of DDBMS Are As Follows
9 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
New Freebitcoin Scripttxt
No ratings yet
New Freebitcoin Scripttxt
1 page
Unit 3: Distributed File System
No ratings yet
Unit 3: Distributed File System
12 pages
How ACID Principle Works
No ratings yet
How ACID Principle Works
2 pages
DDB Assignment 2
No ratings yet
DDB Assignment 2
6 pages
Hindidbms 1
No ratings yet
Hindidbms 1
22 pages
Blockchain Overview: Unit 01 Part 2
No ratings yet
Blockchain Overview: Unit 01 Part 2
41 pages
Process Management - Synchronization
No ratings yet
Process Management - Synchronization
34 pages
Unit 1
No ratings yet
Unit 1
12 pages
Trends in Distributed Systems
No ratings yet
Trends in Distributed Systems
10 pages
Chapter 3 Transaction Processing Concepts
No ratings yet
Chapter 3 Transaction Processing Concepts
40 pages
Voucher 16-7-2024
No ratings yet
Voucher 16-7-2024
15 pages
AWS Essentials
No ratings yet
AWS Essentials
6 pages
Assessment2 CSC3121 PDF
No ratings yet
Assessment2 CSC3121 PDF
7 pages
The CORBA Architecture: Service Description
No ratings yet
The CORBA Architecture: Service Description
7 pages
OSY Board Questions With Answers
No ratings yet
OSY Board Questions With Answers
51 pages
AIML 2022 23 PEOs PSOs III and IV Sem Syllabus
No ratings yet
AIML 2022 23 PEOs PSOs III and IV Sem Syllabus
36 pages
10 Distributed Systems
No ratings yet
10 Distributed Systems
66 pages
Ese - Ty B Tech - Sem 5 - Computer Engg - Distributed Computing - Dec 2023
No ratings yet
Ese - Ty B Tech - Sem 5 - Computer Engg - Distributed Computing - Dec 2023
2 pages
Acid
No ratings yet
Acid
3 pages
Csc4306 Net-Centric Computing
100% (1)
Csc4306 Net-Centric Computing
5 pages
Kuber Net Es
No ratings yet
Kuber Net Es
6 pages
Atomic Commit Protocol
No ratings yet
Atomic Commit Protocol
14 pages
Oracle RAC: Overview of Real Application Clustering Features and Functionality
No ratings yet
Oracle RAC: Overview of Real Application Clustering Features and Functionality
32 pages
BCT Techknowledge
No ratings yet
BCT Techknowledge
193 pages
6 Replication Nhom3
No ratings yet
6 Replication Nhom3
44 pages
CS 134: Operating Systems: Multiprocessing
No ratings yet
CS 134: Operating Systems: Multiprocessing
16 pages
Block Chain Technology: Unit - 1
No ratings yet
Block Chain Technology: Unit - 1
42 pages
Hadoop & Spark
No ratings yet
Hadoop & Spark
40 pages
CS3551 Unit 1
No ratings yet
CS3551 Unit 1
24 pages

Data Communication Basics CH 7

Uploaded by

Data Communication Basics CH 7

Uploaded by

Chapter six

i. Complexity- The data replication , failure recovery , network

Object Unix Relational

Dname Dnumber Mgrssn Mgrstartdate 13

• what is your best strategy that can optimize data

 Preferred strategy: Choose strategy 2.

– Dealing with multiple copies of data items:

Site 4 Communications neteork

• It consists of clients running client software, a set of servers

Many Web applications use an architecture called the three-tier

You might also like