0% found this document useful (0 votes)
3 views

Ch6-Introduction to Distributed Database (2)

Chapter 6 introduces distributed databases, defining them as collections of logically related databases distributed over a network, managed by a distributed database management system. It discusses the advantages of distributed databases, including transparency, reliability, performance, and scalability, as well as types of systems (homogeneous and heterogeneous). The chapter also covers distributed query processing strategies and the client-server architecture used in managing these databases.

Uploaded by

Tade Mersha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Ch6-Introduction to Distributed Database (2)

Chapter 6 introduces distributed databases, defining them as collections of logically related databases distributed over a network, managed by a distributed database management system. It discusses the advantages of distributed databases, including transparency, reliability, performance, and scalability, as well as types of systems (homogeneous and heterogeneous). The chapter also covers distributed query processing strategies and the client-server architecture used in managing these databases.

Uploaded by

Tade Mersha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 6

Introduction to Distributed
Database
Chapter Outline
1. Distributed Database definition
2. Distributed Database Advantages
3. Types of Distributed Database Systems
4. Distributed Query Processing
5. Client-Server Database Architecture

Advanced Database System- Ch6: Introduction to Distributed Database Slide 2


Distributed Database Definition
 A distributed database (DDB) is a collection of
multiple logically related database distributed
over a computer network,
 A distributed database management system as
a software system that manages a distributed
database while making the distribution
transparent to the user.

Database 1
Database 3
Communication Network
Server 1
Server 3

Database 2

Server 2

Advanced Database System- Ch6: Introduction to Distributed Database Slide 3


Distributed Database Advantages
 Management of distributed data with
different levels of transparency:
 This refers to the physical placement of
data (files, relations, etc.) which is not
known to the user (distribution
transparency).

Advanced Database System- Ch6: Introduction to Distributed Database Slide 4


Distributed Database Advantages…
 Distribution and Network transparency:
 Users do not have to worry about operational
details of the network.
 There is Location transparency, which refers to
freedom of issuing command from any location
without affecting its working.
 Then there is Naming transparency, which
allows access to any names object (files, relations,
etc.) from any location.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 5


Distributed Database Advantages…
 Replication transparency:
 It allows to store copies of a data at multiple
sites as shown in the above diagram.
 This is done to minimize access time to the
required data.
 Fragmentation transparency:
 Allows to fragment a relation horizontally
(create a subset of tuples of a relation) or
vertically (create a subset of columns of a
relation).

Advanced Database System- Ch6: Introduction to Distributed Database Slide 6


Distributed Database Advantages…
 The EMPLOYEE, PROJECT, and WORKS_ON
tables may be fragmented horizontally and
stored with possible replication as shown
below.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 7


Distributed Database Advantages…
 Increased reliability and availability:
 Reliability refers to system live time, that is,
system is running efficiently most of the time.
Availability is the probability that the system is
continuously available (usable or accessible)
during a time interval.
 A distributed database system has multiple
nodes (computers) and if one fails then others
are available to do the job.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 8


Distributed Database Advantages…
 Improved performance:
 A distributed DBMS fragments the database to
keep data closer to where it is needed most.
 This reduces data management (access and
modification) time significantly.
 Easier expansion (scalability):
 Allows new nodes (computers) to be added
anytime without chaining the entire
configuration.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 9


Types of Distributed Database Systems
 Homogeneous
 All sites of the
database system
have identical setup,
i.e., same database Window
Site 5 Unix
system software. Oracle Site 1
 The underlying Oracle
operating system Window
may be different. Site 4 Communications
 For example, all network
sites run Oracle or
DB2, or Sybase or Oracle
some other Site 3 Site 2
database system. Linux Oracle Linux Oracle
 The underlying
operating systems
can be a mixture of
Linux, Window, Unix,
etc.
Advanced Database System- Ch6: Introduction to Distributed Database Slide 10
Types of Distributed Database Systems
 Heterogeneous
 Each site may run different database system with one
or more varieties of operating system

Object Unix Relational


Oriented Site 5 Unix
Site 1
Hierarchical
Window
Site 4 Communications
network

Network
Object DBMS
Oriented Site 3 Site 2 Relational
Linux Linux

Advanced Database System- Ch6: Introduction to Distributed Database Slide 11


Query Processing in Distributed Databases
 Issues
 Cost of transferring data (files and results) over the
network.
 This cost is usually high so some optimization is
necessary.
 Example relations: Employee at site 1 and Department at
Site 2
 Employee at site 1. 10,000 rows. Row size = 100 bytes.
Table size = 106 bytes.
Fname Minit Lname SSN Bdate Address Sex Salary Superssn Dno
 Department at Site 2. 100 rows. Row size = 35 bytes. Table
size = 3,500 bytes. Dname Dnumber Mgrssn Mgrstartdate
 Q: For each employee, retrieve employee name and
department name Where the employee works.
 Q: Fname,Lname,Dname (Employee Dno = Dnumber Department)

Advanced Database System- Ch6: Introduction to Distributed Database Slide 12


Query Processing in Distributed Databases
 Result
 The result of this query will have 10,000
tuples, assuming that every employee is
related to a department.
 Suppose each result tuple is 40 bytes long.
The query is submitted at site 3 and the result
is sent to this site.
 Problem: Employee and Department relations
are not present at site 3.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 13


Query Processing in Distributed Databases
 Strategies:
1. Transfer Employee and Department to site 3.
 Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
 Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join
at site 1, and send the result to site 3.
 Total bytes transferred = 400,000 + 3500 = 403,500
bytes.
 Optimization criteria: minimizing data transfer.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 14


Query Processing in Distributed Databases
 Strategies:
1. Transfer Employee and Department to site 3.
 Total transfer bytes = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3.
 Query result size = 40 * 10,000 = 400,000 bytes. Total
transfer size = 400,000 + 1,000,000 = 1,400,000 bytes.
3. Transfer Department relation to site 1, execute the join
at site 1, and send the result to site 3.
 Total bytes transferred = 400,000 + 3500 = 403,500
bytes.
 Optimization criteria: minimizing data transfer.
 Preferred approach: strategy 3.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 15


Query Processing in Distributed Databases
 Consider the query
 Q’: For each department, retrieve the
department name and the name of the
department manager
 Relational Algebra expression:
 Fname,Lname,Dname (Employee Mgrssn = SSN
Department)

Advanced Database System- Ch6: Introduction to Distributed Database Slide 16


Query Processing in Distributed Databases
 The result of this query will have 100 tuples,
assuming that every department has a manager, the
execution strategies are:
1. Transfer Employee and Department to the result site
and perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at
site 1 and send the result to site 3.
 Total transfer size = 4000 + 3500 = 7500 bytes.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 17


Query Processing in Distributed Databases
 The result of this query will have 100 tuples,
assuming that every department has a manager, the
execution strategies are:
1. Transfer Employee and Department to the result site
and perform the join at site 3.
 Total bytes transferred = 1,000,000 + 3500 = 1,003,500
bytes.
2. Transfer Employee to site 2, execute join at site 2 and
send the result to site 3. Query result size = 40 * 100 =
4000 bytes.
 Total transfer size = 4000 + 1,000,000 = 1,004,000 bytes.
3. Transfer Department relation to site 1, execute join at
site 1 and send the result to site 3.
 Total transfer size = 4000 + 3500 = 7500 bytes.
 Preferred strategy: Choose strategy 3.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 18


Query Processing in Distributed Databases
 Now suppose the result site is 2. Possible
strategies :
1. Transfer Employee relation to site 2, execute
the query and present the result to the user at
site 2.
 Total transfer size = 1,000,000 bytes for both
queries Q and Q’.
2. Transfer Department relation to site 1, execute
join at site 1 and send the result back to site 2.
 Total transfer size for Q = 400,000 + 3500 =
403,500 bytes and for Q’ = 4000 + 3500 = 7500
bytes.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 19


Client-Server Database Architecture
 It consists of clients running client software, a
set of servers which provide all database
functionalities and a reliable communication
infrastructure.
Server 1 Client 1

Client 2

Server 2 Client 3

Server n Client n

Advanced Database System- Ch6: Introduction to Distributed Database Slide 20


Client-Server Database Architecture
 Clients reach server for desired service, but
server does reach clients.
 The server software is responsible for local
data management at a site, much like
centralized DBMS software.
 The client software is responsible for most of
the distribution function.
 The communication software manages
communication among clients and servers.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 21


Client-Server Database Architecture
 The processing of a SQL queries goes as
follows:
 Client parses a user query and decomposes it
into a number of independent sub-queries.
Each subquery is sent to appropriate site for
execution.
 Each server processes its query and sends the
result to the client.
 The client combines the results of subqueries
and produces the final result.

Advanced Database System- Ch6: Introduction to Distributed Database Slide 22

You might also like