0% found this document useful (0 votes)
20 views

RDBMS - Module5 - Distributed and Parallel DB

good notes

Uploaded by

shiniii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

RDBMS - Module5 - Distributed and Parallel DB

good notes

Uploaded by

shiniii
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CHAPTER

Distributed and Parallel

7 Databases

7.1 DISTRIBUTED DATABASES


(GGSIPU,2011;MDU, Dec, 2009, May 2009-1o, KIN
Distributed databases can be termed as collection of multiple databases that are
stored on several computers across various location connected to one another
through a computer network.
A user sees a distributed database as a single database which is located on a

single computer. He does not have any idea that the particular data which he is
accessing may be located at some other site. A
distributed database management
system is a set of programme that uses client-server architecture to process
information requests.

7.1.1 Types of Distributed Databases


There are two types of distributed databases:

(i) Homogeneous distributed databases: Databases stored at various

geographical regions runs identical database softwares.


(ii) Heterogeneous distributed databases: Databases stored at various
geographical regions have different database softwares. For example, one
site may be running oracle database while other may have DB2 database.

7.1.2 Design of Distributed Database


Different techniques used for designing distributed databases are:

1. Data Fragmentation: In this technique, decision is made regarding what


portion of database is to be stored at which location. A relation is broken
into different fragment and is physically stored across various sites. Various
ways of fragmenting arelation are:
(a) Horizontal Fragmentation: A relation R is partitioned into many
relations where each new relation consists of some tuples of relation
R. These new relations are distributed across various sites.

Example: Consider the following student relation


Parallel Databases 211
and
pietributed

Sudent Name Branch Marke


No
Roll Ashu CSE
Binoy CSE
Himanshu T
Naina CSE 70
Rashmi

Fig. 7.1: StudentRelation

can be partitioned according to branch field of a student ie. as


This reelation

follows:
Student
Student_Prag2
Fragi
=h G Brnck s (Student)
r(Student)
Student_Frag2
Student_Prag1 Marks Roll No Name Branch Marks
Branch
Roll No
Name 3 Himanshu 79
CSE 95
Ashu Rashmi IT 65
1
CSE 84
Binoy
CSE 70
Naina

Fig.7.2: Horizontal Fragmentation

R partitioned into many relations


Vertical Fragmentation: A relation
is
b) attributes of a relation
where each new relation consist of only certain
which specifies logical or
r,

R. An additional attribute Tuple_Id is added


of a tuple.
physical address two new relations,
is partitioned into
Example: The student relation contains Name
contains RollNo., Marks while the other
one relation
student.
and Branch fields of a
(Student)
Student_Vírag1 = Tpollno, marks, Tuple,

Student_Vírag2 = ame, branch, Tugle


n(Student)

Student_Vfrag2
tudent_Vfrag1 Branch Tuple_ld
Name
Roll No Marks Tuple_Id
Ashu CSE
1
1
95 CSE
2 Binoy 3
2 84 IT
Himanshu
3 CSE 4
3 79 Naina
4 78 4 IT 5
Rashmi
65 5
5

Fragmentation
Fig. 7.3: Vertical
a relation is first
In this type of fragmentation further
(c) Mixed Fragmentation: obtained is

and then the new relation


horizontally
partitioned
Database Management Systems
212
and
is first partitioned vertically
partitioned vertically or a relation
partitioned horizontally.
then the new relation obtained isfurther
Example:
Stud =aNA,Nem (o C)(Student)
Stud.

RollNo Name
1 Ashu
2 Binoy
Naina

Fig.74:Mixed Fragmentation

2. Data Replication: Itrefers to maintaining of more than one copy of a data

at several different site i.e. many identical replicas of a relation is stored

at more than one site.


Two types of data replicationsare:
(a) Fully Replicated Database: A copy of entire database is replicated at

more than one site.


(b)Partially Replicated Database: Some portion of a database is

replicated at other site.


3. Data Allocotion: Data allocationis a strategyby which one decides how to

place data at different site. In centralised strategydata and DBMS is stored at


a single site and users at different site can access this data through a network.
Another strategy is to partition the data and store them atdiffrent site or

keep differentcopies of same data at several sites.

7.1.3 Architecture of Distributed Database

Following are the three architectures used in distributed database.

1. Shared Nothing Architecture: Every computer located at various site have


their own local database. All thesecomputers are connected via network
but no one shares it database with other.

Database 1

Site 1

Site 3

Database 2
Database 3
Site 2
Site 4

Database 3

Fig.7.5: Šhared Nothing Architecture


and rarallel Databasee
nistributed
213

2. Centralised Database: Each and


is
every computer located atvarious sites
connected through a nelwork and
shares a common database.
Site 1

Site
3 Site 2

Centralised
Database
Site 4

Fig. 7.6: Centralised Database Architecture

3 Truly Distributed Database: Each and every computer located at various


sites and connected through a network, have there own
local databases.

However, all these databases are shared.

Site 1

Site 2
Site 4

Site 3

Architecture
Fig. 7.7: Truly Distributed
Distributed and Parallel Databases
215

7.2 PARALLEL DATABASES


databases multiple processors works in parallel to
narallel
perform various
onerationsconcurrently. For example,one CPU might be loading the data while
other isexecuting a query atthe same time.

7.2.1 Architecture of Parallel Databases


(MDU,Dec 2009, May 2009, 2010, 2011, KU)
Three most popular architecture of parallel databases are:

1. Shared memory archilecture. As the name suggests all the


proCessors
and disk share a common memory. All the processors,
disk and mernory
are connected through a communication
network.A processor may also
have a local cache so that referencing of shared
memory is avciied
whenever possible. Processors communicate with each other through
memory writes.

Processor Processor Processor

Inter Connection Network

Disk Disk Shared memory

Fig. 7.8: Shared Memory Architecture of Parallel Database

Advantages
(a) Data access is fast as processor communicates through memory writes.
(b)Low communication overhead.
Disadvantages
(a) Cache coherency: If an update is done to shared memory then it should
also be done to local cache.
(b) Architecture not scalable beyond 32 or 64 processors.
2. Shared Disk Architecture: In this architecturethere are multiple processors
and each processor have there own private memory, but they all share
some common disk via interconnection network.

Memory Memory Memory

Processor Processor Processor

Inter Connection Network

Disk Disk

Fig. 7.9: Shared Disk Architectureof Parallel Database


Database
216 Management Systens

Advantages:
bus iss not a bottleneck.
(a) Since each processorhas its own memory,
fails, then other can take over.
(b) If one processor or memory
(c) Load balancing is easy.
Disadvantages:
(a) Problems of scalability
as with increase in processor number of disk
to disk becomes a
accessalsoincreasesand interconnection bottleneck.
(b) Due to increase in processor, existing processors get slow down
because of increased contention of memory access and network
bandwidth.
3. Shared Nothing Architecture: Every processor connected to the
interconnection network has its own individual memory and disk. All
communication is done through high speed communication network.

Memory Memory

Disk Processor Processor Disk

inter Connection Network

Processor Disk

Memory

Fig. 7.10: Shared Nothing Architectureof Parallel Database

Advantages:
(a) Better scalability. No sharing of resources minimises contention among
processors.
(b) High speed. As queries are executed at individual node so onlyqueries
requiring access to non-local disk and result pass through network.
(c) Support large number of processors.
Disadvantages:
(a)Communication costs are higher.
(b) Difficulty in load balancing.
(c) Cost of non local disk access is higher than shared one.

(d) Since, there is no sharing of disk and data, so if one processor fails
data becomes inaccessible to other processor.
Distributed and Parallel Databases
217
A. Hierarchical Architecture

Processor Processor
Disk
Memory

inter Connection
Network

Processor Processor
Disk Disk Memory
Fig. 7.11: Hierarchial
Architecture of Parallel Database

It is a combination
of shared memory, shared disk
and shared nothing
architectures.Initially
the system can be seen as shared nothing
systen. Now
each node is shared memory system.
Within system each node the system is
shared disk system.

Advantages:

(a) Higher performance -


Higher speed up and scale up can be attained with
more number of CPU.
(b) Flexibility– more nodes can be added or removed
easily.
(c) A single system can serve many user.
7.2.2 Query Parallelism
Query parallelism means how to parallely execute multiple queries or how to
decompose a query into various parts so that they all can be executed in parallel.
Techniques toachieve thisquery parallelism are:

1. Inputoutput parallelism: A
relation is partitioned and kept on multiple
disk toreduce the retrievaltime. Now each partitionis processed
parallely
and then finally combined. Various strategiesto partition a relation are:
(a) Hash partitioning: Every tuple of a relation is hashed on some
partitioning attributeof the relation.
Ifthehash function returns value i
then this tuple is kept on disk i.

(b) Round robin partitioning: ith tuple of the relation is kept on disk

number D, mod n. So, all tuples are evenly distributed across every
disk.

(c) Range partitioning: Distributes contiguous attribute value range to

each disk. For example range partitioning with three disks numbered

You might also like