Parallel and Distributed Databases in DBMS

The document covers the concepts of parallel and distributed databases, detailing architectures, performance measurement, and various types of parallelism such as interquery and intraquery. It also discusses distributed databases, their types (homogeneous and heterogeneous), and strategies for distributed data storage including fragmentation and replication. Key advantages and disadvantages of these systems are highlighted, emphasizing their impact on performance and reliability.

Uploaded by

spectar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views31 pages

Parallel and Distributed Databases in DBMS

Uploaded by

spectar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Parallel and Distributed Databases

Syllabus Content
• Parallel Database:
• Architecture, I/O Parallelism, Interquery, Intraquery
• Intraoperation and Interoperation Parallelism
• Distributed Databases
• Types of Distributed Database Systems,
• Distributed Data Storage, Distributed Query Processing
How to measure Performance of Database
• Parallel DBMS is a Database Management System that runs through
multiple processors and disks.
• They combine two or more processors also disk storage that helps make
operations and executions easier and faster.
• They are designed to execute concurrent operations.
How to measure Performance of Database
• Single Processor
Parallel Database
• Parallel DBMS is a Database Management System that runs through
multiple processors and disks.
• They combine two or more processors also disk storage that helps make
operations and executions easier and faster.
• They are designed to execute concurrent operations.
Parallel Database
• Parallel DBMS is a Database Management System that runs through
multiple processors and disks.
• They combine two or more processors also disk storage that helps make
operations and executions easier and faster.
• They are designed to execute concurrent operations.
• Architectural Models
• There are several architectural models for parallel Database, which are
given below −
• Shared memory architecture.
• Shared disk architecture.
• Shared nothing architecture.
Parallel Database
• Shared Memory System
• Every computer processor is able to access and
process data from multiple memory modules or
unit through intercommunication channel.
• This architecture is also commonly known as SMP
or Symmetric Multi-processing
• Shared Disk System
• A Shared Disk System is an architecture of
Database Management System where every
computer processors can access multiple disk
through intercommunication network.
• It can also access and utilize every local memory.
Parallel Database
• Shared Nothing System
• A Shared Nothing System is an architecture of
Database Management System where every
processor has their own disk and memory for the
objective of efficient workflows.
• The processors can communicate with other
processors using intercommunication network.
• Each of the processors act like servers to store
data on the disk.
I/O parallelism in parallel database
• I/O parallelism refers to reducing the time required to retrieve relations from disk
by partitioning the relations on multiple disks.

• The most common form of data partitioning in a parallel database environment is

horizontal partitioning.

• In horizontal partitioning, the tuples of a relation are divided (or declustered)

among many disks, so that each tuple resides on one disk.

•Partitioning Techniques
•Three basic data-partitioning strategies. Assume that there are n disks,
•D0,D1, . . .,Dn−1, across which the data are to be partitioned.
.
I/O parallelism in parallel database
• Round Robin Partitioning
• List Partitioning
• Hash Partitioning
• Range Partitioning
I/O parallelism in parallel database
•.
•Round-robin.
•This strategy scans the relation in any order and sends the ith tuple to disk
number Di mod n.
•The round-robin scheme ensures an even distribution of tuples across
disks; that is, each disk has approximately the same number of tuples as the
others.
I/O parallelism in parallel database
• I – record number
• n – number disks

• I mod n is used for

Splitting records
• 1 mod 3 =1
• 2 mod 3 =2
• 3 mod 3 =0
• 4 mod 3 =1
• 5 mod 3 =2
Disk0 Disk1 Disk2
• 6 mod 3 =0
• 7 mod 3 =1
Disadvatages
• Only suitable for full table scans
• Not suitable for point queries or range queries
• Select*from employee where name=‘sam’
• Select *from employee where id between 3 and 5
I/O parallelism in parallel database
•Hash partitioning.
•This declustering strategy designates one or more attributes from the given relation’s schema as
the partitioning attributes.
•A hash function is chosen whose range is {0, 1, . . . , n − 1}. Each tuple of the original relation is
hashed on the partitioning attributes. If the hash function returns i, then the tuple is placed on
disk Di.
•Range partitioning.
•This strategy distributes contiguous attribute-value ranges to each disk. It chooses a partitioning
attribute, A, as a partitioning vector.
•The relation is partitioned as follows. Let [v0, v1, . . . , vn−2] denote the partitioning vector, such
that, if i < j, then vi < vj. Consider a tuple t such that t[A] = x. If x < v0, then t goes on disk D0. If x ≥
vn−2, then t goes on disk Dn−1. If vi ≤ x < vi+1, then t goes on disk Di+1.
•For example, range partitioning with three disks numbered 0, 1, and 2 may assign tuples with
values less than 5 to disk 0, values between 5 and 40 to disk 1, and values greater than 40 to
disk 2.
Interquery and Intraquery parallelism
• Interquery Parallelism
• In interquery parallelism, different queries or transaction execute in parallel with
one another.
• This form of parallelism can increase transactions throughput. The response
times of individual transactions are not faster than they would be if the
transactions were run in isolation.
• Thus, the primary use of interquery parallelism is to scale up a transaction
processing system to support a more significant number of transactions per
second.
• Interquery parallelism is the easiest form of parallelism to support in a database
system—particularly in a shared-memory parallel system.
• Database systems designed for single-processor systems can be used with few or
no changes on a shared-memory parallel architecture, since even sequential
database systems support concurrent processing.
Interquery parallelism
• Refer Notebook
Intraquery parallelism
• Intraquery Parallelism
• Intraquery parallelism defines the execution of a single query in parallel on
multiple processors and disks.
• Using intraquery parallelism is essential for speeding up long-running queries.
• This application of parallelism decomposes the serial SQL, query into lower-
level operations such as scan, join, sort, and aggregation.
• To illustrate the parallel evaluation of a query, consider a query that requires a
relation to be sorted. Suppose that the relation has been partitioned across
multiple disks by range partitioning on some attribute, and the sort is
requested on the partitioning attribute. The sort operation can be
implemented by sorting each partition in parallel, then concatenating the
sorted partitions to get the final sorted relation.
Intraquery parallelism
Intraoperation and Interoperation Parallelism
• we may be able to pipeline the output of one operation to another operation.
The two operations can be executed in parallel on separate processors,
• one generating output that is consumed by the other, even as it is generated.
• In summary, the execution of a single query can be parallelized in two ways:
• Intraoperation parallelism.
• We can speed up processing of a query by parallelizing the execution of each
individual operation, such as sort, select, project, and join.
• Interoperation parallelism.
• We can speed up processing of a query by executing in parallel the different
operations in a query expression.
Distributed Databases
• What is a distributed database?
• Distributed database system is one in which the
data belonging to a single logical database is
distributed to two or more physical databases
to ensure reliability and availability
• A distributed database is a database in which all
storage devices are not attached to a common
CPU. Data may be stored in multiple sites
separate from each other.
• In a distributed database, the data is spread or
replicated among several databases which are
physically separate from each other. These
databases are connected through a network so
that they appear as a single database to the
user.
Types of Distributed Databases
• Distributed databases can be broadly
classified into homogeneous and
heterogeneous distributed database
environments
• Homogeneous Distributed Databases
• In a homogeneous distributed database, all
the sites use identical DBMS and operating
systems. Its properties are
• The sites use very similar software.
• The sites use identical DBMS or DBMS from
the same vendor.
• Each site is aware of all other sites and
cooperates with other sites to process user
requests.
• The database is accessed through a single
interface as if it is a single database.
Types of Distributed Databases
• There are two types of homogeneous
distributed database are:
[Link] − Each database is
independent that functions on its
own. They are integrated by a
controlling application and use
message passing to share data
updates.
[Link]-autonomous − Data is distributed
across the homogeneous nodes and a
central or master DBMS co-ordinates
data updates across the sites.
Types of Distributed Databases
• Heterogeneous Distributed Databases
• In a heterogeneous distributed database,
different sites have different operating systems,
DBMS products and data models. Its properties
are −
• Different sites use dissimilar schemas and
software.
• The system may be composed of a variety of
DBMSs like relational, network, hierarchical or
object oriented.
• Query processing is complex due to dissimilar
schemas.
• Transaction processing is complex due to
dissimilar software.
• A site may not be aware of other sites and so
there is limited co-operation in processing user
requests.
Types of Distributed Databases
• Types of Heterogeneous Distributed
Databases
[Link] − The heterogeneous
database systems are independent in
nature and integrated together so
that they function as a single
database system.
[Link]-federated − The database systems
employ a central coordinating module
through which the databases are
accessed.
Distributed Data Storage

• Distributed Data storage is an intelligent distribution of your data pieces,

(called data fragments) to improve database performance and Data
Availability for end-users.
• It aims to reduce overall costs of transaction processing while also
providing accurate data rapidly in your DDBMS systems.
• Distributed Data storage is one of the key steps in building your
Distributed Database Systems.
• There are two common strategies used in optimal Data Allocation: Data
Fragmentation and Data Replication.
Distributed Data Storage
• Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and
each of the fragments is stored in different sites where they’re required.
• Fragmentation is a process of disintegrating relations or tables into several partitions in
multiple sites. It divides a database into various subtables and sub relations so that data can
be distributed and stored efficiently. Fragmentation of relations can be done in two ways:
•Horizontal fragmentation– Splitting by rows – The relation is fragmented into groups of
tuples so that each tuple is assigned to at least one fragment.
• For example, in the student schema, if the details of all students of Computer Science
Course needs to be maintained at the School of Computer Science, then the designer will
horizontally fragment the database as follows −
• CREATE COMP_STD AS
• SELECT * FROM STUDENT
• WHERE COURSE = "Computer Science";
Distributed Data Storage
•Vertical fragmentation – Splitting by columns –
•The schema of the relation is divided into smaller schemas. Each fragment must contain a
common candidate key so as to ensure a lossless join.
•In certain cases, an approach that is hybrid of fragmentation and replication is used.
•For example, let us consider that a University database keeps records of all registered
students in a Student table having the following schema.
• STUDENT
Regd_No Name Course Address Semester Fees Marks

• Now, the fees details are maintained in the accounts section. In this case, the designer will
fragment the database as follows −
• CREATE TABLE STD_FEES AS
• SELECT Regd_No, Fees
• FROM STUDENT;
Distributed Data Storage
•Hybrid Fragmentation
•In hybrid fragmentation, a combination of horizontal and vertical
fragmentation techniques are used.
•Hybrid fragmentation can be done in two alternative ways −
•At first, generate a set of horizontal fragments; then generate vertical
fragments from one or more of the horizontal fragments.
•At first, generate a set of vertical fragments; then generate horizontal
fragments from one or more of the vertical fragments.
Distributed Data Storage
•Fragmentation Example
Distributed Data Storage
• Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.
• This is advantageous as it increases the availability of data at different sites.
• However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one site
needs to be recorded at every site that relation is stored or else it may lead to inconsistency. This is a lot of
overhead. Also, concurrency control becomes way more complex as concurrent access now needs to be checked
over a number of sites.
• Advantages of Data Replication
• Reliability − In case of failure of any site, the database system continues to work since a copy is available at another
site(s).
• Reduction in Network Load − Since local copies of data are available, query processing can be done with reduced
network usage, particularly during prime hours. Data updating can be done at non-prime hours.
• Quicker Response − Availability of local copies of data ensures quick query processing and consequently quick
response time.
• Simpler Transactions − Transactions require less number of joins of tables located at different sites and minimal
coordination across the network. Thus, they become simpler in nature.
Distributed Data Storage
•Types of Data Replication In DBMS
•Transactional Replication
•Snapshot Replication
•Merge Replication

•Transactional Replication
•Transactional Replication makes a complete copy of your database, as well as copies of new data changes. In this type of
Data Replication, changes to your database are synced in real-time and in the same order as they occur. This guarantees
transactional consistency.
•Snapshot Replication
•Snapshot Replication is perhaps the simplest type of Data Replication that copies “snapshots” of your database. It
replicates the current state of your database as is, at a specific point in time, without including any changes/updates to
your data. This kind of replication is helpful when changes made to your databases are infrequent.
•Merge Replication
•Merge Replication combines data from several databases into a single database. This type of Data Replication tracks
subsequent data changes and schema modifications made at publishers and subscribers and synchronizes the same to your
database using merge agents. A great advantage of using Merge Replication is that it allows publishers and subscribers to
independently modify the database.

Parallel and Distributed Database Systems
No ratings yet
Parallel and Distributed Database Systems
22 pages
Adv DBMS-Unit 2
No ratings yet
Adv DBMS-Unit 2
15 pages
Parallel and Distributed DBMS Techniques
No ratings yet
Parallel and Distributed DBMS Techniques
15 pages
Parallel and Distributed Databases NOTES
No ratings yet
Parallel and Distributed Databases NOTES
98 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
Parallel Database Systems Overview
100% (1)
Parallel Database Systems Overview
141 pages
Parallelism in Database Management Systems
No ratings yet
Parallelism in Database Management Systems
37 pages
Module 4
No ratings yet
Module 4
23 pages
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
I/O Parallelism in Database Systems
100% (1)
I/O Parallelism in Database Systems
52 pages
Week 2 Parallel and Distributed Database
No ratings yet
Week 2 Parallel and Distributed Database
7 pages
Second Unit ADBMS
No ratings yet
Second Unit ADBMS
53 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
17 pages
Database Parallelism Essentials
No ratings yet
Database Parallelism Essentials
46 pages
Parallel Database Management Systems Overview
No ratings yet
Parallel Database Management Systems Overview
74 pages
Parallel and Distributed Databases Overview
No ratings yet
Parallel and Distributed Databases Overview
23 pages
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6TH Sem
11 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
27 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
132 pages
Ads Unit 3
No ratings yet
Ads Unit 3
8 pages
Parallel Database
No ratings yet
Parallel Database
22 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
Module 3 - Parallel and Distributed Database
No ratings yet
Module 3 - Parallel and Distributed Database
22 pages
WEEK 4 DP 3rd Term Year 11 Parallel
No ratings yet
WEEK 4 DP 3rd Term Year 11 Parallel
29 pages
Module1 ADBMS
No ratings yet
Module1 ADBMS
99 pages
Parallel Database System
No ratings yet
Parallel Database System
55 pages
Parallel and Distributed Databases
No ratings yet
Parallel and Distributed Databases
7 pages
ADBMS Notes
No ratings yet
ADBMS Notes
15 pages
9.CSI2004-ADBMS Module2 Part1
No ratings yet
9.CSI2004-ADBMS Module2 Part1
54 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
41 pages
Parallel Database Systems Guide
No ratings yet
Parallel Database Systems Guide
11 pages
Adbms Unit4
No ratings yet
Adbms Unit4
24 pages
Unit - I DBMS
No ratings yet
Unit - I DBMS
74 pages
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Distributed Database System Cse 6th Sem
11 pages
Intra Query Parallelism in Databases
No ratings yet
Intra Query Parallelism in Databases
58 pages
8-Parallel Nhom5
No ratings yet
8-Parallel Nhom5
59 pages
Parallel Database
No ratings yet
Parallel Database
4 pages
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
No ratings yet
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
114 pages
ADT Unit 1 To 5
No ratings yet
ADT Unit 1 To 5
160 pages
DWHM 1
No ratings yet
DWHM 1
12 pages
Parallel Dbms
No ratings yet
Parallel Dbms
5 pages
Lecture3-Distributed Introduction
No ratings yet
Lecture3-Distributed Introduction
38 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
60 pages
Parallel Database QA Detailed
No ratings yet
Parallel Database QA Detailed
2 pages
Parallal Databases
No ratings yet
Parallal Databases
4 pages
Parallel Databases
No ratings yet
Parallel Databases
10 pages
Evolution and Architecture of Database Systems
No ratings yet
Evolution and Architecture of Database Systems
51 pages
DBT Unit 3 Slides
No ratings yet
DBT Unit 3 Slides
110 pages
Distributed and Parallel Database Systems: To-Peer, Require Sophisticated Protocols
No ratings yet
Distributed and Parallel Database Systems: To-Peer, Require Sophisticated Protocols
4 pages
1 Distributed DB
No ratings yet
1 Distributed DB
67 pages
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
No ratings yet
Parallel DB /D.S.Jagli 1 5/4/2012 1 1. Parallel DB /D.S.Jagli
70 pages
Advanced Parallel Databases
No ratings yet
Advanced Parallel Databases
136 pages
Advanced Parallel DB Systems
No ratings yet
Advanced Parallel DB Systems
30 pages
Unit 5
No ratings yet
Unit 5
28 pages
Types of Database Systems and SQL
No ratings yet
Types of Database Systems and SQL
16 pages
Adbms Unit2 Answers
No ratings yet
Adbms Unit2 Answers
7 pages
Parallel Databases
No ratings yet
Parallel Databases
23 pages
Transaction Management in DBMS
No ratings yet
Transaction Management in DBMS
45 pages
Engineering Actuators Overview
No ratings yet
Engineering Actuators Overview
33 pages
Temperature Measurement-Bimetallic Thermometer
No ratings yet
Temperature Measurement-Bimetallic Thermometer
47 pages
J.A.O. Exam Budgetary Control Guide
No ratings yet
J.A.O. Exam Budgetary Control Guide
31 pages
Reseach Paper 2
No ratings yet
Reseach Paper 2
22 pages
Vector Analysis and Calculations
No ratings yet
Vector Analysis and Calculations
14 pages
DPP - Determinants
100% (1)
DPP - Determinants
4 pages
Quadratic Equation Problems and Solutions
100% (1)
Quadratic Equation Problems and Solutions
20 pages
A.P. Problems: Arithmetic Means
100% (1)
A.P. Problems: Arithmetic Means
23 pages
Projectile Motion Problems and Solutions
No ratings yet
Projectile Motion Problems and Solutions
1 page
JEE Main 2014 Units & Dimensions Guide
No ratings yet
JEE Main 2014 Units & Dimensions Guide
8 pages
Fii Pre Mock 2025 Ics
No ratings yet
Fii Pre Mock 2025 Ics
6 pages
Iot Lab Manual FF New
No ratings yet
Iot Lab Manual FF New
49 pages
Introduction To NovaCloud - A Public Cloud Service by IP ServerOne
No ratings yet
Introduction To NovaCloud - A Public Cloud Service by IP ServerOne
5 pages
Sans Nom 2
No ratings yet
Sans Nom 2
6 pages
Computer PDF
No ratings yet
Computer PDF
77 pages
Vsphere Esxi Vcenter Server 703 Availability Guide
No ratings yet
Vsphere Esxi Vcenter Server 703 Availability Guide
92 pages
Logical Binary Shifts
No ratings yet
Logical Binary Shifts
6 pages
Marvell PXA1802-001 Platform Brief
No ratings yet
Marvell PXA1802-001 Platform Brief
2 pages
AboutBlobStorage AZURE
No ratings yet
AboutBlobStorage AZURE
1,785 pages
C Programming With Windows and Linux ABIs
No ratings yet
C Programming With Windows and Linux ABIs
5 pages
HP Universal Print Driver TRAINING
No ratings yet
HP Universal Print Driver TRAINING
6 pages
64 F 2626
No ratings yet
64 F 2626
1,071 pages
Configure Cisco Routers For Syslog, NTP, and SSH Operations
No ratings yet
Configure Cisco Routers For Syslog, NTP, and SSH Operations
6 pages
Update in The SAp GUI
No ratings yet
Update in The SAp GUI
32 pages
Rtos Esd
No ratings yet
Rtos Esd
76 pages
Citra Log
No ratings yet
Citra Log
11 pages
Linux - A Beginner's Guide
100% (15)
Linux - A Beginner's Guide
107 pages
TrueCrypt Volume Identification Guide
No ratings yet
TrueCrypt Volume Identification Guide
7 pages
PSoC 5LP CY8C58LP Family Datasheet Programmable System-On-Chip PSoC
No ratings yet
PSoC 5LP CY8C58LP Family Datasheet Programmable System-On-Chip PSoC
139 pages
Essential Mac OS X Keyboard Shortcuts
No ratings yet
Essential Mac OS X Keyboard Shortcuts
10 pages
Creating An IP-based Catalyst Store For Veeam Backups
No ratings yet
Creating An IP-based Catalyst Store For Veeam Backups
1 page
CPU Scheduling Questions With Answers
No ratings yet
CPU Scheduling Questions With Answers
5 pages
Disk Replacement in Eva
No ratings yet
Disk Replacement in Eva
5 pages
V1600G Series Software Release Notes
No ratings yet
V1600G Series Software Release Notes
19 pages
Microsoft Office Hack Guide
No ratings yet
Microsoft Office Hack Guide
2 pages
A History of The Computer in Table Format
100% (1)
A History of The Computer in Table Format
32 pages
KL 002.11.6 en Unit1 v1.0.7
No ratings yet
KL 002.11.6 en Unit1 v1.0.7
137 pages
F5 Customer Demo: BIG-IP AFM - Use AFM in Network Firewall Mode
No ratings yet
F5 Customer Demo: BIG-IP AFM - Use AFM in Network Firewall Mode
11 pages
LAN Networking Concepts Quiz
No ratings yet
LAN Networking Concepts Quiz
14 pages
A-Level Processor Structure Flashcards
No ratings yet
A-Level Processor Structure Flashcards
471 pages

Parallel and Distributed Databases in DBMS

Uploaded by

Parallel and Distributed Databases in DBMS

Uploaded by

Parallel and Distributed Databases

• The most common form of data partitioning in a parallel database environment is

• In horizontal partitioning, the tuples of a relation are divided (or declustered)

• I mod n is used for

• Distributed Data storage is an intelligent distribution of your data pieces,

You might also like