Parallel and Distributed Databases in DBMS
Parallel and Distributed Databases in DBMS
Syllabus Content
• Parallel Database:
• Architecture, I/O Parallelism, Interquery, Intraquery
• Intraoperation and Interoperation Parallelism
• Distributed Databases
• Types of Distributed Database Systems,
• Distributed Data Storage, Distributed Query Processing
How to measure Performance of Database
• Parallel DBMS is a Database Management System that runs through
multiple processors and disks.
• They combine two or more processors also disk storage that helps make
operations and executions easier and faster.
• They are designed to execute concurrent operations.
How to measure Performance of Database
• Single Processor
Parallel Database
• Parallel DBMS is a Database Management System that runs through
multiple processors and disks.
• They combine two or more processors also disk storage that helps make
operations and executions easier and faster.
• They are designed to execute concurrent operations.
Parallel Database
• Parallel DBMS is a Database Management System that runs through
multiple processors and disks.
• They combine two or more processors also disk storage that helps make
operations and executions easier and faster.
• They are designed to execute concurrent operations.
• Architectural Models
• There are several architectural models for parallel Database, which are
given below −
• Shared memory architecture.
• Shared disk architecture.
• Shared nothing architecture.
Parallel Database
• Shared Memory System
• Every computer processor is able to access and
process data from multiple memory modules or
unit through intercommunication channel.
• This architecture is also commonly known as SMP
or Symmetric Multi-processing
• Shared Disk System
• A Shared Disk System is an architecture of
Database Management System where every
computer processors can access multiple disk
through intercommunication network.
• It can also access and utilize every local memory.
Parallel Database
• Shared Nothing System
• A Shared Nothing System is an architecture of
Database Management System where every
processor has their own disk and memory for the
objective of efficient workflows.
• The processors can communicate with other
processors using intercommunication network.
• Each of the processors act like servers to store
data on the disk.
I/O parallelism in parallel database
• I/O parallelism refers to reducing the time required to retrieve relations from disk
by partitioning the relations on multiple disks.
•Partitioning Techniques
•Three basic data-partitioning strategies. Assume that there are n disks,
•D0,D1, . . .,Dn−1, across which the data are to be partitioned.
.
I/O parallelism in parallel database
• Round Robin Partitioning
• List Partitioning
• Hash Partitioning
• Range Partitioning
I/O parallelism in parallel database
•.
•Round-robin.
•This strategy scans the relation in any order and sends the ith tuple to disk
number Di mod n.
•The round-robin scheme ensures an even distribution of tuples across
disks; that is, each disk has approximately the same number of tuples as the
others.
I/O parallelism in parallel database
• I – record number
• n – number disks
• Now, the fees details are maintained in the accounts section. In this case, the designer will
fragment the database as follows −
• CREATE TABLE STD_FEES AS
• SELECT Regd_No, Fees
• FROM STUDENT;
Distributed Data Storage
•Hybrid Fragmentation
•In hybrid fragmentation, a combination of horizontal and vertical
fragmentation techniques are used.
•Hybrid fragmentation can be done in two alternative ways −
•At first, generate a set of horizontal fragments; then generate vertical
fragments from one or more of the horizontal fragments.
•At first, generate a set of vertical fragments; then generate horizontal
fragments from one or more of the vertical fragments.
Distributed Data Storage
•Fragmentation Example
Distributed Data Storage
• Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.
• This is advantageous as it increases the availability of data at different sites.
• However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one site
needs to be recorded at every site that relation is stored or else it may lead to inconsistency. This is a lot of
overhead. Also, concurrency control becomes way more complex as concurrent access now needs to be checked
over a number of sites.
• Advantages of Data Replication
• Reliability − In case of failure of any site, the database system continues to work since a copy is available at another
site(s).
• Reduction in Network Load − Since local copies of data are available, query processing can be done with reduced
network usage, particularly during prime hours. Data updating can be done at non-prime hours.
• Quicker Response − Availability of local copies of data ensures quick query processing and consequently quick
response time.
• Simpler Transactions − Transactions require less number of joins of tables located at different sites and minimal
coordination across the network. Thus, they become simpler in nature.
Distributed Data Storage
•Types of Data Replication In DBMS
•Transactional Replication
•Snapshot Replication
•Merge Replication
•Transactional Replication
•Transactional Replication makes a complete copy of your database, as well as copies of new data changes. In this type of
Data Replication, changes to your database are synced in real-time and in the same order as they occur. This guarantees
transactional consistency.
•Snapshot Replication
•Snapshot Replication is perhaps the simplest type of Data Replication that copies “snapshots” of your database. It
replicates the current state of your database as is, at a specific point in time, without including any changes/updates to
your data. This kind of replication is helpful when changes made to your databases are infrequent.
•Merge Replication
•Merge Replication combines data from several databases into a single database. This type of Data Replication tracks
subsequent data changes and schema modifications made at publishers and subscribers and synchronizes the same to your
database using merge agents. A great advantage of using Merge Replication is that it allows publishers and subscribers to
independently modify the database.