Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
1. Introduction
2. Architecture for Parallel databases.
3. Parallel query Evaluation
4. Parallelizing Individual operations.
1
1
Introduction
What is a Centralized Database ?
-all the data is maintained at a single site and assumed that the processing of
individual transaction is essentially sequential.
2
PARALLEL DBMSs
WHY DO WE NEED THEM?
10,000,000,000,000 bytes!
dt
h
10 MB/s
Parallelism:
divide a big problem
into many smaller ones
to be solved in parallel.
4
Parallel DB
Parallel database system seeks to improve performance through
parallelization of various operations such as loading data ,building
indexes, and evaluating queries by using multiple CPUs and Disks in
Parallel.
5
PARALLEL DBMSs
BENEFITS OF A PARALLEL DBMS
INTERQUERY PARALLELISM
It is possible to process a number of transactions in parallel with each other.
Improves Throughput.
INTRAQUERY PARALLELISM
It is possible to process ‘sub-tasks’ of a transaction in parallel with each other.
6
PARALLEL DBMSs
HOW TO MEASURE THE BENEFITS
Speed-Up
– Adding more resources results in proportionally less running time for a
fixed amount of data.
10 seconds to scan a DB of 10,000 records using 1 CPU
1 second to scan a DB of 10,000 records using 10 CPUs
Scale-Up
If resources are increased in proportion to an increase in data/problem
size, the overall time should remain constant
7
Architectures for Parallel Databases
The basic idea behind Parallel DB is to carry out evaluation steps in
parallel whenever is possible.
1. Shared Memory
2. Shared Disk
3. Shared Nothing
8
Shared Memory
Advantages:
1. It is closer to conventional machine
, Easy to.
2. OS services arprogram
3. overhead is lowe leveraged to
utilize the additional CPUs.
Disadvantage:
1. It leads to bottleneck problem
2. Expensive to build
3. It is less sensitive to
partitioning
9
Shared Disk
Advantages:
1. Almost same
Disadvantages:
1. More interference
2. Increases N/W band width
3. Shared disk less sensitive to
partitioning
10
Shared Nothing
Advantages:
1. It provides linear scale up
&linear speed up
2. Shared nothing benefits from
"good" partitioning
3. Cheap to build
Disadvantage
1. Hard to program
2. Addition of new nodes
requires reorganizing
11
PARALLEL DBMSs
SPEED-UP
Number of transactions/second
2000/Sec
1600/Sec
Sub-linear speed-up
1000/Sec
12 1. Parallel DB /D.S.Jagli
Number of CPUs 09/01/20
PARALLEL DBMSs
SCALE-UP
Number of transactions/second
5 CPUs 10 CPUs
1 GB Database 2 GB Database
13 1. Parallel DB /D.S.Jagli
Number of CPUs, Database size 09/01/20
PARALLEL QUERY EVALUATION
14
Different Types of DBMS ||-ism
Parallel evaluation of a relational query in DBMS With shared –nothing
architecture
1. Inter-query parallelism
Multiple queries run on different sites
2. Intra-query parallelism
Parallel execution of single query run on different sites.
a) Intra-operator parallelism
a) get all machines working together to compute a given operation (scan, sort, join).
b) Inter-operator parallelism
each operator may run concurrently on a different site (exploits
pipelining).
In order to evaluate different operators in parallel, we need to
evaluate each operator in query plan in Parallel.
15
Data Partitioning
Types of Partitioning
1. Horizontal Partitioning: tuple of a relation are divided among
many disks such that each tuple resides on one disk.
It enables to exploit the I/O band width of disks by reading &
writing them in parallel.
Reduce the time required to retrieve relations from disk by
partitioning the relations on multiple disks.
1. Range Partitioning
2. Hash Partitioning
3. Round Robin Partitioning
2. Vertical Partitioning
16
1.Range Partitioning
Tuples are sorted (conceptually), and n ranges are chosen for
the sort key values so that each range contains roughly the
same number of tuples;
tuples in range i are assigned to processor i.
Eg:
sailor _id 1-10 assigned to disk 1
sailor _id 10-20 assigned to disk 2
sailor _id 20-30 assigned to disk 3
range partitioning can lead to data skew; that is, partitions with widely
varying number of tuples across
17
2.Hash Partitioning
A hash function is applied to selected fields of a tuple to determine its
processor.
Hash partitioning has the additional virtue that it keeps data evenly
distributed even if the data grows and shrinks over time.
18
3.Round Robin Partitioning
If only a subset of the tuples (e.g., those that satisfy the selection
condition age = 20) is required, hash partitioning and range partitioning
are better than round-robin partitioning
19
Range Hash Round Robin
A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z
20
Parallelizing Sequential Operator
Evaluation Code
1. An elegant software architecture for parallel DBMSs enables us to
readily parallelize existing code for sequentially evaluating a
relational operator.
Techniques
1. Bulk loading& scanning
2. Sorting
3. Joins
22
1.Bulk Loading and scanning
scanning a relation: Pages can be read in parallel while scanning a
relation, and the retrieved tuples can then be merged, if the relation is
partitioned across several disks.
23
2.Parallel Sorting :
Parallel sorting steps:
1. First redistribute all tuples in the relation using range partitioning.
2. Each processor then sorts the tuples assigned to it
3. The entire sorted relation can be retrieved by visiting the processors in
an order corresponding to the ranges assigned to them.
2. By using the same partitioning function for both A and B, we ensure that
the union of the k smaller joins computes the join of A and B.
Hash-Join
Sort-merge-join
25
Sort-merge-join
partition A and B by dividing the range of the join attribute into k disjoint
subranges and placing A and B tuples into partitions according to the
subrange to which their values belong.
The result of the join of A and B, the output of the join process may be split
into several data streams.
26
Dataflow Network of Operators for
Parallel Join