2 Parallel Databases
2 Parallel Databases
Parallel Databases
• Architecture
• Data Partitioning Strategy
• Interquery and Intraquery Parallelism
• Parallel Query Optimization
Measuring Performance of a Database
Throughput
• The number of tasks that can be completed in
a given time interval
Response Time
• The amount of time it takes to complete a
single task from the time it is submitted
Single Processor, Single Disk Systems
Throughput
and
Data Response
Processor Time was
not
Satisfactory
Disk
Problems with Single Processor Systems
• Growth of the internet lead to Millions of users
accessing websites and increased data collection
from users
• Such huge data running in terabytes are used for
data analytics and decision support.
• Single processor systems cannot efficiently
handle decision support queries on huge data.
• Single processor systems cannot efficiently
handle large number of concurrent transactions
PARALLEL DATABASES
Parallel Databases
• Parallel systems improve processing and I/O
speeds by using multiple processors and disks
in parallel.
Types of Parallel Machines
Coarse Grain Parallel Machine
• A coarse-grain parallel machine consists of a small
number of powerful processors
• All high-end machines today offer some degree of
coarse-grain parallelism: at least two or four processors
Fine Grain Parallel Machines
• A Fine-grain parallel machine uses thousands of
smaller processors.
• They support a larger degree of parallelism
Measuring Performance of Parallel
Processing Systems
• Speedup
• Scaleup
Speed up
Task1
Task1 Speedup = TS / TL
Time
Time Taken( TL )
Taken( TS) P P P
P
D D D
D
Output
Output
Example - Speedup
• If the original system took 60 seconds to
perform the task and the parallel system with
3 parallel processors took 20 seconds to
complete the task then
Speedup = 60/20 = 3
• Speedup increases with the number of parallel
processors.
Speedup Curve
Scaleup - Example
• Scaleup is the factor that expresses how much
more work can be done in the same time
period by a system n times larger.
• If the original system can process 100
transactions in a given amount of time, and
the parallel system can process 300
transactions in this amount of time, then the
value of scaleup would be equal to 3. That is,
300/100 = 3
VolumeS = 100
Scaleup VolumeL = 300
VolumeL / VolumeS
P P Constant
Constant P
P Time
Time
D D D
D
Output
Output
Scaleup Curve
Parallel Database Architecture
• Shared Memory
• Shared Disk
• Shared Nothing
Shared Memory Architecture
Interconnection Network
Interconnection Network
Interconnection Network
Advantages/Disadvantages of Shared
Nothing Architectures
• Communication network is used for non local
disk access.
• Each node functions as a server for the data
• Drawback is that non local disk access is costly
How Parallelism is achieved
Parallelism
Intraquery
I/O Parallelism Interquery Parallelism Parallelism
Interoperation
List Parallelism
Hash
Range Partitioning
I/O Parallelism
• I/O parallelism refers to reducing the time
required to retrieve relations from disk by
partitioning the relations on multiple disks.
Partitioning Example
Id Name Branch
1 Sam Chennai
2 Ram Vellore
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai
Horizontal Partitioning
Id Name Branch
1 Sam Chennai
2 Ram Vellore
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai
Id Name Branch
DISK 1 DISK 3
DISK 2
Basic Partitioning Strategies
• Round Robin Partitioning
• List Partitioning
• Hash Partitioning
• Range Partitioning
Round Robin Partitioning
5 Jeff Vellore
6 Mohan Vellore Id Name Branch
1 Sam Chennai
DISK 0 DISK 2
4 Chris Mumbai
7 Rahul Chennai
DISK 1
Disadvantages
• Only suitable for full table scans
• Not suitable for point queries or range
queries.
• Select * from employee where name=‘sam’;
• Select * from employee where id between 3
and 5;
List Partitioning
• List partitioning enables you to explicitly
control how rows map to partitions by
specifying a list of discrete values for the
partitioning key in the description for each
partition.
• For a table with a Branch column as the
partitioning key, the Tamilnadu partition might
contain values Chennai and Vellore , the
Maharashtra partition might contain Mumbai
List Partitioning
Id Name Branch
1 Sam Chennai Partition Key -
2 Ram Vellore Branch
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai
DISK 0
Oracle Implementation
create table employee_branch(
id number,name varchar2(10),
branch varchar2(10), income number)
partition by list(branch)
(
partition Tamilnadu
values('chennai','vellore'),
partition Maharashtra
values('mumbai','pune')
);
Lets insert some Records
What happens when a user inserts a record with
a branch that doesn’t match any partition?
1 row created.
1 row created.
Viewing data from a partition
SELECT <column_name_list> FROM <table_name> PARTITION (<partition_name>);
• If we have 10 departments, then according to the hash function, all the employees of
department 1 will go into disk 1, department 2 to disk 2 and so on.
• As another example, if we choose the EName of the employees as partitioning attribute,
then we could have the following hash function;
Partition 1
Partition
Hash function
Key Partition 2
Partition 3
Hash Partitioning - oracle
create table employee_branch1(
id number,
name varchar2(10),
branch varchar2(10),
income number)
partition by hash(id)
(
partition p1,
partition p2,
partition p3);
Inserting some records
SQL> insert into employee_branch1 values(1,'sam','vellore',2000);
1 row created.
1 row created.
1 row created.
Range Partitioning
Range partitioning strategy partitions the data
based on the partitioning attributes values.
We need to find set of range vectors on which
we are about to partition.
For example, the records with Salary range 100
to 5000 will be in disk 1, 5001 to 10000 in disk 2,
and so on.
Range Partitioning
Id Name Branch Salary
1 Sam Chennai 1000 Partition Key -
2 Ram Vellore 5000 Salary
3 Tom Mumbai 40000
<10000
4 Chris Mumbai 20000
10000 < 30000
5 Jeff Vellore 28000
6 Mohan Vellore 3000
>30000
7 Rahul Chennai 38000
<10000 >30000
Id Name Branch Salary
10000 < 30000
Id Name Branch Salary
Id Name Branch Salary
1 Sam Chennai 1000 3 Tom Mumbai 40000
4 Chris Mumbai 20000
2 Ram Vellore 5000 7 Rahul Chennai 38000
5 Jeff Vellore 28000
6 Moha Vellore 3000
n
Oracle Implementation – Range Partitioning
1 row created.
1 row created.
1 row created.
Viewing Records from each partition
SQL> select * from employee_branch2 partition(p0);
<1000 >30000
Id Name Branch Salary
1001 < 30000
Id Name Branch Salary
Id Name Branch Salary
3 Tom Mumbai 40000
4 Chris Mumbai 20000
7 Rahul Chennai 38000
5 Jeff Vellore 28000