0% found this document useful (0 votes)
30 views71 pages

2 Parallel Databases

Parallel databases improve performance by using multiple processors and disks to process queries and tasks in parallel. There are different architectures for parallel databases including shared memory, shared disk, and shared nothing. The shared nothing architecture scales well as it partitions data across independent nodes that communicate over a network. Parallelism is achieved through techniques like I/O parallelism which partitions relations across multiple disks, and query parallelism which breaks queries into sub-queries run in parallel.

Uploaded by

Shel Coop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views71 pages

2 Parallel Databases

Parallel databases improve performance by using multiple processors and disks to process queries and tasks in parallel. There are different architectures for parallel databases including shared memory, shared disk, and shared nothing. The shared nothing architecture scales well as it partitions data across independent nodes that communicate over a network. Parallelism is achieved through techniques like I/O parallelism which partitions relations across multiple disks, and query parallelism which breaks queries into sub-queries run in parallel.

Uploaded by

Shel Coop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Parallel Databases

Parallel Databases
• Architecture
• Data Partitioning Strategy
• Interquery and Intraquery Parallelism
• Parallel Query Optimization
Measuring Performance of a Database

Throughput
• The number of tasks that can be completed in
a given time interval

Response Time
• The amount of time it takes to complete a
single task from the time it is submitted
Single Processor, Single Disk Systems

Data Main Memory

Throughput
and
Data Response
Processor Time was
not
Satisfactory

Disk
Problems with Single Processor Systems
• Growth of the internet lead to Millions of users
accessing websites and increased data collection
from users
• Such huge data running in terabytes are used for
data analytics and decision support.
• Single processor systems cannot efficiently
handle decision support queries on huge data.
• Single processor systems cannot efficiently
handle large number of concurrent transactions
PARALLEL DATABASES
Parallel Databases
• Parallel systems improve processing and I/O
speeds by using multiple processors and disks
in parallel.
Types of Parallel Machines
Coarse Grain Parallel Machine
• A coarse-grain parallel machine consists of a small
number of powerful processors
• All high-end machines today offer some degree of
coarse-grain parallelism: at least two or four processors
Fine Grain Parallel Machines
• A Fine-grain parallel machine uses thousands of
smaller processors.
• They support a larger degree of parallelism
Measuring Performance of Parallel
Processing Systems
• Speedup
• Scaleup
Speed up
Task1
Task1 Speedup = TS / TL

Time
Time Taken( TL )
Taken( TS) P P P
P

D D D
D

Output
Output
Example - Speedup
• If the original system took 60 seconds to
perform the task and the parallel system with
3 parallel processors took 20 seconds to
complete the task then
Speedup = 60/20 = 3
• Speedup increases with the number of parallel
processors.
Speedup Curve
Scaleup - Example
• Scaleup is the factor that expresses how much
more work can be done in the same time
period by a system n times larger.
• If the original system can process 100
transactions in a given amount of time, and
the parallel system can process 300
transactions in this amount of time, then the
value of scaleup would be equal to 3. That is,
300/100 = 3
VolumeS = 100
Scaleup VolumeL = 300

Task2 Task 100


Task2 Task4

Task1 Task1 Task3 Task300

VolumeL / VolumeS

P P Constant
Constant P
P Time
Time

D D D
D

Output
Output
Scaleup Curve
Parallel Database Architecture
• Shared Memory
• Shared Disk
• Shared Nothing
Shared Memory Architecture

Processor1 Processor2 Processor3 Processor4


cache cache cache cache

Interconnection Network

Global Shared Primary Memory

Disk1 Disk2 Disk3 Disk4


Shared Memory
• Multiple Processors share secondary disk
storage and also share primary memory
• In a shared memory system multiple
processors are attached to an interconnection
network and can access a common region of
the main memory.
Advantages of Shared Memory Architecture

• Efficient Communication between Processors


• A Processor can send messages to another
Processor by using Memory Writes
Shared Memory Disadvantages
• Architecture is not scalable beyond 64
Processors
• Communication Network is a Bottleneck
• These Architectures have large memory
caches at each processor. Maintaining Cache
coherency becomes an increasing overhead.
Shared Disk Architecture
• Each Processor has a private main memory
and access to all disks using a common
interconnection network
• Multiple processors share secondary disk
storage but each has their own primary
memory
Shared Disk Architecture
Main Main Main Main
Memory Memory Memory Memory

Processor1 Processor2 Processor3 Processor4

Interconnection Network

Disk1 Disk2 Disk3 Disk4


Advantages/Disadvantages of Shared Disk
Architecture
• Fault Tolerance – If a Processor fails other
processors can take up its task since all
processors can access all disks.

• Disadvantages – As we scale up the


communication network becomes a
bottleneck
Problems with Shared Memory and Shared
Disk Architecture
• As more Processors are added, the existing
processors slow down because of increased
memory contention and network bandwidth.
• A system with 1000 Processors is only 4%
percent as effective as a single processor
system
Shared Nothing Architecture
• Each processor has a local main memory and
disk space.
• No two processors can access the same
storage area.
• All communication between processors
happen over a communication network.
• There is a homogeneity of nodes in a shared
nothing architecture
Shared Nothing Architecture
Disk 1 Disk 2 Disk 3 Disk 4

Main Main Main Main


Memory Memory Memory Memory

Processor 1 Processor 2 Processor 3 Processor 4

Interconnection Network
Advantages/Disadvantages of Shared
Nothing Architectures
• Communication network is used for non local
disk access.
• Each node functions as a server for the data
• Drawback is that non local disk access is costly
How Parallelism is achieved
Parallelism
Intraquery
I/O Parallelism Interquery Parallelism Parallelism

Round Robin Intraoperation Parallelism

Interoperation
List Parallelism

Hash

Range Partitioning
I/O Parallelism
• I/O parallelism refers to reducing the time
required to retrieve relations from disk by
partitioning the relations on multiple disks.
Partitioning Example
Id Name Branch
1 Sam Chennai
2 Ram Vellore
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai
Horizontal Partitioning
Id Name Branch
1 Sam Chennai
2 Ram Vellore
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai

Id Name Branch Id Name Branch

Id Name Branch

DISK 1 DISK 3

DISK 2
Basic Partitioning Strategies
• Round Robin Partitioning
• List Partitioning
• Hash Partitioning
• Range Partitioning
Round Robin Partitioning

• This strategy scans the relation in any order


and sends the ith tuple to disk number Di mod
n.
• The round-robin scheme ensures an even
distribution of tuples across disks; that is, each
disk has approximately the same number of
tuples as the others.
Round Robin Partitioning
Id Name Branch
1 Sam Chennai 1 mod 3=1
2 Ram Vellore
2 mod 3=2
i – record number
3 Tom Mumbai 3 mod 3=0
n – number disks 4 Chris Mumbai 4 mod 3 =1
5 Jeff Vellore
i mod n is used for 5 mod 3 = 2
6 Mohan Vellore
splitting records 7 Rahul Chennai
6 mod 3 = 0
7 mod 3 = 1

Id Name Branch Id Name Branch


3 Tom Mumbai 2 Ram Vellore

5 Jeff Vellore
6 Mohan Vellore Id Name Branch
1 Sam Chennai
DISK 0 DISK 2
4 Chris Mumbai

7 Rahul Chennai

DISK 1
Disadvantages
• Only suitable for full table scans
• Not suitable for point queries or range
queries.
• Select * from employee where name=‘sam’;
• Select * from employee where id between 3
and 5;
List Partitioning
• List partitioning enables you to explicitly
control how rows map to partitions by
specifying a list of discrete values for the
partitioning key in the description for each
partition.
• For a table with a Branch column as the
partitioning key, the Tamilnadu partition might
contain values Chennai and Vellore , the
Maharashtra partition might contain Mumbai
List Partitioning
Id Name Branch
1 Sam Chennai Partition Key -
2 Ram Vellore Branch
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai

Tamilnadu Partition Maharashtra Partition

Id Name Branch Id Name Branch


1 Sam Chennai 3 Tom Mumbai
2 Ram Vellore 4 Chris Mumbai
5 Jeff Vellore DISK 1
7 Rahul Chennai

DISK 0
Oracle Implementation
create table employee_branch(
id number,name varchar2(10),
branch varchar2(10), income number)
partition by list(branch)
(
partition Tamilnadu
values('chennai','vellore'),
partition Maharashtra
values('mumbai','pune')
);
Lets insert some Records
What happens when a user inserts a record with
a branch that doesn’t match any partition?

insert into employee_branch


values(1,'sam','trichy',5000);
Partition Key Error
SQL> insert into employee_branch
values(1,'sam','trichy',5000);
insert into employee_branch
values(1,'sam','trichy',5000)
*
ERROR at line 1:
ORA-14400: inserted partition key does not map
to any partition
Default Partition
• The DEFAULT partition enables you to avoid
specifying all possible values for a list-
partitioned table by using a default partition,
so that all rows that do not map to any other
partition do not generate an error.
Default Partition Oracle
create table employee_branch1(
id number,
name varchar2(10),
branch varchar2(10),
income number)
partition by list(branch)
(
partition Tamilnadu
values ('vellore','chennai'),
partition Maharashtra
values ('mumbai','pune'),
partition unknown_branch
values (default)
);
Inserting records
SQL> insert into employee_branch1
values(123,'sam','vellore',5000);

1 row created.

SQL> insert into employee_branch1


values(123,'sam','trichy',5000);

1 row created.
Viewing data from a partition
SELECT <column_name_list> FROM <table_name> PARTITION (<partition_name>);

SQL> select * from employee_branch partition (Tamilnadu);

ID NAME BRANCH INCOME


---------- ---------- ---------- ----------
1 sam vellore 5000
2 ram chennai 20000

SQL> select * from employee_branch partition (Maharashtra);

ID NAME BRANCH INCOME


---------- ---------- ---------- ----------
3 rahul mumbai 60000
Hash Partitioning
• Hash partitioning maps data to partitions
based on a hashing algorithm that applies to
the partitioning key that you identify.
• The hashing algorithm evenly distributes rows
among partitions, giving partitions
approximately the same size.
• For example, consider the following table;
EMPLOYEE(ENo, EName, DeptNo, Salary, Age)

• If we choose DeptNo attribute as the partitioning attribute and if we have 10 disks to


distribute the data, then the following would be a hash function;
h(DeptNo) = DeptNo mod 10

• If we have 10 departments, then according to the hash function, all the employees of
department 1 will go into disk 1, department 2 to disk 2 and so on.
•  As another example, if we choose the EName of the employees as partitioning attribute,
then we could have the following hash function;

h(EName) = (Sum of ASCII value of every character in the name) mod n,

• where n is the number of disks/partitions needed.


Sample

Partition 1

Partition key Mod N

Partition
Hash function
Key Partition 2

Partition 3
Hash Partitioning - oracle
create table employee_branch1(
id number,
name varchar2(10),
branch varchar2(10),
income number)
partition by hash(id)
(
partition p1,
partition p2,
partition p3);
Inserting some records
SQL> insert into employee_branch1 values(1,'sam','vellore',2000);

1 row created.

SQL> insert into employee_branch1 values(2,'ram','chennai',3000);

1 row created.

SQL> insert into employee_branch1 values(3,'tom','mumbai',4000);

1 row created.
Range Partitioning
Range partitioning strategy partitions the data
based on the partitioning attributes values.
We need to find set of range vectors on which
we are about to partition.
For example, the records with Salary range 100
to 5000 will be in disk 1, 5001 to 10000 in disk 2,
and so on.
Range Partitioning
Id Name Branch Salary
1 Sam Chennai 1000 Partition Key -
2 Ram Vellore 5000 Salary
3 Tom Mumbai 40000
<10000
4 Chris Mumbai 20000
10000 < 30000
5 Jeff Vellore 28000
6 Mohan Vellore 3000
>30000
7 Rahul Chennai 38000

<10000 >30000
Id Name Branch Salary
10000 < 30000
Id Name Branch Salary
Id Name Branch Salary
1 Sam Chennai 1000 3 Tom Mumbai 40000
4 Chris Mumbai 20000
2 Ram Vellore 5000 7 Rahul Chennai 38000
5 Jeff Vellore 28000
6 Moha Vellore 3000
n
Oracle Implementation – Range Partitioning

create table employee_branch2(


id number,
name varchar2(10),
branch varchar2(10),
salary number)
partition by range(salary)
(
partition p0 values less than(10000),
partition p1 values less than(30000),
partition p2 values less than (maxvalue));
Lets Insert some records
SQL> insert into employee_branch2 values(1,'sam','vellore',1000);

1 row created.

SQL> insert into employee_branch2 values(2,'ram','chennai',25000);

1 row created.

SQL> insert into employee_branch2 values(3,'tom','mumbai',35000);

1 row created.
Viewing Records from each partition
SQL> select * from employee_branch2 partition(p0);

ID NAME BRANCH SALARY


---------- ---------- ---------- ----------
1 sam vellore 1000

SQL> select * from employee_branch2 partition(p1);

ID NAME BRANCH SALARY


---------- ---------- ---------- ----------
2 ram chennai 25000
Viewing Records from each partition

SQL> select * From employee_branch2


partition(p2);

ID NAME BRANCH SALARY


---------- ---------- ---------- ----------
3 tom mumbai 35000
Inserting a record into the partition
insert into employee_branch2 partition(p0)
values(54,'Jim','US',8000);
Updating a record in a partition
• update employee_branch2 partition(p0) set
name='waters' where id=1;
Delete a record from a partition
• delete from employee_branch2 partition(p0)
where id=54;
Viewing Partitions on a table
select table_name,partition_name from
user_tab_partitions;
Partitioning Techniques and their Support for
different type of access
Round Robin –
• Useful for reading entire relations
• Point queries and Range Queries should access all
n disks and is complicated to process.
Hash Partitioning
• Good for point or range queries on the partitioning
attribute
• Not good for point or range queries on non
partitioning attribute
Partitioning Techniques and their Support for
different type of access
Range Partitioning–
Well suited for range and point queries on the
partitioning attribute
Handling of Skew
• Skew – Some partition gets more tuples and
some partition gets lesser tuples
Two Types of Skew
• Attribute-Skew
• Partition Skew
Attribute value Skew
Id Name Branch
1 Sam Chennai Partition Key -
2 Ram Vellore Branch
3 Tom Mumbai
4 Chris Trichy
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai

Tamilnadu Partition Maharashtra Partition

Id Name Branch Id Name Branch


1 Sam Chennai 3 Tom Mumbai
2 Ram Vellore
5 Jeff Vellore DISK 0 DISK 1
7 Rahul Chennai
4 Chris Trichy
Partition Skew
Id Name Branch Salary
1 Sam Chennai 1000 Partition Key -
2 Ram Vellore 5000 Salary
3 Tom Mumbai 40000
<1000
4 Chris Mumbai 20000
1001 < 30000
5 Jeff Vellore 28000
6 Mohan Vellore 3000
>30000
7 Rahul Chennai 38000

<1000 >30000
Id Name Branch Salary
1001 < 30000
Id Name Branch Salary
Id Name Branch Salary
3 Tom Mumbai 40000
4 Chris Mumbai 20000
7 Rahul Chennai 38000
5 Jeff Vellore 28000

1 Sam Chennai 1000

2 Ram Vellore 5000

6 Moha Vellore 3000


n

You might also like