0% found this document useful (0 votes)

30 views71 pages

2 Parallel Databases

Parallel databases improve performance by using multiple processors and disks to process queries and tasks in parallel. There are different architectures for parallel databases including shared memory, shared disk, and shared nothing. The shared nothing architecture scales well as it partitions data across independent nodes that communicate over a network. Parallelism is achieved through techniques like I/O parallelism which partitions relations across multiple disks, and query parallelism which breaks queries into sub-queries run in parallel.

Uploaded by

Shel Coop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views71 pages

2 Parallel Databases

Uploaded by

Shel Coop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 71

Parallel Databases

Parallel Databases
• Architecture
• Data Partitioning Strategy
• Interquery and Intraquery Parallelism
• Parallel Query Optimization
Measuring Performance of a Database

Throughput
• The number of tasks that can be completed in
a given time interval

Response Time
• The amount of time it takes to complete a
single task from the time it is submitted
Single Processor, Single Disk Systems

Data Main Memory

Throughput
and
Data Response
Processor Time was
not
Satisfactory

Disk
Problems with Single Processor Systems
• Growth of the internet lead to Millions of users
accessing websites and increased data collection
from users
• Such huge data running in terabytes are used for
data analytics and decision support.
• Single processor systems cannot efficiently
handle decision support queries on huge data.
• Single processor systems cannot efficiently
handle large number of concurrent transactions
PARALLEL DATABASES
Parallel Databases
• Parallel systems improve processing and I/O
speeds by using multiple processors and disks
in parallel.
Types of Parallel Machines
Coarse Grain Parallel Machine
• A coarse-grain parallel machine consists of a small
number of powerful processors
• All high-end machines today offer some degree of
coarse-grain parallelism: at least two or four processors
Fine Grain Parallel Machines
• A Fine-grain parallel machine uses thousands of
smaller processors.
• They support a larger degree of parallelism
Measuring Performance of Parallel
Processing Systems
• Speedup
• Scaleup
Speed up
Task1
Task1 Speedup = TS / TL

Time
Time Taken( TL )
Taken( TS) P P P
P

D D D
D

Output
Output
Example - Speedup
• If the original system took 60 seconds to
perform the task and the parallel system with
3 parallel processors took 20 seconds to
complete the task then
Speedup = 60/20 = 3
• Speedup increases with the number of parallel
processors.
Speedup Curve
Scaleup - Example
• Scaleup is the factor that expresses how much
more work can be done in the same time
period by a system n times larger.
• If the original system can process 100
transactions in a given amount of time, and
the parallel system can process 300
transactions in this amount of time, then the
value of scaleup would be equal to 3. That is,
300/100 = 3
VolumeS = 100
Scaleup VolumeL = 300

Task2 Task 100

Task2 Task4

Task1 Task1 Task3 Task300

VolumeL / VolumeS

P P Constant
Constant P
P Time
Time

D D D
D

Output
Output
Scaleup Curve
Parallel Database Architecture
• Shared Memory
• Shared Disk
• Shared Nothing
Shared Memory Architecture

Processor1 Processor2 Processor3 Processor4

cache cache cache cache

Interconnection Network

Global Shared Primary Memory

Disk1 Disk2 Disk3 Disk4

Shared Memory
• Multiple Processors share secondary disk
storage and also share primary memory
• In a shared memory system multiple
processors are attached to an interconnection
network and can access a common region of
the main memory.
Advantages of Shared Memory Architecture

• Efficient Communication between Processors

• A Processor can send messages to another
Processor by using Memory Writes
Shared Memory Disadvantages
• Architecture is not scalable beyond 64
Processors
• Communication Network is a Bottleneck
• These Architectures have large memory
caches at each processor. Maintaining Cache
coherency becomes an increasing overhead.
Shared Disk Architecture
• Each Processor has a private main memory
and access to all disks using a common
interconnection network
• Multiple processors share secondary disk
storage but each has their own primary
memory
Shared Disk Architecture
Main Main Main Main
Memory Memory Memory Memory

Processor1 Processor2 Processor3 Processor4

Interconnection Network

Disk1 Disk2 Disk3 Disk4

Advantages/Disadvantages of Shared Disk
Architecture
• Fault Tolerance – If a Processor fails other
processors can take up its task since all
processors can access all disks.

• Disadvantages – As we scale up the

communication network becomes a
bottleneck
Problems with Shared Memory and Shared
Disk Architecture
• As more Processors are added, the existing
processors slow down because of increased
memory contention and network bandwidth.
• A system with 1000 Processors is only 4%
percent as effective as a single processor
system
Shared Nothing Architecture
• Each processor has a local main memory and
disk space.
• No two processors can access the same
storage area.
• All communication between processors
happen over a communication network.
• There is a homogeneity of nodes in a shared
nothing architecture
Shared Nothing Architecture
Disk 1 Disk 2 Disk 3 Disk 4

Main Main Main Main

Memory Memory Memory Memory

Processor 1 Processor 2 Processor 3 Processor 4

Interconnection Network
Advantages/Disadvantages of Shared
Nothing Architectures
• Communication network is used for non local
disk access.
• Each node functions as a server for the data
• Drawback is that non local disk access is costly
How Parallelism is achieved
Parallelism
Intraquery
I/O Parallelism Interquery Parallelism Parallelism

Round Robin Intraoperation Parallelism

Interoperation
List Parallelism

Hash

Range Partitioning
I/O Parallelism
• I/O parallelism refers to reducing the time
required to retrieve relations from disk by
partitioning the relations on multiple disks.
Partitioning Example
Id Name Branch
1 Sam Chennai
2 Ram Vellore
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai
Horizontal Partitioning
Id Name Branch
1 Sam Chennai
2 Ram Vellore
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai

Id Name Branch Id Name Branch

Id Name Branch

DISK 1 DISK 3

DISK 2
Basic Partitioning Strategies
• Round Robin Partitioning
• List Partitioning
• Hash Partitioning
• Range Partitioning
Round Robin Partitioning

• This strategy scans the relation in any order

and sends the ith tuple to disk number Di mod
n.
• The round-robin scheme ensures an even
distribution of tuples across disks; that is, each
disk has approximately the same number of
tuples as the others.
Round Robin Partitioning
Id Name Branch
1 Sam Chennai 1 mod 3=1
2 Ram Vellore
2 mod 3=2
i – record number
3 Tom Mumbai 3 mod 3=0
n – number disks 4 Chris Mumbai 4 mod 3 =1
5 Jeff Vellore
i mod n is used for 5 mod 3 = 2
6 Mohan Vellore
splitting records 7 Rahul Chennai
6 mod 3 = 0
7 mod 3 = 1

Id Name Branch Id Name Branch

3 Tom Mumbai 2 Ram Vellore

5 Jeff Vellore
6 Mohan Vellore Id Name Branch
1 Sam Chennai
DISK 0 DISK 2
4 Chris Mumbai

7 Rahul Chennai

DISK 1
Disadvantages
• Only suitable for full table scans
• Not suitable for point queries or range
queries.
• Select * from employee where name=‘sam’;
• Select * from employee where id between 3
and 5;
List Partitioning
• List partitioning enables you to explicitly
control how rows map to partitions by
specifying a list of discrete values for the
partitioning key in the description for each
partition.
• For a table with a Branch column as the
partitioning key, the Tamilnadu partition might
contain values Chennai and Vellore , the
Maharashtra partition might contain Mumbai
List Partitioning
Id Name Branch
1 Sam Chennai Partition Key -
2 Ram Vellore Branch
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai

Tamilnadu Partition Maharashtra Partition

Id Name Branch Id Name Branch

1 Sam Chennai 3 Tom Mumbai
2 Ram Vellore 4 Chris Mumbai
5 Jeff Vellore DISK 1
7 Rahul Chennai

DISK 0
Oracle Implementation
create table employee_branch(
id number,name varchar2(10),
branch varchar2(10), income number)
partition by list(branch)
(
partition Tamilnadu
values('chennai','vellore'),
partition Maharashtra
values('mumbai','pune')
);
Lets insert some Records
What happens when a user inserts a record with
a branch that doesn’t match any partition?

insert into employee_branch

values(1,'sam','trichy',5000);
Partition Key Error
SQL> insert into employee_branch
values(1,'sam','trichy',5000);
insert into employee_branch
values(1,'sam','trichy',5000)
*
ERROR at line 1:
ORA-14400: inserted partition key does not map
to any partition
Default Partition
• The DEFAULT partition enables you to avoid
specifying all possible values for a list-
partitioned table by using a default partition,
so that all rows that do not map to any other
partition do not generate an error.
Default Partition Oracle
create table employee_branch1(
id number,
name varchar2(10),
branch varchar2(10),
income number)
partition by list(branch)
(
partition Tamilnadu
values ('vellore','chennai'),
partition Maharashtra
values ('mumbai','pune'),
partition unknown_branch
values (default)
);
Inserting records
SQL> insert into employee_branch1
values(123,'sam','vellore',5000);

1 row created.

SQL> insert into employee_branch1

values(123,'sam','trichy',5000);

1 row created.
Viewing data from a partition
SELECT <column_name_list> FROM <table_name> PARTITION (<partition_name>);

SQL> select * from employee_branch partition (Tamilnadu);

ID NAME BRANCH INCOME

---------- ---------- ---------- ----------
1 sam vellore 5000
2 ram chennai 20000

SQL> select * from employee_branch partition (Maharashtra);

ID NAME BRANCH INCOME

---------- ---------- ---------- ----------
3 rahul mumbai 60000
Hash Partitioning
• Hash partitioning maps data to partitions
based on a hashing algorithm that applies to
the partitioning key that you identify.
• The hashing algorithm evenly distributes rows
among partitions, giving partitions
approximately the same size.
• For example, consider the following table;
EMPLOYEE(ENo, EName, DeptNo, Salary, Age)

• If we choose DeptNo attribute as the partitioning attribute and if we have 10 disks to

distribute the data, then the following would be a hash function;
h(DeptNo) = DeptNo mod 10

• If we have 10 departments, then according to the hash function, all the employees of
department 1 will go into disk 1, department 2 to disk 2 and so on.
• As another example, if we choose the EName of the employees as partitioning attribute,
then we could have the following hash function;

h(EName) = (Sum of ASCII value of every character in the name) mod n,

• where n is the number of disks/partitions needed.

Sample

Partition 1

Partition key Mod N

Partition
Hash function
Key Partition 2

Partition 3
Hash Partitioning - oracle
create table employee_branch1(
id number,
name varchar2(10),
branch varchar2(10),
income number)
partition by hash(id)
(
partition p1,
partition p2,
partition p3);
Inserting some records
SQL> insert into employee_branch1 values(1,'sam','vellore',2000);

1 row created.

SQL> insert into employee_branch1 values(2,'ram','chennai',3000);

1 row created.

SQL> insert into employee_branch1 values(3,'tom','mumbai',4000);

1 row created.
Range Partitioning
Range partitioning strategy partitions the data
based on the partitioning attributes values.
We need to find set of range vectors on which
we are about to partition.
For example, the records with Salary range 100
to 5000 will be in disk 1, 5001 to 10000 in disk 2,
and so on.
Range Partitioning
Id Name Branch Salary
1 Sam Chennai 1000 Partition Key -
2 Ram Vellore 5000 Salary
3 Tom Mumbai 40000
<10000
4 Chris Mumbai 20000
10000 < 30000
5 Jeff Vellore 28000
6 Mohan Vellore 3000
>30000
7 Rahul Chennai 38000

<10000 >30000
Id Name Branch Salary
10000 < 30000
Id Name Branch Salary
Id Name Branch Salary
1 Sam Chennai 1000 3 Tom Mumbai 40000
4 Chris Mumbai 20000
2 Ram Vellore 5000 7 Rahul Chennai 38000
5 Jeff Vellore 28000
6 Moha Vellore 3000
n
Oracle Implementation – Range Partitioning

create table employee_branch2(

id number,
name varchar2(10),
branch varchar2(10),
salary number)
partition by range(salary)
(
partition p0 values less than(10000),
partition p1 values less than(30000),
partition p2 values less than (maxvalue));
Lets Insert some records
SQL> insert into employee_branch2 values(1,'sam','vellore',1000);

1 row created.

SQL> insert into employee_branch2 values(2,'ram','chennai',25000);

1 row created.

SQL> insert into employee_branch2 values(3,'tom','mumbai',35000);

1 row created.
Viewing Records from each partition
SQL> select * from employee_branch2 partition(p0);

ID NAME BRANCH SALARY

---------- ---------- ---------- ----------
1 sam vellore 1000

SQL> select * from employee_branch2 partition(p1);

ID NAME BRANCH SALARY

---------- ---------- ---------- ----------
2 ram chennai 25000
Viewing Records from each partition

SQL> select * From employee_branch2

partition(p2);

ID NAME BRANCH SALARY

---------- ---------- ---------- ----------
3 tom mumbai 35000
Inserting a record into the partition
insert into employee_branch2 partition(p0)
values(54,'Jim','US',8000);
Updating a record in a partition
• update employee_branch2 partition(p0) set
name='waters' where id=1;
Delete a record from a partition
• delete from employee_branch2 partition(p0)
where id=54;
Viewing Partitions on a table
select table_name,partition_name from
user_tab_partitions;
Partitioning Techniques and their Support for
different type of access
Round Robin –
• Useful for reading entire relations
• Point queries and Range Queries should access all
n disks and is complicated to process.
Hash Partitioning
• Good for point or range queries on the partitioning
attribute
• Not good for point or range queries on non
partitioning attribute
Partitioning Techniques and their Support for
different type of access
Range Partitioning–
Well suited for range and point queries on the
partitioning attribute
Handling of Skew
• Skew – Some partition gets more tuples and
some partition gets lesser tuples
Two Types of Skew
• Attribute-Skew
• Partition Skew
Attribute value Skew
Id Name Branch
1 Sam Chennai Partition Key -
2 Ram Vellore Branch
3 Tom Mumbai
4 Chris Trichy
5 Jeff Vellore
6 Mohan Vellore
7 Rahul Chennai

Tamilnadu Partition Maharashtra Partition

Id Name Branch Id Name Branch

1 Sam Chennai 3 Tom Mumbai
2 Ram Vellore
5 Jeff Vellore DISK 0 DISK 1
7 Rahul Chennai
4 Chris Trichy
Partition Skew
Id Name Branch Salary
1 Sam Chennai 1000 Partition Key -
2 Ram Vellore 5000 Salary
3 Tom Mumbai 40000
<1000
4 Chris Mumbai 20000
1001 < 30000
5 Jeff Vellore 28000
6 Mohan Vellore 3000
>30000
7 Rahul Chennai 38000

<1000 >30000
Id Name Branch Salary
1001 < 30000
Id Name Branch Salary
Id Name Branch Salary
3 Tom Mumbai 40000
4 Chris Mumbai 20000
7 Rahul Chennai 38000
5 Jeff Vellore 28000

1 Sam Chennai 1000

2 Ram Vellore 5000

6 Moha Vellore 3000

Data Stage PDF
No ratings yet
Data Stage PDF
37 pages
GDC09 Abrash Larrabee+Final
No ratings yet
GDC09 Abrash Larrabee+Final
116 pages
Using Parallel Maya: Autodesk, Inc
No ratings yet
Using Parallel Maya: Autodesk, Inc
22 pages
Design and Development of A Model For Parallelization of Sequential Program For Execution On Multicore Architecture
No ratings yet
Design and Development of A Model For Parallelization of Sequential Program For Execution On Multicore Architecture
19 pages
Parallel Algorithms For Matrix Computations
No ratings yet
Parallel Algorithms For Matrix Computations
208 pages
PES 20112 EMT TS Tools
No ratings yet
PES 20112 EMT TS Tools
6 pages
Introduction To Cloud Computing
No ratings yet
Introduction To Cloud Computing
116 pages
Unit-1 & Ii GCC
No ratings yet
Unit-1 & Ii GCC
37 pages
ACA-unit 1 - Concept 1
No ratings yet
ACA-unit 1 - Concept 1
32 pages
VLSI System Testing: Fault Simulation
No ratings yet
VLSI System Testing: Fault Simulation
13 pages
DR - Babasaheb Ambedkar Marathwada University, Aurangabad
No ratings yet
DR - Babasaheb Ambedkar Marathwada University, Aurangabad
52 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Big CPU Big Data
No ratings yet
Big CPU Big Data
424 pages
Rootkit Attacks and Protection A Case Study of Tea
No ratings yet
Rootkit Attacks and Protection A Case Study of Tea
319 pages
2.2 DD2356 Threads
No ratings yet
2.2 DD2356 Threads
22 pages
The Art of High Performance Computing For Computational Science, Vol. 2: Advanced Techniques and Examples For Materials Science Masaaki Geshi
100% (3)
The Art of High Performance Computing For Computational Science, Vol. 2: Advanced Techniques and Examples For Materials Science Masaaki Geshi
63 pages
Heidari 2018
No ratings yet
Heidari 2018
53 pages
Course Title: Computer Architecture Course Code: Periods/Week: 4 Periods/Semester:72
No ratings yet
Course Title: Computer Architecture Course Code: Periods/Week: 4 Periods/Semester:72
3 pages
Pipeline and Vector Processing
No ratings yet
Pipeline and Vector Processing
4 pages
Ub-Mesh: A Hierarchically Localized Nd-Fullmesh Datacenter Network Architecture
No ratings yet
Ub-Mesh: A Hierarchically Localized Nd-Fullmesh Datacenter Network Architecture
14 pages
Matlab Mpi: Parallel Programming With Matlabmpi Reference Manual
No ratings yet
Matlab Mpi: Parallel Programming With Matlabmpi Reference Manual
32 pages
PSDD Accelerator
No ratings yet
PSDD Accelerator
21 pages
Comporg Chapter2
No ratings yet
Comporg Chapter2
4 pages
Notes On Rotman's Becoming Beside Ourselves
100% (1)
Notes On Rotman's Becoming Beside Ourselves
6 pages
IPSXE 2017 Update 8 Release Notes L W
No ratings yet
IPSXE 2017 Update 8 Release Notes L W
19 pages
Flynn
No ratings yet
Flynn
2 pages
Dns PDF
No ratings yet
Dns PDF
28 pages
Folien HyperThreading
No ratings yet
Folien HyperThreading
3 pages
Auto-Keras: An Efficient Neural Architecture Search System: Haifeng Jin, Qingquan Song, Xia Hu
No ratings yet
Auto-Keras: An Efficient Neural Architecture Search System: Haifeng Jin, Qingquan Song, Xia Hu
11 pages
A Paper On Parallel CRC Generation For High Speed Application
No ratings yet
A Paper On Parallel CRC Generation For High Speed Application
3 pages
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

2 Parallel Databases

Uploaded by

2 Parallel Databases

Uploaded by

Parallel Databases

Data Main Memory

Task2 Task 100

Task1 Task1 Task3 Task300

Processor1 Processor2 Processor3 Processor4

Global Shared Primary Memory

Disk1 Disk2 Disk3 Disk4

• Efficient Communication between Processors

Processor1 Processor2 Processor3 Processor4

Disk1 Disk2 Disk3 Disk4

• Disadvantages – As we scale up the

Main Main Main Main

Processor 1 Processor 2 Processor 3 Processor 4

Round Robin Intraoperation Parallelism

Id Name Branch Id Name Branch

• This strategy scans the relation in any order

Id Name Branch Id Name Branch

Tamilnadu Partition Maharashtra Partition

Id Name Branch Id Name Branch

insert into employee_branch

SQL> insert into employee_branch1

SQL> select * from employee_branch partition (Tamilnadu);

ID NAME BRANCH INCOME

SQL> select * from employee_branch partition (Maharashtra);

ID NAME BRANCH INCOME

• If we choose DeptNo attribute as the partitioning attribute and if we have 10 disks to

h(EName) = (Sum of ASCII value of every character in the name) mod n,

• where n is the number of disks/partitions needed.

Partition key Mod N

SQL> insert into employee_branch1 values(2,'ram','chennai',3000);

SQL> insert into employee_branch1 values(3,'tom','mumbai',4000);

create table employee_branch2(

SQL> insert into employee_branch2 values(2,'ram','chennai',25000);

SQL> insert into employee_branch2 values(3,'tom','mumbai',35000);

ID NAME BRANCH SALARY

SQL> select * from employee_branch2 partition(p1);

ID NAME BRANCH SALARY

SQL> select * From employee_branch2

ID NAME BRANCH SALARY

Tamilnadu Partition Maharashtra Partition

Id Name Branch Id Name Branch

1 Sam Chennai 1000

2 Ram Vellore 5000

6 Moha Vellore 3000

You might also like