100% found this document useful (1 vote)
127 views141 pages

Elective-I Advanced Database Management Systems: Unit Ii

The document discusses parallel databases. It describes parallel database architecture, types of parallelism including I/O parallelism, and comparison of inter-query and intra-query parallelism. It also covers parallel query evaluation and implementation issues related to parallel query evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
127 views141 pages

Elective-I Advanced Database Management Systems: Unit Ii

The document discusses parallel databases. It describes parallel database architecture, types of parallelism including I/O parallelism, and comparison of inter-query and intra-query parallelism. It also covers parallel query evaluation and implementation issues related to parallel query evaluation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 141

Elective-I Advanced Database Management

Systems

UNIT II

Introduction to Parallel databases, Parallel database


architecture, speedup, scale-up, I/O parallelism, Comparison
of Inter-query and Intra-query parallelism, parallel query
evaluation, implementation issues of Parallel query
evaluation.
Textbooks:
1. Korth, Sudarshan,Silberschatz, “Database System Concepts”, MacGraw Hill Publication, 2013.
2. Elmasari , Navathe, “Fundamentals of Database Systems”, Pearson, 2013.
3. Thomas Connolly, Carolyn Begg, “Database Systems: A Practical Approach to Design, Implementation
&
Management”, Pearson, 2013.
4. Michale Gertz, Sushil Jajodia, “Handbook of Database Security, Applications and Trends”, Springer,
2008.

Load Credit Total marks Sessional marks University marks Total


4 hrs (Theory) 5 100 20 80 100
1 hr (Tutorial)
Parallel Databases
Outline
 Introduction
 Architecture
 Automatic data partitioning
 Parallel Scan, Sorting and Aggregates
 Parallel Joins
 Dataflow Network for Joins
 Complex Parallel query plans
 Parallel query optimization
 Summary
Learning Outcomes

 At the end of this session, students will be able to


 Describe the need of parallel database and its goals
 Illustrate the parallel database architecture
.
DBMS: The || Success Story
 DBMSs are the most (only?) successful application of parallelism.
 Teradata, Tandem, Thinking Machines
 Every major DBMS vendor has some || server
 Workstation manufacturers now depend on || DB server sales.
 Reasons for success:
 Bulk-processing (= partition ||-ism).
 Natural pipelining.
 Inexpensive hardware can do the trick
 Users/app-programmers don’t need to think in ||
Parallel Database Introduction
 Some times the client server and centralized system is not much efficient to handle
huge amount of data with high data transfer rate
 The need to improve the efficiency gave birth to the concept of parallel database
 The parallel Database system improves performance of data processing using
multiple resources in parallel, like multiple CPU and Disks are used parallel
 It also performs many parallelization operations like data loading and Query
processing in parallel.
Parallel DBMS: Intro

 Parallelism is natural to DBMS processing


 Pipeline parallelism: many machines each doing one step in a multi-step
process.
 Partition parallelism: many machines doing the same thing to different
pieces of data.
 Both are natural in DBMS!

Any Any
Pipeline Sequential
Program
Sequential
Program

Sequential
Any Any
Partition Sequential
Sequential
Program
Sequential
Program

outputs split N ways, inputs merge M ways


Need of parallelism

 Parallel machines are becoming quite common and affordable


 Prices of microprocessors, memory and disks have dropped sharply
 Recent desktop computers features multiple processors and this trends is
projected to accelerate
Goal of parallelism
 Large scale parallel database systems are used increasingly for
 Storing large volumes of data
 Processing time consuming decision support queries
 Providing high throughput for transaction processing
Need of Parallel Databases

 Improved performance
 Improved availability of data
 Improved reliability
 Provided Distributed Access of data
Parallelism in Databases
 Data can be partitioned across multiple disks for parallel I/O.
 Individual relational operations (e.g., sort, join, aggregation) can be
executed in parallel
 data can be partitioned and each processor can work independently on its
own partition.
 Queries are expressed in high level language (SQL, translated to
relational algebra)
 makes parallelization easier.
 Different queries can be run in parallel with each other.
 Concurrency control takes care of conflicts.
 Thus, databases naturally lend themselves to parallelism.

12
Parallel Database Architecture
 Machines are physically close to each
other, e.g same server room
 Machines connects with dedicated
high speed LANs and switches
 Communication cost is assumed to be
small
 Can used shared memory, shared disk
or shared nothing architecture
Why Parallel Access To Data?

At 10 MB/s 1,000 x parallel


1.2 days to scan 1.5 minute to scan.
Ba
nd
1 Terabyte wi 1 Terabyte

dt
h
10 MB/s Parallelism:
divide a big problem
into many smaller ones
to be solved in parallel.
Parallel Database System Vs Distributed database System

Parallel Database System Distributed Database System


 It is implemented on Multiprocessor  DB is geographically separated
and comprises of single DB system  No sharing of resources
 Sharing of resources  There may be homo/heterogeneity of
 There id Symmetry and homogeneity sites
of sites
Types of Parallel Database Architectures
(Architecture Issue: Shared What?)

Shared Memory Shared Disk Shared Nothing


(SMP) (network)

CLIENTS CLIENTS CLIENTS

Processors
Memory

Easy to program Hard to program


Expensive to build Cheap to build
Difficult to scaleup Easy to scaleup
Sequent, SGI, Sun VMScluster, Sysplex Tandem, Teradata, SP2
What Systems Work This Way
(as of 9/1995)
Shared Nothing
Teradata: 400 nodes CLIENTS

Tandem: 110 nodes


IBM / SP2 / DB2: 128 nodes
Informix/SP2 48 nodes
ATT & Sybase ? nodes
CLIENTS
Shared Disk
Oracle 170 nodes
DEC Rdb 24 nodes

Shared Memory CLIENTS


Informix 9 nodes Processors
RedBrick ? nodes Memory
Types of Parallel Database Architectures
 Shared Memory System
 Shared Disk System
 Shared Nothing System
Shared Memory System
 It usage multiple processors which is attached to a global shared memory via
intercommunication channel or communication bus
 Shared memory system have large amount of cache memory at each
processors, so referencing of the shared memory is avoided
 If a processor performs a write operation to memory location, the data should
be updated or removed from that location
Advantages of Shared Memory System

 Data is easily accessible to any processor


 One processor can send message to other efficiently
Dis-advantages of Shared Memory System

 Waiting time of processes is increased due to number of processors


 Bandwidth problem
Shared Disk System
 Shared disk system usage multiple processors which one accessible to
multiple disk via intercommunication channel and every processor has local
memory
 Each processor has its own memory so that data sharing is efficient
 The system Built around this system are called as clusters
Advantages of Shared Disk System

 Fault Tolerance is achieved

Dis-advantages of Shared Disk System


 It has limited scalability as large amount of data travels through the
intercommunication channel
 If more processors are added, the existing processing gets slowed down
Shared Nothing System
 Each processor in the shared nothing system has its own local memory and
local disk
 Processors can communicate with each other through intercommunication
channel
 Any processors can acts as server to serve the data which is stored on local
disk
Advantages of Shared Nothing System
 Number of processors and disk can be connected as per the requirements in
shared nothing disk system
 It can support for many processors, which makes the system more scalable

Dis-advantages of Shared Nothing System


 Data partitioning is required
 Cost of communication for accessing local disk is much higher
I/O Parallelism
 Reduce the time required to retrieve relations from disk by partitioning the
relations on multiple disks.
 Horizontal partitioning – tuples of a relation are divided among many disks
such that each tuple resides on one disk.
 Partitioning techniques (number of disks = n):
Round-robin: It scan the relation in any order, Send the ith tuple inserted in the
relation to disk i mod n. It ensures that each disk will have Equal no. of tuples.
Hash partitioning:
 Choose one or more attributes as the partitioning attributes.
 Choose hash function h with range 0…n - 1
 Let i denote result of hash function h applied to the partitioning attribute value
of a tuple. If hash function returns I then tuple is placed to disk i.

26
Measuring Performance of a Database

 Throughput: The number of tasks that can be completed in a given time


interval
 Response Time: The amount of time it takes to complete a single task from
the time it is submitted
Different Types of DBMS ||-ism

 Intra-operator parallelism
 get all machines working to compute a given operation (scan, sort, join)
 Inter-operator parallelism
 each operator may run concurrently on a different site (exploits pipelining)
 Inter-query parallelism
 different queries run on different sites
 We’ll focus on intra-operator ||-ism
Single Processor, Single Disk System

Data Main Memory

Processor

Disk
Single Processor, Single Disk System

Main Memory
Data

Processor

Disk
Parallel Databases
 Parallel systems improve processing and I/O speeds by using multiple
processors and disks in parallel
Measuring Performance of Parallel Processing
Systems Ideal

(throughput)
Xact/sec.
 Speed-Up
 More resources means proportionally
less time for given amount of data.
degree of ||-ism
 Scale-Up
 If resources increased in proportion to
Ideal

(response time)
increase in data size, time is constant.

sec./Xact
degree of ||-ism
Speed-Up
Task
Task 1
1

Processor Processor Processor Processor


Processor

Time
Time Take
Take n (Tl)
n (Ts) Disk Disk Disk Disk
Disk

Output
Output

𝑇𝑠
 
𝑆𝑝𝑒𝑒𝑑𝑈𝑝 =
𝑇𝑙
Example – Speed Up
 If the original system took 60 seconds perform the task and the parallel
system with 3 parallel processors took 20seconds to complete the task then
 Speedup= 60/20=3

 Speedup increases with the number of parallel processors

Ideal Speedup Curve


Speedup

No. of processors
Scale-Up example
 Scale-up is the factor that expresses how more work can be done in the same
time period by a system n times larger
 If the original system can process 100 transactions in a given amount of time
and the parallel system can process 300 transactions in this amount of time,
then the value of scale-up would be equal to 3. That is 300/100=3
 Thrice as much as hardware can process thrice as much as data

Ideal Scale-up curve


(response time)

No. of Processors and transaction volume


Volume s =100
Scale-Up Volume l =300
Task Task Task
Task Task Task 1 2 300
1 2 100

Processor Processor Processor Processor


Processor

Time
Time Take
Take n (Tl)
n (Ts) Disk Disk Disk Disk
Disk

Task
Task 1
1

𝑇𝑠
 
𝑆𝑝𝑒𝑒𝑑𝑈𝑝 =
𝑇𝑙
Parallel Databases
 Introduction

 I/O Parallelism

 Interquery Parallelism

 Intraquery Parallelism

 Intraoperation Parallelism

 Interoperation Parallelism

 Design of Parallel Systems

37
Introduction
 Parallel machines are becoming quite common and affordable
 Prices of microprocessors, memory and disks have dropped sharply
 Recent desktop computers feature multiple processors and this trend is
projected to accelerate
 Databases are growing increasingly large
 Large volumes of transaction data are collected and stored for later analysis.
 Multimedia objects like images are increasingly stored in databases
 Large-scale parallel database systems increasingly used for:
 Storing large volumes of data
 Processing time-consuming decision-support queries
 Providing high throughput for transaction processing

38
Parallelism in Databases
 Data can be partitioned across multiple disks for parallel I/O.
 Individual relational operations (e.g., sort, join, aggregation) can be
executed in parallel
 data can be partitioned and each processor can work independently on its
own partition.
 Queries are expressed in high level language (SQL, translated to
relational algebra)
 makes parallelization easier.
 Different queries can be run in parallel with each other. Concurrency
control takes care of conflicts.
 Thus, databases naturally lend themselves to parallelism.

39
I/O Parallelism
 Reduce the time required to retrieve relations from disk by partitioning the
relations on multiple disks.
 Horizontal partitioning – tuples of a relation are divided among many disks
such that each tuple resides on one disk.
 Partitioning techniques (number of disks = n):
Round-robin: It scan the relation in any order, Send the ith tuple inserted in the
relation to disk i mod n. It ensures that each disk will have Equal no. of tuples.
Hash partitioning:
 Choose one or more attributes as the partitioning attributes.
 Choose hash function h with range 0…n - 1
 Let i denote result of hash function h applied to the partitioning attribute value
of a tuple. If hash function returns I then tuple is placed to disk i.

40
I/O Parallelism (Cont.)
 Range partitioning:
 Choose an attribute as the partitioning attribute.
 It distribute continuous attribute –value range to each disk.
 A partitioning vector [vo, v1, ..., vn-2] is chosen.
 Let v be the partitioning attribute value of a tuple. Tuples such that vi  vi+1
go to disk I + 1. Tuples with v < v0 go to disk 0 and tuples with v  vn-2 go to
disk n-1.
E.g., with a partitioning vector [5,11], a tuple with partitioning attribute value
of 2 will go to disk 0, a tuple with value 8 will go to disk 1, while a tuple
with value 20 will go to disk2.

41
Partitioning Example

Id Name Branch
1 Shyam Chennai
2 Ram Nagpur
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Nagpur
6 Mohan Nagpur
7 Rahul Chennai
Horizontal Partitioning
Id Name Branch
1 Shyam Chennai
2 Ram Nagpur
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Nagpur
6 Mohan Nagpur
7 Rahul Chennai

Id Name Branch Id Name Branch Id Name Branch

Disk 1 Disk 2 Disk 3


Basic Partitioning strategies

 Round Robin Partitioning


 List Partitioning
 Hash Partitioning
 Range Partitioning
Round Robin Partitioning

  
This strategy scans the relation in any order and sends the tuple to disk
number Di mod n
 The Round Robin scheme ensures an even distribution of tuple across disks;
that is, each disk has approx. the same number of tuples as the others
Round Robin Partitioning
Id Name Branch
1 Shyam Chennai
i - record number
n – disk number 2 Ram Nagpur
3 Tom Mumbai
i mod n is used for 4 Chris Mumbai
splitting records
5 Jeff Nagpur
6 Mohan Nagpur
7 Rahul Chennai

Disk 1
Id Name Branch Id Name Branch Id Name Branch
1 Shyam Chennai 2 Ram Nagpur 3 Tom Mumbai
4 Chris Mumbai 5 Jeff Nagpur 6 Mohan Nagpur
7 Rahul Chennai
Disk 2 Disk 3
Dis-advantages

 Only suitable for full table scan


 Not suitable for point queries or range queries
 Select * from employee where name = ‘sam’;
 Select * from employee where id between 3 and 5;
List Partitioning

 List Partitioning enables you to explicitly how rows map to partitions by


specifying a list of discrete values for the partitioning key in the description
for each partition.
 For a table with a Branch column as the partitioning key, the Maharashtra
partition might contain values Mumbai and Nagpur, the Tamilnadu might
contain Chennai.
List Partitioning
Id Name Branch
1 Shyam Chennai
2 Ram Nagpur
3 Tom Mumbai
4 Chris Mumbai
5 Jeff Nagpur
6 Mohan Nagpur
7 Rahul Chennai

Maharashtra
Id Name Branch Tamilnadu
2 Ram Nagpur
Id Name Branch
3 Tom Mumbai
1 Shyam Chennai
4 Chris Mumbai
7 Rahul Chennai
5 Jeff Nagpur
6 Mohan Nagpur Disk 2
Disk 1
Oracle Implementation

Create table Employee_Branch(id number, name varchar2(10), branch


varchar2(10), income number)
Partition by list(branch)
(
partition Maharashtra values(‘Mumbai’, ‘Nagpur’),
partition Tamilnadu values(‘Chennai’, ‘Vellore’)’
partition unknown_branch values(default)
);
Hash Partitioning

 Hash portioning maps data to partition based on a hashing algorithm that


applies to the partitioning key that you identify.
 The hashing algorithm evenly distributes rows among partitions, giving
partitions approx. the same size.
Sample

Partition 1

Partition
Key mod n

Hash
Partition
Function
Key Partition 2

Partition 3
Range Partitioning

 Range partitioning strategy partitions the data based on the partitioning


attribute values.
 We need to find set of range vectors on which we are about to partition.
 For example, the record with salary range 100 to 5000 will be in disk 1, 5001
to 10000 will be in disk 2, and so on.
Range Partitioning
Id Name Branch Salary
1 Shyam Chennai 1000
Partition key-
2 Ram Nagpur 5000 Salary
3 Tom Mumbai 40000
4 Chris Mumbai 20000
5 Jeff Nagpur 28000
6 Mohan Nagpur 3000
7 Rahul Chennai 38000

Disk 2
(10000<30000)
Disk 1 (<10000) Disk 3 (>30000)
Id Name Branch Salary Id Name Branch Salary Id Name Branch Salary
1 Shyam Chennai 1000 4 Chris Mumbai 20000 3 Tom Mumbai 40000
2 Ram Nagpur 5000 5 Jeff Nagpur 28000 7 Rahul Chennai 38000
6 Mohan Nagpur 3000
Partitioning Techniques and Their Support for
Different Types of Access
 Round Robin
 Useful for reading entire relation
 Point queries and Range queries should access all n disks and is complicated to
process
 Hash Partitioning
 Good for point or range queries on the partitioning attribute
 Not good for point or range queries on non partitioning attribute
 Range Partitioning
 Well suited for range and point queries on the partitioning attributes
Handling of Skew

 Skew – Some partition gets more tuple and some partition gets lesser tuples
 Two types of Skew
 Attribute Skew
 Partition Skew
Attribute value Skew
Id Name Branch
1 Shyam Chennai
2 Ram Nagpur
3 Tom Mumbai
4 Chris Pune
5 Jeff Nagpur
6 Mohan Nagpur
7 Rahul Mumbai
Maharashtra Partition
Id Name Branch
2 Ram Nagpur
Tamilnadu
3 Tom Mumbai
Id Name Branch
4 Chris Pune
1 Shyam Chennai
5 Jeff Nagpur
6 Mohan Nagpur
7 Rahul Mumbai Disk 0 Disk 1
Partition Skew
Id Name Branch Salary
1 Shyam Chennai 1000
2 Ram Nagpur 5000 Partition key-
3 Tom Mumbai 40000 Salary
4 Chris Mumbai 20000
5 Jeff Nagpur 28000
6 Mohan Nagpur 3000
7 Rahul Chennai 38000
Disk 2
(1000<30000)
Id Name Branch Salary
Disk 1 (<1000) 4 Chris Mumbai 20000 Disk 3 (>30000)
Id Name Branch Salary 5 Jeff Nagpur 28000 Id Name Branch Salary
1 Shyam Chennai 1000 3 Tom Mumbai 40000
2 Ram Nagpur 5000 7 Rahul Chennai 38000
6 Mohan Nagpur 3000
Comparison of Partitioning Techniques
 Evaluate how well partitioning techniques support the following
types of data access:
1.Scanning the entire relation.
2.Locating a tuple associatively – point queries.
i.e. tuples that have specified value for specified attribute.
 E.g., emp_name=“Ram”.
3.Locating all tuples such that the value of a given attribute lies
within a specified range – range queries.
 E.g., 10000 < salary < 25000.

59
Comparison of Partitioning Techniques
Round robin:
(Cont.)
 Advantages
 Best suited for sequential scan of entire relation on each query.
 All disks have almost an equal number of tuples; retrieval work is thus well
balanced between disks.
 Range queries are difficult to process
 No clustering -- tuples are scattered across all disks

60
Comparison of Partitioning
Hash partitioning:
Techniques(Cont.)
 Good for sequential access
 Assuming hash function is good, and partitioning attributes form a key,
tuples will be equally distributed between disks
 Retrieval work is then well balanced between disks.
 Good for point queries on partitioning attribute
 Can lookup single disk, leaving others available for answering other queries.
 Index on partitioning attribute can be local to disk, making lookup and
update more efficient
 No clustering, so difficult to answer range queries

61
Comparison of Partitioning Techniques (Cont.)
 Range partitioning:
 Provides data clustering by partitioning attribute value.
 Good for sequential access
 Good for point queries on partitioning attribute: only one disk needs to
be accessed.
 For range queries on partitioning attribute, one to a few disks may
need to be accessed
 Remaining disks are available for other queries.
 Good if result tuples are from one to a few blocks.
 If many blocks are to be fetched, they are still fetched from one to a few
disks, and potential parallelism in disk access is wasted
 Example of execution skew.

62
Partitioning a Relation across
Disks
 If a relation contains only a few tuples which will fit into a single disk
block, then assign the relation to a single disk.
 Large relations are preferably partitioned across all the available disks.
 If a relation consists of m disk blocks and there are n disks available in
the system, then the relation should be allocated min(m,n) disks.

63
Handling of Skew
 The distribution of tuples to disks may be skewed — that is, some disks
have many tuples, while others may have fewer tuples.
 Types of skew:
 Attribute-value skew.
 Some values appear in the partitioning attributes of many tuples; all the tuples
with the same value for the partitioning attribute end up in the same partition.
 Can occur with range-partitioning and hash-partitioning.
 Partition skew.
 With range-partitioning, badly chosen partition vector may assign too many tuples
to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.

64
Handling Skew in Range-Partitioning
 To create a balanced partitioning vector (assuming partitioning attribute
forms a key of the relation):
 Sort the relation on the partitioning attribute.
 Construct the partition vector by scanning the relation in sorted order as follows.
 After every 1/nth of the relation has been read, the value of the partitioning attribute
of the next tuple is added to the partition vector.
 n denotes the number of partitions to be constructed.
 Duplicate entries or imbalances can result if duplicates are present in
partitioning attributes.
 Alternative technique based on histograms used in practice

65
Handling Skew using Histograms
 Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion
 Assume uniform distribution within each range of the histogram
 Histogram can be constructed by scanning relation, or sampling (blocks
containing) tuples of the relation

66
Handling Skew Using Virtual Processor
Partitioning
 Skew in range partitioning can be handled elegantly using virtual
processor partitioning:
 create a large number of partitions (say 10 to 20 times the number of
processors)
 Assign virtual processors to partitions either in round-robin fashion or based on
estimated cost of processing each virtual partition
 Basic idea:
 If any normal partition would have been skewed, it is very likely the skew is
spread over a number of virtual partitions
 Skewed virtual partitions get spread across a number of processors, so work gets
distributed evenly!

67
Automatic Data Partitioning
Partitioning a table:
Range Hash Round Robin

A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z

Good for equijoins, Good for equijoins Good to spread load


range queries
group-by
Shared disk and memory less sensitive to partitioning,
Shared nothing benefits from "good" partitioning
Parallel Scans

 Scan in parallel, and merge.


 Selection may not require all sites for range or hash partitioning.
 Indexes can be built at each partition.
Scale-up
Parallel Sorting

 Current records:
 8.5 Gb/minute, shared-nothing; Datamation benchmark in 2.41 secs (UCB
students http://now.cs.berkeley.edu/NowSort/)
 Idea:
 Scan in parallel, and range-partition as you go.
 As tuples come in, begin “local” sorting on each
 Resulting data is sorted, and range-partitioned.
 Problem: skew!
 Solution: “sample” the data at start to determine partition points.
Parallel Joins

 Nested loop:
 Each outer tuple must be compared with each inner tuple that might join.
 Easy for range partitioning on join cols, hard otherwise!
 Sort-Merge (or plain Merge-Join):
 Sorting gives range-partitioning.
 Merging partitioned tables is local.
Parallel Hash Join
Partitions
OUTPUT
1 1

Phase 1
INPUT 2
hash 2
Original Relations ... function
h
(R then S) B-1
B-1
Disk B main memory buffers Disk

 In first phase, partitions get distributed to different sites:


 A good hash function automatically distributes work evenly!
 Do second phase at each site.
 Almost always the winner for equi-join.
Dataflow Network for || Join

 Good use of split/merge makes it easier to build parallel versions of


sequential join code.
Complex Parallel Query Plans

 Complex Queries: Inter-Operator parallelism


 Pipelining between operators:
 note that sort and phase 1 of hash-join block the pipeline!!
 Bushy Trees

Sites 1-8
Sites 1-4 Sites 5-8

A B R S
N´M-way Parallelism

Merge Merge Merge

Sort Sort Sort Sort Sort

Join Join Join Join Join

A...E F...J K...N O...S T...Z

N inputs, M outputs, no bottlenecks.

Partitioned Data
Partitioned and Pipelined Data Flows
Observations

 It is relatively easy to build a fast parallel query executor


 It is hard to write a robust and world-class parallel query optimizer.
 There are many tricks.
 One quickly hits the complexity barrier.
 Still open research!
Parallel Query Optimization

 Common approach: 2 phases


 Pick best sequential plan (System R algorithm)
 Pick degree of parallelism based on current system parameters.
 “Bind” operators to processors
 Use query tree.
What’s Wrong With That?

 Best serial plan != Best || plan! Why?


 Trivial counter-example:
 Table partitioned with local secondary index at two nodes
 Range query: all of node 1 and 1% of node 2.
 Node 1 should do a scan of its partition.
 Node 2 should use secondary index.

Table Index
Scan Scan

A..M N..Z
Examples of Parallel Databases
Parallel DBMS Summary

 ||-ism natural to query processing:


 Both pipeline and partition ||-ism!
 Shared-Nothing vs. Shared-Mem
 Shared-disk too, but less standard
 Shared-mem easy, costly. Doesn’t scaleup.
 Shared-nothing cheap, scales well, harder to implement.
 Intra-op, Inter-op, & Inter-query ||-ism all possible.
|| DBMS Summary, cont.

 Data layout choices important


 Most DB operations can be done partition-||
 Sort.
 Sort-merge join, hash-join.
 Complex plans.
 Allow for pipeline-||ism, but sorts, hashes block the pipeline.
 Partition ||-ism achieved via trees.
|| DBMS Summary, cont.

 Hardest part of the equation: optimization.


 2-phase optimization simplest, but can be ineffective.
 More complex schemes still at the research stage.
 We haven’t said anything about Xacts, logging.
 Easy in shared-memory architecture.
 Takes some care in shared-nothing.
5 min Quiz

 What is primary reason of using parallel DBMS?


 List two reasons of success of || dbms ?
 In N* M parallelism what does N and M stand for ?
 Is optimization the hardest part in || DBMS (Yes/No)?
Parallel Databases
 Introduction

 I/O Parallelism

 Interquery Parallelism

 Intraquery Parallelism

 Intraoperation Parallelism

 Interoperation Parallelism

 Design of Parallel Systems

86
Introduction
 Parallel machines are becoming quite common and affordable
 Prices of microprocessors, memory and disks have dropped sharply
 Recent desktop computers feature multiple processors and this trend is
projected to accelerate
 Databases are growing increasingly large
 Large volumes of transaction data are collected and stored for later analysis.
 Multimedia objects like images are increasingly stored in databases
 Large-scale parallel database systems increasingly used for:
 Storing large volumes of data
 Processing time-consuming decision-support queries
 Providing high throughput for transaction processing

87
Parallelism in Databases
 Data can be partitioned across multiple disks for parallel I/O.
 Individual relational operations (e.g., sort, join, aggregation) can be
executed in parallel
 data can be partitioned and each processor can work independently on its
own partition.
 Queries are expressed in high level language (SQL, translated to
relational algebra)
 makes parallelization easier.
 Different queries can be run in parallel with each other. Concurrency
control takes care of conflicts.
 Thus, databases naturally lend themselves to parallelism.

88
I/O Parallelism
 Reduce the time required to retrieve relations from disk by partitioning the
relations on multiple disks.
 Horizontal partitioning – tuples of a relation are divided among many disks
such that each tuple resides on one disk.
 Partitioning techniques (number of disks = n):
Round-robin: It scan the relation in any order, Send the ith tuple inserted in the
relation to disk i mod n. It ensures that each disk will have Equal no. of tuples.
Hash partitioning:
 Choose one or more attributes as the partitioning attributes.
 Choose hash function h with range 0…n - 1
 Let i denote result of hash function h applied to the partitioning attribute value
of a tuple. If hash function returns I then tuple is placed to disk i.

89
I/O Parallelism (Cont.)
 Range partitioning:
 Choose an attribute as the partitioning attribute.
 It distribute continuous attribute –value range to each disk.
 A partitioning vector [vo, v1, ..., vn-2] is chosen.
 Let v be the partitioning attribute value of a tuple. Tuples such that vi  vi+1
go to disk I + 1. Tuples with v < v0 go to disk 0 and tuples with v  vn-2 go to
disk n-1.
E.g., with a partitioning vector [5,11], a tuple with partitioning attribute value
of 2 will go to disk 0, a tuple with value 8 will go to disk 1, while a tuple
with value 20 will go to disk2.

90
Comparison of Partitioning Techniques
 Evaluate how well partitioning techniques support the following
types of data access:
1.Scanning the entire relation.
2.Locating a tuple associatively – point queries.
i.e. tuples that have specified value for specified attribute.
 E.g., emp_name=“Ram”.
3.Locating all tuples such that the value of a given attribute lies
within a specified range – range queries.
 E.g., 10000 < salary < 25000.

91
Comparison of Partitioning Techniques
Round robin:
(Cont.)
 Advantages
 Best suited for sequential scan of entire relation on each query.
 All disks have almost an equal number of tuples; retrieval work is thus well
balanced between disks.
 Range queries are difficult to process
 No clustering -- tuples are scattered across all disks

92
Comparison of Partitioning
Hash partitioning:
Techniques(Cont.)
 Good for sequential access
 Assuming hash function is good, and partitioning attributes form a key,
tuples will be equally distributed between disks
 Retrieval work is then well balanced between disks.
 Good for point queries on partitioning attribute
 Can lookup single disk, leaving others available for answering other queries.
 Index on partitioning attribute can be local to disk, making lookup and
update more efficient
 No clustering, so difficult to answer range queries

93
Comparison of Partitioning Techniques (Cont.)
 Range partitioning:
 Provides data clustering by partitioning attribute value.
 Good for sequential access
 Good for point queries on partitioning attribute: only one disk needs to
be accessed.
 For range queries on partitioning attribute, one to a few disks may
need to be accessed
 Remaining disks are available for other queries.
 Good if result tuples are from one to a few blocks.
 If many blocks are to be fetched, they are still fetched from one to a few
disks, and potential parallelism in disk access is wasted
 Example of execution skew.

94
Partitioning a Relation across
Disks
 If a relation contains only a few tuples which will fit into a single disk
block, then assign the relation to a single disk.
 Large relations are preferably partitioned across all the available disks.
 If a relation consists of m disk blocks and there are n disks available in
the system, then the relation should be allocated min(m,n) disks.

95
Handling of Skew
 The distribution of tuples to disks may be skewed — that is, some disks
have many tuples, while others may have fewer tuples.
 Types of skew:
 Attribute-value skew.
 Some values appear in the partitioning attributes of many tuples; all the tuples
with the same value for the partitioning attribute end up in the same partition.
 Can occur with range-partitioning and hash-partitioning.
 Partition skew.
 With range-partitioning, badly chosen partition vector may assign too many tuples
to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.

96
Handling Skew in Range-Partitioning
 To create a balanced partitioning vector (assuming partitioning attribute
forms a key of the relation):
 Sort the relation on the partitioning attribute.
 Construct the partition vector by scanning the relation in sorted order as follows.
 After every 1/nth of the relation has been read, the value of the partitioning attribute
of the next tuple is added to the partition vector.
 n denotes the number of partitions to be constructed.
 Duplicate entries or imbalances can result if duplicates are present in
partitioning attributes.
 Alternative technique based on histograms used in practice

97
Handling Skew using Histograms
 Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion
 Assume uniform distribution within each range of the histogram
 Histogram can be constructed by scanning relation, or sampling (blocks
containing) tuples of the relation

98
Handling Skew Using Virtual Processor
Partitioning
 Skew in range partitioning can be handled elegantly using virtual
processor partitioning:
 create a large number of partitions (say 10 to 20 times the number of
processors)
 Assign virtual processors to partitions either in round-robin fashion or based on
estimated cost of processing each virtual partition
 Basic idea:
 If any normal partition would have been skewed, it is very likely the skew is
spread over a number of virtual partitions
 Skewed virtual partitions get spread across a number of processors, so work gets
distributed evenly!

99
Query Parallelism
Query Parallelism

 In parallel database system to improve the performance of the system,


parallelism is used
 It is achieved through query parallelism
 The transaction throughput is increased by parallel execution for one query
or more queries
Forms of query parallelism
Inter-query Parallelism
 “Parallelism among queries”
 Different queries or transactions are executed in parallel with one another
 Main aim: scaling up transaction processing system

Query Result
Processor 1
1 1

Query Result
2 Processor 2 2

Query
n Result
Processor n
n
Intra-query Parallelism

 “Parallelism within a query”


 Execution of a single query in parallel on multiple processors and disks
 Main aim Speeding up long- running queries
Sub- Sub
query Processor 1 Result
1.1 1.1
Sub- Sub
Query Result
query Result
1 Processor 2 1
1.2 1.2

Subq-
uery Sub
1.n Processor n Result
1.n
Intra query parallelism
 Execution of single query can be parallelized in two ways:

Intra-operation Parallelism Inter-operation parallelism


 Each individual operation in a  Different operations in a query
query is parallelized expressions are executed in
parallel
 Example : parallel sort. Parallel
search  Example Simultaneous sorting and
searching techniques
 A single operation sorting is
parallelized  The operations sorting and
searching are parallelized
Can you guess the type of parallel query
execution for following figure?
Intra-operation Parallelism

Partitioned Parallelism
 Parallelism due to the data being
partitioned
 The degree of parallelism is
increased based on the large
number of record in a table
Intra-operation Parallelism
Inter-operation Parallelism
Within a same query or transaction, different operations are concurrently
executing

Pipeline Parallelism Independent Parallelism


 Output record of one operation A  Multiple operations in a query that
and consumed by a second do not depend on one another are
operation B without completing the executed in parallel
first operation A  Does not provide a high degree of
 Like an assembly line, multiple parallelism
operations are executing
 Only useful for small number of
processors
I. Interquery Parallelism
 Queries/transactions execute in parallel with one another.
 Increases transaction throughput; used primarily to scale up a
transaction processing system to support a larger number of
transactions per second.
 Easiest form of parallelism to support, particularly in a shared-memory
parallel database, because even sequential database systems support
concurrent processing.
 More complicated to implement on shared-disk or shared-nothing
architectures
 Locking and logging must be coordinated by passing messages between
processors.
 Data in a local buffer may have been updated at another processor.
 Cache-coherency has to be maintained — reads and writes of data in buffer
must find latest version of data.

112
Cache Coherency Protocol
 Example of a cache coherency protocol for shared disk systems:
 Before reading/writing to a page, the page must be locked in shared/exclusive
mode.
 On locking a page, the page must be read from disk
 Before unlocking a page, the page must be written to disk if it was modified.
 More complex protocols with fewer disk reads/writes exist.
 Cache coherency protocols for shared-nothing systems are similar. Each
database page is assigned a home processor. Requests to fetch the page or
write it to disk are sent to the home processor.

113
II. Intraquery Parallelism
1. Intraoperation parallelism:
a) Parallel Sort
i. Range –partitioning Sort
ii. Parallel External Sort Merge
b) Parallel Join
i. Partitioned Join
ii. Fragment & Replicate Joins
iii. Partitioned Parallel Hash Join
iv. Parallel Nested - Loop Join

2. Interoperation parallelism

a) Pipeline parallelism

b) Independent parallelism 114


II. Intraquery Parallelism
 Execution of a single query in parallel on multiple processors/disks;
important for speeding up long-running queries.
 Two complementary forms of intraquery parallelism :
 Intraoperation Parallelism – parallelize the execution of each individual
operation in the query.
 Interoperation Parallelism – execute the different operations in a query
expression in parallel.
 The first form scales better with increasing parallelism because
the number of tuples processed by each operation is typically more than
the number of operations in a query

115
Parallel Processing of Relational Operations
 Our discussion of parallel algorithms assumes:
 read-only queries
 shared-nothing architecture
 n processors, P0, ..., Pn-1, and n disks D0, ..., Dn-1, where disk Di is associated
with processor Pi.
 If a processor has multiple disks they can simply simulate a single disk
Di.
 Shared-nothing architectures can be efficiently simulated on shared-
memory and shared-disk systems.
 Algorithms for shared-nothing systems can thus be run on shared-memory
and shared-disk systems.
 However, some optimizations may be possible.

116
a. Parallel Sort
i) Range-Partitioning Sort
 Choose processors P0, ..., Pm, where m  n -1 to do sorting.
 Create range-partition vector with m entries, on the sorting attributes
 Redistribute the relation using range partitioning
 all tuples that lie in the ith range are sent to processor Pi
 Pi stores the tuples it received temporarily on disk Di.
 This step requires I/O and communication overhead.
 Each processor Pi sorts its partition of the relation locally.
 Each processors executes same operation (sort) in parallel with other
processors, without any interaction with the others (data parallelism).
 Final merge operation is trivial: range-partitioning ensures that, for 1 j
m, the key values in processor Pi are all less than the key values in Pj.

117
Parallel Sort (Cont.)
ii) Parallel External Sort-Merge
 Assume the relation has already been partitioned among disks D0, ..., Dn-1 (in
whatever manner).
 Each processor Pi locally sorts the data on disk Di.
 The sorted runs on each processor are then merged to get the final sorted
output.
 Parallelize the merging of sorted runs as follows:
 The sorted partitions at each processor Pi are range-partitioned across the
processors P0, ..., Pm-1.
 Each processor Pi performs a merge on the streams as they are received, to get a
single sorted run.
 The sorted runs on processors P0,..., Pm-1 are concatenated to get the final result.

118
b. Parallel Join
 The join operation requires pairs of tuples to be tested to see if they
satisfy the join condition, and if they do, the pair is added to the
join output.

 Parallel join algorithms attempt to split the pairs to be tested over


several processors.

 Each processor then computes part of the join locally.

 In a final step, the results from each processor can be collected


together to produce the final result.

119
i) Partitioned Join
 For equi-joins and natural joins, it is possible to partition the two input
relations across the processors, and compute the join locally at each
processor.
 Let r and s be the input relations, and we want to compute
r r.A=s.B s.
 r and s each are partitioned into n partitions, denoted r0, r1, ..., rn-1 and s0,
s1, ..., sn-1.
 Can use either range partitioning or hash partitioning.
 r and s must be partitioned on their join attributes r.A and s.B), using the
same range-partitioning vector or hash function.
 Partitions ri and si are sent to processor Pi,
 Each processor Pi locally computes ri ri.A=si.B si. Any of the standard join
methods can be used.

120
Partitioned Join (Cont.)

121
ii) Fragment-and-Replicate Join
 Partitioning not possible for some join conditions
 e.g., non-equijoin conditions, such as r.A > s.B.
 For joins were partitioning is not applicable, parallelization can be
accomplished by fragment and replicate technique
 Depicted on next slide
 Special case – asymmetric fragment-and-replicate: Steps :
1. One of the relations, say r, is partitioned; any partitioning technique can
be used.
2. The other relation, s, is replicated across all the processors.
3. Processor Pi then locally computes the join of ri with all of s using any join
technique.

122
Depiction of Fragment-and-Replicate Joins

123
Fragment-and-Replicate Join (Cont.)
 General case: reduces the sizes of the relations at each processor.
 r is partitioned into n partitions,r0, r1, ..., r n-1 & s is partitioned into m
partitions, s0, s1, ..., sm-1.
 Any partitioning technique may be used.
 There must be at least m * n processors.
 Label the processors as
P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
 Pi,j computes the join of ri with sj. In order to do so, ri is replicated to Pi,0,
Pi,1, ..., Pi,m-1, while si is replicated to P0,i, P1,i, ..., Pn-1,i
 Any join technique can be used at each processor Pi,j.

124
Fragment-and-Replicate Join (Cont.)
 Both versions of fragment-and-replicate work with any join condition, since
every tuple in r can be tested with every tuple in s.
 Usually has a higher cost than partitioning, since one of the relations (for
asymmetric fragment-and-replicate) or both relations (for general fragment-
and-replicate) have to be replicated.
 Sometimes asymmetric fragment-and-replicate is preferable even though
partitioning could be used.
 E.g., say s is small and r is large, and already partitioned. It may be cheaper to
replicate s across all processors, rather than repartition r and s on the join
attributes.

125
iii) Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
 Assume s is smaller than r and therefore s is chosen as the build relation.
1. A hash function h1 takes the join attribute value of each tuple in s and
maps this tuple to one of the n processors.
 Each processor Pi reads the tuples of s that are on its disk Di, and sends
each tuple to the appropriate processor based on hash function h1. Let si
denote the tuples of relation s that are sent to processor Pi.
2. As tuples of relation s are received at the destination processors, they are
partitioned further using another hash function, h2, which is used to
compute the hash-join locally. (Cont.)

126
Partitioned Parallel Hash-Join (Cont.)
3. Once the tuples of s have been distributed, the larger relation r is
redistributed across the m processors using the hash function h1
 Let ri denote the tuples of relation r that are sent to processor Pi.
As the r tuples are received at the destination processors, they are
repartitioned using the function h2
(just
as the probe relation is partitioned in the sequential hash-join
algorithm).
4. Each processor Pi executes the build and probe phases of the hash-
join algorithm on the local partitions ri and si of r and s to produce a
partition of the final result of the hash-join.
Note: Hash-join optimizations can be applied to the parallel case
 e.g., the hybrid hash-join algorithm can be used to cache some of
the incoming tuples in memory and avoid the cost of writing them
and reading them back in.
127
iv) Parallel Nested-Loop Join
 Assume that
 relation s is much smaller than relation r and that r is stored by partitioning.
 there is an index on a join attribute of relation r at each of the partitions of
relation r.
 Use asymmetric fragment-and-replicate, with relation s being replicated,
and using the existing partitioning of relation r.
 Each processor Pj where a partition of relation s is stored reads the tuples
of relation s stored in Dj, and replicates the tuples to every other
processor Pi.
 At the end of this phase, relation s is replicated at all sites that store tuples of
relation r.
 Each processor Pi performs an indexed nested-loop join of relation s with
the ith partition of relation r.

128
Other Relational Operations
1. Selection (r)
 If  is of the form ai = v, where ai is an attribute and v a value.
 If r is partitioned on ai the selection is performed at a single processor.
 If  is of the form l <= ai <= u (i.e.,  is a range selection) and the relation
has been range-partitioned on ai
 Selection is performed at each processor whose partition overlaps with the
specified range of values.
 In all other cases: the selection is performed in parallel at all the
processors.

129
Other Relational Operations (Cont.)
2. Duplicate elimination
 Perform by using either of the parallel sort techniques
 eliminate duplicates as soon as they are found during sorting.
 Can also partition the tuples (using either range- or hash- partitioning) and
perform duplicate elimination locally at each processor.

3. Projection
 Projection without duplicate elimination can be performed as tuples are read in
from disk in parallel.
 If duplicate elimination is required, any of the above duplicate elimination
techniques can be used.

130
Other Relational Operations (Cont.)
4. Grouping or Aggregation:
 Partition the relation on the grouping attributes and then compute the
aggregate values locally at each processor.
 Can reduce cost of transferring tuples during partitioning by partly
computing aggregate values before partitioning.
 Consider the sum aggregation operation:
 Perform aggregation operation at each processor Pi on those tuples stored on
disk Di
 results in tuples with partial sums at each processor.
 Result of the local aggregation is partitioned on the grouping attributes, and the
aggregation performed again at each processor Pi to get the final result.
 Fewer tuples need to be sent to other processors during partitioning.

131
Cost of Parallel Evaluation of Operations
 If there is no skew in the partitioning, and there is no overhead due
to the parallel evaluation, expected speed-up will be 1/n
 If skew and overheads are also to be taken into account, the time
taken by a parallel operation can be estimated as
Tpart + Tasm + max (T0, T1, …, Tn-1)
 Tpart is the time for partitioning the relations
 Tasm is the time for assembling the results
 Ti is the time taken for the operation at processor Pi
 this needs to be estimated taking into account the skew, and the time wasted
in contentions.

132
Cost of Parallel Evaluation of Operations
 For calculating cost of parallel Evaluation of operation

We have to use following cost :

1. Start –up cost

2. Skew

3. Contention for resources

4. Cost of assembling

133
2. Interoperation parallelism

a) Pipeline Parallelism

b) Independent Parallelism

134
a) Pipelined parallelism
 Pipelined parallelism
 Here the output tuple of one operation A are consumed by second operation.

 Pipelining is basically used for Sequential Access.

 Here it is possible to run operation A & B simultaneously on different


processors .

 So that B consumes tuples in parallel with A .

135
a) Pipelined parallelism
 Pipelined parallelism
 Consider a join of four relations
 r1 r2 r3 r4

 Set up a pipeline that computes the three joins in parallel


 Let P1 be assigned the computation of
temp1 = r1 r2

 And P2 be assigned the computation of temp2 = temp1 r3

 And P3 be assigned the computation of temp2 r4

 Each of these operations can execute in parallel, sending result tuples it


computes to the next operation even as it is computing further results
 Provided a pipelineable join evaluation algorithm (e.g. indexed nested loops join) is
used

136
Factors Limiting Utility of Pipeline Parallelism
 Pipeline parallelism is useful since it avoids writing intermediate
results to disk
 Useful with small number of processors, but does not scale up well
with more processors. One reason is that pipeline chains do not attain
sufficient length.
 Cannot pipeline relational operators which do not produce output until
all inputs have been accessed (e.g. aggregate and sort) 
 Little speedup is obtained for the frequent cases of skew in which one
operator's execution cost is much higher than the others.

137
b. Independent Parallelism
 Here operations in query expression that does not depend on one
another can be executed in parallel
 Consider a join of four relations

r1 r2 r3 r4
 Let P1 be assigned the computation of
temp1 = r1 r2


And P2 be assigned the computation of temp2 = r3 r4
 And P3 be assigned the computation of temp1 temp2
 P1 and P2 can work independently in parallel
 P3 has to wait for input from P1 and P2
 Can pipeline output of P1 and P2 to P3, combining independent parallelism and pipelined parallelism
 Does not provide a high degree of parallelism
 useful with a lower degree of parallelism.
 less useful in a highly parallel system,
138
Query Optimization
 Query optimization in parallel databases is significantly more complex
than query optimization in sequential databases.
 Cost models are more complicated, since we must take into account
partitioning costs and issues such as skew and resource contention.
 When scheduling execution tree in parallel system, must decide:
 How to parallelize each operation and how many processors to use for it.
 What operations to pipeline, what operations to execute independently in
parallel, and what operations to execute sequentially, one after the other.
1. Determining the amount of resources to allocate for each operation is a
problem.
 E.g., allocating more processors than optimal can result in high communication
overhead.
2. Long pipelines should be avoided as the final operation may wait a lot for
inputs, while holding precious resources

139
Query Optimization (Cont.)
 The number of parallel evaluation plans from which to choose from is much larger than
the number of sequential evaluation plans.
 Therefore heuristics plans are needed while optimization
 Two alternative heuristics for choosing parallel plans:
 No pipelining and inter-operation pipelining; just parallelize every operation across
all processors.
 Finding best plan is now much easier --- use standard optimization technique,
but with new cost model
 Volcano parallel database popularize the exchange-operator model
 exchange operator is introduced into query plans to partition and distribute
tuples
 each operation works independently on local data on each processor, in
parallel with other copies of the operation
 First choose most efficient sequential plan and then choose how best to parallelize
the operations in that plan.
 Can explore pipelined parallelism as an option
 Choosing a good physical organization (partitioning technique) is important to speed
up queries. 140
Design of Parallel Systems
Some issues in the design of parallel systems:
 Parallel loading of data from external sources is needed in order to handle
large volumes of incoming data.
1. Resilience to failure of some processors or disks.
 Probability of some disk or processor failing is higher in a parallel system.
 Operation (perhaps with degraded performance) should be possible in spite of
failure.
 Redundancy achieved by storing extra copy of every data item at another
processor.
 Eg: Teradata & Informix XPS

141
Design of Parallel Systems (Cont.)
2. On-line reorganization of data and schema changes must be supported.
 For example, index construction on terabyte databases can take hours or days
even on a parallel system.
 Need to allow other processing (insertions/deletions/updates) to be performed on
relation even as index is being constructed.
 Basic idea: index construction tracks changes and “catches up” on changes at
the end.
 Also need support for on-line repartitioning and schema changes
(executed concurrently with other processing).
 Eg: Compaq Himalaya

142

You might also like