0% found this document useful (0 votes)
56 views19 pages

Parallel Databases Pres

Parallel databases allow multiple machines to access a single database simultaneously for increased performance. They are increasingly used to store large volumes of data and handle high transaction volumes. Queries can be parallelized by partitioning data across disks and executing relational operations concurrently. There are different architectures for parallel databases, including shared memory, shared disk, and shared nothing approaches. Performance is measured based on speedup and scaleup.

Uploaded by

uzaifa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views19 pages

Parallel Databases Pres

Parallel databases allow multiple machines to access a single database simultaneously for increased performance. They are increasingly used to store large volumes of data and handle high transaction volumes. Queries can be parallelized by partitioning data across disks and executing relational operations concurrently. There are different architectures for parallel databases, including shared memory, shared disk, and shared nothing approaches. Performance is measured based on speedup and scaleup.

Uploaded by

uzaifa
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 19

Parallel Databases

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Introduction

 A variety of hardware architectures allow multiple


computers to share and access to data. A parallel
database can allow access to a single database by
users on multiple machines, with increased
performance.

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Purpose
 Databases are growing increasingly large
 Large volumes of transaction data are collected and stored for
later analysis.

 Multimedia objects like images are increasingly stored in


databases

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Large-scale parallel database

 Large-scale parallel database systems increasingly


used for:
 storing large volumes of data
 providing high throughput for transaction processing

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Queries

 Queries are expressed in high level language (SQL)


 makes parallelization easier.

 Different queries can be run in parallel with each other.


Concurrency control takes care of conflicts.

 Thus, databases naturally provide themselves to


parallelism.

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallelism in Databases
 Data can be partitioned across multiple disks for
parallel I/O.
 Individual relational operations

(e.g., sort, join, aggregation)


can be executed in parallel
database.

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallel Database Architectures

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallel Database Architectures

 Shared memory
 Shared disk
 Shared nothing
 Hierarchical

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallel Database Architectures

 Shared memory -- processors share a common


memory (Extremely efficient communication between
processors).

 Shared disk -- processors share a common disk


(Shared-disk systems can scale to a somewhat larger
number of processors, but communication between
processors is slower.)

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
 Shared nothing -- Each processor has exclusive
access to its main memory.

 Hierarchical -- hybrid of the above architectures


(Combines characteristics of shared-memory, shared-
disk, and shared-nothing architectures.)

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallel Level
 A coarse-grain parallel machine
consists of a small number of
powerful processors

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Massively Parallel
 A massively parallel or fine grain parallel machine
utilizes thousands of smaller processors.

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallel System Performance Measure
 Speedup: = small system elapsed time
large system elapsed time
 Scaleup: = small system small problem elapsed time
big system big problem elapsed time

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Database Performance Measures
 throughput --- the number of tasks that can be
completed in a given time interval.
 response time --- the amount of time it takes to
complete a single task from the time it is submitted.

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Factors Limiting Speedup and Scaleup

Speedup and scaleup are often sublinear due to:


 Startup costs: Cost of starting up multiple processes
may dominate computation time, if the degree of
parallelism is high.
 Interference: Processes accessing shared resources
(e.g.,system bus, disks, or locks) compete with each
other
 Skew: Overall execution time determined by slowest of
parallely executing tasks.

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Parallel Database Issues
 Data Partitioning
 Parallel Query Processing

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
I/O Parallelism
 Horizontal partitioning – tuples of a relation are divided
among many disks such that each tuple resides on one
disk.
 Partitioning techniques (number of disks = n):
 Round-robin
 Hash partitioning
 Range partitioning

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Typical Database Query Types
 Sequential scan
 Point query
 Range query

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database
Comparison of Partitioning Techniques

Round Hashing Range


Robin
Sequential Best/good Good Good
Scan parallelism

Point Query Difficult Good for hash key Good for range
vector

Range Query Difficult Difficult Good for range


vector

04/25/2005 Yan Huang - CSCI5330 Database


Implementation – Parallel Database

You might also like