ADVDB OR DD Parallel Aau
ADVDB OR DD Parallel Aau
1
Agenda
Database models
Database models
Object Oriented model
Object relational model
Distributed Database
Introduction
DDBMS Architecture
DDB Design
Distributed Query Processing
2
1. Database Model
3
Recap
Database
Types Database
Relational
Attributes, tuple; order; relationship
Non-relational
Topology; relationship; order
Object Oriented model
4
Relational model
Mathematical set Table ={<a1: v1, … an:vn}
Two dimensional => attribute, tuples
No Ordering between attributes, no ordering among
tuples
Relationship
1-1
1-n
N-n is not supported rather changed into two 1-n
5
Relational model: Example
Students(ID, Fullname, sex, telno, hobbies, address, …)
Departments (DepID, dName, ChairPerson, teleno,… )
Course(CID, Title, Description, Crhr, …)
Employees(EID, FullName, sex, address, qualification, hobbies, photo,
…)
Rules:
Entity constraint
Referential integrity rule
How many telno/ hobbies a student or an Employee can have?
What is address/ qualification and what can be its content?
Relationship
Students and department
Employees and Departments
Students & courses
Any relationship/ similarity between Student and Employee
6
Non-relational
Topology – hierarchal, network based, tree, graph, linked list …
Relationship 1-1, 1-n, n-m
Order is important
E.g. XML, OODB, ..,
7
OO database concept
Representing complex object
Encapsulation
Class
Inheritance
8
OO database concept
Association: is the link between entities in an application.
It is represented by means of references between objects.
It can be binary, ternary and reverse
9
ADVANTAGES OF OODB
An integrated repository of information that is shared by
multiple users, multiple products, multiple applications on
multiple platforms.
It also solves the following problems:
The semantic gap: The real world and the Conceptual model is very
similar.
Impedance mismatch: Programming languages and database systems
must be interfaced to solve application problems. But the language
style, data structures, of a programming language (such as C) and the
DBMS (such as Oracle) are different. The OODB supports general
purpose programming in the OODB framework.
New application requirements: Especially in OA, CAD, CAM, CASE,
object-orientation is the most natural and most convenient.
10
Complex object model
Allows
Sets of atomic values
Tuple-valued attributes
Sets of tuples (nested relations)
General set and tuple constructors
Object identity
Thus, formally
Every atomic value in A is an object.
If a1, ..., an are attribute names in N, and O1, ..., On are objects,
then T = [a1:O1, ..., an:On] is also an object, and T.ai retrieves the
value Oi.
If O1, ..., On are objects, then S = {O1, ..., On} is an abject.
11
Object Model
An object is defined by a triple (OID, type constructor, state)
where OID is the unique object identifier,
type constructor is its type (such as atom, tuple, set, list, array, bag,
etc.) and state is its actual value.
Example:
(i1, atom, 'John')
(i2, atom, 30)
(i3, atom, 'Mary')
(i4, atom, 'Mark')
(i5, atom 'Vicki')
(i6, tuple, [Name:i1, Age:i2])
(i7, set, {i4, i5})
(i8, tuple, [Name:i3, Friends:i7])
(i9, set, {i6, i8})
12
OBJECT-ORIENTED DATABASES
OODB = Object Orientation + Database Capabilities
13
OODB
RESEARCH PROTOTYPES
ORION: Lisp-based system
IRIS: Functional data model, version control, object-SQL.
Galileo: Strong typed language, complex objects.
PROBE .
POSTGRES: Extended relational database supporting objects.
COMMERCIAL OODB
O2: O2 Technology. Language O2C to define classes, methods and types. Supports multiple
inheritance. C++ compatible. Supports an extended SQL language O2SQL which can refer
to complex objects.
G-Base: Lisp-based system, supports ADT, multiple inheritance of classes.
CORBA: Standards for distributed objects.
GemStone: Earliest OODB supporting object identity, inheritance, encapsulation. Language
OPAL is based upon Smalltalk.
Ontos: C++ based system, supports encapsulation, inheritance, ability to construct
complex objects.
Object Store: C++ based system. A good feature is that it supports the creation of indexes.
Statics: Supports entity types, set valued attributes, and inheritance of entity types and
methods.
14
OODB
COMMERCIAL OODB
Relational DB Extensions: Many relational systems support
OODB extensions.
User-defined functions (dBase).
User-defined ADTs (POSTGRES)
Very-long multimedia fields (BLOB or Binary Large Object). (DB2
from IBM, SQL from SYBASE, Informix, Interbase)
15
OODB Implementation Strategies
Develop novel database data model or data language (SIM)
Extend an existing database language with object-oriented
capabilities. (IRIS, O2 and VBASE/ONTOS extended SQL)
Extend existing object-oriented programming language
with database capabilities (GemStone OPAL extended
SmallTalk)
Extendable object-oriented DBMS library (ONTOS)
16
ODL A Class With Key and Extent
A class definition with “extent”, “key”, and more elaborate
attributes; still relatively straightforward
class Person (extent persons key ssn) {
attribute struct Pname {string fname …} name;
attribute string ssn;
attribute date birthdate;
…
short age();
}
SELECT d.name
FROM departments d
WHERE d.college = ‘Engineering’;
Review Questions
What are the main assumptions in Relational model?
What are the basic features of relational model?
State the drawback/ limitations of relational database
model?
State advantage of Object Oriented Database model?
Explain the challenges of Object oriented database model?
19
Object-Relational Data Models
Extend the relational data model by including object
orientation and constructs to deal with added data types.
Allow attributes of tuples to have complex types, including
non-atomic values such as nested relations.
Preserve relational foundations, in particular the
declarative access to data, while extending modeling
power.
Upward compatibility with existing relational languages.
20
Nested Relations
Motivation:
Permit non-atomic domains (atomic indivisible)
Example of non-atomic domain: set of integers,or set of tuples
Allows more intuitive modeling for applications with complex
data
Intuitive definition:
allow relations whenever we allow atomic (scalar) values -
relations within relations
Retains mathematical foundation of relational model
Violates first normal form.
21
Example of a Nested Relation
Example: library information system
Each book has
title,
a set of authors,
Publisher, and
a set of keywords
Non-1NF relation books
22
1NF Version of Nested Relation
1NF version of books
flat-books
23
4NF Decomposition of Nested Relation
Remove awkwardness of flat-books by assuming that the
following multi-valued dependencies hold:
title author
title keyword
title pub-name, pub-branch
Decompose flat-doc into 4NF using the schemas:
(title, author)
(title, keyword)
(title, pub-name, pub-branch)
24
4NF Decomposition of flat–books
25
Problems with 4NF Schema
4NF design requires users to include joins in their
queries.
1NF relational view flat-books defined by join of 4NF
relations:
eliminates the need for users to perform joins,
but loses the one-to-one correspondence between tuples and
documents.
And has a large amount of redundancy
Nested relations representation is much more natural
here.
26
Complex Types and SQL:1999
Extensions to SQL to support complex types include:
Collection and large object types
Nested relations are an example of collection types
Structured types
Nested record structures like composite attributes
Inheritance
Object orientation
Including object identifiers and references
27
Collection Types
Set type (not in SQL:1999)
create table books (
…..
keyword-set setof(varchar(20))
……
)
Sets are an instance of collection types. Other instances
include
Arrays (are supported in SQL:1999)
E.g. author-array varchar(20) array[10]
Can access elements of array in usual fashion:
E.g. author-array[1]
Multisets (not supported in SQL:1999)
I.e., unordered collections, where an element may occur multiple
times
Nested relations are sets of tuples
SQL:1999 supports arrays of tuples
28
Large Object Types
Large object types
clob: Character large objects
book-review clob(10KB)
blob: binary large objects
image blob(10MB)
movie blob (2GB)
29
Structured and Collection Types
(PostgreSQL)
Structured types can be declared and used in SQL
CREATE TYPE Publisher as (name varchar(20),
branch varchar(20));
31
Structured Types (Cont.)
Add two records into the books table
Insert into books (title,authors,pub_date, pub, keywords) values
('Compilers','{"Smith","Jones"}', now()::date,row('McGraw-Hill','New
York')::publisher,'{"Parsing","Analysis"}'),
('Networks','{"Jones","Frick"}',now()::date,row('Oxford','London')::p
ublisher,'{"Internet","Web"}')
Retrieve the content of the books table – two rows will be returned
Select * from Books;
34
Inheritance in PostgreSQL
PostgreSQL supports only table inheritance no type
inheritance which is supported in SQL-99
create type Person_Ty as (PID varchar (20), fullname name_type,
address full_address);
create table People of Person_ty;
Create table Emps (id serial, salary numeric) INHERITS (people);
-- inherits columns of the base table people
Inserting data to the Emps table adds part of the data into the base
table – people but the reverse is not true
Insert into emps (pid, fullname,address,salary) values (1245, row('Dawit',
'bekele')::name_type, row('DZ','AM')::full_address, 9878)
35
Inheritance in PostgreSQL
PostgreSQL supports only table inheritance no type
inheritance which is supported in SQL-99
create type Person_Ty as (PID varchar (20), fullname name_type,
address full_address);
create table People of Person_ty;
Create table Emps (id serial, salary numeric) INHERITS (people);
-- inherits columns of the base table people
Inserting data to the Emps table adds part of the data into the base
table – people but the reverse is not true
Insert into emps (pid, fullname,address,salary) values (1245, row('Dawit',
'bekele')::name_type, row('DZ','AM')::full_address, 9878)
36
Structured and Collection Types (Oracle)
Structured types can be declared and used in SQL
CREATE OR REPLACE TYPE Publisher as Object (name varchar(20), branch
varchar(20));
/
CREATE OR REPLACE TYPE VA as VARRAY (5) of VARCHAR(30);
/
CREATE OR REPLACE TYPE Book AS OBJECT (title varchar(20), authors VA,
pub_date date, pub Publisher, keywords VA);
/
Structured types can be used to create tables
37
Structured Types (Cont.)
Creating tables without creating an intermediate type
For example, the table books could also be defined as follows:
Create or Replace table books
(title varchar(20),authors VA,
pub_date date, pub Publisher, keywords VA)
38
Creation of Values of Complex Types
Values of structured types are created using
constructor functions
E.g. Publisher(‘McGraw-Hill’, ‘New York’)
Note: a value is not an object
39
Creation of Values of Complex Types
To insert the preceding tuple into the relation books
Insert into books (title, authors, pub, keywords) values
('Compilers', VA('Smith', 'Jones'),
Publisher('McGraw-Hill', 'New York'), VA('parsing','analysis'));
40
Inheritance Person_Typ
Suppose that we have the following type definition for people:
create or replace type Person_typ as Object
(name varchar(20),
address varchar(20)) not final; Teacher_Typ Student_Typ
/
Using inheritance to define the student and teacher types
create type Student under Person
As Object (degree varchar(20),
department varchar(20))
create or replace type Student_typ UNDER Person_ty
(degree varchar(20),
department varchar(20)) not final;
/
41
Reference Types
Object-oriented languages provide the ability to create
and refer to objects.
In SQL:1999
References are to tuples, and
References must be scoped,
I.e., can only point to tuples in one specified table
42
Reference Declaration in SQL:1999
E.g. define a type Department with a field name and a field
head which is a reference to the Person in table people as
scope
create type Department as Object
(name varchar(20), head ref Person_typ )
43
Initializing Reference Typed Values
In Oracle, to create a tuple with a reference value, first
create the tuple with a null reference and then set the
reference separately using the function ref(p) applied to a
tuple variable
45
Nested Table
CREATE TYPE animal_ty AS OBJECT (breed
VARCHAR(25), name VARCHAR(25), birthdate DATE);
/
CREATE TYPE animals_nt AS TABLE OF animal_ty;
/
CREATE TABLE breeder (breederName VARCHAR(25),
animals animals_nt)
nested table animals store as animals_nt_tab;
breederName Animals
48
Comparison of O-O and O-R Databases
Relational systems
simple data types, powerful query languages, high protection.
Persistent-programming-language-based OODBs
complex data types, integration with programming language,
high performance.
Object-relational systems
complex data types, powerful query languages, high protection.
Note: Many real systems blur these boundaries
E.g. persistent programming language built as a wrapper on a
relational database offers first two benefits, but may have poor
performance.
49
Template for the review report
Introduction
Motivation
Problem statement
Proposed solution
Critics of the reviewers (both +ve and –ve)
Conclusion
50
Distributed Database
51
Outline
Distributed Database
Introduction
DDBMS Architecture
DDB Design
Distributed Query Processing
52
1. Introduction to Distributed
Database
53
File Systems
Program 1
Data
description File 1
Redundant Data
Program 2
Data File 2
description
Program 3
Data File 3
description
54
Database Management
Application program 1
(with data semantics)
Data description
Application program 2 Data Manipulation DATABASE
(with data semantics)
…
Application program 3
(with data semantics)
55
Quiz
https://app.sli.do/event/mDVTamzrFpEb4VMLxvzRgh/live/q
uestions
Or https://www.slido.com/
Code: #2177058
56
Objective of database technology
The key objective of DBS is Integration not centralization
57
Motivation
integration distribution
Distributed Database
Systems
integration
Integration ≠ centralization
58
Quiz
https://app.sli.do/event/mDVTamzrFpEb4VMLxvzRgh/live/q
uestions
Or https://www.slido.com/
Code: #2177058
59
What is distributed …
Processing logic or processing elements
Functions
Data
Control
60
Classification of Distributed computing
Criteria's [Bochmann, 1983]
Degree of coupling – how closely the processing elements are
connected together
Amount of Data exchanged/ amount of local processing
Weak vs strong coupling
Interconnection structure
Point-to-point interconnection b/n processing units
Common interconnection channel
Interdependence of components
Synchronization between components
Synchronous or asynchronous
61
What is a Distributed Database
System?
A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over a computer
network.
A distributed database management system (D–DBMS) is
the software that manages the DDB and provides an
access mechanism that makes this distribution
transparent to the users.
Distributed database system (DDBS) = DDB + D–DBMS
62
What is not a DDBS?
A timesharing computer system
A loosely or tightly coupled multiprocessor system
A database system which resides at one of the nodes of a
network of computers - this is a centralized database on
a network node
63
Shared-Memory Architecture
P1 P2 … Pn
Memory
64
Shared-Disk Architecture
Computer System Computer System Computer System
Shared
Secondary
Storage
65
Shared-Nothing Architecture
Switch
66
Centralized DBMS on a Network
Site 1
Site 2
Communication
Network
Site 3
Site 4
67
Distributed DBMS Environment
Site 2
Site 2
Site 1
Communication
Network
Site 3
Site 4
68
Implicit Assumptions
Data stored at a number of sites ➯ each site logically
consists of a single processor.
Processors at different sites are interconnected by a
computer network ➯ no multiprocessors
parallel database systems
Distributed database is a database, not a collection of files
➯ data logically related as exhibited in the users’ access
patterns
relational data model
D-DBMS is a full-fledged DBMS
not remote file system, not a TP system
69
Promises of Distributed DBMS
Transparent management of distributed, fragmented, and
replicated data
Improved reliability/availability through distributed
transactions
Improved performance
Easier and more economical system expansion
70
Transparency
Transparency is the separation of the higher level
semantics of a system from the lower level
implementation issues.
Fundamental issue is to provide
Data independence in the distributed environment
Network (distribution) transparency
Replication transparency
Fragmentation transparency
horizontal fragmentation: selection
vertical fragmentation: projection
hybrid
71
Review questions
List any Four Promises of Distributed database and with
example.
Explain network, location, fragmentation and replication
transparency types with example.
73
Introduction: Architecture
Defines the structure of the system. i.e,
The components of a system are identified
The functions of each component is specified and
The interrelationships and interactions among these
components are defined
74
DBMS Standardization
Reference Model
A conceptual framework whose purpose is to divide standardization
work into manageable pieces and to show at a general level how these
pieces are related to one another. (e.g., ISO/OSI)
75
DBMS Standardization …
2. Function-based
Classes of users are identified together with the functionality that the system
will provide for each class (e.g., ISO/OSI)
The objectives of the system are clearly identified. But it gives very little
insight into how these objectives are attained
3. Data-based
Identify the different types of data and specify the functional units that will
realize and/or use data according to these views.
76
ANSI/SPARC Architecture
77
Conceptual Schema Definition
RELATION PROJ [
KEY = {PNO}
ATTRIBUTES = {
PNO : CHARACTER(7)
PNAME : CHARACTER(20)
BUDGET : NUMERIC(7)
LOC : CHARACTER(15)
}
]
RELATION ASG [
KEY = {ENO,PNO}
ATTRIBUTES = {
ENO : CHARACTER(9)
PNO : CHARACTER(7)
RESP : CHARACTER(10)
DUR : NUMERIC(3)
}
]
78
Internal Schema Definition
RELATION EMP [
KEY = {ENO}
ATTRIBUTES = {
ENO : CHARACTER(9)
ENAME : CHARACTER(15)
TITLE : CHARACTER(10)
}
]
INTERNAL_REL E [
INDEX ON E# CALL EMINX
FIELD = {
E# : BYTE(9)
ENAME : BYTE(15)
TIT : BYTE(10)
}
]
79
External View Definition –
Example 1
Create a BUDGET view from the PROJ relation
80
External View Definition –
Example 2
Create a Payroll view from relations EMP and
Pay
81
Architectural models for Distributed DBMS
Ways to put multiple databases for sharing multiple
DBMS
82
Dimensions of the Problem
Distribution
Heterogeneity
83
Dimensions of the Problem
Autonomy
Refers to the degree to which individual DBMSs can operate independently
Various dimensions:
Design autonomy: Ability of a component DBMS to decide on issues related to its own
data model, design and transaction management techniques.
84
Architectural alternatives
A0, D0, H0: logically integrated system
Set of homogenous multiple DBMS
86
Architectural …
(A1, D1, H1): distributed heterogeneous federated DBMS
(A2, D0, H0): fully autonomous (multi-database system). The
components didn’t know how to talk with each other
It is autonomous collections of homogenous DBMS
Multi-DBMS is the software that provides for the management of this
Multi-databases and provides transparent access to it
88
Client/server
Task distribution
89
Advantages of Client-Server Architectures
More efficient division of labor
Horizontal and vertical scaling of resources
Better price/performance on client machines
Ability to use familiar tools on client machines
Client access to remote data (via standards)
Full DBMS functionality provided to client workstations
Overall better system price/performance
90
Problems With Multiple-Client/Single Server
Server forms bottleneck
Server forms single point of failure
Database scaling difficult
91
Multiple client- multiple server
94
Components of DDBMS
95
MDBMS architecture with GCS
96
Components of a Multi-DBMS
97
Review questions
States the three reference database archteture in DDB
Considering data oriented approach- ANSI/SPARC explain
Its the three layers and associated actors
at-least three advantage
List the building blocks of client server database
architecture along with services in each components.
Explain the main characteristics of p2p database
architecture
What is a multi-database?
98
3. Distributed Database Design
99
Introduction
The design of DDB involves
Making decision on the placement of data and program across
the sites of a computer network as well as possible designing
the network itself
100
Design strategies
Top-down
Based on designing systems from scratch
Begins with the requirement analysis that defines the
environment of the system and elicits both the data and
processing needs of all potential database users
It is applicable for the design of homogeneous databases
Bottom-up
When the databases already exist at a number of sites
Design involves integrating databases into one database
Integrate Local schema into Global schema
It is ideal in the context of heterogeneous databases
101
Top-Down Design Process
102
Distribution Design Issues
Why fragment at all?
103
Reasons for Fragmentation
Can't we just distribute relations?
What is a reasonable unit of distribution?
relation
views are subsets of relations locality
extra communication
fragments of relations (sub-relations)
concurrent execution of a number of transactions that access
different portions of a relation
views that cannot be defined on a single fragment will require extra
processing
semantic data control (especially integrity enforcement) more
difficult
104
Correctness of fragmentation
Given a relation R and its fragments R1, …, Rn , the
fragmentation is correct if the following three conditions
are supported
Completeness: a tuple in the original relation exists in some of
the fragments
ReConstructability: there exists an operator that can generate
the original relation from the fragments
Disjointness: there is no overlap or interaction between
fragments
105
Fragmentation Alternatives-Horizontal
106
Fragmentation Alternatives-Vertical
107
Degree of Fragmentation
108
Fragmentation
Horizontal Fragmentation (HF)
Primary Horizontal Fragmentation (PHF)
109
PHF – Information Requirements
Application Information
minterm selectivity: sel(mi)
The number of tuples of the relation that would be accessed by a
user query which is specified according to a given minterm predicate
mi
access frequencies: acc(qi)
The frequency with which a user application accesses data. If Q = {q1,
q2, …, qq} is a set of user queries, acc(qi) indicates the access
frequency of the query qi in a given period
Acc(mi) is computed from the acc(qi) that constitute the minterm
110
Primary Horizontal Fragmentation
Definition:
Rj = σFj (R ), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a minterm
predicate
A horizontal fragment Ri of relation R consists of all the tuples
of R which satisfy a minterm predicate mi
Given a set of minterm predicates M, there are as many
horizontal fragments of relation R as there are minterm
predicates
Set of horizontal fragments also referred to as minterm
fragments
111
PHF – Algorithm
Given:
A relation R, the set of simple predicates Pr
Output:
The set of fragments of R = {R1, R2,…,Rw} which obey the
fragmentation rules.
Preliminaries :
1. Pr should be complete
2. Pr should be minimal
112
Completeness of Simple Predicates
A set of simple predicates Pr is said to be complete IFF
the accesses to the tuples of the minterm fragments
defined on Pr requires that two tuples of the same
minterm fragment have the same probability of being
accessed by any application
Example:
Assume PROJ[PNO, PNAME, BUDGET, LOC] has two
applications defined on it
Find the budgets of projects at each location (1)
Find projects with budgets less than $200000 (2)
113
Completeness of Simple Predicates
According to (1),
Pr = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”}
which is not complete with respect to (2).
Modify
Pr = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤200000, BUDGET>200000}
which is complete.
114
Minimality of Simple Predicates
If a predicate influences how fragmentation is performed,
(i.e., causes a fragment f to be further fragmented into,
say, fi and fj) then there should be at least one application
that accesses fi and fj differently
In other words, the simple predicate should be relevant in
determining a fragmentation.
If all the predicates of a set Pr are relevant, then Pr is
minimal
115
Minimality of Simple Predicates
Example :
Pr ={LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤200000, BUDGET>200000}
However, if we add
PNAME = “Instrumentation”
116
COM_MIN Algorithm
Given:
a relation R and a set of simple predicates Pr
Output:
a complete and minimal set of simple predicates Pr' for Pr
Rule 1:
a relation or fragment is partitioned into at least two
parts which are accessed differently by at least one
application.
117
COM_MIN Algorithm
❶ Initialization
find a pi ∈ Pr such that pi partitions R according to Rule 1
set Pr' = pi ;
Pr ←Pr – pi;
F ←fi
❷ Iteratively add predicates to Pr' until it is complete
find a pj ∈ Pr such that pj partitions some fk defined according to
minterm predicate over Pr' according to Rule 1
set Pr' = Pr' ∪ pi ;
Pr ←Pr – pi;
F ← F ∪ fi
if ∃pk ∈ Pr' which is non-relevant then
Pr' ← Pr' – pk
F ← F – fk
118
PHORIZONTAL Algorithm
Makes use of COM_MIN to perform fragmentation
Input:
a relation R and a set of simple predicates Pr
Output:
a set of minterm predicates M according to which
relation R is to be fragmented
119
PHF – Example
Two candidate relations : Skill and PROJ.
Fragmentation of relation Skill
Application: Check the salary info and determine raise
Employee records kept at two sites application run at two sites
Simple predicates
p1: SAL ≤ 30000
p2: SAL > 30000
Pr= {p1,p2} which is complete and minimal Pr'=Pr
Minterm predicates
m1: (SAL ≤ 30000)
m2: NOT(SAL ≤ 30000) = (SAL > 30000)
120
PHF - Example
Skill1 Skill2
121
PHF - Example
Fragmentation of relation PROJ
Applications:
Find the name and budget of projects given their location
Issued at three sites
Access project information according to budget
one site accesses ≤200000 other accesses >200000
Simple predicates
For application (1)
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
For application (2)
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
Pr = Pr' = {p1,p2,p3,p4,p5}
122
PHF – Example
Fragmentation of relation PROJ continued
Minterm fragments left after elimination
m1 : (LOC = “Montreal”) ∧ (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) ∧ (BUDGET > 200000)
m3 : (LOC = “New York”) ∧ (BUDGET ≤ 200000)
m4 : (LOC = “New York”) ∧ (BUDGET > 200000)
m5 : (LOC = “Paris”) ∧ (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) ∧ (BUDGET > 200000)
123
PHF Correctness
Completeness
Since Pr' is complete and minimal, the selection predicates are
complete
Reconstruction
If relation R is fragmented into FR = {R1,R2,…,Rr}
R = ∪∀Ri ∈FR Ri
Disjointness
Minterm predicates that form the basis of fragmentation
should be mutually exclusive.
124
Review Question
Given relation EMP (ENO, ENAME, Title,…), let p1: TITLE <
“Programmer” and p2: TITLE > “Programmer” be two
simple predicates. Assume each attribute is string type.
(a) Perform a horizontal fragmentation of relation EMP
with respect to {p1, p2}
(b) Explain why the resulting fragmentation (EMP1, EMP2)
does not fulfill the correctness rules of fragmentation.
(c) Explain what should be done to make the fragmentation
correct
125
Derived Horizontal Fragmentation
Defined on a member relation of a link according to a
selection operation specified on its owner.
Each link is an equijoin.
Equijoin can be implemented by means of semi-joins.
126
DHF – Definition
Given a link L where owner(L)=S and member(L)=R, the
derived horizontal fragments of R are defined as
Ri = R ⋉Si, 1≤i≤w
127
DHF- Example
Given link L1 where owner(L1)=SKILL and member(L1)=EMP
EMP1 = EMP ⋉ SKILL1
EMP2 = EMP ⋉ SKILL2 Skill2
Skill1
where
SKILL1 = σSAL≤30000(SKILL)
SKILL2 = σSAL>30000(SKILL)
128
DHF – Correctness
Completeness
Let R be the member relation of a link whose owner is relation S which
is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join
attribute between R and S. Then, for each tuple t of R, there should be a
tuple t' of S such that: t[A] = t'[A]
i.e., Referential integrity :(tuples of any fragment of the member
relation are also in the owner relation)
Reconstruction
Reconstruction of a global relation R from its fragments {R1, R2,
…, Rn} is performed by the union operator (R is union of its
fragments)
Disjointness
In DHF disjointness is guaranteed only if the join graph
between the owner and the member fragments is simple.
129
Vertical Fragmentation
Has been studied within the centralized context
design methodology
physical clustering
More difficult than horizontal, because more alternatives
exist
Two approaches :
Grouping: attributes to fragments
Splitting: relation to fragments
130
Review question
Given relation PAY (Title, Sal) and EMP(ENO, ENAMA, Title),
let p1: SAL < 30000 and p2: SAL ≥ 30000 be two simple
predicates.
a) Perform a primary horizontal fragmentation of PAY with
respect to these predicates to obtain PAY1, and PAY2.
b) Using the fragmentation of PAY, perform further derived
horizontal fragmentation for EMP.
c) Show completeness, reconstruction, and disjointness of the
fragmentation of EMP.
131
VF
Overlapping fragments
grouping
Non-overlapping fragments
splitting
We do not consider the replicated key attributes to be
overlapping
Advantage:
Easier to enforce functional dependencies (for integrity
checking etc.)
132
VF – Information requirements
Application Information
Attribute affinities
a measure that indicates how closely related the attributes are
This is obtained from more primitive usage data
Attribute usage values
Given a set of queries Q = {q1, q2,…, qq} that will run on the relation
R[A1, A2,…, An]
133
VF – Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC
134
VF – Affinity Measure aff(Ai,Aj)
The attribute affinity measure between two attributes Ai
and Aj of a relation R[A1, A2, …, An] with respect to the set
of applications Q = (q1, q2, …, qq) is defined as follows :
Then
aff(A1, A3) = 15*1 + 20*1+10*1
= 45
and the attribute affinity matrix AA is
136
VF – Clustering Algorithm
Take the attribute affinity matrix AA and reorganize the
attribute orders to form clusters where the attributes in
each cluster demonstrate high affinity to one another
137
Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a
perturbation of AA
❶ Initialization: Place and fix one of the columns of AA in
CA
❷ Iteration: Place the remaining n-i columns in the
remaining i+1 positions in the CA matrix. For each
column, choose the placement that makes the most
contribution to the global affinity measure
❸ Row order: Order the rows according to the column
ordering
138
Cont(Ai,Ak, Aj) = 2bond(Ai Ak)+2bond(Ak,Aj)-2bond(Ai, Aj)
139
BEA – Example
Consider the following AA matrix and the corresponding CA matrix
where A1 and A2 have been placed.
Place A3:
Ordering (0-3-1):
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2):
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4): cont (A2,A3,A4) = 1780
140
BEA: Example
141
Partitioning Algorithm
The objective is find set of attributes that can be accessed
solely in most cases. i.e., to divide a set of clustered
attributes {A1, A2, …, An} into two (or more) sets {A1, A2, …, Ai}
and {Ai+1, …, An} such that there are no (or minimal)
applications that access both (or more than one) of the
sets.
142
Partitioning algorithm
Define
AQ(qi) = {Aj|use(qi, Aj) =1}
TQ = {qi| AQ(qi) subset of TA}
BQ = {qi| AQ(qi) subset of BA}
OQ = Q –{TQ U BQ} //set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications that access
only TA
CBQ = total number of accesses to attributes by applications that access
only BA
COQ = total number of accesses to attributes by applications that access
both TA and BA
Then find the point along the diagonal that maximizes
z = CTQ∗CBQ−COQ2
143
Partitioning algorithm
Two problems :
❶ Cluster forming in the middle of the CA matrix
Shift a row up and a column left and apply the algorithm to
find the “best” partitioning point
Do this for all possible shifts
Cost O(m2)
❷ More than two clusters
m-way partitioning
try 1, 2, …, m–1 split points along diagonal and try to find the best
point for each of these
Cost O(2m)
144
VF correctness
A relation R, defined over attribute set A and key K, generates the
vertical partitioning FR = {R1, R2, …, Rr}.
Completeness
The following should be true for A:
A =∪ ARi
Reconstruction
Reconstruction can be achieved by
R = ∆ Ri ∀Ri ∈FR
Disjointness
TID's are not considered to be overlapping since they are maintained
by the system
Duplicated keys are not considered to be overlapping
145
Hybrid fragmentation
146
Allocation
Problem Statement
Given
F = {F1, F2, …, Fn} fragments
S = {S1, S2, …, Sm} network sites
Q = {q1, q2,…, qq} applications
Find the "optimal" distribution of F to S.
Optimality
Minimal cost
Communication + storage + processing (read & update)
Cost in terms of time (usually)
Performance
Response time and/or throughput
Constraints
Per site constraints (storage & processing)
147
Information Requirements
Database information
selectivity of fragments
size of a fragment
Application information
access types and numbers
access localities
Communication network information
unit cost of storing data at a site
unit cost of processing at a site
Computer system information
bandwidth
latency
communication overhead
148
Allocation Model
General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint
Decision Variable
150
Allocation Model
Total Cost
Query processing cost + cost of storing a fragment at a site
Storage Cost (of fragment Fj at Sk)
(unit storage cost at Sk) * (size of Fj) * xjk
Query Processing Cost (for one query)
processing component + transmission component
151
Allocation Model
Query Processing Cost
Processing component
Access cost
152
Allocation Model
Query Processing Cost
Transmission component
Cost of updates
update message cost + acknowledgment cost
Retrieval Cost
(cost of retrieval command + cost of sending back the result)
153
Allocation Model
Constraints
Response time
Execution time of query <= max allowable response time for that
query
Storage constraints
Storage requirement of a fragment at that site <=storage capacity at
that site
154
Allocation Model
Attempts to reduce the solution space
assume all candidate partitioning are known and select the
“best” partitioning
ignore replication at first
sliding window on fragments
155
4. Distributed Query Processing
156
Introduction
Query Processing
query
Processor
Query optimization
How do we determine the “best” execution plan?
158
Query processing problem
Example
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND DUR > 37
159
Example …
160
Cost of Alternatives
Assume:
size(EMP) = 400, size(ASG) = 1000
tuple access cost = 1 unit; tuple transfer cost = 10 units
Strategy 1
produce ASGi: (10+10)∗tuple access cost 20
transfer ASGi to the sites of EMP: (10+10)∗tuple transfer cost 200
produce EMPi : (10+10) ∗tuple access cost∗2 40
transfer EMPi to result site: (10+10) ∗tuple transfer cost 200
Total cost 460
Strategy 2
transfer EMP to site 5:400∗tuple transfer cost 4,000
transfer ASGi to site 5 :1000∗tuple transfer cost 10,000
produce ASGi:1000∗tuple access cost 1,000
join EMPi and ASGi:400∗20∗tuple access cost 8,000
Total cost 23,000
161
Objective of Query processing
To transform a high-level query on a distributed database into low
level language on local databases
Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different distributed
environments
Wide area networks
communication cost will dominate
low bandwidth
low speed
high protocol overhead
Local area networks
communication cost not that dominant
total cost function should be considered
162
Complexity of Relational Operations
Assume
• relations of cardinality n
• sequential scan
Operation Complexity
Select O(n)
Project
Project (with duplicate elimination) O(nlog n)
Group
Join O(nlog n)
Semi-join
Division
Set Operations
Cartesian Product O(n2)
163
Characterization of Query processors
Four characteristics that hold for Centralized query processors
Language
Input language – relational calculus or relational algebra
Types of optimization
Exhaustive search
cost-based
Optimal
combinatorial complexity in the number of relations
Heuristics
not optimal
regroup common sub-expressions
perform selection, projection first
replace a join by a series of semi-joins
reorder operations to reduce intermediate relation size
optimize individual operations
164
Optimization Timing
Static
compilation optimize prior to the execution
difficult to estimate the size of the intermediate results error
propagation
can amortize over many executions
E.g. R*
Dynamic
run time optimization
exact information on the intermediate relation sizes
have to reoptimize for multiple executions
E.g. Distributed INGRES
Hybrid
compile using a static algorithm
if the error in estimate sizes > threshold, reoptimize at run time
E.g. MERMAID
165
Statistics
Relation
cardinality
size of a tuple
fraction of tuples participating in a join with another relation
Attribute
cardinality of domain
actual number of distinct values
Common assumptions
independence between different attribute values
uniform distribution of attribute values within their domain
166
Decision Sites
Centralized
single site determines the “best” schedule
simple
need knowledge about the entire distributed database
Distributed
cooperation among sites to determine the schedule
need only local information
cost of cooperation
Hybrid
one site determines the global schedule
each site optimizes the local subqueries
167
Network Topology
Wide area networks (WAN)
characteristics
low bandwidth
low speed
high protocol overhead
communication cost will dominate; ignore all other cost factors
global schedule to minimize communication cost
local schedules according to centralized query optimization
Local area networks (LAN)
communication cost not that dominant
total cost function should be considered
broadcasting can be exploited (e.g. joins) to optimize query
processing
special algorithms exist for star networks
168
Exploitation of Replicated Fragments
In Distributed query processing global relations are
mapped into queries on physical fragments of relation by
translating relations into fragments – localization
Replication is need for increasing reliability and availability
169
Use of semijoins
Semijoin reduces the size of the operand relation
But it increase the number of messages and in the local
processing time
E.g. SDD 1, designed for slow wide area networks, use
semijoin extensively
170
Layers of Query Processing
171
Query Decomposition
Input : Calculus query on global relations
1. Normalization
manipulate query quantifier and qualification
2. Analysis
detect and reject “incorrect” queries
possible for only a subset of relational calculus
3. Simplification
eliminate redundant predicates
4. Restructuring
calculus query is restructured into algebraic query
more than one translation is possible
use transformation rules
172
Normalization
Lexical and syntactic analysis
check validity (similar to compilers)
check for attributes and relations
type checking on the qualification
Put into normal form
Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)
Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧pmn)
OR's mapped into union
AND's mapped into join or selection
173
Analysis
Remove incorrect queries
Type incorrect
If any of its attribute or relation names are not defined in the global
schema
If operations are applied to attributes of the wrong type
Semantically incorrect
Components do not contribute in any way to the generation of the
result
Only a subset of relational calculus queries can be tested for
correctness
Those that do not contain disjunction and negation
Technique to detect incorrect queries
connection graph (query graph) that represent the semantic of the query
join graph
174
Analysis – Example
SELECT ENAME,RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"
175
Analysis
If the query graph is not connected, the query is wrong.
SELECT ENAME,RESP, PNAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"
176
Simplification
Use transformation rules
elimination of redundancy
idempotency rules
p1 ∧ ¬( p1) ⇔ false
p1 ∧ (p1 ∨ p2) ⇔ p1
p1 ∨ false ⇔ p1
application of transitivity
use of integrity rules
177
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
OR (NOT(EMP.TITLE = “Programmer”)
AND (EMP.TITLE = “Programmer”)
OR EMP.TITLE = “Elect. Eng.”)
AND NOT(EMP.TITLE = “Elect. Eng.”) )
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
178
Restructuring
Convert relational calculus to
relational algebra
Make use of query trees
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years.
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)
179
Restructuring –Transformation Rules
Commutativity of binary operations
R×S⇔S×R
R join S ⇔S join R
R∪S⇔S∪R
Associativity of binary operations
( R × S ) × T ⇔ R × (S × T)
( R join S) join T ⇔ R join (S join T)
Idempotence of unary operations
ΠA’(ΠA’(R)) ⇔ΠA’(R)
σp1(A1)(σp2(A2)(R)) = σp1(A1) ∧ p2(A2)(R)
where R[A] and A' ⊆ A, A" ⊆ A and A' ⊆ A"
Commuting selection with projection
180
Restructuring –Transformation Rules
Commuting selection with binary operations
σp(A)(R × S) ⇔ (σp(A) (R)) × S
σp(Ai)(R join(Aj,Bk) S) ⇔ (σp(Ai)(R)) join(Aj,Bk) S
σp(Ai)(R ∪ T) ⇔ σp(Ai)(R) ∪ σp(Ai)(T)
where Ai belongs to R and T
Commuting projection with binary operations
ΠC(R × S) ⇔ΠA’(R) × ΠB’(S)
ΠC(R join(Aj,Bk) S)⇔ΠA’(R) join(Aj,Bk) ΠB’(S)
ΠC(R ∪ S) ⇔ΠC (R) ∪ ΠC (S)
where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B
181
Example
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)
182
Equivalent Query
183
Restructuring
σDur=12 v Dur=24
184
Step 2 – Data Localization
Input: Algebraic query on distributed relations
Determine which fragments are involved
Localization program
substitute for each global query its materialization program
➠ optimize
185
Example
Assume
EMP is fragmented into EMP1, EMP2,
EMP3 as follows:
EMP1=σENO≤“E3”(EMP)
EMP2= σ“E3”<ENO≤“E6”(EMP)
EMP3=σENO>“E6”(EMP)
ASG fragmented into ASG1 and ASG2 as
follows:
ASG1=σENO≤“E3”(ASG)
ASG2=σENO>“E3”(ASG)
186
Provides Parallellism
187
Eliminates …
188
Reduction for PHF
Reduction with selection
Relation R and FR={R1, R2, …, Rw} where Rj=σ pj(R)
σ pi(Rj)= φ if ∀x in R: ¬(pi(x) ∧ pj(x))
EMP1=σENO≤“E3”(EMP)
Example
EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT *
EMP3=σENO>“E6”(EMP)
FROM EMP
WHERE ENO=“E5”
189
Reduction for PHF
Reduction with join
Possible if fragmentation is done on join attribute
Distribute join over union
(R1 ∪ R2) join S ⇔ (R1 join S) ∪ (R2 join S)
Given Ri = σpi(R) and Rj = σpj(R)
Ri join Rj = φ if ∀x in Ri, ∀y in Rj: ¬(pi(x) ∧ pj(y))
190
Reduction for PHF
Reduction with join - Example
Assume EMP is fragmented into three
ASG1: σENO ≤ "E3"(ASG)
ASG2: σENO > "E3"(ASG) EMP1=σENO≤“E3”(EMP)
Consider the query EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT * FROM EMP, ASG EMP3=σENO>“E6”(EMP)
WHERE EMP.ENO=ASG.ENO
191
Reduction for PHF
Reduction with join
Distribute join over unions
Apply the reduction rule
192
Reduction for VF
Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically
fragmented as Ri = ΠA'(R) where A' ⊆ A:
ΠD,K(Ri) is useless if the set of projection attributes D is not in A’
Example: EMP1= ΠENO,ENAME(EMP); EMP2= ΠENO,TITLE (EMP)
SELECT ENAME
FROM EMP
193
Reduction for DHF
Rule :
Distribute joins over unions
Apply the join reduction for horizontal fragmentation
Example
ASG1: ASG JoinENO EMP1
ASG2: ASG JoinENO EMP2
EMP1: σTITLE=“Programmer” (EMP)
EMP2: σTITLE<>“Programmer” (EMP)
Query
SELECT *
FROM EMP, ASG
WHERE ASG.ENO = EMP.ENO
AND EMP.TITLE = “Mech. Eng.”
194
Reduction for DHF
195
Reduction for DHF
Joins over unions
196
Reduction for Hybrid Fragmentation
Combine the rules already specified:
Remove empty relations generated by contradicting selections
on horizontal fragments
Remove useless relations generated by projections on vertical
fragments
Distribute joins over unions in order to isolate and remove
useless joins
197
Reduction for Hybrid Fragmentation
Example
Consider the following hybrid
fragmentation:
EMP1=σENO≤"E4" (ΠENO,ENAME(EMP))
EMP2=σENO>"E4"
(ΠENO,ENAME(EMP))
EMP3= ΠENO,TITLE(EMP)
and the query
SELECT ENAME
FROM EMP
WHERE ENO=“E5”
198
Global Query Optimization
Input: Fragment query
Find the best (not necessarily optimal) global schedule
Minimize a cost function
Distributed join processing
Bushy vs. linear trees
Which relation to ship where?
Ship-whole vs ship-as-needed
Decide on the use of semijoins
Semijoin saves on communication at the expense of more local
processing.
Join methods
nested loop vs ordered joins (merge join or hash join)
199
Cost-Based Optimization
Solution space
The set of equivalent algebra expressions (query trees).
Cost function (in terms of time)
I/O cost + CPU cost + communication cost
These might have different weights in different distributed
environments (LAN vs WAN).
Can also maximize throughput
Search algorithm
How do we move inside the solution space?
Exhaustive search, heuristic algorithms (iterative improvement,
simulated annealing, genetic,…)
200
5. Concurrency Control
201
Recap
What is transaction?
What are the main problem that may occure if different
transactions are allowed to access data together?
Any mechanism to detect such problem?
Considering distributed database system, what can be
distributed?
202
Concurrency Control in Distributed
Database
Concurrency control schemes dealt with handling of data
as part of concurrent transactions.
Various locking protocols are used for handling
concurrent transactions in centralized database systems.
There are no major differences between the schemes in
centralized and distributed databases. The only major
difference is that the way the lock manager should deal
with the replicated data.
203
Locking protocols
1. Single lock manager approach
2. Distributed lock manager approach
a) Primary Copy protocol
b) Majority protocol
c) Biased protocol
d) Quorum Consensus protocol
204
Single Lock Manager - Concurrency Control
in Distributed Database
205
Single Lock Manager …
1. Transaction T1 @S5 request for data
item D
2. The initiator site S5’s Transaction
manager sends the lock request to lock
data item D to the lock-manager site S3.
The Lock-manager at site S3 will look for the
availability of the data item D.
3. If the requested item is not locked by
any other transactions, the lock-manager
site responds with lock grant message
to the initiator site S5.
4. The initiator site S5 can use the data
item D from any of the sites S1, S2, and
S6 for completing the Transaction T1.
5. After successful completion of the
Transaction T1, the Transaction manager
of S5 releases the lock by sending the
unlock request to the lock-manager site
S3.
206
Primary Copy Protocol
207
Majority Based Protocol
A transaction which needs to lock data item Q has to
request and lock data item Q in half+one sites in which Q
is replicated (i.e, majority of the sites in which Q is
replicated).
The lock-managers of all the sites in which Q is replicated
are responsible for handling lock and unlock requests
locally individually.
Irrespective of the lock types (read or write, i.e, Shared
or Exclusive), we need to lock half+one sites.
208
Majority Based Protocol
209
Parallel Databases
210
Parallel Databases
Introduction
I/O Parallelism
Interquery Parallelism
Intraquery Parallelism
Intraoperation Parallelism
Interoperation Parallelism
Design of Parallel Systems
211
Introduction
Parallel machines are becoming quite common and affordable
Prices of microprocessors, memory and disks have dropped sharply
Recent desktop computers feature multiple processors and this
trend is projected to accelerate
Databases are growing increasingly large
large volumes of transaction data are collected and stored for later
analysis.
multimedia objects like images are increasingly stored in databases
Large-scale parallel database systems increasingly used for:
storing large volumes of data
processing time-consuming decision-support queries
providing high throughput for transaction processing
212
Parallelism in Databases
Data can be partitioned across multiple disks for parallel
I/O.
Individual relational operations (e.g., sort, join,
aggregation) can be executed in parallel
data can be partitioned and each processor can work
independently on its own partition.
Queries are expressed in high level language (SQL,
translated to relational algebra)
makes parallelization easier.
Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
Thus, databases naturally lend themselves to parallelism.
213
Modes of Parallelism
At the heart of all parallel machines is a collection of
processors.
Each processor has its own local cache
Classify parallel architectures into three broad groups
The most tightly coupled architectures shared memory
A less tightly coupled architecture shares disk but not
memory.
Shared nothing
214
Shared-Memory
215
Shared-Disk
all processors have their own memory and their own disk or disks
the shared-nothing architecture is the most commonlyused architecture for database systems
Used by Teradata, IBM, Sybase, Microsoft for OLAP
Prototypes: Gamma, Bubba, Grace, Prisma, EDS
+ Extensibility, availability
- Complexity, difficult load balancing
217
Hybrid Architectures
Various possible combinations of the three basic
architectures are possible to obtain different trade-offs
between cost, performance, extensibility, availability, etc.
Hybrid architectures try to obtain the advantages of
different architectures:
efficiency and simplicity of shared-memory
extensibility and cost of either shared disk or shared nothing
2 main kinds: NUMA and cluster
218
I/O Parallelism
Reduce the time required to retrieve relations from disk
by partitioning the relations on multiple disks.
Horizontal partitioning – tuples of a relation are divided
among many disks such that each tuple resides on one
disk.
Partitioning techniques (number of disks = n):
Round-robin: Send the ith tuple inserted in the relation to disk i
mod n.
Hash partitioning: send tuple n to disk f(n) where f is a
uniformly distributed random function
219
I/O Parallelism (Cont.)
Range partitioning: break tuples up into contiguous
ranges of keys, requires a key that can be ordered linearly
Choose an attribute as the partitioning attribute.
A partitioning vector [vo, v1, ..., vn-2] is chosen.
Let v be the partitioning attribute value of a tuple. Tuples such
that vi vi+1 go to disk I + 1. Tuples with v < v0 go to disk 0 and
tuples with v vn-2 go to disk n-1.
E.g., with a partitioning vector [5,11], a tuple with partitioning
attribute value of 2 will go to disk 0, a tuple with value 8 will go
to disk 1, while a tuple with value 20 will go to disk2.
220
Comparison of Partitioning Techniques
Evaluate how well partitioning techniques support the
following types of data access:
1.Scanning the entire relation.
2.Locating a tuple (identify query) associatively – point
queries.
Example: r.A = 25.
3.Locating a set of tuples based on the value of a given
attribute lies within a specified range – range queries.
Example: 10 r.A < 25.
221
Comparison of Partitioning Techniques(Cont.)
Round robin:
Advantages
Best suited for sequential scan of entire relation on each query.
All disks have almost an equal number of tuples; retrieval work
is thus well balanced between disks.
Range queries are difficult to process
No clustering - tuples are scattered across all disks
222
Comparison of Partitioning Techniques(Cont.)
Hash partitioning:
Good for sequential access
Assuming hash function is good, and partitioning attributes
form a key, tuples will be equally distributed between disks
Retrieval work is then well balanced between disks.
Good for point queries on partitioning attribute
Can lookup single disk, leaving others available for answering
other queries.
Index on partitioning attribute can be local to disk, making
lookup and update more efficient
No clustering, so difficult to answer range queries
223
Range partitioning
Partition requires a partitioning attribute A usually the
primary key
A vector of dimension n partitions A
Vector {v0,v2,…,vn-1}
Each tuple t goes into:
Partition 0 if t[A] < v0
Partition n-1 if t[A] > vn-2
Partition k if t[A] > vk-1 and t[A] < vk, k >=1
Simple range partitioning #disks = #partitions
224
Comparison of Partitioning Techniques (Cont.)
Range partitioning:
Provides data clustering by partitioning attribute value.
Good for sequential access
Good for point queries on partitioning attribute: only one disk
needs to be accessed.
For range queries on partitioning attribute, one to a few disks
may need to be accessed
Remaining disks are available for other queries.
Good if result tuples are from one to a few blocks.
If many blocks are to be fetched, they are still fetched from one to a
few disks, and potential parallelism in disk access is wasted
Example of execution skew.
225
Partitioning a Relation across Disks
If a relation contains only a few tuples which will fit into a
single disk block, then assign the relation to a single disk.
Large relations are preferably partitioned across all the
available disks.
If a relation consists of m disk blocks and there are n
disks available in the system, then the relation should be
allocated min(m,n) disks.
226
Handling of Skew
The distribution of tuples to disks may be skewed —
that is, some disks have many tuples, while others may
have fewer tuples.
Types of skew:
Attribute-value skew.
when lots of tuples are clustered around the same (or nearly same
value) i.e. some values appear in the partitioning attributes of many
tuples; all the tuples with the same value for the partitioning attribute
end up in the same partition.
Can occur with range-partitioning and hash-partitioning.
Partition skew.
With range-partitioning, badly chosen partition vector may assign too
many tuples to some partitions and too few to others.
Less likely with hash-partitioning if a good hash-function is chosen.
227
Handling Skew in Range-Partitioning
To create a balanced partitioning vector (assuming
partitioning attribute forms a key of the relation):
Sort the relation on the partitioning attribute.
Construct the partition vector by scanning the relation in
sorted order as follows.
After every 1/nth of the relation has been read, the value of the
partitioning attribute of the next tuple is added to the partition
vector.
n denotes the number of partitions to be constructed.
Duplicate entries or imbalances can result if duplicates are
present in partitioning attributes.
Alternative technique based on histograms used in
practice
228
Handling Skew using Histograms
Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion
Assume uniform distribution within each range of the histogram
Histogram can be constructed by scanning relation, or sampling (blocks
containing) tuples of the relation.
229
Handling Skew Using Virtual Processor
Partitioning
Skew in range partitioning can be handled elegantly using
virtual processor partitioning:
create a large number of partitions (say 10 to 20 times the
number of processors)
Assign virtual processors to partitions either in round-robin
fashion or based on estimated cost of processing each virtual
partition
Basic idea:
If any normal partition would have been skewed, it is very likely
the skew is spread over a number of virtual partitions
Skewed virtual partitions get spread across a number of
processors, so work gets distributed evenly!
230
Interquery Parallelism
It is a form of parallelism where many different Queries or
Transactions are executed in parallel with one another on many
processors
Increases transaction throughput; used primarily to scale up a
transaction processing system to support a larger number of
transactions per second.
Easiest form of parallelism to support, particularly in a shared-
memory parallel database, because even sequential database systems
support concurrent processing.
More complicated to implement on shared-disk or shared-nothing
architectures
Locking and logging must be coordinated by passing messages between
processors.
Data in a local buffer may have been updated at another processor.
Cache-coherency has to be maintained - reads and writes of data in
buffer must find latest version of data.
231
Intraquery Parallelism
Execution of a single query in parallel on multiple
processors/disks; important for speeding up long-running
queries.
SELECT * FROM Email ORDER BY Start_Date;
Two complementary forms of intraquery parallelism :
Intraoperation Parallelism – parallelize the execution of
each individual operation in the query.
SELECT * FROM Email ORDER BY Start_Date; //(Sort
Operation)
SELECT * FROM Student, CourseRegd WHERE
Student.Regno = CourseRegd.Regno; //(Join)
233
Intraquery Parallelism
Interoperation Parallelism – execute the different
operations in a query expression in parallel.
A single query may involve multiple operations at once.
SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id;
234
Parallel Processing of Relational Operations
The discussion of parallel algorithms assumes:
read-only queries
shared-nothing architecture
n processors, P0, ..., Pn-1, and n disks D0, ..., Dn-1, where disk Di is
associated with processor Pi.
If a processor has multiple disks they can simply simulate
a single disk Di.
Shared-nothing architectures can be efficiently simulated
on shared-memory and shared-disk systems.
Algorithms for shared-nothing systems can thus be run on
shared-memory and shared-disk systems.
However, some optimizations may be possible.
235
Parallel Sort
Range-Partitioning Sort
Assumptions:
Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1.
Disk Di is associated with Processor Pi.
Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or
Hash Partitioning technique or Range Partitioning technique (if range partitioned
on some other attribute other than sorting attribute)
Objective:
to sort a relation (table) Ri that resides on n disks on an attribute A in parallel.
i.e. choose processors P0, ..., Pm, where m n -1 to do sorting.
Step 1: Partition the relations Ri on the sorting attribute A at every
processor using a range vector v. Send the partitioned records which fall in
the ith range to Processor Pi where they are temporarily stored in Di.
Step 2: Sort each partition locally at each processor Pi. And, send the
sorted results for merging with all the other sorted results which is trivial
process.
236
Assume that relation Employee(Emp_ID, EName, Salary) is permanently
partitioned using Round-robin technique into 3 disks D0, D1, and D2which are
associated with processors P0, P1, and P2. At processors P0, P1, and P2, the relations
are named Employee0, Employee1 and Employee2 respectively.
237
Sort 2: Sort each temporary table in ascending order and later merge
238
Parallel Sort (Cont.)
Parallel External Sort-Merge
Assume the relation has already been partitioned among disks
D0, ..., Dn-1.
Each processor Pi locally sorts the data on disk Di.
The sorted runs on each processor are then merged to get
the final sorted output.
Parallelize the merging of sorted runs as follows:
The sorted partitions at each processor Pi are range-partitioned
across the processors P0, ..., Pm-1.
Each processor Pi performs a merge on the streams as they are
received, to get a single sorted run.
The sorted runs on processors P0,..., Pm-1 are concatenated to get the
final result.
239
SELECT * FROM Employee ORDER BY Salary;
v[14000, 24000]
240
Parallel Join
The join operation requires pairs of tuples to be tested
to see if they satisfy the join condition, and if they do, the
pair is added to the join output.
Parallel join algorithms attempt to split the pairs to be
tested over several processors. Each processor then
computes part of the join locally.
In a final step, the results from each processor can be
collected together to produce the final result.
241
Partitioned Join
For equi-joins and natural joins, it is possible to partition the
two input relations across the processors, and compute the
join locally at each processor.
Let r and s be the input relations, and we want to compute r
r.A=s.B s.
r and s each are partitioned into n partitions, denoted r0, r1, ...,
rn-1 and s0, s1, ..., sn-1.
Can use either range partitioning or hash partitioning.
r and s must be partitioned on their join attributes r.A and s.B),
using the same range-partitioning vector or hash function.
Partitions ri and si are sent to processor Pi,
Each processor Pi locally computes ri ri.A=si.B si. Any of the
standard join methods can be used.
242
Partitioned Join (Cont.)
243
244
Fragment-and-Replicate Join
Partitioning not possible for some join conditions
e.g., non-equijoin conditions, such as r.A > s.B.
For joins were partitioning is not applicable,
parallelization can be accomplished by fragment and
replicate technique
Special case - asymmetric fragment-and-replicate:
One of the relations, say r, is partitioned; any partitioning
technique can be used.
The other relation, s, is replicated across all the processors.
Processor Pi then locally computes the join of ri with all of s
using any join technique.
245
Depiction of Fragment-and-Replicate Joins
246
Fragment-and-Replicate Join (Cont.)
General case: reduces the sizes of the relations at each
processor.
r is partitioned into n partitions,r0, r1, ..., r n-1; s is partitioned
into m partitions, s0, s1, ..., sm-1.
Any partitioning technique may be used.
There must be at least m * n processors.
Label the processors as: P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
Pi,j computes the join of ri with sj. In order to do so, ri is
replicated to Pi,0, Pi,1, ..., Pi,m-1, while si is replicated to P0,i, P1,i, ...,
Pn-1,i
Any join technique can be used at each processor Pi,j.
247
Fragment-and-Replicate Join (Cont.)
Both versions of fragment-and-replicate work with any
join condition, since every tuple in r can be tested with
every tuple in s.
Usually has a higher cost than partitioning, since one of
the relations (for asymmetric fragment-and-replicate) or
both relations (for general fragment-and-replicate) have
to be replicated.
Sometimes asymmetric fragment-and-replicate is
preferable even though partitioning could be used.
E.g., say s is small and r is large, and already partitioned. It may
be cheaper to replicate s across all processors, rather than
repartition r and s on the join attributes.
248
Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
Assume s is smaller than r and therefore s is chosen as the
build relation.
A hash function h1 takes the join attribute value of each tuple
in s and maps this tuple to one of the n processors.
Each processor Pi reads the tuples of s that are on its disk Di,
and sends each tuple to the appropriate processor based on
hash function h1. Let si denote the tuples of relation s that are
sent to processor Pi.
As tuples of relation s are received at the destination
processors, they are partitioned further using another hash
function, h2, which is used to compute the hash-join locally.
(Cont.)
249
Partitioned Parallel Hash-Join (Cont.)
Once the tuples of s have been distributed, the larger relation r is
redistributed across the m processors using the hash function h1
Let ri denote the tuples of relation r that are sent to processor Pi.
As the r tuples are received at the destination processors, they are
repartitioned using the function h2
(just as the probe relation is partitioned in the sequential hash-join
algorithm).
Each processor Pi executes the build and probe phases of the hash-
join algorithm on the local partitions ri and s of r and s to produce a
partition of the final result of the hash-join.
Note: Hash-join optimizations can be applied to the parallel case
e.g., the hybrid hash-join algorithm can be used to cache some of the
incoming tuples in memory and avoid the cost of writing them and
reading them back in.
250
Parallel Nested-Loop Join
Assume that
relation s is much smaller than relation r and that r is stored by
partitioning.
there is an index on a join attribute of relation r at each of the
partitions of relation r.
Use asymmetric fragment-and-replicate, with relation s being
replicated, and using the existing partitioning of relation r.
Each processor Pj where a partition of relation s is stored
reads the tuples of relation s stored in Dj, and replicates the
tuples to every other processor Pi.
At the end of this phase, relation s is replicated at all sites that store
tuples of relation r.
Each processor Pi performs an indexed nested-loop join of
relation s with the ith partition of relation r.
251
Other Relational Operations
Selection (r)
If is of the form ai = v, where ai is an attribute and v a
value.
If r is partitioned on ai the selection is performed at a single
processor.
If is of the form l <= ai <= u (i.e., is a range selection)
and the relation has been range-partitioned on ai
Selection is performed at each processor whose partition
overlaps with the specified range of values.
In all other cases: the selection is performed in parallel at
all the processors.
252
Other Relational Operations (Cont.)
Duplicate elimination
Perform by using either of the parallel sort techniques
eliminate duplicates as soon as they are found during sorting.
Can also partition the tuples (using either range- or hash-
partitioning) and perform duplicate elimination locally at each
processor.
Projection
Projection without duplicate elimination can be performed as
tuples are read in from disk in parallel.
If duplicate elimination is required, any of the above duplicate
elimination techniques can be used.
253
Grouping/Aggregation
Partition the relation on the grouping attributes and then
compute the aggregate values locally at each processor.
Can reduce cost of transferring tuples during partitioning
by partly computing aggregate values before partitioning.
Consider the sum aggregation operation:
Perform aggregation operation at each processor Pi on those
tuples stored on disk Di
results in tuples with partial sums at each processor.
Result of the local aggregation is partitioned on the grouping
attributes, and the aggregation performed again at each
processor Pi to get the final result.
Fewer tuples need to be sent to other processors during
partitioning.
254
Cost of Parallel Evaluation of Operations
If there is no skew in the partitioning, and there is no
overhead due to the parallel evaluation, expected speed-
up will be 1/n
If skew and overheads are also to be taken into account,
the time taken by a parallel operation can be estimated as
Tpart + Tasm + max (T0, T1, …, Tn-1)
Tpart is the time for partitioning the relations
Tasm is the time for assembling the results
Ti is the time taken for the operation at processor Pi
this needs to be estimated taking into account the skew, and the time
wasted in contentions.
255
Interoperator Parallelism
Pipelined parallelism
Consider a join of four relations
r1 r2 r3 r4
Set up a pipeline that computes the three joins in parallel
Let P1 be assigned the computation of
temp1 = r1 r2
And P2 be assigned the computation of temp2 = temp1 r3
And P3 be assigned the computation of temp2 r4
Each of these operations can execute in parallel, sending result
tuples it computes to the next operation even as it is
computing further results
Provided a pipelineable join evaluation algorithm (e.g. indexed nested
loops join) is used
256
Factors Limiting Utility of Pipeline
Parallelism
Pipeline parallelism is useful since it avoids writing
intermediate results to disk
Useful with small number of processors, but does not
scale up well with more processors. One reason is that
pipeline chains do not attain sufficient length.
Cannot pipeline operators which do not produce output
until all inputs have been accessed (e.g. aggregate and
sort)
Little speedup is obtained for the frequent cases of skew
in which one operator's execution cost is much higher
than the others.
257
Independent Parallelism
Independent parallelism
Consider a join of four relations
r1 r2 r3 r4
Let P1 be assigned the computation of temp1 = r1 r2
And P2 be assigned the computation of temp2 = r3 r4
And P3 be assigned the computation of temp1 temp2
P1 and P2 can work independently in parallel
P3 has to wait for input from P1 and P2
Can pipeline output of P1 and P2 to P3, combining independent parallelism
and pipelined parallelism
Does not provide a high degree of parallelism
useful with a lower degree of parallelism.
less useful in a highly parallel system,
258
Design of Parallel Systems
Some issues in the design of parallel systems:
Parallel loading of data from external sources is needed in
order to handle large volumes of incoming data.
Resilience to failure of some processors or disks.
Probability of some disk or processor failing is higher in a
parallel system.
Operation (perhaps with degraded performance) should be
possible in spite of failure.
Redundancy achieved by storing extra copy of every data item
at another processor.
259
Design of Parallel Systems (Cont.)
Online reorganization of data and schema changes must
be supported.
For example, index construction on terabyte databases can
take hours or days even on a parallel system.
Need to allow other processing (insertions/deletions/updates) to be
performed on relation even as index is being constructed.
Basic idea: index construction tracks changes and “catches up”
on changes at the end.
Also need support for online repartitioning and schema
changes (executed concurrently with other processing).
260