0% found this document useful (0 votes)
31 views258 pages

ADVDB OR DD Parallel Aau

Uploaded by

Nurye Nigus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views258 pages

ADVDB OR DD Parallel Aau

Uploaded by

Nurye Nigus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 258

Database Technologies

[email protected]

1
Agenda
 Database models
 Database models
 Object Oriented model
 Object relational model
 Distributed Database
 Introduction
 DDBMS Architecture
 DDB Design
 Distributed Query Processing

2
1. Database Model

3
Recap
 Database
 Types Database
 Relational
 Attributes, tuple; order; relationship
 Non-relational
 Topology; relationship; order
 Object Oriented model

4
Relational model
 Mathematical set Table ={<a1: v1, … an:vn}
 Two dimensional => attribute, tuples
 No Ordering between attributes, no ordering among
tuples
 Relationship
 1-1
 1-n
 N-n is not supported rather changed into two 1-n

5
Relational model: Example
 Students(ID, Fullname, sex, telno, hobbies, address, …)
 Departments (DepID, dName, ChairPerson, teleno,… )
 Course(CID, Title, Description, Crhr, …)
 Employees(EID, FullName, sex, address, qualification, hobbies, photo,
…)
 Rules:
 Entity constraint
 Referential integrity rule
 How many telno/ hobbies a student or an Employee can have?
 What is address/ qualification and what can be its content?
 Relationship
 Students and department
 Employees and Departments
 Students & courses
 Any relationship/ similarity between Student and Employee

6
Non-relational
 Topology – hierarchal, network based, tree, graph, linked list …
 Relationship 1-1, 1-n, n-m
 Order is important
 E.g. XML, OODB, ..,

7
OO database concept
 Representing complex object
 Encapsulation
 Class
 Inheritance

8
OO database concept
 Association: is the link between entities in an application.
It is represented by means of references between objects.
It can be binary, ternary and reverse

Select p.name, p.empl.company_name


From p in Persons

9
ADVANTAGES OF OODB
 An integrated repository of information that is shared by
multiple users, multiple products, multiple applications on
multiple platforms.
 It also solves the following problems:
 The semantic gap: The real world and the Conceptual model is very
similar.
 Impedance mismatch: Programming languages and database systems
must be interfaced to solve application problems. But the language
style, data structures, of a programming language (such as C) and the
DBMS (such as Oracle) are different. The OODB supports general
purpose programming in the OODB framework.
 New application requirements: Especially in OA, CAD, CAM, CASE,
object-orientation is the most natural and most convenient.

10
Complex object model
 Allows
 Sets of atomic values
 Tuple-valued attributes
 Sets of tuples (nested relations)
 General set and tuple constructors
 Object identity
 Thus, formally
 Every atomic value in A is an object.
 If a1, ..., an are attribute names in N, and O1, ..., On are objects,
then T = [a1:O1, ..., an:On] is also an object, and T.ai retrieves the
value Oi.
 If O1, ..., On are objects, then S = {O1, ..., On} is an abject.

11
Object Model
 An object is defined by a triple (OID, type constructor, state)
 where OID is the unique object identifier,
 type constructor is its type (such as atom, tuple, set, list, array, bag,
etc.) and state is its actual value.
Example:
(i1, atom, 'John')
(i2, atom, 30)
(i3, atom, 'Mary')
(i4, atom, 'Mark')
(i5, atom 'Vicki')
(i6, tuple, [Name:i1, Age:i2])
(i7, set, {i4, i5})
(i8, tuple, [Name:i3, Friends:i7])
(i9, set, {i6, i8})

12
OBJECT-ORIENTED DATABASES
 OODB = Object Orientation + Database Capabilities

 May provide the following features:


 persistence
 support of transactions
 simple querying of bulk data
 concurrency control
 resilience and recovery
 security
 versioning
 integrity
 performance issues
 DATA MODELS:
 Complex object model
 Semantic data model such as Extended ER (EER) model

13
OODB
 RESEARCH PROTOTYPES
 ORION: Lisp-based system
 IRIS: Functional data model, version control, object-SQL.
 Galileo: Strong typed language, complex objects.
 PROBE .
 POSTGRES: Extended relational database supporting objects.
 COMMERCIAL OODB
 O2: O2 Technology. Language O2C to define classes, methods and types. Supports multiple
inheritance. C++ compatible. Supports an extended SQL language O2SQL which can refer
to complex objects.
 G-Base: Lisp-based system, supports ADT, multiple inheritance of classes.
 CORBA: Standards for distributed objects.
 GemStone: Earliest OODB supporting object identity, inheritance, encapsulation. Language
OPAL is based upon Smalltalk.
 Ontos: C++ based system, supports encapsulation, inheritance, ability to construct
complex objects.
 Object Store: C++ based system. A good feature is that it supports the creation of indexes.
 Statics: Supports entity types, set valued attributes, and inheritance of entity types and
methods.

14
OODB
 COMMERCIAL OODB
 Relational DB Extensions: Many relational systems support
OODB extensions.
 User-defined functions (dBase).
 User-defined ADTs (POSTGRES)
 Very-long multimedia fields (BLOB or Binary Large Object). (DB2
from IBM, SQL from SYBASE, Informix, Interbase)

15
OODB Implementation Strategies
 Develop novel database data model or data language (SIM)
 Extend an existing database language with object-oriented
capabilities. (IRIS, O2 and VBASE/ONTOS extended SQL)
 Extend existing object-oriented programming language
with database capabilities (GemStone OPAL extended
SmallTalk)
 Extendable object-oriented DBMS library (ONTOS)

16
ODL A Class With Key and Extent
 A class definition with “extent”, “key”, and more elaborate
attributes; still relatively straightforward
class Person (extent persons key ssn) {
attribute struct Pname {string fname …} name;
attribute string ssn;
attribute date birthdate;

short age();
}

class department(extent Departments) {


attribute string name;
attribute string college;
}
Simple OQL Queries
 Basic syntax: select…from…where…
SELECT d.name
FROM d in departments
WHERE d.college = ‘Engineering’;
 An entry point to the database is needed for each query

SELECT d.name
FROM departments d
WHERE d.college = ‘Engineering’;
Review Questions
 What are the main assumptions in Relational model?
 What are the basic features of relational model?
 State the drawback/ limitations of relational database
model?
 State advantage of Object Oriented Database model?
 Explain the challenges of Object oriented database model?

19
Object-Relational Data Models
 Extend the relational data model by including object
orientation and constructs to deal with added data types.
 Allow attributes of tuples to have complex types, including
non-atomic values such as nested relations.
 Preserve relational foundations, in particular the
declarative access to data, while extending modeling
power.
 Upward compatibility with existing relational languages.

20
Nested Relations
 Motivation:
 Permit non-atomic domains (atomic  indivisible)
 Example of non-atomic domain: set of integers,or set of tuples
 Allows more intuitive modeling for applications with complex
data
 Intuitive definition:
 allow relations whenever we allow atomic (scalar) values -
relations within relations
 Retains mathematical foundation of relational model
 Violates first normal form.

21
Example of a Nested Relation
 Example: library information system
 Each book has
 title,
 a set of authors,
 Publisher, and
 a set of keywords
 Non-1NF relation books

22
1NF Version of Nested Relation
 1NF version of books

flat-books

23
4NF Decomposition of Nested Relation
 Remove awkwardness of flat-books by assuming that the
following multi-valued dependencies hold:
 title author
 title keyword
 title pub-name, pub-branch
 Decompose flat-doc into 4NF using the schemas:
 (title, author)
 (title, keyword)
 (title, pub-name, pub-branch)

24
4NF Decomposition of flat–books

25
Problems with 4NF Schema
 4NF design requires users to include joins in their
queries.
 1NF relational view flat-books defined by join of 4NF
relations:
 eliminates the need for users to perform joins,
 but loses the one-to-one correspondence between tuples and
documents.
 And has a large amount of redundancy
 Nested relations representation is much more natural
here.

26
Complex Types and SQL:1999
 Extensions to SQL to support complex types include:
 Collection and large object types
 Nested relations are an example of collection types
 Structured types
 Nested record structures like composite attributes
 Inheritance
 Object orientation
 Including object identifiers and references

27
Collection Types
 Set type (not in SQL:1999)
create table books (
…..
keyword-set setof(varchar(20))
……
)
 Sets are an instance of collection types. Other instances
include
 Arrays (are supported in SQL:1999)
 E.g. author-array varchar(20) array[10]
 Can access elements of array in usual fashion:
 E.g. author-array[1]
 Multisets (not supported in SQL:1999)
 I.e., unordered collections, where an element may occur multiple
times
 Nested relations are sets of tuples
 SQL:1999 supports arrays of tuples

28
Large Object Types
 Large object types
 clob: Character large objects
book-review clob(10KB)
 blob: binary large objects
image blob(10MB)
movie blob (2GB)

29
Structured and Collection Types
(PostgreSQL)
Structured types can be declared and used in SQL
CREATE TYPE Publisher as (name varchar(20),
branch varchar(20));

CREATE TYPE Book AS (title varchar(20), authors


text[], pub_date date, pub Publisher, keywords text[]);

 Structured types can be used to create tables

CREATE TABLE books of Book


30
Structured Types (Cont.)
 Creating tables without creating an intermediate type
 For example, the table books could also be defined as
follows:
Create table books (title varchar(20),authors text[],
pub_date date, pub Publisher, keywords text[])

31
Structured Types (Cont.)
Add two records into the books table
Insert into books (title,authors,pub_date, pub, keywords) values
('Compilers','{"Smith","Jones"}', now()::date,row('McGraw-Hill','New
York')::publisher,'{"Parsing","Analysis"}'),
('Networks','{"Jones","Frick"}',now()::date,row('Oxford','London')::p
ublisher,'{"Internet","Web"}')

Retrieve the content of the books table – two rows will be returned
Select * from Books;

Unnesting the nested relation - array


Select title,c.* as authors, (pub).name,(pub).branch, k.* as keywords
from Books b, unnest(authors) c, unnest(keywords) k;
32
Structured Types (Cont.) – Nested Table
Create Table Departments (dID serial primary key, dname varchar(20) not null);
Insert INTO Departments (dname) values ('Sales'), ( 'Marketing'), ('Production'),
('IT')
Create type name_type as (fname varchar(20), lname varchar(20));
Create type Edu_ty as (name varchar(20), Institution varchar(20), year
varchar(5));
Create table Employees (Id serial not null primary key, fullname name,
telno text[], edu Edu_Ty[], salary numeric, dId int references
departments(did));
Insert into Employees (fullname, telno, edu, salary, did) values
(row('dawit','alemu')::name,'{"0111222922","0911631715"}’,
Array[row('MSc','AAU','2015')::Edu_ty,
row('Bsc','AAU','2013')::Edu_ty,
row('Diploma','BU','1999')::Edu_ty],
33
14000, 4);
Structured Types (Cont.)
Define function that return table –
Create function getEmployee (eID int) returns Table (id int, fname varchar(20),
lname varchar(20), telno varchar(20),cred_name varchar(20), awarding_Inst
varchar(20), award_year varchar(20),depart_ID int, dname varchar(20)) AS $$
SELECT e.id,(fullname).fname as fname, (fullname).lname as lname, t.* as telno,
(ed).name as cred_name, (ed).Institution as awarding_Inst,(ed).year as
award_year,e.did as depart_ID, dname
from Employees e, unnest(telno) t, unnest(edu) ed, departments d
where (d.did = e.did) and (e.id = $1);
$$ LANGUAGE SQL;

-- Function as data source


Select * from getEmployee(1);

34
Inheritance in PostgreSQL
 PostgreSQL supports only table inheritance no type
inheritance which is supported in SQL-99
create type Person_Ty as (PID varchar (20), fullname name_type,
address full_address);
create table People of Person_ty;
Create table Emps (id serial, salary numeric) INHERITS (people);
-- inherits columns of the base table people

Inserting data to the Emps table adds part of the data into the base
table – people but the reverse is not true
Insert into emps (pid, fullname,address,salary) values (1245, row('Dawit',
'bekele')::name_type, row('DZ','AM')::full_address, 9878)

35
Inheritance in PostgreSQL
 PostgreSQL supports only table inheritance no type
inheritance which is supported in SQL-99
create type Person_Ty as (PID varchar (20), fullname name_type,
address full_address);
create table People of Person_ty;
Create table Emps (id serial, salary numeric) INHERITS (people);
-- inherits columns of the base table people

Inserting data to the Emps table adds part of the data into the base
table – people but the reverse is not true
Insert into emps (pid, fullname,address,salary) values (1245, row('Dawit',
'bekele')::name_type, row('DZ','AM')::full_address, 9878)

36
Structured and Collection Types (Oracle)
 Structured types can be declared and used in SQL
CREATE OR REPLACE TYPE Publisher as Object (name varchar(20), branch
varchar(20));
/
CREATE OR REPLACE TYPE VA as VARRAY (5) of VARCHAR(30);
/
CREATE OR REPLACE TYPE Book AS OBJECT (title varchar(20), authors VA,
pub_date date, pub Publisher, keywords VA);
/
 Structured types can be used to create tables

create table books of Book

37
Structured Types (Cont.)
 Creating tables without creating an intermediate type
 For example, the table books could also be defined as follows:
Create or Replace table books
(title varchar(20),authors VA,
pub_date date, pub Publisher, keywords VA)

 Methods can be part of the type definition of a structured type:


Create or Replace type Employee_Ty as Object
(name varchar(20), salary int,
MEMBER function giveraise (percent IN int) return NUMBER);

 Method body is created separately


CREATE OR REPLACE TYPE BODY Employee_Ty AS
MEMBER Function giveraise(percent IN int ) return NUMBER IS
begin
RETURN (salary + ( salary * percent) / 100);
end giveraise;
END;
/

38
Creation of Values of Complex Types
 Values of structured types are created using
constructor functions
 E.g. Publisher(‘McGraw-Hill’, ‘New York’)
Note: a value is not an object

39
Creation of Values of Complex Types
 To insert the preceding tuple into the relation books
Insert into books (title, authors, pub, keywords) values
('Compilers', VA('Smith', 'Jones'),
Publisher('McGraw-Hill', 'New York'), VA('parsing','analysis'));

Insert into books (title, authors, pub, keywords) values


(‘Introduction to Programming’, VA('Sample', 'Jones‘, ‘Test’),
Publisher('McGraw-Hill', 'New York'), VA(‘Modularity','analysis'));

Select Title, a.* from Books b, table(b.authors) a

40
Inheritance Person_Typ
 Suppose that we have the following type definition for people:
create or replace type Person_typ as Object
(name varchar(20),
address varchar(20)) not final; Teacher_Typ Student_Typ
/
 Using inheritance to define the student and teacher types
create type Student under Person
As Object (degree varchar(20),
department varchar(20))
create or replace type Student_typ UNDER Person_ty
(degree varchar(20),
department varchar(20)) not final;
/

create type Teacher _typ under Person_typ


(salary integer,
department varchar(20)) not final

 Subtypes can redefine methods by using overriding method in place of member in


the member declaration

41
Reference Types
 Object-oriented languages provide the ability to create
and refer to objects.
 In SQL:1999
 References are to tuples, and
 References must be scoped,
 I.e., can only point to tuples in one specified table

42
Reference Declaration in SQL:1999
 E.g. define a type Department with a field name and a field
head which is a reference to the Person in table people as
scope
create type Department as Object
(name varchar(20), head ref Person_typ )

The table departments is defined as follows


create table departments of Department

43
Initializing Reference Typed Values
 In Oracle, to create a tuple with a reference value, first
create the tuple with a null reference and then set the
reference separately using the function ref(p) applied to a
tuple variable

 E.g. create a department with name CS and head being


the person named John
insert into departments values (`CS’, null)
update departments d
set head = (select ref(p) from people p
where name='John')
where d.name = 'CS' and d.head is null;
/
44
Querying with Structured Types
 Find the title and the name of the publisher of each book.
select title, publisher.name
from books

 Note, the use of the dot notation to access fields of the


composite attribute (structured type) publisher

45
Nested Table
CREATE TYPE animal_ty AS OBJECT (breed
VARCHAR(25), name VARCHAR(25), birthdate DATE);
/
CREATE TYPE animals_nt AS TABLE OF animal_ty;
/
CREATE TABLE breeder (breederName VARCHAR(25),
animals animals_nt)
nested table animals store as animals_nt_tab;
breederName Animals

Breed Name Birthdate


Breed Name Birthdate
46
Nested Table
 CREATE TABLE breeder (breederName VARCHAR(25),
animals animals_nt) nested table animals store as
animals_nt_tab;
INSERT INTO breeder VALUES (
'John Smith ',
animals_nt(
animal_ty('DOG', 'BUTCH', '31-MAR-01'),
animal_ty('DOG', 'ROVER', '05-JUN-01'),
animal_ty('DOG', 'JULIO', '10-JUN-01') )
);
breederName Animals

John Smith 'DOG’ 'BUTCH 31-MAR-01


'DOG’ 'ROVER’ 05-JUN-01
'DOG’ 'JULIO’ '10-JUN-01' 47
Nested Table

SELECT breederName, N.Name, N.BirthDate FROM breeder,


TABLE(breeder.Animals) N;

SELECT breederName, N.Name, N.BirthDate FROM breeder,


TABLE(breeder.Animals) N WHERE N.Name = 'JULIO';

48
Comparison of O-O and O-R Databases
 Relational systems
 simple data types, powerful query languages, high protection.
 Persistent-programming-language-based OODBs
 complex data types, integration with programming language,
high performance.
 Object-relational systems
 complex data types, powerful query languages, high protection.
 Note: Many real systems blur these boundaries
 E.g. persistent programming language built as a wrapper on a
relational database offers first two benefits, but may have poor
performance.

49
Template for the review report
 Introduction
 Motivation
 Problem statement
 Proposed solution
 Critics of the reviewers (both +ve and –ve)
 Conclusion

50
Distributed Database

51
Outline
 Distributed Database
 Introduction
 DDBMS Architecture
 DDB Design
 Distributed Query Processing

52
1. Introduction to Distributed
Database

53
File Systems

Program 1
Data
description File 1

Redundant Data
Program 2
Data File 2
description
Program 3
Data File 3
description

54
Database Management

Application program 1
(with data semantics)

Data description
Application program 2 Data Manipulation DATABASE
(with data semantics)

Application program 3
(with data semantics)

55
Quiz

 https://app.sli.do/event/mDVTamzrFpEb4VMLxvzRgh/live/q
uestions
 Or https://www.slido.com/

 Code: #2177058

56
Objective of database technology
 The key objective of DBS is Integration not centralization

 It is possible to achieve integration without centralization

57
Motivation

Database Technology Computer Network

integration distribution

Distributed Database
Systems

integration

Integration ≠ centralization

58
Quiz

 https://app.sli.do/event/mDVTamzrFpEb4VMLxvzRgh/live/q
uestions
 Or https://www.slido.com/

 Code: #2177058

59
What is distributed …
 Processing logic or processing elements
 Functions
 Data
 Control

60
Classification of Distributed computing
 Criteria's [Bochmann, 1983]
 Degree of coupling – how closely the processing elements are
connected together
 Amount of Data exchanged/ amount of local processing
 Weak vs strong coupling
 Interconnection structure
 Point-to-point interconnection b/n processing units
 Common interconnection channel
 Interdependence of components
 Synchronization between components
 Synchronous or asynchronous

61
What is a Distributed Database
System?
 A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over a computer
network.
 A distributed database management system (D–DBMS) is
the software that manages the DDB and provides an
access mechanism that makes this distribution
transparent to the users.
 Distributed database system (DDBS) = DDB + D–DBMS

62
What is not a DDBS?
 A timesharing computer system
 A loosely or tightly coupled multiprocessor system
 A database system which resides at one of the nodes of a
network of computers - this is a centralized database on
a network node

63
Shared-Memory Architecture

P1 P2 … Pn

Memory

Examples : symmetric multiprocessors (Sequent, Encore) and


some mainframes (IBM3090, Bull's DPS8)

64
Shared-Disk Architecture
Computer System Computer System Computer System

CPU CPU CPU

Memory Memory Memory

Shared
Secondary
Storage

Examples : DEC's VAXcluster, IBM's IMS/VS Data


Sharing

65
Shared-Nothing Architecture

Computer System Computer System Computer System

CPU CPU CPU


Memory Memory Memory

Switch

Examples : Teradata's DBC, Tandem, Intel's


Paragon, NCR's 3600 and 3700

66
Centralized DBMS on a Network

Site 1

Site 2

Communication
Network

Site 3
Site 4

67
Distributed DBMS Environment
Site 2

Site 2
Site 1

Communication
Network

Site 3
Site 4

68
Implicit Assumptions
 Data stored at a number of sites ➯ each site logically
consists of a single processor.
 Processors at different sites are interconnected by a
computer network ➯ no multiprocessors
 parallel database systems
 Distributed database is a database, not a collection of files
➯ data logically related as exhibited in the users’ access
patterns
 relational data model
 D-DBMS is a full-fledged DBMS
 not remote file system, not a TP system

69
Promises of Distributed DBMS
 Transparent management of distributed, fragmented, and
replicated data
 Improved reliability/availability through distributed
transactions
 Improved performance
 Easier and more economical system expansion

70
Transparency
 Transparency is the separation of the higher level
semantics of a system from the lower level
implementation issues.
 Fundamental issue is to provide
 Data independence in the distributed environment
 Network (distribution) transparency
 Replication transparency
 Fragmentation transparency
 horizontal fragmentation: selection
 vertical fragmentation: projection
 hybrid

71
Review questions
 List any Four Promises of Distributed database and with
example.
 Explain network, location, fragmentation and replication
transparency types with example.

 Given a customer account (AID,Name, sex, region, subcity,


woreda, kebele, HouseNo, EmploymentType, …) data in four
branches (b1, b2, b3,b4) of CBE, which transparency is fails and
why in the following SQL statement
Select * from account@b1 union
Select * from account@b2 union
Select * from account@b3 union
Select * from account@b4
72
2. Distributed DBMS Architecture

73
Introduction: Architecture
 Defines the structure of the system. i.e,
 The components of a system are identified
 The functions of each component is specified and
 The interrelationships and interactions among these
components are defined

 The three “reference” architectures for distributed DBMS


 Client/server
 Peer-to-peer distributed DBMS
 Multi database system

74
DBMS Standardization
 Reference Model
 A conceptual framework whose purpose is to divide standardization
work into manageable pieces and to show at a general level how these
pieces are related to one another. (e.g., ISO/OSI)

 The three approaches


1. Component-based

 Components of the system are defined together with the


interrelationships between components.

Recommended for design and implementation of system

75
DBMS Standardization …
2. Function-based
 Classes of users are identified together with the functionality that the system
will provide for each class (e.g., ISO/OSI)

 The objectives of the system are clearly identified. But it gives very little
insight into how these objectives are attained

3. Data-based
 Identify the different types of data and specify the functional units that will
realize and/or use data according to these views.

 As data is the central resource that DBMS manages datalogical approach is


the preferable for standardization activities (e.g., ANSI/SPARC )

76
ANSI/SPARC Architecture

77
Conceptual Schema Definition
RELATION PROJ [
KEY = {PNO}
ATTRIBUTES = {
PNO : CHARACTER(7)
PNAME : CHARACTER(20)
BUDGET : NUMERIC(7)
LOC : CHARACTER(15)
}
]
RELATION ASG [
KEY = {ENO,PNO}
ATTRIBUTES = {
ENO : CHARACTER(9)
PNO : CHARACTER(7)
RESP : CHARACTER(10)
DUR : NUMERIC(3)
}
]

78
Internal Schema Definition
RELATION EMP [
KEY = {ENO}
ATTRIBUTES = {
ENO : CHARACTER(9)
ENAME : CHARACTER(15)
TITLE : CHARACTER(10)
}
]

INTERNAL_REL E [
INDEX ON E# CALL EMINX
FIELD = {
E# : BYTE(9)
ENAME : BYTE(15)
TIT : BYTE(10)
}
]

79
External View Definition –
Example 1
Create a BUDGET view from the PROJ relation

CREATE VIEW BUDGET(PNAME, BUD) AS SELECT


PNAME, BUDGET FROM PROJ

80
External View Definition –
Example 2
Create a Payroll view from relations EMP and
Pay

CREATE VIEW PAYROLL (EMP_NO, EMP_NAME, SAL)


AS SELECT EMP.ENO,EMP.ENAME,PAY.SAL
FROM EMP, PAY
WHERE EMP.TITLE = PAY.TITLE

81
Architectural models for Distributed DBMS
 Ways to put multiple databases for sharing multiple
DBMS

82
Dimensions of the Problem
 Distribution

 Whether the components (deals with data) of the system are

located on the same machine or not

 Client-server vs Peer to Peer

 Heterogeneity

 Various levels (hardware, communications, operating system)

 DBMS important one

 data model, query language, transaction management algorithms

83
Dimensions of the Problem
 Autonomy
 Refers to the degree to which individual DBMSs can operate independently

 Autonomy is a function of communication, execution of transaction, dependency

 Various dimensions:
 Design autonomy: Ability of a component DBMS to decide on issues related to its own
data model, design and transaction management techniques.

 Communication autonomy: Ability of a component DBMS to decide whether and how


to communicate with other DBMSs. i.e., what type of information it wants to provide
to other DBMSs or the SW that controls their global execution

 Execution autonomy: Ability of a component DBMS to execute local operations in any


manner it wants to.

84
Architectural alternatives
 A0, D0, H0: logically integrated system
 Set of homogenous multiple DBMS

E.g. share everything multiprocessor system

 A0, D0, H1: logically integrated heterogeneous system


E.g. integrating network, hierarchical and relational databases
residing in the same machine

 A0, D1, H0: database is distributed (client-server) even


though integrated view is presented to users
85
Architectural …
 A0, D2, H0: same type of transparency is provided to users in a
fully distributed environment
 There is no distinction between the client and server

 A1, D0, H0: semi-autonomous systems (federated DBMS)


 Components have significant autonomy in their execution

 but their participation in a federation indicated that they are willing


to cooperate with others in the executing user requests that access
multiple databases

 Components are homogenous and not distributed

86
Architectural …
 (A1, D1, H1): distributed heterogeneous federated DBMS
 (A2, D0, H0): fully autonomous (multi-database system). The
components didn’t know how to talk with each other
 It is autonomous collections of homogenous DBMS
 Multi-DBMS is the software that provides for the management of this
Multi-databases and provides transparent access to it

 (A2, D0, H1): Autonomous, and heterogeneous DBMSs


 (A2, D1,H1): client/server distributed heterogeneous systems
 (A2, D2, H1): P2P distributed heterogeneous systems
87
Client/server

88
Client/server
 Task distribution

89
Advantages of Client-Server Architectures
 More efficient division of labor
 Horizontal and vertical scaling of resources
 Better price/performance on client machines
 Ability to use familiar tools on client machines
 Client access to remote data (via standards)
 Full DBMS functionality provided to client workstations
 Overall better system price/performance

90
Problems With Multiple-Client/Single Server
 Server forms bottleneck
 Server forms single point of failure
 Database scaling difficult

91
Multiple client- multiple server

 Each client manages its own connection to the


appropriate server
92
Multiple client – multiple server

 Each client knows of its own home server which


communicates with other servers as required
93
Peer-to-Peer Distributed Systems

94
Components of DDBMS

95
MDBMS architecture with GCS

96
Components of a Multi-DBMS

97
Review questions
 States the three reference database archteture in DDB
 Considering data oriented approach- ANSI/SPARC explain
 Its the three layers and associated actors
 at-least three advantage
 List the building blocks of client server database
architecture along with services in each components.
 Explain the main characteristics of p2p database
architecture
 What is a multi-database?

98
3. Distributed Database Design

99
Introduction
 The design of DDB involves
 Making decision on the placement of data and program across
the sites of a computer network as well as possible designing
the network itself

 In DDBMS, the placement of applications entails


 Placement of the DDBMS software; and
 Placement of the applications that run on the DB

100
Design strategies
 Top-down
 Based on designing systems from scratch
 Begins with the requirement analysis that defines the
environment of the system and elicits both the data and
processing needs of all potential database users
 It is applicable for the design of homogeneous databases
 Bottom-up
 When the databases already exist at a number of sites
 Design involves integrating databases into one database
 Integrate Local schema into Global schema
 It is ideal in the context of heterogeneous databases

101
Top-Down Design Process

102
Distribution Design Issues
 Why fragment at all?

 How should we fragment?

 How much should we fragment?

 Is there any way to test the correctness of decomposition?

 How should we allocate?

 What is the necessary information for fragmentation and


allocation?

103
Reasons for Fragmentation
 Can't we just distribute relations?
 What is a reasonable unit of distribution?
 relation
 views are subsets of relations locality
 extra communication
 fragments of relations (sub-relations)
 concurrent execution of a number of transactions that access
different portions of a relation
 views that cannot be defined on a single fragment will require extra
processing
 semantic data control (especially integrity enforcement) more
difficult

104
Correctness of fragmentation
 Given a relation R and its fragments R1, …, Rn , the
fragmentation is correct if the following three conditions
are supported
 Completeness: a tuple in the original relation exists in some of
the fragments
 ReConstructability: there exists an operator that can generate
the original relation from the fragments
 Disjointness: there is no overlap or interaction between
fragments

105
Fragmentation Alternatives-Horizontal

106
Fragmentation Alternatives-Vertical

107
Degree of Fragmentation

Finding the suitable level of partitioning within this range

108
Fragmentation
 Horizontal Fragmentation (HF)
 Primary Horizontal Fragmentation (PHF)

 Derived Horizontal Fragmentation (DHF)

 Vertical Fragmentation (VF)

 Hybrid Fragmentation (HF)

109
PHF – Information Requirements
 Application Information
 minterm selectivity: sel(mi)
 The number of tuples of the relation that would be accessed by a
user query which is specified according to a given minterm predicate
mi
 access frequencies: acc(qi)
 The frequency with which a user application accesses data. If Q = {q1,
q2, …, qq} is a set of user queries, acc(qi) indicates the access
frequency of the query qi in a given period
 Acc(mi) is computed from the acc(qi) that constitute the minterm

110
Primary Horizontal Fragmentation
Definition:
Rj = σFj (R ), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a minterm
predicate
 A horizontal fragment Ri of relation R consists of all the tuples
of R which satisfy a minterm predicate mi
 Given a set of minterm predicates M, there are as many
horizontal fragments of relation R as there are minterm
predicates
 Set of horizontal fragments also referred to as minterm
fragments
111
PHF – Algorithm
Given:
A relation R, the set of simple predicates Pr

Output:
The set of fragments of R = {R1, R2,…,Rw} which obey the
fragmentation rules.

Preliminaries :
1. Pr should be complete
2. Pr should be minimal

112
Completeness of Simple Predicates
 A set of simple predicates Pr is said to be complete IFF
the accesses to the tuples of the minterm fragments
defined on Pr requires that two tuples of the same
minterm fragment have the same probability of being
accessed by any application
 Example:
 Assume PROJ[PNO, PNAME, BUDGET, LOC] has two
applications defined on it
 Find the budgets of projects at each location (1)
 Find projects with budgets less than $200000 (2)

113
Completeness of Simple Predicates
 According to (1),
 Pr = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”}
which is not complete with respect to (2).

 Modify
 Pr = {LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤200000, BUDGET>200000}

which is complete.

114
Minimality of Simple Predicates
 If a predicate influences how fragmentation is performed,
(i.e., causes a fragment f to be further fragmented into,
say, fi and fj) then there should be at least one application
that accesses fi and fj differently
 In other words, the simple predicate should be relevant in
determining a fragmentation.
 If all the predicates of a set Pr are relevant, then Pr is
minimal

115
Minimality of Simple Predicates
Example :
 Pr ={LOC=“Montreal”, LOC=“New York”, LOC=“Paris”,
BUDGET≤200000, BUDGET>200000}

is minimal (in addition to being complete).

 However, if we add
PNAME = “Instrumentation”

then Pr is not minimal.

116
COM_MIN Algorithm
 Given:
a relation R and a set of simple predicates Pr
 Output:
a complete and minimal set of simple predicates Pr' for Pr
 Rule 1:
a relation or fragment is partitioned into at least two
parts which are accessed differently by at least one
application.

117
COM_MIN Algorithm
❶ Initialization
 find a pi ∈ Pr such that pi partitions R according to Rule 1
 set Pr' = pi ;
 Pr ←Pr – pi;
 F ←fi
❷ Iteratively add predicates to Pr' until it is complete
 find a pj ∈ Pr such that pj partitions some fk defined according to
minterm predicate over Pr' according to Rule 1
 set Pr' = Pr' ∪ pi ;
 Pr ←Pr – pi;
 F ← F ∪ fi
 if ∃pk ∈ Pr' which is non-relevant then
 Pr' ← Pr' – pk
 F ← F – fk

118
PHORIZONTAL Algorithm
 Makes use of COM_MIN to perform fragmentation
 Input:
a relation R and a set of simple predicates Pr
 Output:
a set of minterm predicates M according to which
relation R is to be fragmented

❶ Pr' ← COM_MIN (R, Pr)


❷ determine the set M of minterm predicates
❸ determine the set I of implications among pi ∈ Pr
❹ eliminate the contradictory minterms from M

119
PHF – Example
 Two candidate relations : Skill and PROJ.
 Fragmentation of relation Skill
 Application: Check the salary info and determine raise
 Employee records kept at two sites application run at two sites
 Simple predicates
 p1: SAL ≤ 30000
 p2: SAL > 30000
 Pr= {p1,p2} which is complete and minimal Pr'=Pr
 Minterm predicates
 m1: (SAL ≤ 30000)
 m2: NOT(SAL ≤ 30000) = (SAL > 30000)

120
PHF - Example

Skill1 Skill2

121
PHF - Example
 Fragmentation of relation PROJ
 Applications:
 Find the name and budget of projects given their location
 Issued at three sites
 Access project information according to budget
 one site accesses ≤200000 other accesses >200000
 Simple predicates
 For application (1)
 p1 : LOC = “Montreal”
 p2 : LOC = “New York”
 p3 : LOC = “Paris”
 For application (2)
 p4 : BUDGET ≤ 200000
 p5 : BUDGET > 200000
 Pr = Pr' = {p1,p2,p3,p4,p5}

122
PHF – Example
 Fragmentation of relation PROJ continued
 Minterm fragments left after elimination
m1 : (LOC = “Montreal”) ∧ (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) ∧ (BUDGET > 200000)
m3 : (LOC = “New York”) ∧ (BUDGET ≤ 200000)
m4 : (LOC = “New York”) ∧ (BUDGET > 200000)
m5 : (LOC = “Paris”) ∧ (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) ∧ (BUDGET > 200000)

123
PHF Correctness
 Completeness
 Since Pr' is complete and minimal, the selection predicates are
complete
 Reconstruction
 If relation R is fragmented into FR = {R1,R2,…,Rr}
R = ∪∀Ri ∈FR Ri
 Disjointness
 Minterm predicates that form the basis of fragmentation
should be mutually exclusive.

124
Review Question
Given relation EMP (ENO, ENAME, Title,…), let p1: TITLE <
“Programmer” and p2: TITLE > “Programmer” be two
simple predicates. Assume each attribute is string type.
(a) Perform a horizontal fragmentation of relation EMP
with respect to {p1, p2}
(b) Explain why the resulting fragmentation (EMP1, EMP2)
does not fulfill the correctness rules of fragmentation.
(c) Explain what should be done to make the fragmentation
correct

125
Derived Horizontal Fragmentation
 Defined on a member relation of a link according to a
selection operation specified on its owner.
 Each link is an equijoin.
 Equijoin can be implemented by means of semi-joins.

126
DHF – Definition
 Given a link L where owner(L)=S and member(L)=R, the
derived horizontal fragments of R are defined as
Ri = R ⋉Si, 1≤i≤w

 where w is the maximum number of fragments that will


be defined on R and
Si = σFi (S)

 where Fi is the formula according to which the primary


horizontal fragment Si is defined.

127
DHF- Example
 Given link L1 where owner(L1)=SKILL and member(L1)=EMP
EMP1 = EMP ⋉ SKILL1
EMP2 = EMP ⋉ SKILL2 Skill2
Skill1
 where
SKILL1 = σSAL≤30000(SKILL)
SKILL2 = σSAL>30000(SKILL)

128
DHF – Correctness
 Completeness
 Let R be the member relation of a link whose owner is relation S which
is fragmented as FS = {S1, S2, ..., Sn}. Furthermore, let A be the join
attribute between R and S. Then, for each tuple t of R, there should be a
tuple t' of S such that: t[A] = t'[A]
 i.e., Referential integrity :(tuples of any fragment of the member
relation are also in the owner relation)
 Reconstruction
 Reconstruction of a global relation R from its fragments {R1, R2,
…, Rn} is performed by the union operator (R is union of its
fragments)
 Disjointness
 In DHF disjointness is guaranteed only if the join graph
between the owner and the member fragments is simple.
129
Vertical Fragmentation
 Has been studied within the centralized context
 design methodology
 physical clustering
 More difficult than horizontal, because more alternatives
exist
 Two approaches :
Grouping: attributes to fragments
Splitting: relation to fragments

130
Review question
 Given relation PAY (Title, Sal) and EMP(ENO, ENAMA, Title),
let p1: SAL < 30000 and p2: SAL ≥ 30000 be two simple
predicates.
a) Perform a primary horizontal fragmentation of PAY with
respect to these predicates to obtain PAY1, and PAY2.
b) Using the fragmentation of PAY, perform further derived
horizontal fragmentation for EMP.
c) Show completeness, reconstruction, and disjointness of the
fragmentation of EMP.

131
VF
 Overlapping fragments
 grouping
 Non-overlapping fragments
 splitting
 We do not consider the replicated key attributes to be
overlapping
 Advantage:
 Easier to enforce functional dependencies (for integrity
checking etc.)

132
VF – Information requirements
 Application Information
 Attribute affinities
 a measure that indicates how closely related the attributes are
 This is obtained from more primitive usage data
 Attribute usage values
 Given a set of queries Q = {q1, q2,…, qq} that will run on the relation
R[A1, A2,…, An]

 use(qi,•) can be defined accordingly

133
VF – Definition of use(qi,Aj)
 Consider the following 4 queries for relation PROJ
q1: SELECT BUDGET q2: SELECT PNAME,BUDGET
FROM PROJ FROM PROJ
WHERE PNO=Value
q3: SELECT PNAME q4: SELECT SUM(BUDGET)
FROM PROJ FROM PROJ
WHERE LOC=Value WHERE LOC=Value
Let A1= PNO, A2= PNAME, A3= BUDGET, A4= LOC

134
VF – Affinity Measure aff(Ai,Aj)
 The attribute affinity measure between two attributes Ai
and Aj of a relation R[A1, A2, …, An] with respect to the set
of applications Q = (q1, q2, …, qq) is defined as follows :

aff(Ai, Aj) = all queries that access Ai and Aj (query access)

Where refl(qk) is the number of accesses to Ai and Aj for the


query qk at site sl and accl(qk) is the query qk access
frequency
135
Example
 Assume each query in the previous example accesses the
attributes once during each execution

 Also assume the access frequencies


of each query in per site

 Then
aff(A1, A3) = 15*1 + 20*1+10*1
= 45
 and the attribute affinity matrix AA is

136
VF – Clustering Algorithm
 Take the attribute affinity matrix AA and reorganize the
attribute orders to form clusters where the attributes in
each cluster demonstrate high affinity to one another

 Bond Energy Algorithm (BEA) has been used for


clustering of entities. BEA finds an ordering of entities
(e.g. attributes) using the global affinity measure
𝑛 𝑛
𝑎𝑓𝑓 𝑎𝑖 , 𝑎𝑗−1 + 𝑎𝑓𝑓 𝑎𝑖 , 𝑎𝑗+1 +
𝐴𝑀 = 𝑚𝑎𝑥 ෍ ෍ 𝑎𝑓𝑓 𝑎𝑖 , 𝑎𝑗
𝑎𝑓𝑓 𝑎𝑖−1 , 𝑎𝑗 + 𝑎𝑓𝑓 𝑎𝑖−1 , 𝑎𝑗
𝑖=1 𝑗=1
𝑛 𝑛

𝐴𝑀 = 𝑚𝑎𝑥 ෍ ෍ 𝑎𝑓𝑓 𝑎𝑖 , 𝑎𝑗 𝑎𝑓𝑓 𝑎𝑖 , 𝑎𝑗−1 + 𝑎𝑓𝑓 𝑎𝑖 , 𝑎𝑗+1


𝑖=1 𝑗=1

137
Bond Energy Algorithm
Input: The AA matrix
Output: The clustered affinity matrix CA which is a
perturbation of AA
❶ Initialization: Place and fix one of the columns of AA in
CA
❷ Iteration: Place the remaining n-i columns in the
remaining i+1 positions in the CA matrix. For each
column, choose the placement that makes the most
contribution to the global affinity measure
❸ Row order: Order the rows according to the column
ordering

138
Cont(Ai,Ak, Aj) = 2bond(Ai Ak)+2bond(Ak,Aj)-2bond(Ai, Aj)

139
BEA – Example
 Consider the following AA matrix and the corresponding CA matrix
where A1 and A2 have been placed.

Place A3:
Ordering (0-3-1):
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2):
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4): cont (A2,A3,A4) = 1780

140
BEA: Example

141
Partitioning Algorithm
 The objective is find set of attributes that can be accessed
solely in most cases. i.e., to divide a set of clustered
attributes {A1, A2, …, An} into two (or more) sets {A1, A2, …, Ai}
and {Ai+1, …, An} such that there are no (or minimal)
applications that access both (or more than one) of the
sets.

142
Partitioning algorithm
Define
AQ(qi) = {Aj|use(qi, Aj) =1}
TQ = {qi| AQ(qi) subset of TA}
BQ = {qi| AQ(qi) subset of BA}
OQ = Q –{TQ U BQ} //set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications that access
only TA
CBQ = total number of accesses to attributes by applications that access
only BA
COQ = total number of accesses to attributes by applications that access
both TA and BA
Then find the point along the diagonal that maximizes

z = CTQ∗CBQ−COQ2

143
Partitioning algorithm
Two problems :
❶ Cluster forming in the middle of the CA matrix
 Shift a row up and a column left and apply the algorithm to
find the “best” partitioning point
 Do this for all possible shifts
 Cost O(m2)
❷ More than two clusters
 m-way partitioning
 try 1, 2, …, m–1 split points along diagonal and try to find the best
point for each of these
 Cost O(2m)

144
VF correctness
 A relation R, defined over attribute set A and key K, generates the
 vertical partitioning FR = {R1, R2, …, Rr}.
 Completeness
 The following should be true for A:
 A =∪ ARi
 Reconstruction
 Reconstruction can be achieved by
R = ∆ Ri ∀Ri ∈FR
 Disjointness
 TID's are not considered to be overlapping since they are maintained
by the system
 Duplicated keys are not considered to be overlapping

145
Hybrid fragmentation

146
Allocation
 Problem Statement
 Given
F = {F1, F2, …, Fn} fragments
S = {S1, S2, …, Sm} network sites
Q = {q1, q2,…, qq} applications
Find the "optimal" distribution of F to S.
 Optimality
 Minimal cost
 Communication + storage + processing (read & update)
 Cost in terms of time (usually)
 Performance
 Response time and/or throughput
 Constraints
 Per site constraints (storage & processing)

147
Information Requirements
 Database information
 selectivity of fragments
 size of a fragment
 Application information
 access types and numbers
 access localities
 Communication network information
 unit cost of storing data at a site
 unit cost of processing at a site
 Computer system information
 bandwidth
 latency
 communication overhead

148
Allocation Model
 General Form
min(Total Cost)
subject to
response time constraint
storage constraint
processing constraint

Decision Variable

xij = 1 if fragment Fi is stored at site Sj


0 otherwise

150
Allocation Model
 Total Cost
Query processing cost + cost of storing a fragment at a site
 Storage Cost (of fragment Fj at Sk)
(unit storage cost at Sk) * (size of Fj) * xjk
 Query Processing Cost (for one query)
processing component + transmission component

151
Allocation Model
 Query Processing Cost

Processing component

access cost + integrity enforcement cost + concurrency


control cost

 Access cost

(no of update accesses + no of read accesses) * xij * local processing


cost at a site

152
Allocation Model
 Query Processing Cost
Transmission component

cost of processing updates + cost of processing retrievals

 Cost of updates
update message cost + acknowledgment cost

 Retrieval Cost
(cost of retrieval command + cost of sending back the result)

153
Allocation Model
 Constraints
 Response time
 Execution time of query <= max allowable response time for that
query
 Storage constraints
 Storage requirement of a fragment at that site <=storage capacity at
that site

 Processing constraint (for a site)


 Processing load of a query at that site <= processing capacity of that
site

154
Allocation Model
 Attempts to reduce the solution space
 assume all candidate partitioning are known and select the
“best” partitioning
 ignore replication at first
 sliding window on fragments

155
4. Distributed Query Processing

156
Introduction
 Query Processing

high level user query

query
Processor

low level data manipulation


commands
157
Query Processing Components
 Query language that is used
 SQL

 Query execution methodology


 The steps that one goes through in executing high level
(declarative) user queries.

 Query optimization
 How do we determine the “best” execution plan?

158
Query processing problem
Example

SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND DUR > 37

159
Example …

160
Cost of Alternatives
 Assume:
 size(EMP) = 400, size(ASG) = 1000
 tuple access cost = 1 unit; tuple transfer cost = 10 units
 Strategy 1
 produce ASGi: (10+10)∗tuple access cost 20
 transfer ASGi to the sites of EMP: (10+10)∗tuple transfer cost 200
 produce EMPi : (10+10) ∗tuple access cost∗2 40
 transfer EMPi to result site: (10+10) ∗tuple transfer cost 200
Total cost 460
 Strategy 2
 transfer EMP to site 5:400∗tuple transfer cost 4,000
 transfer ASGi to site 5 :1000∗tuple transfer cost 10,000
 produce ASGi:1000∗tuple access cost 1,000
 join EMPi and ASGi:400∗20∗tuple access cost 8,000
Total cost 23,000

161
Objective of Query processing
 To transform a high-level query on a distributed database into low
level language on local databases
 Minimize a cost function
I/O cost + CPU cost + communication cost
 These might have different weights in different distributed
environments
 Wide area networks
 communication cost will dominate
 low bandwidth
 low speed
 high protocol overhead
 Local area networks
 communication cost not that dominant
 total cost function should be considered

162
Complexity of Relational Operations
Assume
• relations of cardinality n
• sequential scan

Operation Complexity
Select O(n)
Project
Project (with duplicate elimination) O(nlog n)
Group
Join O(nlog n)
Semi-join
Division
Set Operations
Cartesian Product O(n2)
163
Characterization of Query processors
 Four characteristics that hold for Centralized query processors
 Language
 Input language – relational calculus or relational algebra
 Types of optimization
 Exhaustive search
 cost-based
 Optimal
 combinatorial complexity in the number of relations
 Heuristics
 not optimal
 regroup common sub-expressions
 perform selection, projection first
 replace a join by a series of semi-joins
 reorder operations to reduce intermediate relation size
 optimize individual operations

164
Optimization Timing
 Static
 compilation optimize prior to the execution
 difficult to estimate the size of the intermediate results error
propagation
 can amortize over many executions
 E.g. R*
 Dynamic
 run time optimization
 exact information on the intermediate relation sizes
 have to reoptimize for multiple executions
 E.g. Distributed INGRES
 Hybrid
 compile using a static algorithm
 if the error in estimate sizes > threshold, reoptimize at run time
 E.g. MERMAID

165
Statistics
 Relation
 cardinality
 size of a tuple
 fraction of tuples participating in a join with another relation
 Attribute
 cardinality of domain
 actual number of distinct values
 Common assumptions
 independence between different attribute values
 uniform distribution of attribute values within their domain

166
Decision Sites
 Centralized
 single site determines the “best” schedule
 simple
 need knowledge about the entire distributed database
 Distributed
 cooperation among sites to determine the schedule
 need only local information
 cost of cooperation
 Hybrid
 one site determines the global schedule
 each site optimizes the local subqueries

167
Network Topology
 Wide area networks (WAN)
 characteristics
 low bandwidth
 low speed
 high protocol overhead
 communication cost will dominate; ignore all other cost factors
 global schedule to minimize communication cost
 local schedules according to centralized query optimization
 Local area networks (LAN)
 communication cost not that dominant
 total cost function should be considered
 broadcasting can be exploited (e.g. joins) to optimize query
processing
 special algorithms exist for star networks

168
Exploitation of Replicated Fragments
 In Distributed query processing global relations are
mapped into queries on physical fragments of relation by
translating relations into fragments – localization
 Replication is need for increasing reliability and availability

 Optimization algorithms might exploit the existence of


replicated fragments at run time to minimize
communication time

169
Use of semijoins
 Semijoin reduces the size of the operand relation
 But it increase the number of messages and in the local
processing time
 E.g. SDD 1, designed for slow wide area networks, use
semijoin extensively

170
Layers of Query Processing

171
Query Decomposition
 Input : Calculus query on global relations
1. Normalization
 manipulate query quantifier and qualification
2. Analysis
 detect and reject “incorrect” queries
 possible for only a subset of relational calculus
3. Simplification
 eliminate redundant predicates
4. Restructuring
 calculus query is restructured into algebraic query
 more than one translation is possible
 use transformation rules

172
Normalization
 Lexical and syntactic analysis
 check validity (similar to compilers)
 check for attributes and relations
 type checking on the qualification
 Put into normal form
 Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧ (pm1∨pm2∨…∨pmn)
 Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨ (pm1 ∧pm2∧…∧pmn)
 OR's mapped into union
 AND's mapped into join or selection

173
Analysis
 Remove incorrect queries
 Type incorrect
 If any of its attribute or relation names are not defined in the global
schema
 If operations are applied to attributes of the wrong type
 Semantically incorrect
 Components do not contribute in any way to the generation of the
result
 Only a subset of relational calculus queries can be tested for
correctness
 Those that do not contain disjunction and negation
 Technique to detect incorrect queries
 connection graph (query graph) that represent the semantic of the query
 join graph

174
Analysis – Example
SELECT ENAME,RESP
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"

175
Analysis
 If the query graph is not connected, the query is wrong.
SELECT ENAME,RESP, PNAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND PNAME = "CAD/CAM"
AND DUR ≥ 36
AND TITLE = "Programmer"

176
Simplification
 Use transformation rules
 elimination of redundancy
 idempotency rules
p1 ∧ ¬( p1) ⇔ false
p1 ∧ (p1 ∨ p2) ⇔ p1
p1 ∨ false ⇔ p1

 application of transitivity
 use of integrity rules

177
Simplification – Example
SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”
OR (NOT(EMP.TITLE = “Programmer”)
AND (EMP.TITLE = “Programmer”)
OR EMP.TITLE = “Elect. Eng.”)
AND NOT(EMP.TITLE = “Elect. Eng.”) )

SELECT TITLE
FROM EMP
WHERE EMP.ENAME = “J. Doe”

178
Restructuring
 Convert relational calculus to
relational algebra
 Make use of query trees

Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years.

SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)

179
Restructuring –Transformation Rules
 Commutativity of binary operations
 R×S⇔S×R
 R join S ⇔S join R
 R∪S⇔S∪R
 Associativity of binary operations
 ( R × S ) × T ⇔ R × (S × T)
 ( R join S) join T ⇔ R join (S join T)
 Idempotence of unary operations
 ΠA’(ΠA’(R)) ⇔ΠA’(R)
 σp1(A1)(σp2(A2)(R)) = σp1(A1) ∧ p2(A2)(R)
where R[A] and A' ⊆ A, A" ⊆ A and A' ⊆ A"
 Commuting selection with projection

180
Restructuring –Transformation Rules
 Commuting selection with binary operations
 σp(A)(R × S) ⇔ (σp(A) (R)) × S
 σp(Ai)(R join(Aj,Bk) S) ⇔ (σp(Ai)(R)) join(Aj,Bk) S
 σp(Ai)(R ∪ T) ⇔ σp(Ai)(R) ∪ σp(Ai)(T)
where Ai belongs to R and T
 Commuting projection with binary operations
 ΠC(R × S) ⇔ΠA’(R) × ΠB’(S)
 ΠC(R join(Aj,Bk) S)⇔ΠA’(R) join(Aj,Bk) ΠB’(S)
 ΠC(R ∪ S) ⇔ΠC (R) ∪ ΠC (S)
where R[A] and S[B]; C = A' ∪ B' where A' ⊆ A, B' ⊆ B

181
Example
Example
Find the names of employees other than
J. Doe who worked on the CAD/CAM
project for either 1 or 2 years

SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)

182
Equivalent Query

183
Restructuring

σDur=12 v Dur=24

184
Step 2 – Data Localization
 Input: Algebraic query on distributed relations
 Determine which fragments are involved
 Localization program
 substitute for each global query its materialization program
 ➠ optimize

185
Example
 Assume
 EMP is fragmented into EMP1, EMP2,
EMP3 as follows:
 EMP1=σENO≤“E3”(EMP)
 EMP2= σ“E3”<ENO≤“E6”(EMP)
 EMP3=σENO>“E6”(EMP)
 ASG fragmented into ASG1 and ASG2 as
follows:
 ASG1=σENO≤“E3”(ASG)
 ASG2=σENO>“E3”(ASG)

Replace EMP by (EMP1∪EMP2∪EMP3 ) and


ASG by (ASG1 ∪ ASG2) in any query

186
Provides Parallellism

187
Eliminates …

188
Reduction for PHF
 Reduction with selection
 Relation R and FR={R1, R2, …, Rw} where Rj=σ pj(R)
σ pi(Rj)= φ if ∀x in R: ¬(pi(x) ∧ pj(x))
EMP1=σENO≤“E3”(EMP)
Example
EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT *
EMP3=σENO>“E6”(EMP)
FROM EMP
WHERE ENO=“E5”

189
Reduction for PHF
 Reduction with join
 Possible if fragmentation is done on join attribute
 Distribute join over union
(R1 ∪ R2) join S ⇔ (R1 join S) ∪ (R2 join S)
 Given Ri = σpi(R) and Rj = σpj(R)
Ri join Rj = φ if ∀x in Ri, ∀y in Rj: ¬(pi(x) ∧ pj(y))

190
Reduction for PHF
 Reduction with join - Example
 Assume EMP is fragmented into three
ASG1: σENO ≤ "E3"(ASG)
ASG2: σENO > "E3"(ASG) EMP1=σENO≤“E3”(EMP)
 Consider the query EMP2= σ“E3”<ENO≤“E6”(EMP)
SELECT * FROM EMP, ASG EMP3=σENO>“E6”(EMP)
WHERE EMP.ENO=ASG.ENO

191
Reduction for PHF
 Reduction with join
 Distribute join over unions
 Apply the reduction rule

192
Reduction for VF
 Find useless (not empty) intermediate relations
Relation R defined over attributes A = {A1, ..., An} vertically

fragmented as Ri = ΠA'(R) where A' ⊆ A:
ΠD,K(Ri) is useless if the set of projection attributes D is not in A’
Example: EMP1= ΠENO,ENAME(EMP); EMP2= ΠENO,TITLE (EMP)
SELECT ENAME
FROM EMP

193
Reduction for DHF
 Rule :
 Distribute joins over unions
 Apply the join reduction for horizontal fragmentation

Example
ASG1: ASG JoinENO EMP1
ASG2: ASG JoinENO EMP2
EMP1: σTITLE=“Programmer” (EMP)
EMP2: σTITLE<>“Programmer” (EMP)

Query
SELECT *
FROM EMP, ASG
WHERE ASG.ENO = EMP.ENO
AND EMP.TITLE = “Mech. Eng.”

194
Reduction for DHF

195
Reduction for DHF
Joins over unions

Elimination of the empty intermediate relations (left sub-tree)

196
Reduction for Hybrid Fragmentation
 Combine the rules already specified:
 Remove empty relations generated by contradicting selections
on horizontal fragments
 Remove useless relations generated by projections on vertical
fragments
 Distribute joins over unions in order to isolate and remove
useless joins

197
Reduction for Hybrid Fragmentation
Example
Consider the following hybrid
fragmentation:
EMP1=σENO≤"E4" (ΠENO,ENAME(EMP))
EMP2=σENO>"E4"
(ΠENO,ENAME(EMP))
EMP3= ΠENO,TITLE(EMP)
and the query
SELECT ENAME
FROM EMP
WHERE ENO=“E5”

198
Global Query Optimization
 Input: Fragment query
 Find the best (not necessarily optimal) global schedule
 Minimize a cost function
 Distributed join processing
 Bushy vs. linear trees
 Which relation to ship where?
 Ship-whole vs ship-as-needed
 Decide on the use of semijoins
 Semijoin saves on communication at the expense of more local
processing.
 Join methods
 nested loop vs ordered joins (merge join or hash join)

199
Cost-Based Optimization
 Solution space
 The set of equivalent algebra expressions (query trees).
 Cost function (in terms of time)
 I/O cost + CPU cost + communication cost
 These might have different weights in different distributed
environments (LAN vs WAN).
 Can also maximize throughput
 Search algorithm
 How do we move inside the solution space?
 Exhaustive search, heuristic algorithms (iterative improvement,
simulated annealing, genetic,…)

200
5. Concurrency Control

201
Recap
 What is transaction?
 What are the main problem that may occure if different
transactions are allowed to access data together?
 Any mechanism to detect such problem?
 Considering distributed database system, what can be
distributed?

202
Concurrency Control in Distributed
Database
 Concurrency control schemes dealt with handling of data
as part of concurrent transactions.
 Various locking protocols are used for handling
concurrent transactions in centralized database systems.
 There are no major differences between the schemes in
centralized and distributed databases. The only major
difference is that the way the lock manager should deal
with the replicated data.

203
Locking protocols
1. Single lock manager approach
2. Distributed lock manager approach
a) Primary Copy protocol
b) Majority protocol
c) Biased protocol
d) Quorum Consensus protocol

204
Single Lock Manager - Concurrency Control
in Distributed Database

205
Single Lock Manager …
1. Transaction T1 @S5 request for data
item D
2. The initiator site S5’s Transaction
manager sends the lock request to lock
data item D to the lock-manager site S3.
 The Lock-manager at site S3 will look for the
availability of the data item D.
3. If the requested item is not locked by
any other transactions, the lock-manager
site responds with lock grant message
to the initiator site S5.
4. The initiator site S5 can use the data
item D from any of the sites S1, S2, and
S6 for completing the Transaction T1.
5. After successful completion of the
Transaction T1, the Transaction manager
of S5 releases the lock by sending the
unlock request to the lock-manager site
S3.

206
Primary Copy Protocol

207
Majority Based Protocol
 A transaction which needs to lock data item Q has to
request and lock data item Q in half+one sites in which Q
is replicated (i.e, majority of the sites in which Q is
replicated).
 The lock-managers of all the sites in which Q is replicated
are responsible for handling lock and unlock requests
locally individually.
 Irrespective of the lock types (read or write, i.e, Shared
or Exclusive), we need to lock half+one sites.

208
Majority Based Protocol

209
Parallel Databases

210
Parallel Databases
 Introduction
 I/O Parallelism
 Interquery Parallelism
 Intraquery Parallelism
 Intraoperation Parallelism
 Interoperation Parallelism
 Design of Parallel Systems

211
Introduction
 Parallel machines are becoming quite common and affordable
 Prices of microprocessors, memory and disks have dropped sharply
 Recent desktop computers feature multiple processors and this
trend is projected to accelerate
 Databases are growing increasingly large
 large volumes of transaction data are collected and stored for later
analysis.
 multimedia objects like images are increasingly stored in databases
 Large-scale parallel database systems increasingly used for:
 storing large volumes of data
 processing time-consuming decision-support queries
 providing high throughput for transaction processing

212
Parallelism in Databases
 Data can be partitioned across multiple disks for parallel
I/O.
 Individual relational operations (e.g., sort, join,
aggregation) can be executed in parallel
 data can be partitioned and each processor can work
independently on its own partition.
 Queries are expressed in high level language (SQL,
translated to relational algebra)
 makes parallelization easier.
 Different queries can be run in parallel with each other.
Concurrency control takes care of conflicts.
 Thus, databases naturally lend themselves to parallelism.

213
Modes of Parallelism
 At the heart of all parallel machines is a collection of
processors.
 Each processor has its own local cache
 Classify parallel architectures into three broad groups
 The most tightly coupled architectures shared memory
 A less tightly coupled architecture shares disk but not
memory.
 Shared nothing

214
Shared-Memory

Each processor has access to all the memory of all the


processors. That is, there is a single physical address space
for the entire machine, rather than one address space for
each processor - Network cost, low extensibility

215
Shared-Disk

• every processor has its own memory, which is not accessible


directly from other processors. However, the disks jure accessible
from any of the processors through the communication network.
• complexity, potential performance problem for cache coherency
216
Shared-Nothing

all processors have their own memory and their own disk or disks
the shared-nothing architecture is the most commonlyused architecture for database systems
Used by Teradata, IBM, Sybase, Microsoft for OLAP
Prototypes: Gamma, Bubba, Grace, Prisma, EDS
+ Extensibility, availability
- Complexity, difficult load balancing

217
Hybrid Architectures
 Various possible combinations of the three basic
architectures are possible to obtain different trade-offs
between cost, performance, extensibility, availability, etc.
 Hybrid architectures try to obtain the advantages of
different architectures:
 efficiency and simplicity of shared-memory
 extensibility and cost of either shared disk or shared nothing
 2 main kinds: NUMA and cluster

218
I/O Parallelism
 Reduce the time required to retrieve relations from disk
by partitioning the relations on multiple disks.
 Horizontal partitioning – tuples of a relation are divided
among many disks such that each tuple resides on one
disk.
 Partitioning techniques (number of disks = n):
Round-robin: Send the ith tuple inserted in the relation to disk i
mod n.
Hash partitioning: send tuple n to disk f(n) where f is a
uniformly distributed random function

219
I/O Parallelism (Cont.)
 Range partitioning: break tuples up into contiguous
ranges of keys, requires a key that can be ordered linearly
 Choose an attribute as the partitioning attribute.
 A partitioning vector [vo, v1, ..., vn-2] is chosen.
 Let v be the partitioning attribute value of a tuple. Tuples such
that vi  vi+1 go to disk I + 1. Tuples with v < v0 go to disk 0 and
tuples with v  vn-2 go to disk n-1.
E.g., with a partitioning vector [5,11], a tuple with partitioning
attribute value of 2 will go to disk 0, a tuple with value 8 will go
to disk 1, while a tuple with value 20 will go to disk2.

220
Comparison of Partitioning Techniques
 Evaluate how well partitioning techniques support the
following types of data access:
1.Scanning the entire relation.
2.Locating a tuple (identify query) associatively – point
queries.
 Example: r.A = 25.
3.Locating a set of tuples based on the value of a given
attribute lies within a specified range – range queries.
 Example: 10  r.A < 25.

221
Comparison of Partitioning Techniques(Cont.)
Round robin:
 Advantages
 Best suited for sequential scan of entire relation on each query.
 All disks have almost an equal number of tuples; retrieval work
is thus well balanced between disks.
 Range queries are difficult to process
 No clustering - tuples are scattered across all disks

222
Comparison of Partitioning Techniques(Cont.)
Hash partitioning:
 Good for sequential access
 Assuming hash function is good, and partitioning attributes
form a key, tuples will be equally distributed between disks
 Retrieval work is then well balanced between disks.
 Good for point queries on partitioning attribute
 Can lookup single disk, leaving others available for answering
other queries.
 Index on partitioning attribute can be local to disk, making
lookup and update more efficient
 No clustering, so difficult to answer range queries

223
Range partitioning
 Partition requires a partitioning attribute A usually the
primary key
 A vector of dimension n partitions A
 Vector {v0,v2,…,vn-1}
 Each tuple t goes into:
 Partition 0 if t[A] < v0
 Partition n-1 if t[A] > vn-2
 Partition k if t[A] > vk-1 and t[A] < vk, k >=1
 Simple range partitioning #disks = #partitions

224
Comparison of Partitioning Techniques (Cont.)
Range partitioning:
 Provides data clustering by partitioning attribute value.
 Good for sequential access
 Good for point queries on partitioning attribute: only one disk
needs to be accessed.
 For range queries on partitioning attribute, one to a few disks
may need to be accessed
 Remaining disks are available for other queries.
 Good if result tuples are from one to a few blocks.
 If many blocks are to be fetched, they are still fetched from one to a
few disks, and potential parallelism in disk access is wasted
 Example of execution skew.

225
Partitioning a Relation across Disks
 If a relation contains only a few tuples which will fit into a
single disk block, then assign the relation to a single disk.
 Large relations are preferably partitioned across all the
available disks.
 If a relation consists of m disk blocks and there are n
disks available in the system, then the relation should be
allocated min(m,n) disks.

226
Handling of Skew
 The distribution of tuples to disks may be skewed —
that is, some disks have many tuples, while others may
have fewer tuples.
 Types of skew:
 Attribute-value skew.
 when lots of tuples are clustered around the same (or nearly same
value) i.e. some values appear in the partitioning attributes of many
tuples; all the tuples with the same value for the partitioning attribute
end up in the same partition.
 Can occur with range-partitioning and hash-partitioning.
 Partition skew.
 With range-partitioning, badly chosen partition vector may assign too
many tuples to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.

227
Handling Skew in Range-Partitioning
 To create a balanced partitioning vector (assuming
partitioning attribute forms a key of the relation):
 Sort the relation on the partitioning attribute.
 Construct the partition vector by scanning the relation in
sorted order as follows.
 After every 1/nth of the relation has been read, the value of the
partitioning attribute of the next tuple is added to the partition
vector.
 n denotes the number of partitions to be constructed.
 Duplicate entries or imbalances can result if duplicates are
present in partitioning attributes.
 Alternative technique based on histograms used in
practice

228
Handling Skew using Histograms
Balanced partitioning vector can be constructed from histogram in a
relatively straightforward fashion
Assume uniform distribution within each range of the histogram
Histogram can be constructed by scanning relation, or sampling (blocks
containing) tuples of the relation.

229
Handling Skew Using Virtual Processor
Partitioning
 Skew in range partitioning can be handled elegantly using
virtual processor partitioning:
 create a large number of partitions (say 10 to 20 times the
number of processors)
 Assign virtual processors to partitions either in round-robin
fashion or based on estimated cost of processing each virtual
partition
 Basic idea:
 If any normal partition would have been skewed, it is very likely
the skew is spread over a number of virtual partitions
 Skewed virtual partitions get spread across a number of
processors, so work gets distributed evenly!

230
Interquery Parallelism
 It is a form of parallelism where many different Queries or
Transactions are executed in parallel with one another on many
processors
 Increases transaction throughput; used primarily to scale up a
transaction processing system to support a larger number of
transactions per second.
 Easiest form of parallelism to support, particularly in a shared-
memory parallel database, because even sequential database systems
support concurrent processing.
 More complicated to implement on shared-disk or shared-nothing
architectures
 Locking and logging must be coordinated by passing messages between
processors.
 Data in a local buffer may have been updated at another processor.
 Cache-coherency has to be maintained - reads and writes of data in
buffer must find latest version of data.

231
Intraquery Parallelism
 Execution of a single query in parallel on multiple
processors/disks; important for speeding up long-running
queries.
SELECT * FROM Email ORDER BY Start_Date;
 Two complementary forms of intraquery parallelism :
 Intraoperation Parallelism – parallelize the execution of
each individual operation in the query.
 SELECT * FROM Email ORDER BY Start_Date; //(Sort
Operation)
 SELECT * FROM Student, CourseRegd WHERE
Student.Regno = CourseRegd.Regno; //(Join)

233
Intraquery Parallelism
 Interoperation Parallelism – execute the different
operations in a query expression in parallel.
 A single query may involve multiple operations at once.
 SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id;

 It can be achieved in two ways


1. Pipelined Parallelism: consume the result produced by one
operation by the next operation in the pipeline
Example: r1 ⋈ r2 ⋈ r3 ⋈ r4 (i.e., there is logical dependency)
2. Independent Parallelism: Operations that are not depending
on each other can be executed in parallel at different
processors

234
Parallel Processing of Relational Operations
 The discussion of parallel algorithms assumes:
 read-only queries
 shared-nothing architecture
 n processors, P0, ..., Pn-1, and n disks D0, ..., Dn-1, where disk Di is
associated with processor Pi.
 If a processor has multiple disks they can simply simulate
a single disk Di.
 Shared-nothing architectures can be efficiently simulated
on shared-memory and shared-disk systems.
 Algorithms for shared-nothing systems can thus be run on
shared-memory and shared-disk systems.
 However, some optimizations may be possible.

235
Parallel Sort
Range-Partitioning Sort
 Assumptions:
 Assume n processors, P0, P1, …, Pn-1 and n disks D0, D1, …, Dn-1.
 Disk Di is associated with Processor Pi.
 Relation R is partitioned into R0, R1, …, Rn-1 using Round-robin technique or
Hash Partitioning technique or Range Partitioning technique (if range partitioned
on some other attribute other than sorting attribute)
 Objective:
 to sort a relation (table) Ri that resides on n disks on an attribute A in parallel.
 i.e. choose processors P0, ..., Pm, where m  n -1 to do sorting.
 Step 1: Partition the relations Ri on the sorting attribute A at every
processor using a range vector v. Send the partitioned records which fall in
the ith range to Processor Pi where they are temporarily stored in Di.
 Step 2: Sort each partition locally at each processor Pi. And, send the
sorted results for merging with all the other sorted results which is trivial
process.

236
 Assume that relation Employee(Emp_ID, EName, Salary) is permanently
partitioned using Round-robin technique into 3 disks D0, D1, and D2which are
associated with processors P0, P1, and P2. At processors P0, P1, and P2, the relations
are named Employee0, Employee1 and Employee2 respectively.

 SELECT * FROM Employee ORDER BY Salary;


 Step 1: Construct range vector of form: v[v0, v1, …, vn-2].
 Assume the range vector v[14000, 24000] representing range 0 (14000 and less),
range 1 (14001 to 24000) and range 2 (24001 and more).
 Redistribute Employee 0, Employee 1 and Employee 2 using these range vectors and
store it in temporary disk

237
 Sort 2: Sort each temporary table in ascending order and later merge

238
Parallel Sort (Cont.)
Parallel External Sort-Merge
 Assume the relation has already been partitioned among disks
D0, ..., Dn-1.
 Each processor Pi locally sorts the data on disk Di.
 The sorted runs on each processor are then merged to get
the final sorted output.
 Parallelize the merging of sorted runs as follows:
 The sorted partitions at each processor Pi are range-partitioned
across the processors P0, ..., Pm-1.
 Each processor Pi performs a merge on the streams as they are
received, to get a single sorted run.
 The sorted runs on processors P0,..., Pm-1 are concatenated to get the
final result.

239
SELECT * FROM Employee ORDER BY Salary;

v[14000, 24000]

240
Parallel Join
 The join operation requires pairs of tuples to be tested
to see if they satisfy the join condition, and if they do, the
pair is added to the join output.
 Parallel join algorithms attempt to split the pairs to be
tested over several processors. Each processor then
computes part of the join locally.
 In a final step, the results from each processor can be
collected together to produce the final result.

241
Partitioned Join
 For equi-joins and natural joins, it is possible to partition the
two input relations across the processors, and compute the
join locally at each processor.
 Let r and s be the input relations, and we want to compute r
r.A=s.B s.
 r and s each are partitioned into n partitions, denoted r0, r1, ...,
rn-1 and s0, s1, ..., sn-1.
 Can use either range partitioning or hash partitioning.
 r and s must be partitioned on their join attributes r.A and s.B),
using the same range-partitioning vector or hash function.
 Partitions ri and si are sent to processor Pi,
 Each processor Pi locally computes ri ri.A=si.B si. Any of the
standard join methods can be used.

242
Partitioned Join (Cont.)

243
244
Fragment-and-Replicate Join
 Partitioning not possible for some join conditions
 e.g., non-equijoin conditions, such as r.A > s.B.
 For joins were partitioning is not applicable,
parallelization can be accomplished by fragment and
replicate technique
 Special case - asymmetric fragment-and-replicate:
 One of the relations, say r, is partitioned; any partitioning
technique can be used.
 The other relation, s, is replicated across all the processors.
 Processor Pi then locally computes the join of ri with all of s
using any join technique.

245
Depiction of Fragment-and-Replicate Joins

246
Fragment-and-Replicate Join (Cont.)
 General case: reduces the sizes of the relations at each
processor.
 r is partitioned into n partitions,r0, r1, ..., r n-1; s is partitioned
into m partitions, s0, s1, ..., sm-1.
 Any partitioning technique may be used.
 There must be at least m * n processors.
 Label the processors as: P0,0, P0,1, ..., P0,m-1, P1,0, ..., Pn-1m-1.
 Pi,j computes the join of ri with sj. In order to do so, ri is
replicated to Pi,0, Pi,1, ..., Pi,m-1, while si is replicated to P0,i, P1,i, ...,
Pn-1,i
 Any join technique can be used at each processor Pi,j.

247
Fragment-and-Replicate Join (Cont.)
 Both versions of fragment-and-replicate work with any
join condition, since every tuple in r can be tested with
every tuple in s.
 Usually has a higher cost than partitioning, since one of
the relations (for asymmetric fragment-and-replicate) or
both relations (for general fragment-and-replicate) have
to be replicated.
 Sometimes asymmetric fragment-and-replicate is
preferable even though partitioning could be used.
 E.g., say s is small and r is large, and already partitioned. It may
be cheaper to replicate s across all processors, rather than
repartition r and s on the join attributes.
248
Partitioned Parallel Hash-Join
Parallelizing partitioned hash join:
 Assume s is smaller than r and therefore s is chosen as the
build relation.
 A hash function h1 takes the join attribute value of each tuple
in s and maps this tuple to one of the n processors.
 Each processor Pi reads the tuples of s that are on its disk Di,
and sends each tuple to the appropriate processor based on
hash function h1. Let si denote the tuples of relation s that are
sent to processor Pi.
 As tuples of relation s are received at the destination
processors, they are partitioned further using another hash
function, h2, which is used to compute the hash-join locally.
(Cont.)

249
Partitioned Parallel Hash-Join (Cont.)
 Once the tuples of s have been distributed, the larger relation r is
redistributed across the m processors using the hash function h1
 Let ri denote the tuples of relation r that are sent to processor Pi.
 As the r tuples are received at the destination processors, they are
repartitioned using the function h2
 (just as the probe relation is partitioned in the sequential hash-join
algorithm).
 Each processor Pi executes the build and probe phases of the hash-
join algorithm on the local partitions ri and s of r and s to produce a
partition of the final result of the hash-join.
 Note: Hash-join optimizations can be applied to the parallel case
 e.g., the hybrid hash-join algorithm can be used to cache some of the
incoming tuples in memory and avoid the cost of writing them and
reading them back in.

250
Parallel Nested-Loop Join
 Assume that
 relation s is much smaller than relation r and that r is stored by
partitioning.
 there is an index on a join attribute of relation r at each of the
partitions of relation r.
 Use asymmetric fragment-and-replicate, with relation s being
replicated, and using the existing partitioning of relation r.
 Each processor Pj where a partition of relation s is stored
reads the tuples of relation s stored in Dj, and replicates the
tuples to every other processor Pi.
 At the end of this phase, relation s is replicated at all sites that store
tuples of relation r.
 Each processor Pi performs an indexed nested-loop join of
relation s with the ith partition of relation r.
251
Other Relational Operations
Selection (r)
 If  is of the form ai = v, where ai is an attribute and v a
value.
 If r is partitioned on ai the selection is performed at a single
processor.
 If  is of the form l <= ai <= u (i.e.,  is a range selection)
and the relation has been range-partitioned on ai
 Selection is performed at each processor whose partition
overlaps with the specified range of values.
 In all other cases: the selection is performed in parallel at
all the processors.

252
Other Relational Operations (Cont.)
 Duplicate elimination
 Perform by using either of the parallel sort techniques
 eliminate duplicates as soon as they are found during sorting.
 Can also partition the tuples (using either range- or hash-
partitioning) and perform duplicate elimination locally at each
processor.

 Projection
 Projection without duplicate elimination can be performed as
tuples are read in from disk in parallel.
 If duplicate elimination is required, any of the above duplicate
elimination techniques can be used.

253
Grouping/Aggregation
 Partition the relation on the grouping attributes and then
compute the aggregate values locally at each processor.
 Can reduce cost of transferring tuples during partitioning
by partly computing aggregate values before partitioning.
 Consider the sum aggregation operation:
 Perform aggregation operation at each processor Pi on those
tuples stored on disk Di
 results in tuples with partial sums at each processor.
 Result of the local aggregation is partitioned on the grouping
attributes, and the aggregation performed again at each
processor Pi to get the final result.
 Fewer tuples need to be sent to other processors during
partitioning.
254
Cost of Parallel Evaluation of Operations
 If there is no skew in the partitioning, and there is no
overhead due to the parallel evaluation, expected speed-
up will be 1/n
 If skew and overheads are also to be taken into account,
the time taken by a parallel operation can be estimated as
Tpart + Tasm + max (T0, T1, …, Tn-1)
 Tpart is the time for partitioning the relations
 Tasm is the time for assembling the results
 Ti is the time taken for the operation at processor Pi
 this needs to be estimated taking into account the skew, and the time
wasted in contentions.

255
Interoperator Parallelism
 Pipelined parallelism
 Consider a join of four relations
 r1 r2 r3 r4
 Set up a pipeline that computes the three joins in parallel
 Let P1 be assigned the computation of
temp1 = r1 r2
 And P2 be assigned the computation of temp2 = temp1 r3
 And P3 be assigned the computation of temp2 r4
 Each of these operations can execute in parallel, sending result
tuples it computes to the next operation even as it is
computing further results
 Provided a pipelineable join evaluation algorithm (e.g. indexed nested
loops join) is used

256
Factors Limiting Utility of Pipeline
Parallelism
 Pipeline parallelism is useful since it avoids writing
intermediate results to disk
 Useful with small number of processors, but does not
scale up well with more processors. One reason is that
pipeline chains do not attain sufficient length.
 Cannot pipeline operators which do not produce output
until all inputs have been accessed (e.g. aggregate and
sort)
 Little speedup is obtained for the frequent cases of skew
in which one operator's execution cost is much higher
than the others.

257
Independent Parallelism
 Independent parallelism
 Consider a join of four relations
r1 r2 r3 r4
 Let P1 be assigned the computation of temp1 = r1 r2
 And P2 be assigned the computation of temp2 = r3 r4
 And P3 be assigned the computation of temp1 temp2
 P1 and P2 can work independently in parallel
 P3 has to wait for input from P1 and P2
 Can pipeline output of P1 and P2 to P3, combining independent parallelism
and pipelined parallelism
 Does not provide a high degree of parallelism
 useful with a lower degree of parallelism.
 less useful in a highly parallel system,

258
Design of Parallel Systems
Some issues in the design of parallel systems:
 Parallel loading of data from external sources is needed in
order to handle large volumes of incoming data.
 Resilience to failure of some processors or disks.
 Probability of some disk or processor failing is higher in a
parallel system.
 Operation (perhaps with degraded performance) should be
possible in spite of failure.
 Redundancy achieved by storing extra copy of every data item
at another processor.

259
Design of Parallel Systems (Cont.)
 Online reorganization of data and schema changes must
be supported.
 For example, index construction on terabyte databases can
take hours or days even on a parallel system.
 Need to allow other processing (insertions/deletions/updates) to be
performed on relation even as index is being constructed.
 Basic idea: index construction tracks changes and “catches up”
on changes at the end.
 Also need support for online repartitioning and schema
changes (executed concurrently with other processing).

260

You might also like