0% found this document useful (0 votes)
89 views137 pages

Adbms Unit 2

1. The query is parsed by the query parser to verify syntax and convert it to an internal representation. 2. The query optimizer then determines the most efficient execution plan by estimating the cost of various plans using statistics from the catalog manager. 3. The chosen execution plan is then evaluated by the query execution engine to retrieve the query results from storage and return them to the user.

Uploaded by

Richin Kolvekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views137 pages

Adbms Unit 2

1. The query is parsed by the query parser to verify syntax and convert it to an internal representation. 2. The query optimizer then determines the most efficient execution plan by estimating the cost of various plans using statistics from the catalog manager. 3. The chosen execution plan is then evaluated by the query execution engine to retrieve the query results from storage and return them to the user.

Uploaded by

Richin Kolvekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

Established as per the Section 2(f) of the UGC Act, 1956

Approved by AICTE, COA and BCI, New Delhi

Advanced DBMS
Course code: M21DES212

School of Computer Science and


Applications
Abhay Kumar Srivastav
LECTURE -1

Agenda
 Prerequisites  How catalog stores data

 Overview of Query Evaluation  Introduction to operator


evaluation
 Query
 simple techniques are used to
 system catalogs develop algorithms for each
 Information stored in the catalog: operator
PREREQUISITES

1. Basic Knowledge of Data Base

2. database relationships

3. memory management in OS

4. SQL

5. Basic Knowledge of RDBMS


UNIT-2 OVERVIEW OF QUERY
EVALUATION, EXTERNAL SORTING AND
RELATIONAL QUERY OPTIMIZER
1. The system catalog
11. General selection
2. Introduction to operator evaluation conditions
3. Algorithm for relational operations12. The Projection operation
4. Introduction to query optimization13. The Join operation
5. When does a DBMS sort data? 14. The Set operations
6. A simple two-way merge sort 15. Aggregate operations
7. External merge sort 16. The impact of buffering.
8. Evaluating Relational Operators The Selection operation
OVERVIEW OF QUERY
EVALUATION

 Here we are going to study about how queries are evaluated in a relational
DBMS
 how a DBMS describes the data that it manages, including tables and
indexes? (way of storing or representing data)
 The descriptive data, or metadata, stored in special tables called the
system catalogs, is used to find the best way to evaluate a query
QUERY

 A query is a request for information from a database.


SYSTEM CATALOGS

 The descriptive data, or metadata, stored in special tables called the


system catalogs, is used to find the best way to evaluate a query
THE SYSTEM CATALOG

Where does the DBMS maintains information about every table?


 A relational DBMS maintains information about every table and index that
it contains.

 The descriptive information is itself stored in a collection of special tables


called the catalog tables also called as data dictionary
THE SYSTEM CATALOG

Information stored in the catalog:

 At a minimum

 System catalog stores system-wide information such as the size of


the buffer pool and the page size
THE SYSTEM CATALOG

Information stored about every table in catalog: -


 table name
 file name (or some identifier)
 file structure (e.g., heap file) of the file in which it is stored.
 attribute name and type of each of its attributes.
 index name of each index on that table.
 The integrity constraints (e.g., primary key and foreign key
constraints) on the table.
THE SYSTEM CATALOG

Information stored about each Index in catalog:


 index name
 Structure (e.g., B+ tree) of the index.
 Search key attributes.

For each view:


view name and definition.
THE SYSTEM CATALOG
Information stored in the catalog Cont…
 Also stores statistics about tables and indexes , updates done
periodically on tables (the date of update )

The following information is commonly stored:


 Cardinality: The number of rows in a table
 Size: The number of pages N Pages(R) for each table R.
THE SYSTEM CATALOG
Information stored in the catalog Cont…

 Index Cardinality: The number of distinct key.


 Index Size: The number of pages INPages(I) for each index I (For a B+ tree
index I, we take the number of leaf pages.)
 Index Height: number of non leaf levels
THE SYSTEM CATALOG

How catalog stores data:


In a RDBMS system catalog is itself a collection of tables.
1. Example: , we store information about the attributes of tables as
Attribute_Cat:
Attribute_Cat( attr_name: string, rel_name: string, type: string,
position: integer)
Ex: suppose we have two tables:
1. Sailors(sid: integer, sname: string, rating: integer, age: real)
2. Reserves(sid: integer, bid: integer, day: dates, rname: string)
THE SYSTEM CATALOG

How catalog stores data Cont…:


1. Sailors(sid: integer, sname: string, rating: integer, age: real)
2. Reserves(sid: integer, bid: integer, day: dates, rname: string)

attr_Name Rel_Name Type Position


INTRODUCTION TO OPERATOR
EVALUATION

 Several alternative algorithms are available for implementing each


relational operator and for most operators no algorithm is universally
superior.

 Several factors influence which algorithm performs best.


INTRODUCTION TO OPERATOR
EVALUATION

Why different algorithms?


The factors that influence on  In this section we will be
algorithm performance are discussing some of the
common techniques used in
 Size of Table, developing evaluation
algorithms for relational
 existing indexes & sort operators.
orders,
 and introduce the concept of
 size available buffer pool access paths, which are the
and buffer replacement different ways in which rows
policies. of a table can be retrieved
INTRODUCTION TO OPERATOR
EVALUATION

 Three Common Techniques:

algorithms for various relational operators actually have a lot in common

Few simple techniques are used to develop algorithms for each operator:

1) Indexing: If a selection or join condition is specified, use an index to


examine just the tuples that satisfy the condition.
INTRODUCTION TO OPERATOR
EVALUATION

 Three Common Techniques:

2) Iteration: Examine all tuples in an input table, one after the other.
If we need only a few fields and there is an index whose key contains all
these fields, instead of examining data tuples, we can scan all index data
entries.

3) Partitioning: By partitioning tuples on a sort key, we can often decompose


an operation into a less expensive.
Note: Sorting and hashing are two commonly used partitioning
techniques
SUMMARY

 Overview of Query Evaluation


 Query
 system catalogs
 Information stored in the catalog:
 How catalog stores data
 operator evaluation
 techniques are used to develop algorithms for each operator
LECTURE -2

OBJECTI
VES
 Query Evaluation  Evaluation algorithms
 system catalogs  Select
 Information stored in the  Projection
catalog
 Join
 Techniques are used to
develop algorithms for
each operator
ALGORITHMS FOR RELATIONAL
OPERATIONS

• Discussion about evaluation algorithms on (σ, ∏, )


Selection
σR.attr op value(R),
Select * from student
where age>40

if there is no index on S.attr  scan S.

If one or more index available use the index.


ALGORITHMS FOR RELATIONAL
OPERATIONS
Selection
Analyze the Query:
• If index is clustered all rname values are together  cost is 100 I/O

• If index is un-clustered  cost is 10,000 I/O because we have to read all pages.

Rule of Thumb:

use normal scan if 5% of tuples are to be retrieved instead of clustered


index
ALGORITHMS FOR RELATIONAL
OPERATIONS

PROJECTION
• Select specific column
• expensive task ensure no duplicates appear in the result

Ex: project Sid , Bid from R


duplicates exist because sailor might have reserved a given boat on several
days
ALGORITHMS FOR RELATIONAL
OPERATIONS
Analysis
1) without eliminating duplicates  no DISTICT Keyword is used.
Soln: projection needs iteration(examine all column on table) irrespective of
clustered or unclustered

2) Eliminate duplicates:
To eliminate duplicated we have to use partitioning.
partitioning steps :
1) scanning R  (sid, bid) pair
2) sort these pairs
3) Now scan these sorted pairs and discard duplicates.
Note: if index is available then the entire process takes very less time
ALGORITHMS FOR RELATIONAL
OPERATIONS

JOIN
expensive & very commonly used operations

Ex: join R & S (Reserved & Sailor)


condition R.sid = S. Sid
Analysis:
1) index available on SID column
(Index nested loops join)
Scan R S Procedure-1
Takes
221000 I/O
ALGORITHMS FOR RELATIONAL
OPERATIONS

JOIN

2) don’t have index that match the join condition


1) Then scan, sort both tables on
Procedure-2
join column Takes
2) scan them to find matches 7500 I/O

This procedure is called sort merge join


SUMMARY

 Evaluation algorithms
 Select
 Projection
 Join
LECTURE -3 & 4

OBJECTIVES

 Evaluation algorithms  Query parsing


 Select  Optimization & Execution
 Projection  Query Evaluation Plans
 Join  Dbms sort data
 external sorting
QUERY PARSING, OPTIMIZATION &
EXECUTION:
QUERY PARSING, OPTIMIZATION &
EXECUTION:

Query Commercial Optimizers: Current relational


Query Parser DBMS optimizers are very complex pieces of
software with many closely guarded details,
Parsed query
and they typically represent 40 to 50 man-
Query Optimizer years of development effort!

Plan Plan Cost Catalog


Generator Estimator Manager

Evaluation plan

Query Plan Evaluator


QUERY PARSING, OPTIMIZATION &
EXECUTION:
∏sname
Query Evaluation Plans
SELECT S.Name FROM Reserves R, Sailors S
WHERE R.sid = S.sid σbid=100^rating>5
AND R.bid = 100 AND S.Rating > 5

This query can be expressed in relational algebra ⋈


as follows: sid=sid
∏sname(σbid=100^rating>5(Reserves ⋈sid=sid
Sailors))
When the input table to a unary operator (e.g Reserves Sailors
selection or projection) is pipelined into it, we
sometimes say that the operator is applied on-the-
fly.
OVERVIEW OF QUERY
EVALUATION

WHEN DOES A DBMS SORT DATA?

 Sorting a collection of records on some (search) key is a very useful


operation.

 The key can be a single attribute or an ordered list of attributes, of course.

 Sorting is required in a variety of situations, including the following


important ones:
OVERVIEW OF QUERY
EVALUATION

Sorting is done during


• Users may want answers in • Sorting is useful for
some order for example, by eliminating duplicate copies
increasing age. in a collection of records.

• Sorting records is the first step • A widely used algorithm for


in bulk loading a tree index. performing a very important
relational algebra operation,
called join, requires a sorting
step
OVERVIEW OF QUERY
EVALUATION

What is external sorting and when do we need external sorting?

 When the data to be sorted is too large to fit into available main
memory we need external sorting

Use of External sorting: minimize the cost of disk accesses


SUMMARY

 Query parsing
 Optimization & Execution
 Query Evaluation Plans
 Dbms sort data
 external sorting
QUIZ

What is the meaning of “SELECT” Which is a unary operation:


clause in Mysql?
a.Selection operation
a) Show me all Columns and rows
b.Primitive operation
b) Show me all columns
c.Projection operation
c) Show me all rows
d.Generalized selection
d) None of the mentioned
Answer:d
Answer:a
LECTURE -5

OBJECTI
VES
 Query parsing  Merge sort
 Optimization & Execution  Two way merge sort
 Query Evaluation Plans  Methodology of two way
merge sort
 Dbms sort data
 Two-Way merge Sort
 external sorting
MERGE SORT
A SIMPLE TWO WAY MERGE SORT

 This algorithm utilizes only three pages of main memory, and it is


presented only for pedagogical purposes.

 In practice, many more pages of memory are available, and we want our
sorting algorithm to use the additional memory effectively

 When sorting a file, several sorted sub files are typically generated in
intermediate steps
we refer to each sorted sub file as a run.
A SIMPLE TWO WAY MERGE SORT

• if the entire file does not fit into the available main memory, we can sort it by
breaking it into smaller subfiles, sorting these subfiles, and then merging them
using a minimal amount of main memory at any given time.

SF-1
SF-1 SF-2 SF-3
Memory
File

SF Sub File


A SIMPLE TWO WAY MERGE SORT

Methodology of two way merge sort


If the number of pages in the input file is
2k , for some k, then:
 Pass 0  produces 2k sorted runs of
one page each,
 Pass 1 produces 2k-1 sorted runs of
two pages each,
 Pass 2  produces 2k - 2 sorted runs
of four pages each,
 and so on, until Pass k produces one
sorted run of 2k: pages.
A SIMPLE TWO WAY MERGE SORT

Methodology of two way merge sort

 In each pass, we read every page in the file, process it, and write it
out.Therefore we have two disk I/Os per page, per pass. The
number of passes is [log2N]+ 1, where N is the number of pages in
the file. The overall cost is 2N( [log2N] + 1) l/Os.
A SIMPLE TWO WAY MERGE SORT

Two-Way merge Sort: Requires 3 Buffers


 Pass 1: Read a page, sort it, write it out.
only one buffer page is used
 Pass 2, 3, …, etc.:
three buffer pages used. INPUT 1

OUTPUT

INPUT 2

Main memory buffers


Disk Disk
A SIMPLE TWO WAY MERGE SORT

3,4 6,2 9,4 8,7 5,6 3,1 2 Input file

Two-Way External Merge Sort 3,4 2,6 4,9 7,8 5,6 1,3 2
PASS 0
1-page runs

 Each pass we read + write each page in file.


PASS 1
2,3 4,7 1,3
2-page runs
4,6 8,9 5,6 2
PASS 2
2,3
 Idea: Divide and conquer: sort sub files and merge
4,4 1,2 4-page runs
6,7 3,5
8,9 6
PASS 3

1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
A SIMPLE TWO WAY MERGE SORT

Two-Way External Merge Sort


3,4 1st output run 2, 3

6,2 4,4

9,4 6,7

8,7 8,9
3, 4 6, 2
Input file

5,6 1,2
9, 4 8, 7
3,1 3,5

2 Buffer pool with B=4 pages 6


2nd output run
SUMMARY

 Merge sort
 Two way merge sort
 Methodology of two way merge sort
 Two-Way merge Sort
QUIZ

The operation of a relation X, produces A_____ is a query that retrieves rows


Y, such that Y contains only selected from more than one table or view:
attributes of X. Such an operation is :
a) Start
a.Projection
b) End
b.Intersection
c) Join
c.Union
d) All of the mentioned
d.Difference
Answer:c
Answer:a
LECTURE -6

OBJECTI
VES
 Merge sort  General External Merge Sort

 Two way merge sort  Cost of External Merge Sort

 Methodology of two way  Number of Passes of


merge sort External Sort

 Two-Way merge Sort  Minimizing the number of


runs
GENERAL EXTERNAL MERGE SORT

More than 3 buffer pages. How can we utilize them?


To sort a file with N pages using B buffer pages:
Pass 0: use B buffer pages. Produce sorted runs of B pages each.
Pass 2, …, etc.: merge B-1 runs.
INPUT 1

... INPUT 2
... OUTPUT ...
INPUT B-1
Disk Disk
B Main memory buffers
GENERAL EXTERNAL MERGE SORT

N / BSORT
COST OF EXTERNAL MERGE 
1. E.g., with 5 buffer pages, to sort 108 page file:
1. Pass 0: = 22 sorted runs of 5 pages each (last run is
only 3 pages)
2. Pass 1: = 6 sorted runs of 20 pages each (last run is
only 8 pages)
3. Pass 2: = 2 sorted runs, 80 pages and 28 pages
4. Pass 3: Sorted file of 108 pages
2. In each pass we read and write 108 pages; thus the total cost is 2* 108*4 =
864 l/Os. Applying our formula, we have N1=108/5=22 and cost 2 * N *
[logB-1N1] + 1) = 2 * 108 * ([log422] + 1) = 864 as expected.
GENERAL EXTERNAL MERGE SORT

Number of Passes of External Sort


GENERAL EXTERNAL MERGE SORT

Minimizing the number of runs


 In Pass 0 we read in B pages at a time and sort them internally to produce
[N/B] runs of B pages each.

 In order to minimize the number of sorted pages replacement sort is


used.

 Suppose that the file is to be sorted in ascending order on some search


key k. Tuples are appended to the output in ascending order by k value.
The idea is to repeatedly pick the tuple in the current set with the smallest
k value that is still greater than the largest k value in the output buffer and
append it to the output buffer. l
SUMMARY

 General External Merge Sort


 Cost of External Merge Sort
 Number of Passes of External Sort
 Minimizing the number of runs
QUIZ

How many join types in join condition: Which join refers to join records from
the write table that have no matching
a) 2
key in the left table are include in the
b) 3 result set:
c) 4 a) Left outer join
d) 5 b) Right outer join
Answer:d c) Full outer join
d) Half outer join
Answer:b
LECTURE -7

OBJECTIVES

 General External Merge  Selection Operation


Sort
 No index, Unsorted Data
 Cost of External Merge
 No index, Sorted Data
Sort
 B+ Tree index available for
 Number of Passes of
equality selection
External Sort
 Hash Index used for
 Minimizing the number of
Equality selection
runs
 Problems with scanning
EVALUATING RELATIONAL
OPERATORS

Selection operation
What are the alternative algorithms for selection?
1. File scan
2. If Sort Data and apply binary search
3. B+ tree
4. Hashing
EVALUATING RELATIONAL
OPERATORS

Selection operation
Which alternatives algorithms are best for select operation under
different conditions?
Conditions:
1) No index, Unsorted Data
 Uses File scan
2) No index, Sorted Data
 Uses Binary search to locate first element
 File scan from the first located element till the non-matching
condition
EVALUATING RELATIONAL
OPERATORS

3) B+ Tree index available for equality selection


 Clustered B+ tree for equality selection  good & less cost
 Un-Clustered B+ tree for equality selection  Cost depends on no of
tuples satisfying condition

4) Hash Index used for Equality selection


Selection operation
SELECT * FROM Reserves R WHERE R.rname = ‘Joe’
EVALUATING RELATIONAL
OPERATORS

Scanning the entire


relation
Job joe Job == Joe
Abc joe
Ace joe
Jje .
Jke
.
.
Rje

Jre joe
Joe joe MATCH FOUND
EVALUATING RELATIONAL
OPERATORS
Selection operation
Problems with scanning:
Takes 1000 I/O if reserves contains 1000 Records Expensive if few tuples
have rname= ‘Joe’
What is the solution :
Use index if a suitable index is available, like
 B+ tree index on rname useful
 But B+ tree index on bid is not useful
SUMMARY

 Selection Operation
 No index, Unsorted Data
 No index, Sorted Data
 B+ Tree index available for equality selection
 Hash Index used for Equality selection
 Problems with scanning
QUIZ

A sequence of primitive operations Which string function returns the index


that can be used to evaluate a query of the first occurrence of substring?
are called as __________
a) INSERT()
a) Query evaluation algebra
b) INSTR()
b) Query evaluation plan
c) INSTRING()
c) Query evaluation primitive
d) INFSTR()
d) Query evaluation engine
Answer:b
ANSWER: b
LECTURE -8

OBJECTIVES

 Selection Operation  Evaluate selection operation


 No index, Unsorted Data Data striping
 No index, Sorted Data  No Index, Unsorted Data
 B+ Tree index available for  No Index, sorted Data
equality selection  Selection using B+ tree
 Hash Index used for Equality  Hash Index, Equality
selection Selection
 Problems with scanning
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
1) No Index, Unsorted Data

 Given a selection of the form σR.attr op value (R), if there is no index


on R. attr and R is not sorted on R. attr, we have to scan the
entire relation. Therefore, the most selective access path is a
file scan. For each tuple, we must test the condition R.attr op
value and add the tuple to the result if the condition is
satisfied.
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
1) No Index, Unsorted Data

 Select rectangle from table where value=9

4 6 8 3 1
5 2 7 9 7 What is Solution to this
5 2 4 6 8 problem as there is no
8 3 5 2 7 index and data is not
6 2 7 2 4 sorted?
9 2 4 2 4
8 3 5 2 7 Soln: scan the entire
6 2 7 2 4 relation
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
2) No Index, sorted Data

 Given a selection of the form σR.attr op value (R) , if there is no index


on R.attr,but R is physically sorted on R.attr, we can utilize the sort
order by doing a binary search to locate the first tuple that satisfies
the selection condition.
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
2) No Index, sorted Data

 Further, we can then retrieve all tuples that satisfy the


selection condition by starting at this location and
scanning R until the selection condition is no longer
satisfied.
 The access method in this case is a sorted-file scan with

selection condition σR.attr op value (R)


UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
2) No Index, sorted Data
1. Select rectangle from table
where value=9
Soln:
1) use binary search to find the selection
condition
2) Then retrieve all tuples that satisfy the
selection condition until condition no
longer satisfy
Cost of the binary search is O(log2 M)
M  number of records
In reserves table if there are 1000
records then
Cost  log2 1000 = 10 I/O [In
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
3) Selection using B+ tree
 If a clustered B+ tree index is available on R.attr, the best strategy for selection

conditions σR.attr op value (R) in which op is not equality is to use the index.

 This strategy is also a good access path for equality selections, although a hash
index on R.attr would be a little better.
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
3) Selection using B+ tree

if Un-clustered  cost depends on number of tuples satisfying the selection


condition

Procedure for selection: search the index which points to the record then
scan the records.
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
3) Selection using B+ tree

The cost of retrieving qualifying tuples from R depends on two factors:


1) No.. Of qualifying tuples

2) Whether the index is clustered or unclustered


 If the data is clustered then cost is just one page (all the tuples are
contained in the same page)
 If unclustered then index entry points to a different page
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
3) Selection using B+ tree

How can we Reduce the number of I/Os:


Soln: sorting the rid’s by their page-id
Note: use of unclusterd index for a range selection could be
expensive
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
4) Hash Index, Equality Selection
If a hash index is available on R.attr and op is equality, the best way to
σ
implement the selection R.attr op value (R) is obviously to use the index to
retrieve qualifying tuples

 The cost includes few I/Os to retrieve the appropriate bucket


SUMMARY

 Evaluate selection operation Data striping


 No Index, Unsorted Data
 No Index, sorted Data
 Selection using B+ tree
 Hash Index, Equality Selection
QUIZ

In a __________ we organize the search In a __________ , we obtain the address


keys, with their associated pointers, of the disk block containing a desired
into a hash file structure record directly by computing a
a) Hash file organization function on the search key value of the
b) Hash index organization record
c) Hashing address a) Hash file organization
d) None of the mentioned b) Hash index organization
c) Hashing address
Answer: b
d) None of the mentioned
ANSWER: a
LECTURE -9

OBJECTIVES

 Evaluate selection  selection condition in select


operation Data striping operation

 No Index, Unsorted Data  CNF and Index Matching

 No Index, sorted Data  Evaluating Selection


without disjunction
 Selection using B+ tree
 Example
 Hash Index, Equality
 Selection with disjunction
Selection
 Example
GENERAL SELECTION CONDITION IN
SELECT OPERATION
GENERAL SELECTION CONDITION IN
SELECT OPERATION

CNF(conjuctive normal form) and Index Matching


 To process a selection operation with a general selection condition, we first
express the condition in conjunctive normal form (CNF), that is, as a
collection of conjunets that are connected through the use of the
operator ˄.

 Each conjunct consists of one or more terms connected by V1.

 Conjuncts that contain V are said to be disjunctive or to contain disjunction.


GENERAL SELECTION CONDITION IN
SELECT OPERATION

CNF(conjuctive normal form) and Index Matching


1. CNF and Index Matching
Ex: suppose that we have a selection on Reserves with the condition
(day < 8/9/12 rname = ‘Joe’) V bid = 5 V sid=3
We can rewrite this in conjunctive normal form as
(day < 8/9/02 V bid=5 V sid= 3 ) ( rname = ‘Joe’ V bid = 5 V
sid=3 )
GENERAL SELECTION CONDITION IN
SELECT OPERATION

2-Evaluating Selection without disjunction


 When the selection does not contain disjunction, that is, it is a conjunction
of terms, we have two evaluation options to consider

 we can retrieve tuples using a file scan or a single index that matches some
conjuncts (and which we estimate to be the most selective access path) and
apply all nonprimary conjuncts in the selection to each retrieved tuple.
GENERAL SELECTION CONDITION IN
SELECT OPERATION

We can try to utilize several indexes.


 If several indexes containing data entries with rids (i.e.,
Alternatives (2) or (3)) match conjuncts in the selection

 we can use these indexes to compute sets of rids of candidate tuples. we


can then intersect these sets of rids, typically by first sorting them, then
retrieving those records whose rids are in the intersection.

 If additional conjuncts are present in the selection, we can apply these


conjuncts to discard some of the candidate tuples from the result.
GENERAL SELECTION
CONDITION IN SELECT
OPERATION
2. Evaluating Selection without disjunction
 condition day < 8/9/12 ˄ bid=3 ˄ sid=4

How to Retrieve Record?


1. B+ Tree index on Day is
available.
2. A hash index on sid is available.
3. What is the procedure followed
to retrieve RECORDS
GENERAL SELECTION
CONDITION IN SELECT
OPERATION
2. Evaluating Selection without disjunction
Procedure:
 condition day < 8/9/12 ˄ bid=3 ˄ sid=4
Data selected by B+ Tree Index on Day < 8/9/12
GENERAL SELECTION CONDITION IN
SELECT OPERATION
2. Evaluating Selection without disjunction
Procedure:
 condition day < 8/9/02 ˄ bid=3 ˄ sid=4

Data selected by Hash Index on


SID=4
GENERAL SELECTION CONDITION IN
SELECT OPERATION
2. Evaluating Selection without disjunction
Intersect these two sets of rids  ( A & B )
GENERAL SELECTION CONDITION IN
SELECT OPERATION
2. Evaluating Selection without disjunction
Apply BID= 3 on this table
GENERAL SELECTION CONDITION IN
SELECT OPERATION
3. Selection with disjunction

(V rname='Joe'). (day < 8/9/02 )


 suppose that the only available indexes are a hash index on rname
and a hash index on sid
 We can retrieve tuples satisfying the condition rname='Joe' by using
the index on rname.
 However, day < 8/9/02 requires a file scan.
 Therefore, the most selective access path in this example is a file
scan.
GENERAL SELECTION CONDITION IN
SELECT OPERATION
3. Selection with disjunction
(day < 8/9/02 V rname='Joe').

hash index on
rname
hash index on sid
SUMMARY

 selection condition in select operation


 CNF and Index Matching
 Evaluating Selection without disjunction
 Example
 Selection with disjunction
 Example
QUIZ

The process of designating sub The similarities between the entity set
groupings within the entity set is called can be expressed by which of the
as _______ following features?
a) Specialization a) Specialization
b) Division b) Generalization
c) Aggregation c) Uniquation
d) Finalization d) Inheritance
ANSWER: a ANSWER: b
LECTURE -10

OBJECTIVES

 Selection condition in  The Projection operation


select operation  Projection Based on
 CNF and Index Matching Sorting

 Evaluating Selection  Partitioning


without disjunction  Sorting Vs Hashing
 Selection with  Use of Indexes for
disjunction Projections
THE PROJECTION OPERATION

SELECT DISTINCT R.sid, R.bid FROM Reserves R


 To implement projection, we have to do the following:
Remove unwanted attributes (i.e., those not specified in the
projection).Eliminate any duplicate tuples produced.
Note:-The second step is the difficult one.
 There are two basic algorithms for projection
1. Based on sorting
2. Based on hashing
THE PROJECTION OPERATION

• Projection Based on Sorting

The algorithm based on sorting has the following steps (at least
conceptually):

 Scan R and produce a set of tuples that contain only desired attributes.
 Sort this set of tuples
 Scan the sorted result by comparing adjacent tuples, and discard
duplicates.
THE PROJECTION OPERATION

• Projection Based on Hashing

There are two phases:


 partitioning
 duplicate elimination.
THE PROJECTION OPERATION
• Partitioning
1. one input buffer page and B-1 output buffer pages
2. relation R is read into the input buffer page, one page at a time.
The input page is processed as follows:
1. For each tuple, remove unwanted attributes and then apply a hash function
h to the combination of all remaining attributes.
2. Two tuples that belong to different partitions are guaranteed not to be
duplicates because they have different hash values.
3. If a new tuple hashes to the same value as some existing tuple, compare the
two to check whether the new tuple is a duplicate
THE PROJECTION OPERATION
• Partitioning
Sorting Vs Hashing

 Sorting is superior to hashing if we have many duplicates

 Sorting is also superior whenever distribution of hash values are NON-


Uniform, hence hash table will not fit in main memory for duplicate
elimination.

 A useful side effect of using sorting is that the output is by default sorted.

 Sorting uses external sorting and most database systems have utility
function for external sorting hence easy to do projections
Use of Indexes for Projections

 Neither the hashing nor the sorting approach utilizes any exiting
indexes.

 An existing index will be useful if the key includes all the attributes, in
that case can do index only scan for projection
SUMMARY

 The Projection operation


 Projection Based on Sorting
 Partitioning
 Sorting Vs Hashing
 Use of Indexes for
Projections
QUIZ

A domain is ______ if elements of the If every non-key attribute is


domain are considered to be indivisible functionally dependent primary key,
units. then the relation will be in
a) Atomic a) First normal form
b) Subatomic b) Second normal form
c) Substructure c) Third form
d) Subset d) Fourth normal form
ANSWER: a ANSWER: b
LECTURE -11 & 12

OBJECTIVES

 The Projection operation  JOIN OPERATIONS


 Projection Based on  Simple Nested Loops
Sorting
 Block Nested Loops Join
 Partitioning
 Index Nested Loops Join
 Sorting Vs Hashing
 Sort-Merge Join
 Use of Indexes for
 HASH Join
Projections
JOIN OPERATIONS

SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid


• alternative techniques for implementing joins.
algorithms that enumerate(included) all the tuples that match the join
condition and discards the non matching tuples
 simple nested loops
 block nested loops
JOIN OPERATIONS

Simple Nested Loops:


foreach tuple r in R do
for each tuple s in S
if ri == sj then output <r, s>
 Simple Nested Loops: Scan outer, scan inner
 For every row in R match It with every row in S
JOIN OPERATIONS
Block Nested Loops:

foreach Block of B-2 pages of R do


for each page of S do{
For all matching in memory tuples r belongs R-Block and
s belong to S page
add<r,s> to result
JOIN OPERATIONS

Block Nested Loops Join

 Use one page as an input buffer for scanning the inner S, one page as the
output buffer, and use all remaining pages to hold ``block’’ of outer R.
 For each matching tuple r in R-block, s in S-page, add <r, s> to result.
Then read next R-block, scan S, etc.
JOIN OPERATIONS
Index Nested Loops Join

 If there is an index on the join column of one relation (say S), can
make it the inner and exploit the index.
JOIN OPERATIONS
Sort-Merge Join
 Suppose two salespeople attend a conference and each collect over 100
business cards from potential new customers. They now each have a pile of
cards in random order, and they want to see how many cards are
duplicated in both piles.

 The salespeople alphabetize their piles, and then they call off names one at
a time.

 Because both piles of cards have been sorted, it becomes much easier to
find the names that appear in both piles
JOIN OPERATIONS

Sort-Merge Join
Example:
Select /* ordered */ ename, dept.deptno
From emp, dept Where dept.deptno = emp.deptno
JOIN OPERATIONS
Sort-Merge Join
When to use Sort merge Join?
SORT-MERGE joins can be used only for equijoins (WHERE D.deptno =
E.deptno, as opposed to WHERE D.deptno >= E.deptno)
Because
Require temporary segments for sorting
(if SORT_AREA_SIZE or the automatic memory parameters like
MEMORY_TARGET are set too small). This can lead to extra memory utilization
JOIN OPERATIONS
HASH Join
 HASH joins are the usual choice of the Oracle optimizer when the
memory is set up to accommodate them.
 In a HASH join, Oracle accesses one table (usually the smaller of the
joined results) and builds a hash table on the join key in memory.
Procedure Followed:
Oracle first builds a hash table to facilitate the operation and then loops
through the hash table. When using an ORDERED hint, the first table in
the FROM clause is the table used to build the hash table.
Select E.Ename, D.DeptNo from Employee, Dept where D.DeptNo =
E.DeptNo
JOIN OPERATIONS
HASH Join

Procedure Followed:
JOIN OPERATIONS
HASH Join

HASH joins can be effective when the lack of a useful index renders
NESTED LOOPS joins inefficient.
AGGREGATE OPERATIONS

AVG, MIN, MAX, SUM, and COUNT.

SELECT AVG(S.age) FROM Sailors S

The basic algorithm for aggregate operators consists of scanning the


entire Sailors relation and maintaining some running information
about the scanned tuples
SUMMARY

 JOIN OPERATIONS
 Simple Nested Loops
 Block Nested Loops Join
 Index Nested Loops Join
 Sort-Merge Join
 HASH Join
QUIZ

Which of the following is not a built in We apply the aggregate function to a


aggregate function in SQL? group of sets of tuples using the
_______ clause.
a) avg
a) group by
b) max
b) group
c) total
c) group set
d) count
d) group attribute
Answer:c
Answer:a
LECTURE -13

OBJECTIVES

 JOIN OPERATIONS  Set Operation


 Simple Nested Loops  Union
 Block Nested Loops Join  intersection
 Index Nested Loops Join  Set difference
 Sort-Merge Join
 HASH Join
SET OPERATION

 The SQL Set operation is used to combine the two or


more SQL SELECT statements.
 DBMS supports relational set operators as well.
 The major relational set operators are union,
intersection and set difference.
UNION

 Union combines two different results obtained by a query


into a single result in the form of a table

Select Student_Name from Art_Students


UNION
Select Student_Name from Dance_Students
INTERSECTION

 The intersection operator gives the common data values


between the two data sets that are intersected.

Select Student_Name from Art_Students


INTERSECT
Select Student_Name from Dance_Students
SET DIFFERENCE

 The set difference operators takes the two sets and


returns the values that are in the first set but not the
second set.

Select Student_Name from Art_Students


MINUS
Select Student_Name from Dance_Students
SUMMARY

 Set Operation
 Union
 intersection
 Set difference
QUIZ

The ___________ operation, denoted by In precedence of set operators, the


−, allows us to find tuples that are in expression is evaluated from
one relation but are not in another. a) Left to left
a) Union b) Left to right
b) Set-difference c) Right to left
c) Difference d) From user specification
d) Intersection
ANSWER: b
ANSWER: b
LECTURE -14

OBJECTIVES

 Set Operation  Aggregate operations


 Union  Aggregation based on
SORTING
 intersection
 Aggregation based on
 Set difference
HASHING
AGGREGATE OPERATIONS

Aggregate Operation Running Information


SUM  Total of the values retrieved
AVG  (Total, Count) of the values retrieved
COUNT  Count of values retrieved.
MIN  Smallest value retrieved
MAX  Largest value retrieved
Aggregate operators can also be used in combination with a GROUP BY
clause
AGGREGATE OPERATIONS

For queries with grouping, there are two good evaluation algorithms that
do not rely on an existing index:

1) based on sorting
2) Hashing
(Both algorithms are instances of the partitioning technique)
AGGREGATE OPERATIONS

Aggregation based on SORTING:

Ex: SELECT AVG(S.age)


FROM Sailors S group by sid

1) sort the relation on the grouping attribute.


2) then scan it again to compute the result of the aggregate operation
for each group
AGGREGATE OPERATIONS

Aggregation based on HASHING:

1) build a hash table (in main memory, if possible) on the grouping attribute.
2) The entries have the form (grouping-value, running-info)
3) The running information depends on the aggregate operation
4) scan the relation  for each tuple probe the hash table & find the group
to which the tuple belongs and update the running information
SUMMARY

 Aggregate operations
 Aggregation based on SORTING
 Aggregation based on HASHING
QUIZ

_____________ can help us detect poor E- Which of the following has each
R design. related entity set has its own schema
a) Database Design Process and there is an additional schema for
b) E-R Design Process the relationship set.
c) Relational scheme a) A many-to-many relationship set
d) Functional dependencies b) A multivalued attribute of an entity
set
ANSWER: d
c) A one-to-many relationship set
d) All of the mentioned
ANSWER: a
LECTURE -15

OBJECTIVES

 Aggregate operations  The impact of buffering


 Aggregation based on  Important
SORTING
 Example
 Aggregation based on
HASHING
THE IMPACT OF BUFFERING

 In implementations of relational operators, effective use of the buffer


pool is very important

 we explicitly considered the size of the buffer pool in determining


algorithm parameters for several of the algorithms.
THE IMPACT OF BUFFERING

There are three main points to note:


 If several operations execute concurrently, they share the buffer pool

 If an operation has a pattern of repeated page accesses, we can increase


the likelihood of finding a page in memory by a good choice of
replacement policy or by reserving a sufficient number of buffers for the
operation

 If tuples are accessed using an index, especially an unclustered index


each tuple retrieved is likely to require us to bring in a new page;
therefore, the buffer pool fills up quickly, leading to a high level of
paging activity
THE IMPACT OF BUFFERING

Ex: Consider a simple nested loops join:


For each tuple of the outer relation, we repeatedly scan all pages in the inner
relation
Soln : If we have enough buffer pages to hold the entire inner relation, the
replacement policy is irrelevant.

Otherwise, the replacement policy becomes critical


SUMMARY

 The impact of buffering


 Important
 Example
QUIZ

In order to reduce the overhead in The order of log records in the stable
retrieving the records from the storage storage ____________ as the order in
space we use which they were written to the log
a) Logs buffer.
b) Log buffer a) Must be exactly the same
c) Medieval space b) Can be different
d) Lower records c) Is opposite
d) Can be partially same
ANSWER: b
ANSWER: a

You might also like