Adbms Unit 2
Adbms Unit 2
Advanced DBMS
Course code: M21DES212
Agenda
Prerequisites How catalog stores data
2. database relationships
3. memory management in OS
4. SQL
Here we are going to study about how queries are evaluated in a relational
DBMS
how a DBMS describes the data that it manages, including tables and
indexes? (way of storing or representing data)
The descriptive data, or metadata, stored in special tables called the
system catalogs, is used to find the best way to evaluate a query
QUERY
At a minimum
Few simple techniques are used to develop algorithms for each operator:
2) Iteration: Examine all tuples in an input table, one after the other.
If we need only a few fields and there is an index whose key contains all
these fields, instead of examining data tuples, we can scan all index data
entries.
OBJECTI
VES
Query Evaluation Evaluation algorithms
system catalogs Select
Information stored in the Projection
catalog
Join
Techniques are used to
develop algorithms for
each operator
ALGORITHMS FOR RELATIONAL
OPERATIONS
• If index is un-clustered cost is 10,000 I/O because we have to read all pages.
Rule of Thumb:
PROJECTION
• Select specific column
• expensive task ensure no duplicates appear in the result
2) Eliminate duplicates:
To eliminate duplicated we have to use partitioning.
partitioning steps :
1) scanning R (sid, bid) pair
2) sort these pairs
3) Now scan these sorted pairs and discard duplicates.
Note: if index is available then the entire process takes very less time
ALGORITHMS FOR RELATIONAL
OPERATIONS
JOIN
expensive & very commonly used operations
JOIN
Evaluation algorithms
Select
Projection
Join
LECTURE -3 & 4
OBJECTIVES
Evaluation plan
When the data to be sorted is too large to fit into available main
memory we need external sorting
Query parsing
Optimization & Execution
Query Evaluation Plans
Dbms sort data
external sorting
QUIZ
OBJECTI
VES
Query parsing Merge sort
Optimization & Execution Two way merge sort
Query Evaluation Plans Methodology of two way
merge sort
Dbms sort data
Two-Way merge Sort
external sorting
MERGE SORT
A SIMPLE TWO WAY MERGE SORT
In practice, many more pages of memory are available, and we want our
sorting algorithm to use the additional memory effectively
When sorting a file, several sorted sub files are typically generated in
intermediate steps
we refer to each sorted sub file as a run.
A SIMPLE TWO WAY MERGE SORT
• if the entire file does not fit into the available main memory, we can sort it by
breaking it into smaller subfiles, sorting these subfiles, and then merging them
using a minimal amount of main memory at any given time.
SF-1
SF-1 SF-2 SF-3
Memory
File
In each pass, we read every page in the file, process it, and write it
out.Therefore we have two disk I/Os per page, per pass. The
number of passes is [log2N]+ 1, where N is the number of pages in
the file. The overall cost is 2N( [log2N] + 1) l/Os.
A SIMPLE TWO WAY MERGE SORT
OUTPUT
INPUT 2
Two-Way External Merge Sort 3,4 2,6 4,9 7,8 5,6 1,3 2
PASS 0
1-page runs
1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
A SIMPLE TWO WAY MERGE SORT
6,2 4,4
9,4 6,7
8,7 8,9
3, 4 6, 2
Input file
5,6 1,2
9, 4 8, 7
3,1 3,5
Merge sort
Two way merge sort
Methodology of two way merge sort
Two-Way merge Sort
QUIZ
OBJECTI
VES
Merge sort General External Merge Sort
... INPUT 2
... OUTPUT ...
INPUT B-1
Disk Disk
B Main memory buffers
GENERAL EXTERNAL MERGE SORT
N / BSORT
COST OF EXTERNAL MERGE
1. E.g., with 5 buffer pages, to sort 108 page file:
1. Pass 0: = 22 sorted runs of 5 pages each (last run is
only 3 pages)
2. Pass 1: = 6 sorted runs of 20 pages each (last run is
only 8 pages)
3. Pass 2: = 2 sorted runs, 80 pages and 28 pages
4. Pass 3: Sorted file of 108 pages
2. In each pass we read and write 108 pages; thus the total cost is 2* 108*4 =
864 l/Os. Applying our formula, we have N1=108/5=22 and cost 2 * N *
[logB-1N1] + 1) = 2 * 108 * ([log422] + 1) = 864 as expected.
GENERAL EXTERNAL MERGE SORT
How many join types in join condition: Which join refers to join records from
the write table that have no matching
a) 2
key in the left table are include in the
b) 3 result set:
c) 4 a) Left outer join
d) 5 b) Right outer join
Answer:d c) Full outer join
d) Half outer join
Answer:b
LECTURE -7
OBJECTIVES
Selection operation
What are the alternative algorithms for selection?
1. File scan
2. If Sort Data and apply binary search
3. B+ tree
4. Hashing
EVALUATING RELATIONAL
OPERATORS
Selection operation
Which alternatives algorithms are best for select operation under
different conditions?
Conditions:
1) No index, Unsorted Data
Uses File scan
2) No index, Sorted Data
Uses Binary search to locate first element
File scan from the first located element till the non-matching
condition
EVALUATING RELATIONAL
OPERATORS
Jre joe
Joe joe MATCH FOUND
EVALUATING RELATIONAL
OPERATORS
Selection operation
Problems with scanning:
Takes 1000 I/O if reserves contains 1000 Records Expensive if few tuples
have rname= ‘Joe’
What is the solution :
Use index if a suitable index is available, like
B+ tree index on rname useful
But B+ tree index on bid is not useful
SUMMARY
Selection Operation
No index, Unsorted Data
No index, Sorted Data
B+ Tree index available for equality selection
Hash Index used for Equality selection
Problems with scanning
QUIZ
OBJECTIVES
4 6 8 3 1
5 2 7 9 7 What is Solution to this
5 2 4 6 8 problem as there is no
8 3 5 2 7 index and data is not
6 2 7 2 4 sorted?
9 2 4 2 4
8 3 5 2 7 Soln: scan the entire
6 2 7 2 4 relation
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
2) No Index, sorted Data
conditions σR.attr op value (R) in which op is not equality is to use the index.
This strategy is also a good access path for equality selections, although a hash
index on R.attr would be a little better.
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
3) Selection using B+ tree
Procedure for selection: search the index which points to the record then
scan the records.
UNDERSTANDING VARIOUS
ALGORITHMS USED TO EVALUATE
SELECTION OPERATION
3) Selection using B+ tree
OBJECTIVES
we can retrieve tuples using a file scan or a single index that matches some
conjuncts (and which we estimate to be the most selective access path) and
apply all nonprimary conjuncts in the selection to each retrieved tuple.
GENERAL SELECTION CONDITION IN
SELECT OPERATION
hash index on
rname
hash index on sid
SUMMARY
The process of designating sub The similarities between the entity set
groupings within the entity set is called can be expressed by which of the
as _______ following features?
a) Specialization a) Specialization
b) Division b) Generalization
c) Aggregation c) Uniquation
d) Finalization d) Inheritance
ANSWER: a ANSWER: b
LECTURE -10
OBJECTIVES
The algorithm based on sorting has the following steps (at least
conceptually):
Scan R and produce a set of tuples that contain only desired attributes.
Sort this set of tuples
Scan the sorted result by comparing adjacent tuples, and discard
duplicates.
THE PROJECTION OPERATION
A useful side effect of using sorting is that the output is by default sorted.
Sorting uses external sorting and most database systems have utility
function for external sorting hence easy to do projections
Use of Indexes for Projections
Neither the hashing nor the sorting approach utilizes any exiting
indexes.
An existing index will be useful if the key includes all the attributes, in
that case can do index only scan for projection
SUMMARY
OBJECTIVES
Use one page as an input buffer for scanning the inner S, one page as the
output buffer, and use all remaining pages to hold ``block’’ of outer R.
For each matching tuple r in R-block, s in S-page, add <r, s> to result.
Then read next R-block, scan S, etc.
JOIN OPERATIONS
Index Nested Loops Join
If there is an index on the join column of one relation (say S), can
make it the inner and exploit the index.
JOIN OPERATIONS
Sort-Merge Join
Suppose two salespeople attend a conference and each collect over 100
business cards from potential new customers. They now each have a pile of
cards in random order, and they want to see how many cards are
duplicated in both piles.
The salespeople alphabetize their piles, and then they call off names one at
a time.
Because both piles of cards have been sorted, it becomes much easier to
find the names that appear in both piles
JOIN OPERATIONS
Sort-Merge Join
Example:
Select /* ordered */ ename, dept.deptno
From emp, dept Where dept.deptno = emp.deptno
JOIN OPERATIONS
Sort-Merge Join
When to use Sort merge Join?
SORT-MERGE joins can be used only for equijoins (WHERE D.deptno =
E.deptno, as opposed to WHERE D.deptno >= E.deptno)
Because
Require temporary segments for sorting
(if SORT_AREA_SIZE or the automatic memory parameters like
MEMORY_TARGET are set too small). This can lead to extra memory utilization
JOIN OPERATIONS
HASH Join
HASH joins are the usual choice of the Oracle optimizer when the
memory is set up to accommodate them.
In a HASH join, Oracle accesses one table (usually the smaller of the
joined results) and builds a hash table on the join key in memory.
Procedure Followed:
Oracle first builds a hash table to facilitate the operation and then loops
through the hash table. When using an ORDERED hint, the first table in
the FROM clause is the table used to build the hash table.
Select E.Ename, D.DeptNo from Employee, Dept where D.DeptNo =
E.DeptNo
JOIN OPERATIONS
HASH Join
Procedure Followed:
JOIN OPERATIONS
HASH Join
HASH joins can be effective when the lack of a useful index renders
NESTED LOOPS joins inefficient.
AGGREGATE OPERATIONS
JOIN OPERATIONS
Simple Nested Loops
Block Nested Loops Join
Index Nested Loops Join
Sort-Merge Join
HASH Join
QUIZ
OBJECTIVES
Set Operation
Union
intersection
Set difference
QUIZ
OBJECTIVES
For queries with grouping, there are two good evaluation algorithms that
do not rely on an existing index:
1) based on sorting
2) Hashing
(Both algorithms are instances of the partitioning technique)
AGGREGATE OPERATIONS
1) build a hash table (in main memory, if possible) on the grouping attribute.
2) The entries have the form (grouping-value, running-info)
3) The running information depends on the aggregate operation
4) scan the relation for each tuple probe the hash table & find the group
to which the tuple belongs and update the running information
SUMMARY
Aggregate operations
Aggregation based on SORTING
Aggregation based on HASHING
QUIZ
_____________ can help us detect poor E- Which of the following has each
R design. related entity set has its own schema
a) Database Design Process and there is an additional schema for
b) E-R Design Process the relationship set.
c) Relational scheme a) A many-to-many relationship set
d) Functional dependencies b) A multivalued attribute of an entity
set
ANSWER: d
c) A one-to-many relationship set
d) All of the mentioned
ANSWER: a
LECTURE -15
OBJECTIVES
In order to reduce the overhead in The order of log records in the stable
retrieving the records from the storage storage ____________ as the order in
space we use which they were written to the log
a) Logs buffer.
b) Log buffer a) Must be exactly the same
c) Medieval space b) Can be different
d) Lower records c) Is opposite
d) Can be partially same
ANSWER: b
ANSWER: a