0% found this document useful (0 votes)

29 views9 pages

DBMS Unit 4

The document discusses different measures of query cost in a database system. It describes how the cost is calculated based on factors like disk I/O time, including seek time and transfer time. It also discusses different methods of query processing and evaluation like parsing, optimization, materialization and pipelining.

Uploaded by

yabera528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views9 pages

DBMS Unit 4

Uploaded by

yabera528

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT IV

Measures of Query cost

Cost of query is the time taken by the query to hit the database and return the result. It
involves query processing time i.e.; time taken to parse and translate the query,
optimize it, evaluate, execute and return the result to the user is called cost of the query.
Though it is in fraction of seconds, it includes multiple sub tasks and time taken by each
of them. Executing the optimized query involves hitting the primary and secondary
memory based on the file organization method. Depending on file organization and the
indexes used, time taken to retrieve the data may vary.

Majority of time is spent by the query in accessing the data from the memory. It too has
several factors determining the cost of access time – disk I/O time, CPU time, network
access time etc. Disk access time is the time taken by the processor to search and find
the record in the secondary memory and return the result. This takes the majority of
time while processing a query. Other times can be ignored compared to disk I/O time.

While calculating the disk I/O time, usually only two factors are considered – seek time
and transfer time. The seek time is the time taken the processor to find a single record
in the disk memory and is represented by tS. For example, in order to find the student
ID of a student ‘John’, the processor will fetch in the memory based on the index and
the file organization method. The time taken by the processor to hit the disk block and
search for his ID is called the seek time. The time taken by the disk to return fetched
result back to the processor / user is called transfer time and is represented by tT.
Suppose a query need to seek S times to fetch a record and there is B blocks needs to
be returned to the user. Then the disk I/O cost is calculated as below
(S* tS)+ (B* tT)
That is, it is the sum of the total time taken for seek S times and the total time taken to
transfer B blocks. Here other costs like CPU cost, RAM cost etc are ignored as they are
comparatively small. Disk I/O alone is considered as cost of a query. But we have to
calculate the worst case cost – the maximum time taken by the query when there is a
worst case like buffer is full or no buffers etc. because the memory space / buffers
depend on the number of queries executing in parallel. All queries would be using the
buffers and determining the number of buffers / blocks available for our query is
unpredictable. The processor might have to wait till it gets all the memory blocks.
Query Processing in DBMS
Query Processing is the activity performed in extracting data from the database. In
query processing, it takes various steps for fetching the data from the database. The
steps involved are:

1. Parsing and translation

2. Optimization

3. Evaluation

The query processing works in the following way:

Parsing and Translation:

As query processing includes certain activities for data retrieval. Initially, the given user
queries get translated in high-level database languages such as SQL. It gets translated
into expressions that can be further used at the physical level of the file system. After
this, the actual evaluation of the queries and a variety of query -optimizing
transformations and takes place. Thus before processing a query, a computer system
needs to translate the query into a human-readable and understandable language.
Consequently, SQL or Structured Query Language is the best suitable choice for
humans. But, it is not perfectly suitable for the internal representation of the query to
the system. Relational algebra is well suited for the internal representation of a query.
The translation process in query processing is similar to the parser of a query. When a
user executes any query, for generating the internal form of the query, the parser in
the system checks the syntax of the query, verifies the name of the relation in the
database, the tuple, and finally the required attribute value. The parser creates a tree
of the query, known as 'parse-tree.' Further, translate it into the form of relational
algebra. With this, it evenly replaces all the use of the views when used in the query.

Thus, we can understand the working of a query processing in the below-described

diagram:
Suppose a user executes a query. As we have learned that there are various methods
of extracting the data from the database. In SQL, a user wants to fetch the records of
the employees whose salary is greater than or equal to 10000. For doing this, the
following query is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the
form of relational algebra. We can bring this query in the relational algebra form as:

o σsalary>10000 (πsalary (Employee))

o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by
using different algorithms. So, in this way, a query processing begins its working.

Evaluation:

For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes
a query evaluation plan.

Query Evaluation Plan:

o In order to fully evaluate a query, the system needs to construct a query
evaluation plan.

o The annotations in the evaluation plan may refer to the algorithms to be used for
the particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation
Primitives. The evaluation primitives carry the instructions needed for the
evaluation of the operation.

o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.

o A query execution engine is responsible for generating the output of the given
query. It takes the query execution plan, executes it, and finally makes the
output for the user query.

Optimization:

o The cost of the query evaluation can vary for different types of queries. Although
the system is responsible for constructing the evaluation plan, the user does
need not to write their query efficiently.

o Usually, a database system generates an efficient query evaluation plan, which

minimizes its cost. This type of task performed by the database system and is
known as Query Optimization.

o For optimizing a query, the query optimizer should have an estimated cost
analysis of each operation. It is because the overall operation cost depends on
the memory allocations to several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces
the output of the query.

There are two methods of evaluating the query.

Materialization:

In this method, queries are broken into individual queries and then the results of which
are used to get the final result. To be more specific, suppose there is a requirement to
find the students who are studying in class ‘DESIGN_01’.

SELECT * FROM STUDENT s, CLASS c

WHERE s.CLASS_ID = c.CLASS_ID AND c.CLASS_NAME = ‘DESIGN_01’;

Here we can observe two queries: one is to select the CLASS_ID of ‘DESIGN_01’ and another
is to select the student details of the CLASS_ID retrieved in the first query.
The DBMS also does the same. It breaks the query into two as mentioned above. Once it is
broken, it evaluates the first query and stores it in the temporary table in the memory. This
temporary table data will be then used to evaluate the second query.

This is the example of two level queries in materialization method. We can have any
number of levels and so many numbers of temporary tables.

Although this method looks simple, the cost of this type of evaluation is always more. It
takes the time to evaluate and write into temporary table, then retrieve from this
temporary table and query to get the next level of result and so on. Hence cost of
evaluation in this method is:
Cost = cost of individual SELECT + cost of write into temporary table

Pipelining:

In this method, DBMS do not store the records into temporary tables. Instead, it queries
each query and result of which will be passed to next query to process and so on. It will
process the query one after the other and each will use the result of previous query for
its processing.

In the example above, CLASS_ID of DESIGN_01 is passed to the STUDENT table to get the student
details.

In this method no extra cost of writing into temporary tables. It has only cost of evaluation of individual
queries; hence it has better performance than materialization.

There are two types of pipelining:

Demand Driven or Lazy evaluation

In this method, the result of lower level queries are not passed to the higher level
automatically. It will be passed to higher level only when it is requested by the higher
level. In this method, it retains the result value and state with it and it will be transferred
to the next level only when it is requested.
In our example above, CLASS_ID for DESIGN_01 will be retrieved, but it will not be
passed to STUDENT query only when it is requested. Once it gets the request, it is
passed to student query and that query will be processed.

Producer Driven or Eager Pipelining

In this method, the lower level queries eagerly pass the results to higher level queries. It does not wait
for the higher level queries to request for the results. In this method, lower level query creates a buffer
to store the results and the higher level queries pulls the results for its use. If the buffer is full, then the
lower level query waits for the higher level query to empty it. Hence it is also called as PULL and PUSH
pipelining.

There are still more methods of pipelining like Linear and non-linear methods of
pipelining, left deep tree, right deep tree etc.

Query Optimization in DBMS

We have seen so far how a query can be processed based on indexes and joins, and how they can
be transformed into relational expressions. The query optimizer uses these two techniques to
determine which process or expression to consider for evaluating the query.

There are two methods of query optimization.

1. Cost based Optimization (Physical)

This is based on the cost of the query. The query can use different paths based on indexes,
constraints, sorting methods etc. This method mainly uses the statistics like record size, number
of records, number of records per block, number of blocks, table size, whether whole table fits in
a block, organization of tables, uniqueness of column values, size of columns etc.
Suppose, we have series of table joined in a query.

T1 ∞ T2 ∞ T3 ∞ T4∞ T5 ∞ T6
For above query we can have any order of evaluation. We can start taking any two tables in any
order and start evaluating the query. Ideally, we can have join combinations in (2(n-1))! / (n-1)!
ways. For example, suppose we have 5 tables involved in join, then we can have 8! / 4! = 1680
combinations. But when query optimizer runs, it does not evaluate in all these ways always. It
uses Dynamic Programming where it generates the costs for join orders of any combination of
tables. It is calculated and generated only once. This least cost for all the table combination is
then stored in the database and is used for future use. i.e.; say we have a set of tables, T = { T1 ,
T2 , T3 .. Tn}, then it generates least cost combination for all the tables and stores it.

• Dynamic Programming
As we learnt above, the least cost for the joins of any combination of table is generated here.
These values are stored in the database and when those tables are used in the query, this
combination is selected for evaluating the query.
While generating the cost, it follows below steps :
Suppose we have set of tables, T = {T1 , T2 , T3 .. Tn}, in a DB. It picks the first table, and
computes cost for joining with rest of the tables in set T. It calculates cost for each of the tables
and then chooses the best cost. It continues doing the same with rest of the tables in set T. It will
generate 2n – 1 cases and it selects the lowest cost and stores it. When a query uses those tables,
it checks for the costs here and that combination is used to evaluate the query. This is called
dynamic programming.
In this method, time required to find optimized query is in the order of 3n, where n is the number
of tables. Suppose we have 5 tables, then time required in 35 = 243, which is lesser than finding
all the combination of tables and then deciding the best combination (1680). Also, the space
required for computing and storing the cost is also less and is in the order of 2n. In above
example, it is 25 = 32.
• Left Deep Trees
This is another method of determining the cost of the joins. Here, the tables and joins are
represented in the form of trees. The joins always form the root of the tree and table is kept at the
right side of the root. LHS of the root always point to the next join. Hence it gets deeper and
deeper on LHS. Hence it is called as left deep tree.

Here instead of calculating the best join cost for set of tables, best join cost for joining with each
table is calculated. In this method, time required to find optimized query is in the order of n2n,
where n is the number of tables. Suppose we have 5 tables, then time required in 5*25 =160,
which is lesser than dynamic programming. Also, the space required for computing storing the
cost is also less and is in the order of 2n. In above example, it is 25 = 32, same as dynamic
programming.
• Interesting Sort Orders
This method is an enhancement to dynamic programming. Here, while calculating the best join
order costs, it also considers the sorted tables. It assumes, calculating the join orders on sorted
tables would be efficient. i.e.; suppose we have unsorted tables T1 , T2 , T3 .. Tn and we have
join on these tables.
(T1 ∞T2)∞ T3 ∞… ∞ Tn
This method uses hash join or merge join method to calculate the cost. Hash Join will simply join
the tables. We get sorted output in merge join method, but it is costlier than hash join. Even
though merge join is costlier at this stage, when it moves to join with third table, the join will
have less effort to sort the tables. This is because first table is the sorted result of first two tables.
Hence it will reduce the total cost of the query.
But the number of tables involved in the join would be relatively less and this cost/space
difference will be hardly noticeable.
All these cost based optimizations are expensive and are suitable for large number of data. There
is another method of optimization called heuristic optimization, which is better compared to cost
based optimization.
2. Heuristic Optimization (Logical)
This method is also known as rule based optimization. This is based on the equivalence rule on
relational expressions; hence the number of combination of queries get reduces here. Hence the
cost of the query too reduces.
This method creates relational tree for the given query based on the equivalence rules. These
equivalence rules by providing an alternative way of writing and evaluating the query, gives the
better path to evaluate the query. This rule need not be true in all cases. It needs to be examined
after applying those rules. The most important set of rules followed in this method is listed
below:
• Perform all the selection operation as early as possible in the query. This should be first
and foremost set of actions on the tables in the query. By performing the selection
operation, we can reduce the number of records involved in the query, rather than using
the whole tables throughout the query.
Suppose we have a query to retrieve the students with age 18 and studying in class DESIGN_01.
We can get all the student details from STUDENT table, and class details from CLASS table.
We can write this query in two different ways.

Here both the queries will return same result. But when we observe them closely we can see that
first query will join the two tables first and then applies the filters. That means, it traverses whole
table to join, hence the number of records involved is more. But he second query, applies the
filters on each table first. This reduces the number of records on each table (in class table, the
number of record reduces to one in this case!). Then it joins these intermediary tables. Hence the
cost in this case is comparatively less.

Instead of writing query the optimizer creates relational algebra and tree for above case.

• Perform all the projection as early as possible in the query. This is similar to selection but
will reduce the number of columns in the query.
Suppose for example, we have to select only student name, address and class name of students
with age 18 from STUDENT and CLASS tables.

Here again, both the queries look alike, results alike. But when we compare the number of
records and attributes involved at each stage, second query uses less records and hence more
efficient.

• Next step is to perform most restrictive joins and selection operations. When we say most
restrictive joins and selection means, select those set of tables and views which will result
in comparatively less number of records. Any query will have better performance when
tables with few records are joined. Hence throughout heuristic method of optimization,
the rules are formed to get less number of records at each stage, so that query
performance is better. So is the case here too.
Suppose we have STUDENT, CLASS and TEACHER tables. Any student can attend only one
class in an academic year and only one teacher takes a class. But a class can have more than 50
students. Now we have to retrieve STUDENT_NAME, ADDRESS, AGE, CLASS_NAME and
TEACHER_NAME of each student in a school.
∏STD_NAME, ADDRESS, AGE, CLASS_NAME, TEACHER_NAME ((STUDENT ∞
CLASS_ID CLASS)∞ TECH_IDTEACHER)
Not So efficient
∏STD_NAME, ADDRESS, AGE, CLASS_NAME, TEACHER_NAME (STUDENT ∞
CLASS_ID (CLASS∞ TECH_IDTEACHER))
Efficient
In the first query, it tries to select the records of students from each class. This will result in a
very huge intermediary table. This table is then joined with another small table. Hence the
traversing of number of records is also more. But in the second query, CLASS and TEACHER
are joined first, which has one to one relation here. Hence the number of resulting record is
STUDENT table give the final result. Hence this second method is more efficient.
• Sometimes we can combine above heuristic steps with cost based optimization technique
to get better results.
All these methods need not be always true. It also depends on the table size, column size, type of
selection, projection, join sort, constraints, indexes, statistics etc. Above optimization describes
the best way of optimizing the queries.

Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
33 pages
UNIT 4 Query Processing and Different Types of Databases
No ratings yet
UNIT 4 Query Processing and Different Types of Databases
13 pages
Unit 6
No ratings yet
Unit 6
34 pages
Unit 1
No ratings yet
Unit 1
23 pages
Query Processing
No ratings yet
Query Processing
39 pages
DBMS
No ratings yet
DBMS
24 pages
DBMS Unit5 Lecture1
No ratings yet
DBMS Unit5 Lecture1
22 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
127 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Query Processing in DBMS
No ratings yet
Query Processing in DBMS
22 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Ch12-Query Processing
No ratings yet
Ch12-Query Processing
34 pages
Unit 4
No ratings yet
Unit 4
24 pages
CO3-Notes-Query Processing and Optimization
No ratings yet
CO3-Notes-Query Processing and Optimization
5 pages
Unit 3
No ratings yet
Unit 3
5 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
Query Processing
No ratings yet
Query Processing
3 pages
Database Technology Query Processing: Heiko Paulheim
No ratings yet
Database Technology Query Processing: Heiko Paulheim
60 pages
Unit IV Part II
No ratings yet
Unit IV Part II
37 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
UT 1 QB Solution
No ratings yet
UT 1 QB Solution
4 pages
Relational Query Optimization: Warih Maharani, ST.,MT
No ratings yet
Relational Query Optimization: Warih Maharani, ST.,MT
39 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
DBMS UNIT 4 Part 1
No ratings yet
DBMS UNIT 4 Part 1
15 pages
Advance Database Management System: Unit - 2 .Query Processing and Optimization
No ratings yet
Advance Database Management System: Unit - 2 .Query Processing and Optimization
38 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
Uds24201j Unit III
No ratings yet
Uds24201j Unit III
34 pages
Unit 5 Query Processing Detail
No ratings yet
Unit 5 Query Processing Detail
38 pages
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
No ratings yet
Unit-2 Query Processing and Optimization, Query Equivalence, Join Strategies
38 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
Chapter 12 - 2
No ratings yet
Chapter 12 - 2
38 pages
06 Query Processing (2) - NDN
No ratings yet
06 Query Processing (2) - NDN
31 pages
Query Processing
No ratings yet
Query Processing
20 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
23 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
Unit-5 Query Processing and Optimization
No ratings yet
Unit-5 Query Processing and Optimization
40 pages
Query Processing
No ratings yet
Query Processing
5 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Sudhansu, DBMS 3rd
No ratings yet
Sudhansu, DBMS 3rd
6 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
11 Query Evaluations
No ratings yet
11 Query Evaluations
17 pages
Query Processing Concepts
No ratings yet
Query Processing Concepts
99 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
Introduction To Query Processing
No ratings yet
Introduction To Query Processing
21 pages
Querry Processing and Indexing, Hashing
No ratings yet
Querry Processing and Indexing, Hashing
24 pages
Adms 2
No ratings yet
Adms 2
20 pages
CH 13 Updated
No ratings yet
CH 13 Updated
30 pages
3 Query Processing and Optimization-1
No ratings yet
3 Query Processing and Optimization-1
18 pages
What Is Query: Lecture's Name: Amanj Anwar Abdullah
No ratings yet
What Is Query: Lecture's Name: Amanj Anwar Abdullah
6 pages
SQL Unit - 1
No ratings yet
SQL Unit - 1
31 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
Unit 10 PL-SQL Query Processing & Query Optimization
No ratings yet
Unit 10 PL-SQL Query Processing & Query Optimization
8 pages
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
ThermoFID Operation Manual V2002
No ratings yet
ThermoFID Operation Manual V2002
106 pages
X300M STX 8
No ratings yet
X300M STX 8
6 pages
Hitachi Tiered Storage Manager Software Datasheet
No ratings yet
Hitachi Tiered Storage Manager Software Datasheet
2 pages
Verilog l10 Mit
No ratings yet
Verilog l10 Mit
24 pages
New G Knowledge
No ratings yet
New G Knowledge
110 pages
Chapter 8
No ratings yet
Chapter 8
26 pages
Syllabus Mysql
100% (2)
Syllabus Mysql
8 pages
Fundamental of Information Technology (Pgdcca)
0% (1)
Fundamental of Information Technology (Pgdcca)
4 pages
Load and Store Instructions
100% (1)
Load and Store Instructions
32 pages
2m CC
No ratings yet
2m CC
4 pages
Persistent Staging Area: Purpose
No ratings yet
Persistent Staging Area: Purpose
2 pages
132 Tech-Tips From Computer Geeks - Com R20071221A
100% (5)
132 Tech-Tips From Computer Geeks - Com R20071221A
499 pages
Study
No ratings yet
Study
11 pages
Sata Card Jmb363
No ratings yet
Sata Card Jmb363
3 pages
Unit 2 Wireless Sensor Networks
No ratings yet
Unit 2 Wireless Sensor Networks
29 pages
Uspv 60-01-68-00-00
No ratings yet
Uspv 60-01-68-00-00
42 pages
Fame WhitePaper FameBenchmarking
No ratings yet
Fame WhitePaper FameBenchmarking
12 pages
I.T 101
No ratings yet
I.T 101
20 pages
h18632 Dell Emc Powerprotect DD Virtual Edition On Amazon Web Services WP
No ratings yet
h18632 Dell Emc Powerprotect DD Virtual Edition On Amazon Web Services WP
29 pages
Psexam Com MCQ Collection For Fundamentals of Comput
100% (2)
Psexam Com MCQ Collection For Fundamentals of Comput
6 pages
Siemens Simadyn D Brochure PDF
No ratings yet
Siemens Simadyn D Brochure PDF
36 pages
Computer Memory: Is An Essential Element of A Computer. Without Its Memory, A Computer Is of Hardly
No ratings yet
Computer Memory: Is An Essential Element of A Computer. Without Its Memory, A Computer Is of Hardly
3 pages
x3 Manual en v1.1
No ratings yet
x3 Manual en v1.1
24 pages
ExaGrid-Backup Comparison Chart DS
No ratings yet
ExaGrid-Backup Comparison Chart DS
1 page
Compatible USB Device List For MX49/MX61/MX88: Manufacturer Model Note
No ratings yet
Compatible USB Device List For MX49/MX61/MX88: Manufacturer Model Note
5 pages
How To Connect Leica TS16, CS20, MS60 To Computer by USB
No ratings yet
How To Connect Leica TS16, CS20, MS60 To Computer by USB
4 pages
W395 E1 03+CJ1M+Operation Manual
No ratings yet
W395 E1 03+CJ1M+Operation Manual
268 pages
Hardware Maintenance Manual: Thinkpad X250
No ratings yet
Hardware Maintenance Manual: Thinkpad X250
118 pages
Rafiki Secondary School Computer Studies Form 3 Mid Term Exam
No ratings yet
Rafiki Secondary School Computer Studies Form 3 Mid Term Exam
7 pages
Ext4 Foss
No ratings yet
Ext4 Foss
25 pages

DBMS Unit 4

Uploaded by

DBMS Unit 4

Uploaded by

UNIT IV

Measures of Query cost

1. Parsing and translation

The query processing works in the following way:

Parsing and Translation:

Thus, we can understand the working of a query processing in the below-described

select emp_name from Employee where salary>10000;

o σsalary>10000 (πsalary (Employee))

o πsalary (σsalary>10000 (Employee))

Query Evaluation Plan:

o Usually, a database system generates an efficient query evaluation plan, which

There are two methods of evaluating the query.

SELECT * FROM STUDENT s, CLASS c

There are two types of pipelining:

Demand Driven or Lazy evaluation

Producer Driven or Eager Pipelining

Query Optimization in DBMS

There are two methods of query optimization.

1. Cost based Optimization (Physical)

You might also like