Advance Concept in Data Bases Unit-2 by Arun Pratap Singh

The document discusses query processing and optimization in databases. It describes the basic steps in query processing as parsing and translation, optimization, and evaluation. Query optimization aims to determine the most efficient execution plan by considering costs and selecting the lowest cost plan. The document outlines the stages of query optimization as parsing, decomposition, normalization, semantic analysis, simplification, reconstruction and cost-based plan selection.

Uploaded by

ArunPratapSingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

820 views51 pages

Advance Concept in Data Bases Unit-2 by Arun Pratap Singh

Uploaded by

ArunPratapSingh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

PREPARED BY ARUN PRATAP SINGH MTECH 2nd SEMESTER

PREPARED BY ARUN PRATAP SINGH 1

1

QUERY PROCESSING AND OPTIMIZATION INTRODUCTION :
A database management system manages a large volume of data which can be retrieved by
specifying a number of queries expressed in a high-level query language such as SQL.
Whenever a query is submitted to the database system, a number of activities are performed
to process that query. Query processing includes translation of high-level queries into low-
level expressions that can be used at the physical level of the file system, query optimization,
and actual execution of the query to get the result. Query optimization is a process in which
multiple query-execution plans for satisfying a query are examined and a most efficient query
plan is identified for execution.

Basic Steps in Query Processing :-
1. Parsing and translation
2. Optimization
3. Evaluation

Parsing and translation :

Check syntax and verify relations.
Translate the query into an equivalent relational algebra expression.

Optimization :

Generate an optimal evaluation plan (with lowest cost) for the query plan.

Evaluation :

The query-execution engine takes an (optimal) evaluation plan, executes that plan, and returns
the answers to the query.

UNIT : II

PREPARED BY ARUN PRATAP SINGH 2

2

PREPARED BY ARUN PRATAP SINGH 3

3

PREPARED BY ARUN PRATAP SINGH 4

4

QUERY OPTIMIZATION :

Query optimization is a function of many relational database management systems. The query
optimizer attempts to determine the most efficient way to execute a given query by considering the
possible query plans.
Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to
database server, and parsed by the parser, they are then passed to the query optimizer where
optimization occurs. However, some database engines allow guiding the query optimizer with hints.

PREPARED BY ARUN PRATAP SINGH 5

5
What is Query Optimization?
Suppose you were given a chance to visit 15 pre-selected different cities in Europe. The
only constraint would be Time
-> Would you have a plan to visit the cities in any order?
Plan:
-> Place the 15 cities in different groups based on their proximity () to each other.
-> Start with one group and move on to the next group.
Important point made over here is that you would have visited the cities in a more organized
manner, and the Time constraint mentioned earlier would have been dealt with efficiently.

Query Optimization works in a similar way:
There can be many different ways to get an answer from a given query. The result would
be same in all scenarios.

DBMS strive to process the query in the most efficient way (in terms of Time) to
produce the answer.

Cost = Time needed to get all answers

Starting with System-R, most of the commercial DBMSs use cost-based optimizers.
The estimation should be accurate and easy. Another important point is the need for
being logically consistent because the least cost plan will always be consistently low.

PREPARED BY ARUN PRATAP SINGH 6

6

PREPARED BY ARUN PRATAP SINGH 7

7

Steps in a Cost-based query optimization :

1. Parsing
2. Transformation
3. Implementation
4. Plan selection based on cost estimates

Query Flow :

PREPARED BY ARUN PRATAP SINGH 8

8
Query Parser Verify validity of the SQL statement. Translate query into an internal
structure using relational calculus.
Query Optimizer Find the best expression from various different algebraic
expressions. Criteria used is Cheapness
Code Generator/Interpreter Make calls for the Query processor as a result of the
work done by the optimizer.
Query Processor Execute the calls obtained from the code generator.

Cost-based query Optimization: Algebraic Expressions
If we had the following query-
SELECT p.pname, d.dname
FROM Patients p, Doctors d
WHERE p.doctor = d.dname
AND d.dgender = M

PREPARED BY ARUN PRATAP SINGH 9

9

PREPARED BY ARUN PRATAP SINGH 10

10

SYNTAX ANALYZER :

The syntax analyser takes the query from the users, parses it into tokens and analyses the
tokens and their order to make sure they comply with the rules of the language grammar. If
an error is found in the query submitted by the user, it is rejected and an error code together
with an explanation of why the query was rejected is returned to the user.

The syntax analyzer takes the query from the users, parses it into tokens andanalyses the tokens
and their order to make sure they follow the rules of the language grammar.
Is an error is found in the query submitted by the user, it is rejected and an error code together
with an explanation of why the query was rejected is return to the user.
A simple form of the language grammar that could use to implement SQL statement is given
bellow :
QUERY = SELECT + FROM + WHERE
SELECT = SELECT + <CLOUMN LIST>
FROM = FROM + <TABLE LIST>
WHERE = WHERE + VALUE1 OP VALUE2
VALUE1 = VALUE / COLUMN NAME
VALUE2 = VALUE / COLUMN NAME
OP = >, <, >=, <=, =, <>

PREPARED BY ARUN PRATAP SINGH 11

11
QUERY DECOMPOSITION :

The aims of query decomposition
(1) To transform a high-level query into a relational algebra query.
(2) To check that the query is syntactically and semantically correct.

The typical stages of query decomposition are analysis, normalization, semantic
analysis, simplification, and query restructuring.

ANALYSIS :

PREPARED BY ARUN PRATAP SINGH 12

12

PREPARED BY ARUN PRATAP SINGH 13

13

PREPARED BY ARUN PRATAP SINGH 14

14

NORMALIZATION :

PREPARED BY ARUN PRATAP SINGH 15

15

PREPARED BY ARUN PRATAP SINGH 16

16

PREPARED BY ARUN PRATAP SINGH 17

17
SEMANTIC ANALYSIS :

PREPARED BY ARUN PRATAP SINGH 18

18

PREPARED BY ARUN PRATAP SINGH 19

19

QUERY SIMPLIFIER :

PREPARED BY ARUN PRATAP SINGH 20

20

PREPARED BY ARUN PRATAP SINGH 21

21

PREPARED BY ARUN PRATAP SINGH 22

22

QUERY RECONSTRUCTION :

PREPARED BY ARUN PRATAP SINGH 23

23

PREPARED BY ARUN PRATAP SINGH 24

24

PREPARED BY ARUN PRATAP SINGH 25

25

PREPARED BY ARUN PRATAP SINGH 26

26
QUERY OPTIMIZATION :

PREPARED BY ARUN PRATAP SINGH 27

27

PREPARED BY ARUN PRATAP SINGH 28

28

PREPARED BY ARUN PRATAP SINGH 29

29

PREPARED BY ARUN PRATAP SINGH 30

30

PREPARED BY ARUN PRATAP SINGH 31

31

PREPARED BY ARUN PRATAP SINGH 32

32

PREPARED BY ARUN PRATAP SINGH 33

33

PREPARED BY ARUN PRATAP SINGH 34

34

PREPARED BY ARUN PRATAP SINGH 35

35

PREPARED BY ARUN PRATAP SINGH 36

36

PREPARED BY ARUN PRATAP SINGH 37

37

PREPARED BY ARUN PRATAP SINGH 38

38

The main aim of query optimization is to choose the most efficient way of implementing the
relational algebra operations at the lowest possible cost. Therefore, the query optimizer
should not depend solely on heuristics rules, but, it should also estimate the cost of executing
the different strategies and find out the strategy with the minimum cost estimate. The method
of optimising the query by choosing a strategy those results in minimum cost is called cost-
based query optimization. The cost-based query optimization uses formulae that estimate the

PREPARED BY ARUN PRATAP SINGH 39

39
costs for a number of options and selects the one with lowest cost and most efficient to
execute. The cost functions used in query optimization are estimates and not exact cost
functions. So, the optimization may select a query execution strategy that is not the optimal
one.
The cost of an operation is heavily dependent on its selectivity, that is, the proportion of the
input relation(s) that forms the output. In general, different algorithms are suitable for low-
and high-selectivity queries. In order for a query optimiser to choose a suitable algorithm for
an operation an estimate of the cost of executing that algorithm must be provided. The cost
of an algorithm is dependent on the cardinality of its input. To estimate the cost of different
query execution strategies, the query tree is viewed as containing a series of basic operations
which are linked in order to perform the query. Each basic operation has an associated cost
function whose argument(s) are the cardinality of its input(s). It is also important to know
the expected cardinality of an operations output, since this forms the input to the next
operation in the tree. The expected cardinalities are derived from statistical estimates of a
querys selectivity, that is, the portion of the tuple satisfying the query.
The main aim of query optimization is to choose the most efficient way of implementing the
relational algebra operations at the lowest possible cost.
Therefore the query optimizer should not depend solely on heuristic rules, but, it should also
estimate the cost of executing the different strategies and find out the strategy with the minimum
cost estimate.
The method of optimizing the query by choosing a strategy those result in minimum cost is called
cost-based query optimization.
The cost-based query optimization uses the formula that estimate the cost for a number of options
and selects the one with lowest cost and the most efficient to execute.
The cost functions used in query optimization are estimates and not exact cost functions.
The cost of an operation is heavily dependent on its selectivity, that is, the proportion of select
operation(s) that forms the output.
In general the different algorithms are suitable for low or high selectivity queries. In order for query
optimizer to choose suitable algorithm for an operation an estimate of the cost of executing that
algorithm must be provided.
The cost of an algorithm is depend of a cardinality of its input.
To estimate the cost of different query execution strategies, the query tree is viewed as containing
a series of basic operations which are linked in order to perform the query.
It is also important to know the expected cardinality of an operations output because this forms
the input to the next operation.
Cost Components of Query Execution :-
The success of estimating the size and cost of intermediate relational algebra operations depends
on the amount the accuracy of statistical data information stored with DBMS.

PREPARED BY ARUN PRATAP SINGH 40

40

PREPARED BY ARUN PRATAP SINGH 41

41

STRUCTURE OF QUERY EVALUATION PLAN :

PREPARED BY ARUN PRATAP SINGH 42

42

PREPARED BY ARUN PRATAP SINGH 43

43

PIPELINING AND MATERIALIZATION :

PREPARED BY ARUN PRATAP SINGH 44

44

PREPARED BY ARUN PRATAP SINGH 45

45

PREPARED BY ARUN PRATAP SINGH 46

46

PREPARED BY ARUN PRATAP SINGH 47

47

PREPARED BY ARUN PRATAP SINGH 48

48

PREPARED BY ARUN PRATAP SINGH 49

49

SOME QUESTIONS
Q . 1 Explain inter-query parallelism ?
Ans : Inter-query parallelism is a form of parallelism in the evaluation of database queries, in
which several different queries execute concurrently on multiple processors to improve the
overall throughput of the system.
When multiple non-conflicting requests are submitted to a database management system, then
the system can execute them in parallel to improve the overall throughput. This form of parallelism is
called inter-query parallelism. Inter-query parallelism is a consequence of the concurrency of user
requests. It is orthogonal to intra-query parallelism, in which several processors cooperate for the faster
execution of a single query.
Inter-query parallelism results from the ability to execute multiple queries at the same time while intra-
query parallelism is achieved by breaking up a single query into a number of subqueries each of which is
executed at a different site, accessing a different part of the distributed database.
If the user access to the distributed database consisted only of querying (i.e.,
read-only access), then provision of inter-query and intra-query parallelism would
imply that as much of the database as possible should be replicated. However, since most database
accesses are not read-only, the mixing of read and update operations requires the implementation of
elaborate concurrency control and commit protocols.
o Queries/transactions execute in parallel with one another.
o Increases transaction throughput; used primarily to scale up a transaction processing system to
support a larger number of transactions per second.
o Easiest form of parallelism to support, particularly in a shared-memory parallel database, because
even sequential database systems support concurrent processing.

PREPARED BY ARUN PRATAP SINGH 50

50
o More complicated to implement on shared-disk or shared-nothing architectures
o Locking and logging must be coordinated by passing messages between processors.
o Data in a local buffer may have been updated at another processor.
o Cache-coherency has to be maintained reads and writes of data in buffer must find
latest version of data.
Cache Coherency Protocol
o Example of a cache coherency protocol for shared disk systems:
o Before reading/writing to a page, the page must be locked in shared/exclusive mode.
o On locking a page, the page must be read from disk
o Before unlocking a page, the page must be written to disk if it was modified.
o More complex protocols with fewer disk reads/writes exist.
o Cache coherency protocols for shared-nothing systems are similar. Each database page is assigned
a home processor. Requests to fetch the page or write it to disk are sent to the home processor.

Q . 2 Discuss cost estimation in query optimization.
Ans: Explain above.

DATA STR File
No ratings yet
DATA STR File
87 pages
Data Compression - Unit 3
No ratings yet
Data Compression - Unit 3
18 pages
Flag Reg
No ratings yet
Flag Reg
23 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
CC104 - Lessons
No ratings yet
CC104 - Lessons
7 pages
AOMEI Backupper
No ratings yet
AOMEI Backupper
11 pages
Tandberg SM6620
No ratings yet
Tandberg SM6620
4 pages
Syllabus
No ratings yet
Syllabus
12 pages
CO3 Session 11
No ratings yet
CO3 Session 11
27 pages
FSD Unit III
No ratings yet
FSD Unit III
22 pages
DS - Queue Best
No ratings yet
DS - Queue Best
7 pages
Running Text
No ratings yet
Running Text
7 pages
Sybase IQ: Installation and Configuration Guide
No ratings yet
Sybase IQ: Installation and Configuration Guide
158 pages
Tuxedo IPC Calculator
No ratings yet
Tuxedo IPC Calculator
1 page
Data Structures & Algorithm in Java - Robert Lafore - PPT
No ratings yet
Data Structures & Algorithm in Java - Robert Lafore - PPT
682 pages
Advanced Database System Chapter Three Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Three Query Processing and Optimization
94 pages
Ds Unit 1 Data Structures
No ratings yet
Ds Unit 1 Data Structures
28 pages
BCC Basic Troubleshooting Guide
No ratings yet
BCC Basic Troubleshooting Guide
21 pages
Chapter One1
No ratings yet
Chapter One1
21 pages
CS501 Solved MCQs Final Term
No ratings yet
CS501 Solved MCQs Final Term
45 pages
Information Theory: 1.1 Review of Probability
No ratings yet
Information Theory: 1.1 Review of Probability
23 pages
Digital Logic Design: Instructor: Yahya Ali Khan Email: Yahya - Ali@se - Uol.edu - PK
No ratings yet
Digital Logic Design: Instructor: Yahya Ali Khan Email: Yahya - Ali@se - Uol.edu - PK
46 pages
Co3 Session 23
No ratings yet
Co3 Session 23
27 pages
Lineland - HBase Architecture 101 - Storage
No ratings yet
Lineland - HBase Architecture 101 - Storage
15 pages
Microprocessor Systems & Interfacing EEE-342: Comsats University
No ratings yet
Microprocessor Systems & Interfacing EEE-342: Comsats University
8 pages
Croma Campus - UiPath (RPA) Training Curriculum
No ratings yet
Croma Campus - UiPath (RPA) Training Curriculum
4 pages
Data Warehousing and OLAP Technology For Data Mining: - Chapter 3
No ratings yet
Data Warehousing and OLAP Technology For Data Mining: - Chapter 3
35 pages
Wireshark Notes-OSI and TCP-IP
No ratings yet
Wireshark Notes-OSI and TCP-IP
34 pages
Data Center Design and Interconnection Network
No ratings yet
Data Center Design and Interconnection Network
12 pages
Xpath Tutorial
100% (2)
Xpath Tutorial
60 pages
Exam 70-764 Administering A SQL Database Infrastructure: MCSA / MCSE For Microsoft SQL Server 2016
No ratings yet
Exam 70-764 Administering A SQL Database Infrastructure: MCSA / MCSE For Microsoft SQL Server 2016
25 pages
Cmu F 1 Aca 002b (Attendance Sheet Small Population)
No ratings yet
Cmu F 1 Aca 002b (Attendance Sheet Small Population)
1 page
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Secondary Storage Devices
No ratings yet
Secondary Storage Devices
7 pages
DVR 416 Center Manual (Eng)
No ratings yet
DVR 416 Center Manual (Eng)
114 pages
Ai Unit 2
No ratings yet
Ai Unit 2
135 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Deblocking Filter
No ratings yet
Deblocking Filter
17 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
CRYPTOGRAPHY & NETWORK SECURITY (Autosaved)
No ratings yet
CRYPTOGRAPHY & NETWORK SECURITY (Autosaved)
82 pages
CS3591 - CN UNIT 1-Application Layer
No ratings yet
CS3591 - CN UNIT 1-Application Layer
50 pages
Setup Help
No ratings yet
Setup Help
4 pages
R22 Unit 5
No ratings yet
R22 Unit 5
23 pages
Information Theory, Coding and Cryptography Unit-3 by Arun Pratap Singh
50% (4)
Information Theory, Coding and Cryptography Unit-3 by Arun Pratap Singh
64 pages
Information Theory, Coding and Cryptography Unit-1 by Arun Pratap Singh
71% (7)
Information Theory, Coding and Cryptography Unit-1 by Arun Pratap Singh
46 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
28 pages
Web Technology and Commerce Unit-2 by Arun Pratap Singh
No ratings yet
Web Technology and Commerce Unit-2 by Arun Pratap Singh
65 pages
Aim: Program:: Implement The Data Link Layer Framing Methods Such As Character Count
100% (1)
Aim: Program:: Implement The Data Link Layer Framing Methods Such As Character Count
21 pages
System Programming Unit-5 by Arun Pratap Singh
100% (2)
System Programming Unit-5 by Arun Pratap Singh
112 pages
CN Kca 303
No ratings yet
CN Kca 303
126 pages
Soft Computing Unit-4 by Arun Pratap Singh
75% (4)
Soft Computing Unit-4 by Arun Pratap Singh
123 pages
Cluster Load Balancing and Failover
No ratings yet
Cluster Load Balancing and Failover
17 pages
Webutil Configuration
No ratings yet
Webutil Configuration
2 pages
System Programming Unit-4 by Arun Pratap Singh
No ratings yet
System Programming Unit-4 by Arun Pratap Singh
83 pages
System Programming Unit-4 by Arun Pratap Singh
No ratings yet
System Programming Unit-4 by Arun Pratap Singh
83 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Discrete Mathematics Question Bank
No ratings yet
Discrete Mathematics Question Bank
45 pages
Web Technology and Commerce Unit-3 by Arun Pratap Singh
No ratings yet
Web Technology and Commerce Unit-3 by Arun Pratap Singh
32 pages
Web Technology and Commerce Unit-4 by Arun Pratap Singh
No ratings yet
Web Technology and Commerce Unit-4 by Arun Pratap Singh
60 pages
Software Testing Methodologies Unit I
No ratings yet
Software Testing Methodologies Unit I
195 pages
Unit 4 - Wireless and Mobile Computing - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Wireless and Mobile Computing - WWW - Rgpvnotes.in
10 pages
Soft Computing Unit-2
No ratings yet
Soft Computing Unit-2
61 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
5CS3-01: Information Theory & Coding: Unit-3 Linear Block Code
No ratings yet
5CS3-01: Information Theory & Coding: Unit-3 Linear Block Code
75 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Daa Unit-5
No ratings yet
Daa Unit-5
29 pages
Unit 3 - Computer Networks - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Computer Networks - WWW - Rgpvnotes.in
18 pages
Week 1 - Introduction To Information Security
No ratings yet
Week 1 - Introduction To Information Security
47 pages
Unit 5 - Advance Computer Networks - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Advance Computer Networks - WWW - Rgpvnotes.in
14 pages
Associative Memory Neural Networks
100% (1)
Associative Memory Neural Networks
35 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Instance Based Learning
100% (1)
Instance Based Learning
49 pages
Introduction To Object Oriented Database: Unit-I
No ratings yet
Introduction To Object Oriented Database: Unit-I
67 pages
OODBMS and ORDBMS
No ratings yet
OODBMS and ORDBMS
6 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
100% (2)
Advance Concept in Data Bases Unit-3 by Arun Pratap Singh
81 pages
Information Theory Coding and Cryptograp PDF
No ratings yet
Information Theory Coding and Cryptograp PDF
140 pages
Advance Concept in Data Bases Unit-1 by Arun Pratap Singh
100% (2)
Advance Concept in Data Bases Unit-1 by Arun Pratap Singh
71 pages
Unit I
No ratings yet
Unit I
53 pages
System Programming Unit-2 by Arun Pratap Singh
100% (1)
System Programming Unit-2 by Arun Pratap Singh
82 pages
Information Theory, Coding and Cryptography Unit-5 by Arun Pratap Singh
100% (2)
Information Theory, Coding and Cryptography Unit-5 by Arun Pratap Singh
79 pages
STM Unit 5
No ratings yet
STM Unit 5
31 pages
Soft Computing Unit-2 by Arun Pratap Singh
100% (1)
Soft Computing Unit-2 by Arun Pratap Singh
74 pages
Soft Computing Unit-1 by Arun Pratap Singh
100% (1)
Soft Computing Unit-1 by Arun Pratap Singh
100 pages
System Programming Unit-1 by Arun Pratap Singh
100% (2)
System Programming Unit-1 by Arun Pratap Singh
56 pages
Information Theory, Coding and Cryptography Unit-2 by Arun Pratap Singh
100% (4)
Information Theory, Coding and Cryptography Unit-2 by Arun Pratap Singh
36 pages
Basic Statistical Descriptions of Data: Dr. Amiya Ranjan Panda
No ratings yet
Basic Statistical Descriptions of Data: Dr. Amiya Ranjan Panda
35 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
Soft Computing Unit-5 by Arun Pratap Singh
100% (1)
Soft Computing Unit-5 by Arun Pratap Singh
78 pages
Unit 2 HSN
100% (1)
Unit 2 HSN
50 pages
Web Technology and Commerce Unit-5 by Arun Pratap Singh
100% (3)
Web Technology and Commerce Unit-5 by Arun Pratap Singh
82 pages
Data Compression Intro
100% (1)
Data Compression Intro
107 pages
Web Technology and Commerce Unit-1 by Arun Pratap Singh
No ratings yet
Web Technology and Commerce Unit-1 by Arun Pratap Singh
38 pages
Object Oriented Technology Mcse
No ratings yet
Object Oriented Technology Mcse
1 page

Advance Concept in Data Bases Unit-2 by Arun Pratap Singh

Uploaded by

Advance Concept in Data Bases Unit-2 by Arun Pratap Singh

Uploaded by

PREPARED BY ARUN PRATAP SINGH MTECH 2nd SEMESTER

PREPARED BY ARUN PRATAP SINGH 1

You might also like