0% found this document useful (0 votes)
8 views24 pages

Dbms Seminar

The document outlines the steps involved in query processing and optimization in distributed databases, including scanning, parsing, validating, and evaluating queries. It discusses various algorithms for selecting and joining operations, emphasizing the importance of query optimization techniques such as heuristic rules and cost estimation for efficient execution. The document also details the cost components associated with query execution, including access, storage, computation, memory usage, and communication costs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views24 pages

Dbms Seminar

The document outlines the steps involved in query processing and optimization in distributed databases, including scanning, parsing, validating, and evaluating queries. It discusses various algorithms for selecting and joining operations, emphasizing the importance of query optimization techniques such as heuristic rules and cost estimation for efficient execution. The document also details the cost components associated with query execution, including access, storage, computation, memory usage, and communication costs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT - 5

Q U E RY P R O C E S S I N G
&
DISTRIBUTED
D ATA B A S E S

VALARMATHI
M
II – CSE D
OVRVIEW
Query Processing Steps:
1. Scanning, Parsing, Validating and
translation
2. Optimization
3. Evaluation
OPTIMIZATION
Query Optimization: Amongst all equivalent
evaluation plans choose the one with lowest cost.
 Cost is estimated using statistical information
from the
database catalog.
e.g. number of tuples in each relation, size of
tuples, etc.

E.g., salary75000(salary(instructor)) is equivalent


to
salary(salary75000(instructor))

 Consider total rows as 1000 and number of


rows satisfying the given condition as 400.
ALGORITHMS FOR
SELECT OPERATION
For SIMPLE queries ,

• Linear Search: Scan each file block and test all


records to see whether they satisfy the selection
condition.
Linear search can be applied regardless of
selection
condition or ordering of records in the file, or
availability of indices.
• Binary Search: It is possible only when the data
is present in sorted order and index values are
needed.

Index: An index in DBMS contains key values


from indexed columns and pointers (references)
SELECT using index values,

i) Using primary key:


 Equality comparison is made on a key
attribute – return single record
 Compariosn condition is <,>,<=.>= on a
key attribute – return multiple records.

ii) Using secondary key:


 Equality comparison is made on non key
attribute
 If the fields in the relation have unique
values only – then single record is returned.
 If duplicates exists , then multiple records
are returned.
Eg: Order: O_ID P_NAME C_ID
(Primary)
1 AAA 1001
2 BBB 1002
3 CCC 1001
Linear, Binary:  (O_ID=2) (Order)

Primary – single :  (O_ID =3) (Order)


Primary – multiple :  (O_ID >=2) (Order)

Secondary – single :  (C_ID = 1002) (Order)


Secondary – multiple :  (C_ID = 1001)
(Order) (or)
 (C_ID > 1000)
For complex queries,

 Conjunctive Select using Individual Index – The


database scans separate indexes for each condition and
finds the intersection of matching records.

 Conjunctive Select using Composite Index – The


composite index directly filters records based on multiple
conditions together, improving efficiency.

 Conjunctive Select using Record Pointers – The


indexes retrieve record pointers for each condition, and
only records matching all conditions are selected.

 Disjunctive selection by union of identifiers - The


result includes records that satisfy at least one of the
given conditions.
Eg: O_ID P_ID C_ID
1 101 1001
2 102 1002
3 101 1003
4 103 1001
5 102 1003
Primary key – O_ID
Composite key – (P_ID,C_ID)
Query:  ( P_ID=101 AND C_ID=1003) (Order)
1. Conj.selc using individual index: Two conditions
are executed separately and finally intersected.
2. Conj.selc using composite index: comp.keys are
(101,1001),(102,1002),(101,1003)…(102,1003).
3.Conjuctive selection using record pointers:
For P_ID=101, record pointers for O_ID are 1,3
For C_ID=1003, record pointers for O_ID are
3,5
Intersecting
O_ID
both we get
P_ID C_ID
3 101 1003

4. Disjunctive selection by union of identifiers:


Query:  ( P_ID=101 OR C_ID=1002) (Order)
O_ID P_ID C_ID
1 101 1001
2 102 1002
3 101 1003
ALGORITHMS FOR JOIN
OPERATION
Emp_ID Name Dept_ID Dept_Nam
Dept_ID
e
1 Alice 101
2 Bob 102 101 HR

3 Charlie 101 102 IT


4 David 103 103 Sales

1. Nested Loop Join (Brute Force) :


 It compares each row in the Employee (E) table with every row in
the Department (D) table.
Example:
 Alice (Dept_ID = 101) is compared with all Dept_IDs in D. It
matches with HR.
 Bob (Dept_ID = 102) is compared with all Dept_IDs in D. It
matches with IT.
2. Single Loop Join (Index Nested Loop Join):
 Uses an index on Dept_ID in D for fast lookup.
 Instead of checking every row, it directly finds matches using
indexing. Index for Dep_ID

Dept_ID Pointer to Employee Row


(1, Alice, 101), (3, Charlie,
101
101)
102 (2, Bob, 102)

103 (4, David, 103)

3. Sort-Merge Join (SMJ):


 Used when both tables are sorted on the join key.
 Efficient for large datasets as it avoids a full table scan.
 Works in two steps:
o Sorting Phase – Both tables are sorted based on the join
key.
o Merge Phase – Tables are scanned once in order,
matching values efficiently.
4.Hash join:
 Works by hashing one table (smaller one) and probing the
other.
 Used when no sorting or index exists on the join column.
 Works in two steps:
 Build Phase – Create a hash table for the smaller table
(e.g., Department).
 Probe Phase – Scan the larger table (Employee) and
match using the hash table.
Hash table: Resultant table:
Fn= (Dept_Id %10)
Dept_Na
3 Sales Emp_ID Name Dept_ID
me
1 Alice 101 HR
2 IT 3 Charlie 101 HR

2 Bob 102 IT
1 HR 4 David 103 Sales
Final Conclusion:

 All joins produce the same result—only


the processing method differs.

 Nested Loop: Simple but slow.

 Index Nested Loop: Faster if an index is


available.

 Sort-Merge: Efficient if data is already


sorted.

 Hash Join: Best for large datasets with


enough memory.
QUERY OPTIMIZATION
USING
Introduction:
HEURISTICS
Heuristic query optimization applies predefined rules
(heuristics) to rearrange and simplify a query before
execution, improving efficiency. These rules help reduce query
cost by minimizing data retrieval and computation
overhead.

Rules:
1) Draw initial query tree.
2) Move SELECT down the tree
3) Move Restrictive SELECT operation
4) Replace CARTESIAN PRODUCT and SELECT operation with
JOIN operation.
5) Move PROJECT operation down the tree.

Eg:
Employee (Fname,Lnmae,ssn,Bdate,Address,Dno);
Works_for (Essn,Pno,hours);
Project (Pname,Pnum,Plocation,Dnum)
Step – 1: (a) Initial (canonical) query tree for SQL
query Q.
Step-2: Moving SELECT operations down the
query tree.
Step – 3: Applying the more restrictive
SELECT operation first.
Step – 4: Replacing CARTESIAN PRODUCT
and SELECT with JOIN operations
Step – 5: Moving PROJECT operations
down the query tree.
COST ESTIMATION
 The main aim of query optimization is to
choose the most efficient way of
implementing the relational algebra
operations at the lowest possible cost.
 The query optimizer should not depend
solely on heuristic rules, but it should also
estimate the cost of executing the different
strategies and find out the strategy with the
minimum cost estimate.
 The cost functions are only estimates and
not exact values.
 The cost depends on the cardinality of the
inputs.
Cost Components of Query
Execution
• The cost of executing the query includes the
following components:-
 Access cost to secondary storage.
 Storage cost.
 Computation cost.
 Memory uses cost.
 Communication cost.
i. Access Cost to Secondary Storage
 Disk I/O Cost → Reading/writing tables and indexes from
disk.
 Index Lookup Cost → Searching for indexed records.
 Sequential vs. Random Access Cost → Sequential
scans are cheaper than random accesses.
ii) Storage Cost
 Data Storage Cost → Space occupied by tables and
indexes.
 Index Storage Cost → Extra space required for maintaining
indexes.
 Temporary Storage Cost → Space needed for intermediate
query results.

iii) Computation Cost


 CPU Cost → Processing operations like filtering (WHERE),
sorting (ORDER BY), joining, and aggregation.
 Function Evaluation Cost → Cost of executing functions
(e.g., AVG(), SUM()).
 Join Operation Cost → Nested loop, hash join, or sort-
merge join computation costs.
iv) Memory Usage Cost
 Buffer Pool Cost → Memory required to store frequently
accessed pages.
 Sorting and Hashing Cost → Memory used in operations
like sorting (ORDER BY) and hashing (HASH JOIN).
 Intermediate Result Storage Cost → Memory used for
temporary query results.

v) Communication Cost (For Distributed Databases)


 Data Transfer Cost → Cost of sending query results
between servers.
 Query Coordination Cost → Overhead of coordinating
execution across multiple nodes.
 Network Latency Cost → Time delay in data transmission
over the network.

You might also like