0% found this document useful (0 votes)
9 views22 pages

Unit-3 RDBMS-1

The document discusses Query Processing and Optimization in relational databases, focusing on relational algebra operations such as selection, projection, union, intersection, difference, Cartesian product, and rename. It also covers equivalence rules for transforming relational expressions and provides an overview of SQL joins, including inner, left, right, and full joins with examples. The content emphasizes the importance of these concepts in efficiently querying and manipulating data within relational databases.

Uploaded by

9923022056
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

Unit-3 RDBMS-1

The document discusses Query Processing and Optimization in relational databases, focusing on relational algebra operations such as selection, projection, union, intersection, difference, Cartesian product, and rename. It also covers equivalence rules for transforming relational expressions and provides an overview of SQL joins, including inner, left, right, and full joins with examples. The content emphasizes the importance of these concepts in efficiently querying and manipulating data within relational databases.

Uploaded by

9923022056
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Unit -III

Query Processing and Optimization: Types of Evaluation of Relational Algebra Expressions,


Query Equivalence, Join strategies, Query Optimization Algorithms.

What is Relational Algebra?


Relational algebra consists of a certain set of rules or operations that are widely used to
manipulate and query data from a relational database. It can be facilitated by utilizing SQL
language and helps users interact with database tables based on querying data from the
database more efficiently and effectively.
Relational Algebra incorporates a collection of operations, such as filtering data or combining
data, that help us organize and manipulate data more efficiently. This ” algebra ” is the
foundation for most database queries, and it enables us to extract the required information
from the databases by using SQL query language.
Types of Relational operation

1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Intersection(∩)
5. Set Difference(-)
6. Cartesian Product(X)
7. Rename(ρ)
1. Selection(σ) :
▪ Selection Operation is basically used to filter out rows from a given
table based on certain given condition. It basically allows you to
retrieve only those rows that match the condition as per condition
passed during SQL Query.
▪ It is used to select required tuples of the relations.
Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300

Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

2. Projection(π) :
While Selection operation works on rows , similarly projection operation of relational
algebra works on columns. It basically allows you to pick specific columns from a
given relational table based on the given condition and ignoring all the other
remaining columns.

Notation: ∏ A1, A2, An (r)

Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION
NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye

Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:
∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

Common Example Table for (Union, Intersection, Difference)

DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO

Johnson A-101
Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION
CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11

Williams L-17

3. Union(U) :
Union Operator is basically used to combine the results of two queries into a single
result. The only condition is that both queries must return same number of columns
with same data types.

Notation: R ∪ S

o A union operation must hold the following condition:


o R and S must have the attribute of the same number.
o Duplicate tuples are eliminated automatically.
Example: Using the above DEPOSITOR table and BORROW table
Input:
∏ CUSTOMER_NAME (BORROW) ∪ ∏ CUSTOMER_NAME (DEPOSITOR)

Output:
CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

4. Set Intersection(∩) :
Set Intersection basically allows to fetches only those rows of data that are common
between two sets of relational tables.

Notation: R ∩ S

Example: Using the above DEPOSITOR table and BORROW table


Input:
∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Smith

Jones

5. Set Difference(-) :
Set difference basically provides the rows that are present in one table , but not in
another tables.

Notation: R - S

Example: Using the above DEPOSITOR table and BORROW table

Input:
∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)

Output:
CUSTOMER_NAME

Jackson

Hayes

Willians

Curry

6. Cross Product(X) :
Cartesian product Operator combines every row of one table with every row of
another table , producing all the possible combination. It’s mostly used as a precursor
to more complex operation like joins.

Notation: E X D

Example:
EMPLOYEE
EMP_ID EMP_NAME EMP_DEPT

1 Smith A
2 Harry C

3 John B

DEPARTMENT
DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:
EMPLOYEE X DEPARTMENT
Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal
7. Rename(ρ) :
Rename operator basically allows you to give a temporary name to a specific
relational table or to its columns. It is very useful when we want to avoid ambiguity,
especially in complex Queries.
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.

ρ(STUDENT1, STUDENT)

Equivalence Rules
The equivalence rule says that expressions of two forms are the same or equivalent
because both expressions produce the same outputs on any legal database instance. It
means that we can possibly replace the expression of the first form with that of the
second form and replace the expression of the second form with an expression of the
first form.

Thus, the optimizer of the query-evaluation plan uses such an equivalence rule or
method for transforming expressions into the logically equivalent one.
The optimizer uses various equivalence rules on relational-algebra expressions for
transforming the relational expressions. For describing each rule, we will use the
following symbols:

θ, θ1, θ2 … : Used for denoting the predicates.

L1, L2, L3 … : Used for denoting the list of attributes.

E, E1, E2 …. : Represents the relational-algebra expressions.

Let's discuss a number of equivalence rules:


Rule 1: Cascade of σ
This rule states the deconstruction of the conjunctive selection operations into a
sequence of individual selections. Such a transformation is known as a cascade of σ.
σθ1 ᴧ θ 2 (E) = σθ1 (σθ2 (E))
Rule 2: Commutative Rule

a) This rule states that selections operations are commutative.


σθ1 (σθ2 (E)) = σ θ2 (σθ1 (E))
b) Theta Join (θ) is commutative.
E1 ⋈ θ E 2 = E 2 ⋈ θ E 1 (θ is in subscript with the join symbol)
However, in the case of theta join, the equivalence rule does not work if the order of
attributes is considered. Natural join is a special case of Theta join, and natural join is
also commutative.
Rule 3: Cascade of ∏
This rule states that we only need the final operations in the sequence of the
projection operations, and other operations are omitted. Such a transformation is
referred to as a cascade of ∏.

∏L1 (∏L2 (. . . (∏Ln (E)) . . . )) = ∏L1 (E)

Rule 4: We can combine the selections with Cartesian products as well as theta joins

1. σθ (E1 x E2) = E1θ ⋈ E2


2. σθ1 (E1 ⋈ θ2 E2) = E1 ⋈ θ1ᴧθ2 E2
3.
Rule 5: Associative Rule

a) This rule states that natural join operations are associative.


(E1 ⋈ E2) ⋈ E3 = E1 ⋈ (E2 ⋈ E3)
b) Theta joins are associative for the following expression:
(E1 ⋈ θ1 E2) ⋈ θ2ᴧθ3 E3 = E1 ⋈ θ1ᴧθ3 (E2 ⋈ θ2 E3)

In the theta associativity, θ2 involves the attributes from E2 and E3 only. There may
be chances of empty conditions, and thereby it concludes that Cartesian Product is
also associative.

Rule 6: Distribution of the Selection operation over the Theta join.

Under two following conditions, the selection operation gets distributed over the
theta-join operation:

a) When all attributes in the selection condition θ0 include only attributes of one of the
expressions which are being joined.
σθ0 (E1 ⋈ θ E2) = (σθ0 (E1)) ⋈ θ E2
b) When the selection condition θ1 involves the attributes of E1 only, and θ2 includes
the attributes of E2 only.

σθ1ꓥ θ2 (E1 ⋈ θ E2) = (σθ1 (E1)) ⋈ θ ((σθ2 (E2))

Rule 7: Distribution of the projection operation over the theta join.

Under two following conditions, the selection operation gets distributed over the
theta-join operation:

a) Assume that the join condition θ includes only in L1 υ L2 attributes of E1 and


E2 Then, we get the following expression:

∏L1υL2 (E1 ⋈ θ E2) = (∏L1 (E1)) ⋈ θ (∏L2 (E2))


b) Assume a join as E1 ⋈ E2. Both expressions E1 and E2 have sets of attributes as
L1 and L2. Assume two attributes L3 and L4 where L3 be attributes of the expression
E1, involved in the θ join condition but not in L1 υ L2 Similarly, an L4 be attributes of
the expression E2 involved only in the θ join condition and not in L1 υ L2 attributes.
Thus, we get the following expression:

∏L1υL2 (E1 ⋈ θ E2) = ∏L1υL2 ((∏L1υL3 (E1)) ⋈ θ ((∏L2υL4 (E2)))

Rule 8: The union and intersection set operations are commutative.

E1 υ E2 = E2 υ E1
E1 ꓵ E2 = E2 ꓵ E1

However, set difference operations are not commutative.

Rule 9: The union and intersection set operations are associative.


(E1 υ E2) υ E3 = E1 υ (E2 υ E3)
(E1 ꓵ E2) ꓵ E3 = E1 ꓵ (E2 ꓵ E3)

Rule 10: Distribution of selection operation on the intersection, union, and set
difference operations.

The below expression shows the distribution performed over the set difference
operation.
σp (E1 − E2) = σp(E1) − σp(E2)
We can similarly distribute the selection operation on υ and ꓵ by replacing with -.
Further, we get:

σp (E1 − E2) = σp(E1) −E2

Rule 11: Distribution of the projection operation over the union operation.
This rule states that we can distribute the projection operation on the union operation
for the given expressions.

∏L (E1 υ E2) = (∏L (E1)) υ (∏L (E2))

Apart from these discussed equivalence rules, there are various other equivalence
rules also.

Joins in SQL – Detailed Explanation with Examples

What are Joins in SQL?


A JOIN in SQL is used to combine rows from two or more tables based on a related
column between them. It helps retrieve data that exists in multiple tables and establish
relationships between them.
Types of Joins
The most common types of joins are:
1. INNER JOIN – Returns matching rows from both tables.
2. LEFT JOIN (LEFT OUTER JOIN) – Returns all rows from the left table and
matching rows from the right table.
3. RIGHT JOIN (RIGHT OUTER JOIN) – Returns all rows from the right table and
matching rows from the left table.
4. FULL JOIN (FULL OUTER JOIN) – Returns all rows from both tables; unmatched
rows will have NULL values.

Example Tables
Let's consider two tables:
1. Employees Table
employee_id name department_id

1 Alice 101

2 Bob 102

3 Charlie NULL

4 David 104

2. Departments Table
department_id department_name

101 HR

102 IT

103 Finance

1. INNER JOIN
Definition:
• Returns only the matching rows from both tables.
• Rows without a match are excluded.

Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
INNER JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name

1 Alice HR

2 Bob IT

Explanation:
• Alice’s department_id = 101 matches with HR.
• Bob’s department_id = 102 matches with IT.
• Charlie and David are excluded because their department_id does not match any row
in Departments.

2. LEFT JOIN (LEFT OUTER JOIN)


Definition:
• Returns all rows from the left table (Employees) and matching rows from the right
table (Departments).
• If there is no match, it returns NULL for columns from the right table.

Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
LEFT JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name

1 Alice HR

2 Bob IT

3 Charlie NULL

4 David NULL

Explanation:
• Alice and Bob have matching department_id values, so they get their department
names.
• Charlie and David do not have a matching department_id, so NULL is returned for
department_name.

3. RIGHT JOIN (RIGHT OUTER JOIN)


Definition:
• Returns all rows from the right table (Departments) and matching rows from the left
table (Employees).
• If there is no match, it returns NULL for columns from the left table.

Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
RIGHT JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name

1 Alice HR

2 Bob IT

NULL NULL Finance

Explanation:
• Alice and Bob have matching department_id values, so their names appear.
• The Finance department has no matching employees, so NULL appears for
employee_id and name.

4. FULL JOIN (FULL OUTER JOIN)


Definition:
• Returns all rows from both tables.
• If a row does not have a match, it will return NULL for missing values.

Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
FULL JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name

1 Alice HR

2 Bob IT

3 Charlie NULL

4 David NULL

NULL NULL Finance

Explanation:
• Alice and Bob have matching department_id values, so they appear normally.
• Charlie and David have no department, so NULL is returned for department_name.
• The Finance department has no employees, so NULL is returned for employee_id and
name.

Advanced Query Optimization in DBMS


Query Optimization is a technique of analyzing and deciding an execution plan that
computes the result of the query using less number of resources. The main goal of
query optimization is to find an execution plan for that query to reduce the time
required to process it.
Two main objectives of Query Optimization are:
• Determine the optimal plan to access the database.
• Reduce the time required to execute the query.
Components of Optimizer
There are three components of the optimizers:
• Transformer
• Estimator
• Plan Generator

Components of optimizers
Let's discuss each one by one:
Transformer: It takes parsed query as input which is represented by set of query
blocks. It determines that if it is advantageous to change the form of the query to
reduce the cost of execution.
Estimator: It determines the over all cost of execution plan. This estimator uses three
different measures to determine cost which includes:
• Selectivity: It is defined as a fraction of rows from a row set.
• Cardinality: It is defined as the number of rows returned by each operation in
executed plans.
• Cost: IT defines the estimated resource consumption for a plan.
To estimate cost, optimizer uses following factors:
• System resources (CPU, Memory and I/O)
• Cardinality
• Size of initial data set
Plan Generator: It explores various plans for query block by checking various access
paths, join methods and join orders. After checking various paths, optimizer picks the
path with the lowest cost.

Methods Of Query Optimization in DBMS


There are following two methods of Query Optimization in DBMS:
• Cost Based Query Optimization in DBMS
• Adaptive Query Optimization
Cost Based Query Optimization in DBMS
In Cost Based Query Optimization, optimizer associates a numerical value (known as
cost) for each step of feasible plan for a given query. Then, all these values are
collectively analyzed to get a cost estimate for that plan. After evaluating cost of all
feasible plans, optimizer finds the plan with lowest cost estimate.
Adaptive Query Optimization in DBMS
In Adaptive Query Optimization, optimizer is allowed to make run time changes to
the execution plans and can find new information to improve the optimizations. It is
helpful when existing statistics are not sufficient to generate the plans.
Feature set for Adaptive Query Optimization includes:

Feature set for Adaptive Query Optimization


Advanced Query Optimization Techniques
Query Explainers: Query explainer tools helps to understand how database query
executes. Query explainer helps in understanding and optimizing the query plan.
Example :- In SQL, EXPLAIN is example of query explainer
EXPLAIN SELECT * FROM department WHERE students > 80
Index Optimizations: While creating indexes, it is important to choose suitable index
type. This will improve the the performance of query search. It also reduce the time
for scanning full table and reduce resource consumption.
Batch Query: Multiple queries are processed in same batch which helps in reducing
system overhead by minimizing the number of database connections and queries. It
can process multiple operations in single transactions which reduce overhead.
In-Memory Storage: By using in-memory databases can help in improving the speed
for read operations for those queries which require low latency. This can be helpful
for caching results of frequently executed queries.
Data Denormalization: Data denormalization can be helpful in reducing the need for
complex joins. It helps to improve the read performance. It also helps in frequent
query execution in the read intensive systems.
Bitmap Index Usage: It is useful when field values have a limited number of
different occurrence. It is effective for the queries involving filtering on fields with
low cardinality.
Automatic Tuning Optimizers
Optimizers perform different actions based upon how they are invoked.
This includes following two types:
• Normal Optimization: In Normal Optimization, optimizer parses the query and
produces an execution plan within a specific time limits.
• SQL Tuning Advisor Optimization: In this, optimizer perform additional analysis to
further produce more efficient plan. The output of the optimizer is a series of action
along with their expected benefits to improve the plan.

Query Processing in DBMS


Query Processing is the activity performed in extracting data from the database. In
query processing, it takes various steps for fetching the data from the database. The
steps involved are:

1. Parsing and translation


2. Optimization
3. Evaluation

The query processing works in the following way:

Parsing and Translation


As query processing includes certain activities for data retrieval. Initially, the given
user queries get translated in high-level database languages such as SQL. It gets
translated into expressions that can be further used at the physical level of the file
system. After this, the actual evaluation of the queries and a variety of query -
optimizing transformations and takes place. Thus before processing a query, a
computer system needs to translate the query into a human-readable and
understandable language. Consequently, SQL or Structured Query Language is the
best suitable choice for humans. But, it is not perfectly suitable for the internal
representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is
similar to the parser of a query. When a user executes any query, for generating the
internal form of the query, the parser in the system checks the syntax of the query,
verifies the name of the relation in the database, the tuple, and finally the required
attribute value. The parser creates a tree of the query, known as 'parse-tree.' Further,
translate it into the form of relational algebra. With this, it evenly replaces all the use
of the views when used in the query.

Thus, we can understand the working of a query processing in the below-described


diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the
following query is undertaken:
select emp_name from Employee where salary>10000;
Thus, to make the system understand the user query, it needs to be translated in the
form of relational algebra. We can bring this query in the relational algebra form as:
o σsalary>10000 (πsalary (Employee))
o πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra operation by
using different algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate
the translated relational algebra expression with the instructions used for specifying
and evaluating each operation. Thus, after translating the user query, the system
executes a query evaluation plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
o A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query evaluation plan, which
minimizes its cost. This type of task performed by the database system and is known
as Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and
produces the output of the query.

Query Optimization Algorithms in RDBMS – Detailed Explanation


What is Query Optimization?
Query Optimization in Relational Database Management Systems (RDBMS) is the
process of choosing the most efficient way to execute a SQL query. The database
engine evaluates different execution plans and selects the one with the lowest cost in
terms of time, CPU, and memory usage.

1. Phases of Query Optimization


Query optimization happens in multiple stages:

a. Query Parsing and Translation


• The SQL query is parsed to check for syntax errors.
• The query is converted into an internal representation (e.g., relational algebra).

b. Query Rewriting (Logical Optimization)


• The optimizer transforms the query to an equivalent but more efficient form.
• Example:
SELECT * FROM employees WHERE age > 25 AND age > 30;
Can be optimized as:
SELECT * FROM employees WHERE age > 30;

c. Query Execution Plan Generation (Physical Optimization)


• The optimizer analyzes different execution plans using various strategies.
• The best plan is selected based on cost estimation.

7. Query Optimization Algorithms

(A) Heuristic-Based Optimization (Rule-Based Optimization)


Heuristic optimization uses predefined rules to transform queries into a more
efficient format.

Techniques in Heuristic Optimization

1. Predicate Pushdown – Move conditions as close to data retrieval as possible.


SELECT * FROM employees WHERE age > 30 AND department_id = 101;
o Instead of scanning all employees, the optimizer first filters department_id =
101, then checks age > 30.

2. Join Reordering – Change the order of joins to optimize performance.


o If Table A has 1 million rows and Table B has 1,000 rows, join Table B first
for efficiency.

3. Projection Pushdown – Select only necessary columns early.


SELECT name FROM employees WHERE department_id = 101;
o Instead of fetching all columns and then filtering, the optimizer retrieves only
the name column.

4. Eliminating Redundant Expressions


SELECT name, department_id FROM employees WHERE age > 30 AND age > 25;
o age > 25 is redundant because age > 30 already satisfies it.

(B) Cost-Based Optimization (CBO)


Cost-Based Optimization (CBO) analyzes different query execution plans and
selects the most efficient one based on cost estimation.

Components of CBO
• Statistics Collection – The optimizer uses statistics such as:
o Table size
o Indexes
o Data distribution (histograms)
o Number of rows
• Selectivity Estimation – Determines the fraction of rows that meet a condition.
• Cost Estimation – Computes the cost of different execution plans using:
o I/O cost (disk reads and writes)
o CPU cost (time to process tuples)
o Memory cost (amount of RAM used)

Example of Cost-Based Optimization


Query:
SELECT * FROM orders WHERE customer_id = 101;
• If customer_id is indexed, the optimizer prefers an index scan over a full table scan.

3. Query Execution Strategies (Algorithms)


(A) Table Access Methods
1. Full Table Scan (Sequential Scan)
o Reads the entire table.
o Used when: No index is present or when fetching a large portion of data.
2. Index Scan
o Uses an index to locate relevant rows quickly.
o Used when: A highly selective condition exists (WHERE customer_id = 101).
3. Index Seek
o Efficiently finds specific values in a B-tree index.
o Used when: Exact matches or range scans (BETWEEN queries).

(B) Join Algorithms


1. Nested Loop Join
o Algorithm: Iterates through each row in the first table and compares it with
each row in the second table.
o Efficiency: Slow for large tables.
o Example:
SELECT e.name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id;
o If employees has 10,000 rows and departments has 100 rows, this results in
10,000 × 100 = 1,000,000 comparisons.
2. Hash Join
o Algorithm:
▪ Builds a hash table from one table.
▪ Probes the hash table using rows from the second table.
o Best for: Large datasets.
o Example:
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;
o If customers is smaller, it is used to build a hash table for quick lookups.
3. Merge Join
o Algorithm:
▪ Sorts both tables on the join column.
▪ Merges them efficiently in a single pass.
o Best for: Pre-sorted datasets.
o Example:
SELECT * FROM students s
JOIN marks m ON s.student_id = m.student_id;
o If both tables are sorted by student_id, a merge join is the fastest approach.

(C) Sorting Algorithms


1. External Merge Sort
o Used for large datasets that don’t fit into memory.
o Sorts data in chunks, then merges them.
2. Index Sorting
o If an index exists on the sorting column, the optimizer avoids sorting
manually.
Example:
SELECT * FROM products ORDER BY price;
• If an index exists on price, sorting is instant.
(D) Aggregation Algorithms
1. Hash Aggregation
o Uses a hash table to store grouped values.
o Efficient for large datasets.
2. Sort-Based Aggregation
o Sorts the dataset first, then applies aggregation.
Example:
SELECT department_id, COUNT(*) FROM employees GROUP BY department_id;
• If department_id is indexed, hash aggregation is faster.

4. Query Optimization Techniques


(A) Index Optimization
• Use B-tree indexes for range queries (BETWEEN).
• Use bitmap indexes for low-cardinality columns (gender).
(B) Partitioning
• Split large tables into partitions for faster access.
Example:
PARTITION BY RANGE (order_date);
• Queries for order_date > '2024-01-01' scan only one partition.
(C) Caching & Materialized Views
• Frequently used queries can be cached or stored as materialized views.
Example:
CREATE MATERIALIZED VIEW top_customers AS
SELECT customer_id, COUNT(*) AS order_count
FROM orders GROUP BY customer_id;
• Instead of recomputing the query, the database reads from the view.

5. Query Optimization Workflow


1. Parse SQL query and convert it to relational algebra.
2. Apply heuristic optimizations (predicate pushdown, projection).
3. Generate multiple execution plans.
4. Estimate cost for each plan.
5. Choose the lowest-cost plan.
6. Execute the query.

You might also like