0% found this document useful (0 votes)
5 views

Module - 1

Query processing involves extracting data from a database through steps such as parsing, optimization, and evaluation, converting high-level queries into low-level expressions. SQL is commonly used for query language, which is translated into relational algebra for execution. Various algorithms, including join and sorting algorithms, are employed to efficiently execute query operations and optimize performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module - 1

Query processing involves extracting data from a database through steps such as parsing, optimization, and evaluation, converting high-level queries into low-level expressions. SQL is commonly used for query language, which is translated into relational algebra for execution. Various algorithms, including join and sorting algorithms, are employed to efficiently execute query operations and optimize performance.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 94

ADBMS

Dr. Nitish Kumar Ojha


Query Processing
• Query Processing is the activity performed in extracting data from the
database.

• In query processing, it takes various steps for fetching the data from the
database.

• The steps involved are:


1.Parsing and translation
2.Optimization
3.Evaluation
• High-level queries are converted into low-level expressions during query
processing.

• It is a methodical procedure that can be applied at the physical level of the file
system, during query optimization, and when the query is actually executed to
obtain the result.

• It needs a basic understanding of relational algebra and file organization. It


includes the variety of tasks involved in getting data out of the database.

• It consists of converting high-level database language queries into expressions


that can be used at the file system’s physical level.
Parsing and Translation

• As query processing includes certain activities for data retrieval. Initially, the
given user queries get translated in high-level database languages such as
SQL.

• It gets translated into expressions that can be further used at the physical level
of the file system.
• After this, the actual evaluation of the queries and a variety of query -
optimizing transformations and takes place. Thus before processing a query, a
computer system needs to translate the query into a human-readable and
understandable language.

• Consequently, SQL or Structured Query Language is the best suitable choice


for humans. But, it is not perfectly suitable for the internal representation of
the query to the system.
• When a user executes any query, for generating the internal form of the query,
the parser in the system checks the syntax of the query, verifies the name of
the relation in the database, the tuple, and finally the required attribute value.

• The parser creates a tree of the query, known as 'parse-tree.' Further, translate
it into the form of relational algebra.

• With this, it evenly replaces all the use of the views when used in the query.
• Suppose a user executes a query. As we have learned that there are various
methods of extracting the data from the database.

• In SQL, a user wants to fetch the records of the employees whose salary is
greater than or equal to 10000. For doing this, the following query is
undertaken:
• select emp_name from Employee where salary>10000;

• Thus, to make the system understand the user query, it needs to be translated in
the form of relational algebra. We can bring this query in the relational algebra
form as:
• σsalary>10000 (πsalary (Employee))
• πsalary (σsalary>10000 (Employee))
• σsalary>10000 (πsalary (Employee))
• πsalary (σsalary>10000 (Employee))

• After translating the given query, we can execute each relational algebra
operation by using different algorithms.

• So, in this way, a query processing begins its working.


Translating SQL Queries into Relational Algebra

• In practice, SQL is the query language that is used in most commercial


RDBMSs.

• An SQL query is first translated into an equivalent extended relational algebra


expression—represented as a query tree data structure—that is then
optimized.

• Typically, SQL queries are decomposed into query blocks, which form the
basic units that can be translated into the algebraic operators and optimized.
• A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clauses if these are
part of the block.

• Hence, nested queries within a query are identified as separate


query blocks.

• Because SQL includes aggregate operators—such as MAX, MIN,


SUM, and COUNT—these operators must also be included in the
extended algebra
Consider the following SQL query on the EMPLOYEE relation in previous
Figure

SELECT Lname, Fname

FROM EMPLOYEE

WHERE Salary > ( SELECT MAX (Salary)

FROM EMPLOYEE

WHERE Dno=5 );
• The inner block could be translated into the following extended relational
algebra expression:

• The query optimizer would then choose an execution plan for each query
block.
• Notice that in the above example, the inner block needs to be evaluated only
once to produce the maximum salary of employees in department 5, which is
then used—as the constant c—by the outer block.

• We called this a nested query (without correlation with the outer query)
Basic Algorithms for executing query operations
• There are several algorithms used to execute query operations, including
relational algebra, join algorithms, and sorting algorithms.

• Relational algebra
• A formal framework for defining operations on relational databases
• Used to manipulate and retrieve data based on user queries
• Common operations include selection, projection, join, and aggregation
Join algorithms
•Used to determine the most efficient way to join two tables together
•Examples of join algorithms include nested loop joins, hash joins, and merge joins.

•Sorting algorithms
• Used to sort a dataset
• Algorithms like Order By divide the dataset into smaller parts, sort them individually, and then merge
them back together .

•Query decomposition
• Involves rewriting a calculus query in a normalized form
• Normalization involves manipulating the query quantifiers and qualification by applying logical
operators

•Transforming query trees


• Involves rearranging or applying certain rules to the query tree to perform a more efficient execution
• This is done during optimization to reduce the cost of executing the query
Join algorithms in Database
• There are two algorithms to compute natural join and conditional
join of two relations in database: Nested loop join, and Block
nested loop join.

• To understand these algorithms we will assume there are two


relations, relation R and relation S. Relation R has T R tuples and
occupies BR blocks. Relation S has TS tuples and occupies BS
blocks. We will also assume relation R is the outer relation and S
is the inner relation.
Nested Loop Join
• In the nested loop join algorithm, for each tuple in outer relation, we have to
compare it with all the tuples in the inner relation then only the next tuple of
outer relation is considered.

• All pairs of tuples which satisfy the condition are added in the result of the
join.
• This algorithm is called nested join because it consists of nested for loops.
• Join operation:
A join operation combines data from two or more tables based on a common
column or columns. The join operation is performed using the JOIN keyword in
SQL, and it returns a single result set that contains columns from all the tables
involved in the join.
• For example, let’s say we have two tables, Table1 and Table2, with the following
data:
• Table1
ID | Name
1 | John
2 | Sarah
3 | David
• Table2
ID | Address
1 | 123 Main St.
2 | 456 Elm St.
4 | 789 Oak St.
• If we want to combine the data from these two tables based on the ID column, we can
perform an inner join using the following SQL query:
• SELECT Table1.ID, Table1.Name, Table2.Address
FROM Table1
INNER JOIN Table2
ON Table1.ID = Table2.ID
• Result:
ID | Name | Address
1 | John | 123 Main St.
2 | Sarah | 456 Elm St.
• If we want to retrieve the names of the people who have an address
in Table2, we can use a nested query as follows:
• SELECT Name
FROM Table1
WHERE ID IN (SELECT ID FROM Table2)
• Result:
Name
John
Sarah
• In this case, the nested query is executed first, and it returns the ID
values of the rows in Table2. These ID values are then used to
evaluate the outer query, which retrieves the names of the people
who have those ID values in Table1.
• The choice between using a join operation or a nested query depends on
the specific requirements of the task at hand.

• Joins are often faster and more efficient for large datasets, but nested
queries can be more flexible and allow for more complex conditions to be
evaluated.
Linear search (brute force)
• Linear search is a brute-force approach where elements in the list or array are
sequentially checked from the beginning to the end until the desired
element is found. The algorithm compares each element with the target value
until a match is found or the entire list has been traversed.

• Linear search can be used to search for the smallest or largest value in an
unsorted list rather than searching for a match. It can do so by keeping track of
the largest (or smallest) value and updating as necessary as the algorithm
iterates through the dataset.
• Working of Linear Search Algorithm in Data Structures

• Let's suppose we need to find element 6 in the given array or list. We will
work according to the above-given algorithm.

• Start from the first element, and compare the key=6 with each element x.
Implementation of Linear Search Algorithm in Different Programming
Languages
Query Tree and Query Graph
• Once the alternative access paths for computation of a relational algebra
expression are derived, the optimal access path is determined.

• In a centralized system, query processing is done with the following aim −

• Minimization of response time of query (time taken to produce the results to user’s query).

• Maximize system throughput (the number of requests that are processed in a given amount of time).

• Reduce the amount of memory and storage required for processing.

• Increase parallelism.
Query Parsing and Translation -

• Initially, the SQL query is scanned. Then it is parsed to look for syntactical
errors and correctness of data types.

• If the query passes this step, the query is decomposed into smaller query
blocks. Each block is then translated to equivalent relational algebra
expression.
Steps for Query Optimization -
• Step 1 − Query Tree Generation

• A query tree is a tree data structure representing a relational algebra


expression. The tables of the query are represented as leaf nodes. The
relational algebra operations are represented as the internal nodes. The root
represents the query as a whole.

• During execution, an internal node is executed whenever its operand tables


are available. The node is then replaced by the result table. This process
continues for all internal nodes until the root node is executed and replaced
by the result table.
• Step 2: This process continues until we reach the root node, where we
PROJECT (π) the required tuples as the output based on the given conditions.

• Step 1: Write the relations you want to execute as the tree’s Leaf nodes. Here
R and S are the relations.
• Step 2: Add the condition (here R.P = S.P) with the relational algebra operator
as an internal node (or parent node of these two leaf nodes).

JOIN R and S where R.P = S.P

• Step 3: Now add the root node that on execution gives the output of the query.
• Example 2: Suppose we have a query:
• For every project located at ‘Stanford’, list the project number (Pnumber), the
controlling department number (Dnum), and the department manager’s last
name (Lname), address (Address), and birth date (Bdate).
• Step 1: We will begin with executing the first leaf node PROJECT, and the
corresponding internal node σPlocation = ‘Stanford’ as we need these resulting tuples
to execute the next operation.
• Step 2: Similarly, we will execute the leaf node DEPARTMENT and the
intermediate/internal node ⋈ Dnum=Dnumber so that we can move to the next
operation.
• Step 3: We execute the next operation with the leaf node EMPLOYEE and
intermediate node ⋈ Mgr_ssn=Ssn.
• Step 4: Now add the root node i.e., πPnumber, Dnum, Lname, Address,
Bdate to get the output of the query on execution.
• Features of Query Tree in Relational Algebra –

1.Hierarchical Structure: Query trees organize relational algebra operations into a


hierarchical structure, making it easier to understand the sequence of operations in a query.

2.Visualization: A query trees provides a visual representation of relational algebra


expressions, which helps in debugging queries and understanding query execution plans.

3.Optimization Potential: Query trees allow database systems to apply optimization


techniques such as reordering operations or using alternative access paths to improve query
performance.
• Advantages of Query Tree in Relational Algebra –

1.Optimization: Query trees enable query optimization by allowing the


database system to explore different execution plans, potentially speeding up
query execution time.

2.Modularity: Query trees break complex queries into smaller and more
manageable components. Which can facilitate the optimization process and
make it easier to reason about query execution.

3.Flexible Optimization Strategies: Database systems can use query trees to


implement various optimization strategies, such as join reordering, predicate
pushdown, and index selection, to improve query performance.
• Disadvantages of Query Tree in Relational Algebra –

1.Complexity: Query trees can be complex, especially for large and complex
queries. Managing and optimizing these trees can require significant
computational resources and sophisticated optimization algorithms.

2.Overhead: Building and traversing the query tree imposes overhead on query
processing. Although this overhead is usually negligible for small queries, it
can be significant for larger queries with many functions.

3.Limited Optimizations: Despite their ability to optimize, query trees may not
always yield significant performance improvements. In some cases, the
overhead associated with optimization may exceed the performance gain
achieved through query optimization.
• The performance of a query plan is determined largely by the order in
which the tables are joined.

• For example, when joining 3 tables A, B, C of size 10 rows, 10,000 rows, and
1,000,000 rows, respectively, a query plan that joins B and C first can take
several orders-of-magnitude more time to execute than one that joins A and C
first.

• Most query optimizers determine join order via a dynamic programming


algorithm pioneered by IBM's System R database project
• The major reasons for SQL Query Optimizations are:

• Enhancing Performance: The main reason for SQL Query Optimization is to


reduce the response time and enhance the performance of the query. The time
difference between request and response needs to be minimized for a better
user experience.

• Reduced Execution Time: The SQL query optimization ensures reduced


CPU time hence faster results are obtained. Further, it is ensured that websites
respond quickly and there are no significant lags.

• Enhances the Efficiency: Query optimization reduces the time spend on


hardware and thus servers run efficiently with lower power and memory
consumption.
• Best Practices For SQL Query Optimization –

1. Use Indexes
2. Use WHERE Clause instead of having
3. Avoid Queries inside a Loop
4. Use Select instead of Select *
5. Add Explain to the Beginning of Queries
6. Keep Wild cards at the End of Phrases
7. Use Exist() instead of Count()
8. Avoid Cartesian Products
9. Consider Denormalization
10. Optimize JOIN Operations
Thanks
Functional dependencies in DBMS
• In relational database management, functional dependency is a
concept that specifies the relationship between two sets of
attributes where one attribute determines the value of another
attribute.

• It is denoted as X → Y, where the attribute set on the left side of


the arrow, X is called Determinant, and Y is called the Dependent.
• What is Functional Dependency?

• A functional dependency occurs when one attribute uniquely determines another


attribute within a relation. It is a constraint that describes how attributes in a
table relate to each other.

• If attribute A functionally determines attribute B we write this as the A→B.

• Functional dependencies are used to mathematically express relations among


database entities and are very important to understanding advanced concepts in
Relational Database Systems.
• From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building }→ Here, roll_no can determine values of fields name, dept_name and dept_building,
hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building}, it can determine its subset
dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately, since departments with different dept_name will also
have a different dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building}, etc.
• Here are some invalid functional dependencies:
• name → dept_name Students with the same name can have different dept_name, hence this is not
a valid functional dependency.
• dept_building → dept_name There can be multiple departments in the same building. Example,
in the above table departments ME and EC are in the same building B2, hence dept_building →
dept_name is an invalid functional dependency.
• More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
• Types of Functional Dependencies in DBMS –

1.Trivial functional dependency

2.Non-Trivial functional dependency

3.Multivalued functional dependency

4.Transitive functional dependency


For each bike model (bike_model):

1. There is a group of colors (color) and a group of manufacturing years (manuf_year).


2. The colors do not depend on the manufacturing year, and the manufacturing year does not
depend on the colors.
4. They are independent.
5. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
• Functional dependency is very important concept in database
management system for ensuring the data consistency and
accuracy.

• In this lecture, we have discuss what is the concept behind


functional dependencies and why they are important.

• The valid and invalid functional dependencies and the types


of most important functional dependencies in RDBMS.

• We have also discussed about the advantages of FDs.


Normalization
• Normalization is the process of efficiently organizing data in a database with
two goals in mind.

• First goal: eliminate redundant data.


• for example, storing the same data in more than one table.

• Second Goal: ensure data dependencies make sense.


• for example, only storing related data in a table
Benefits of Normalization
• Less storage space
• Quicker updates
• Less data inconsistency
• Clearer data relationships
• Easier to add data
• Flexible Structure
Normalization
Normalization
We discuss four normal forms: first, second, third, and Boyce-Codd
normal forms
1NF, 2NF, 3NF, and BCNF

Normalization is a process that “improves” a database design by


generating relations that are of higher normal forms.

The objective of normalization:


“to create relations where every dependency is on the key, the whole key,
and nothing but the key”.
91.2914 68
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest

Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.

91.2914 69
Normalization

1NF
a relation in BCNF, is also in 3NF

2NF a relation in 3NF is also in 2NF

3NF a relation in 2NF is also in 1NF

BCNF

91.2914 70
Normalization
We consider a relation in BCNF to be fully normalized.

The benefit of higher normal forms is that update semantics for the affected data are
simplified.

This means that applications required to maintain the database are simpler.

A design that has a lower normal form than another design has more redundancy.
Uncontrolled redundancy can lead to data integrity problems.

First we introduce the concept of functional dependency.

May 2005 91.2914 71


Functional Dependencies
Functional Dependencies
We say an attribute, B, has a functional dependency on another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two records must be the same.

We illustrate this as:

AB

Example: Suppose we keep track of employee email addresses, and we only track one email address for each employee.
Suppose each employee is identified by their unique employee number.

We say there is a functional dependency of email address on employee number:

employee number  email address

91.2914 72
Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 [email protected] John Doe
456 [email protected] Peter Smith
555 [email protected] Alan Lee
633 [email protected] Peter Doe
787 [email protected] Alan Lee
If EmpNum is the PK then the FDs:
EmpNum  EmpEmail
EmpNum  EmpFname
EmpNum  EmpLname

must exist.

91.2914 73
Functional Dependencies
EmpNum  EmpEmail
EmpNum  EmpFname 3 different ways
EmpNum  EmpLname you might see FDs
depicted

EmpEmail
EmpNum EmpFname

EmpLname

EmpNum EmpEmail EmpFname


EmpLname

91.2914 74
Determinant
Functional Dependency

EmpNum  EmpEmail

Attribute on the LHS is known as the determinant


• EmpNum is a determinant of EmpEmail

91.2914 75
Transitive dependency
Transitive dependency

Consider attributes A, B, and C, and where


A  B and B  C.
Functional dependencies are transitive, which means that we
also have the functional dependency A  C
We say that C is transitively dependent on A through B.

91.2914 76
First Normal Form
First Normal Form
We say a relation is in 1NF if all values stored in the relation are single-valued and atomic.

1NF places restrictions on the structure of relations. Values must be simple.

91.2914 77
First Normal Form
The following in not in 1NF

EmpNum EmpPhone EmpDegrees


123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

EmpDegrees is a multi-valued field:


employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD

91.2914 78
First Normal Form
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

To obtain 1NF relations we must, without loss of information, replace the above with two
relations - see next slide

91.2914 79
First Normal Form
EmployeeDegree
Employee
EmpNum EmpDegre
EmpNum EmpPhone e
333 BA
123 233-9876
333 BSc
333 233-1231
333 PhD
679 233-1231
679 BSc

679 MSc

An outer join between Employee and EmployeeDegree will produce the information we saw before

91.2914 80
Second Normal Form
Second Normal Form
A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on each
candidate key. (That is, we don’t have any partial functional dependency.)

• 2NF (and 3NF) both involve the concepts of key and non-key attributes.

• A key attribute is any attribute that is part of a key; any attribute that is not a key
attribute, is a non-key attribute.

• Relations that are not in BCNF have data redundancies

• A relation in 2NF will not have any partial dependencies

91.2914 81
Second Normal Form
Third Normal Form
Third Normal Form
• A relation is in 3NF if the relation is in 1NF and all determinants of non-key attributes are
candidate keys
That is, for any functional dependency: X  Y, where Y is a non-key attribute (or a set of non-key
attributes), X is a candidate key.
• This definition of 3NF differs from BCNF only in the specification of non-key attributes - 3NF is
weaker than BCNF. (BCNF requires all determinants to be candidate keys.)
• A relation in 3NF will not have any transitive dependencies
of non-key attribute on a candidate key through another non-key attribute.

91.2914 83
Third Normal Form
EmpNum EmpName DeptNum DeptName

We correct the situation by decomposing the original relation


into two 3NF relations. Note the decomposition is lossless.

EmpNum EmpName DeptNum DeptNum DeptName

Verify these two relations are in 3NF.

91.2914 84
Second Normal Form:
Each column must depend on the *entire* primary key.
Third Normal Form:
Each column must depend on *directly* on the
primary key.
Boyce-Codd Normal Form
(BCNF)
Boyce-Codd normal form (BCNF)
A relation is in BCNF, if and only if, every determinant is a
candidate key.

The difference between 3NF and BCNF is that for a functional


dependency A  B, 3NF allows this dependency in a relation
if B is a primary-key attribute and A is not a candidate key,

whereas BCNF insists that for this dependency to remain in a


relation, A must be a candidate key.
ClientInterview
ClientN interviewTim
interviewDate staffNo roomNo
o e
CR76 13-May-02 10.30 SG5 G101

CR76 13-May-02 12.00 SG5 G101

CR74 13-May-02 12.00 SG37 G102

CR56 1-Jul-02 10.30 SG5 G102

 FD1 clientNo, interviewDate  interviewTime, staffNo, roomNo (Primary Key)

 FD2 staffNo, interviewDate, interviewTime clientNo (Candidate key)

 FD3 roomNo, interviewDate, interviewTime  clientNo, staffNo (Candidate key)

 FD4 staffNo, interviewDate  roomNo (not a candidate key)

 As a consequece the ClientInterview relation may suffer from update anmalies.

 For example, two tuples have to be updated if the roomNo need be changed for staffNo SG5 on the 13-
May-02.
Example of BCNF(2)
To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by
creating two new relations called Interview and StaffRoom as shown below,

Interview (clientNo, interviewDate, interviewTime, staffNo)


StaffRoom(staffNo, interviewDate, roomNo)

Interview
ClientNo interviewDate interviewTime staffNo
CR76 13-May-02 10.30 SG5
CR76 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-Jul-02 10.30 SG5

StaffRoom
staffNo interviewDate roomNo
SG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-Jul-02 G102

BCNF Interview and StaffRoom relations


Another BCNF Example

Example taken from Dr. Lee’s 2004 lecture notes


Sources:
 http://www.troubleshooters.com/littstip/ltnorm.html
 http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/nor
malisation/3nf.html
 Dr. Lee’s Fall 2004 lecture notes
e t e d
pl
Co m
l a s s
xt C
Ne
Thanks

You might also like