Unit-3 RDBMS-1
Unit-3 RDBMS-1
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Intersection(∩)
5. Set Difference(-)
6. Cartesian Product(X)
7. Rename(ρ)
1. Selection(σ) :
▪ Selection Operation is basically used to filter out rows from a given
table based on certain given condition. It basically allows you to
retrieve only those rows that match the condition as per condition
passed during SQL Query.
▪ It is used to select required tuples of the relations.
Notation: σ p(r)
Where:
σ is used for selection prediction
r is used for relation
p is used as a propositional logic formula which may use connectors like: AND OR and NOT.
These relational can use as relational operators like =, ≠, ≥, <, >, ≤.
For example: LOAN Relation
Input:
σ BRANCH_NAME="perryride" (LOAN)
Output:
2. Projection(π) :
While Selection operation works on rows , similarly projection operation of relational
algebra works on columns. It basically allows you to pick specific columns from a
given relational table based on the given condition and ignoring all the other
remaining columns.
Where
A1, A2, A3 is used as an attribute name of relation r.
Example: CUSTOMER RELATION
NAME STREET CITY
Input:
∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
3. Union(U) :
Union Operator is basically used to combine the results of two queries into a single
result. The only condition is that both queries must return same number of columns
with same data types.
Notation: R ∪ S
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection(∩) :
Set Intersection basically allows to fetches only those rows of data that are common
between two sets of relational tables.
Notation: R ∩ S
Jones
5. Set Difference(-) :
Set difference basically provides the rows that are present in one table , but not in
another tables.
Notation: R - S
Input:
∏ CUSTOMER_NAME (BORROW) - ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cross Product(X) :
Cartesian product Operator combines every row of one table with every row of
another table , producing all the possible combination. It’s mostly used as a precursor
to more complex operation like joins.
Notation: E X D
Example:
EMPLOYEE
EMP_ID EMP_NAME EMP_DEPT
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename(ρ) :
Rename operator basically allows you to give a temporary name to a specific
relational table or to its columns. It is very useful when we want to avoid ambiguity,
especially in complex Queries.
Example: We can use the rename operator to rename STUDENT relation to
STUDENT1.
ρ(STUDENT1, STUDENT)
Equivalence Rules
The equivalence rule says that expressions of two forms are the same or equivalent
because both expressions produce the same outputs on any legal database instance. It
means that we can possibly replace the expression of the first form with that of the
second form and replace the expression of the second form with an expression of the
first form.
Thus, the optimizer of the query-evaluation plan uses such an equivalence rule or
method for transforming expressions into the logically equivalent one.
The optimizer uses various equivalence rules on relational-algebra expressions for
transforming the relational expressions. For describing each rule, we will use the
following symbols:
Rule 4: We can combine the selections with Cartesian products as well as theta joins
In the theta associativity, θ2 involves the attributes from E2 and E3 only. There may
be chances of empty conditions, and thereby it concludes that Cartesian Product is
also associative.
Under two following conditions, the selection operation gets distributed over the
theta-join operation:
a) When all attributes in the selection condition θ0 include only attributes of one of the
expressions which are being joined.
σθ0 (E1 ⋈ θ E2) = (σθ0 (E1)) ⋈ θ E2
b) When the selection condition θ1 involves the attributes of E1 only, and θ2 includes
the attributes of E2 only.
Under two following conditions, the selection operation gets distributed over the
theta-join operation:
E1 υ E2 = E2 υ E1
E1 ꓵ E2 = E2 ꓵ E1
Rule 10: Distribution of selection operation on the intersection, union, and set
difference operations.
The below expression shows the distribution performed over the set difference
operation.
σp (E1 − E2) = σp(E1) − σp(E2)
We can similarly distribute the selection operation on υ and ꓵ by replacing with -.
Further, we get:
Rule 11: Distribution of the projection operation over the union operation.
This rule states that we can distribute the projection operation on the union operation
for the given expressions.
Apart from these discussed equivalence rules, there are various other equivalence
rules also.
Example Tables
Let's consider two tables:
1. Employees Table
employee_id name department_id
1 Alice 101
2 Bob 102
3 Charlie NULL
4 David 104
2. Departments Table
department_id department_name
101 HR
102 IT
103 Finance
1. INNER JOIN
Definition:
• Returns only the matching rows from both tables.
• Rows without a match are excluded.
Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
INNER JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name
1 Alice HR
2 Bob IT
Explanation:
• Alice’s department_id = 101 matches with HR.
• Bob’s department_id = 102 matches with IT.
• Charlie and David are excluded because their department_id does not match any row
in Departments.
Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
LEFT JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name
1 Alice HR
2 Bob IT
3 Charlie NULL
4 David NULL
Explanation:
• Alice and Bob have matching department_id values, so they get their department
names.
• Charlie and David do not have a matching department_id, so NULL is returned for
department_name.
Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
RIGHT JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name
1 Alice HR
2 Bob IT
Explanation:
• Alice and Bob have matching department_id values, so their names appear.
• The Finance department has no matching employees, so NULL appears for
employee_id and name.
Query:
SELECT employees.employee_id, employees.name, departments.department_name
FROM employees
FULL JOIN departments
ON employees.department_id = departments.department_id;
Output:
employee_id name department_name
1 Alice HR
2 Bob IT
3 Charlie NULL
4 David NULL
Explanation:
• Alice and Bob have matching department_id values, so they appear normally.
• Charlie and David have no department, so NULL is returned for department_name.
• The Finance department has no employees, so NULL is returned for employee_id and
name.
Components of optimizers
Let's discuss each one by one:
Transformer: It takes parsed query as input which is represented by set of query
blocks. It determines that if it is advantageous to change the form of the query to
reduce the cost of execution.
Estimator: It determines the over all cost of execution plan. This estimator uses three
different measures to determine cost which includes:
• Selectivity: It is defined as a fraction of rows from a row set.
• Cardinality: It is defined as the number of rows returned by each operation in
executed plans.
• Cost: IT defines the estimated resource consumption for a plan.
To estimate cost, optimizer uses following factors:
• System resources (CPU, Memory and I/O)
• Cardinality
• Size of initial data set
Plan Generator: It explores various plans for query block by checking various access
paths, join methods and join orders. After checking various paths, optimizer picks the
path with the lowest cost.
Components of CBO
• Statistics Collection – The optimizer uses statistics such as:
o Table size
o Indexes
o Data distribution (histograms)
o Number of rows
• Selectivity Estimation – Determines the fraction of rows that meet a condition.
• Cost Estimation – Computes the cost of different execution plans using:
o I/O cost (disk reads and writes)
o CPU cost (time to process tuples)
o Memory cost (amount of RAM used)