Dbms Chapter 3&4
Dbms Chapter 3&4
Syllabus :
1. Overview
2. The form of basic SQL Query :UNION,INTERSECT &EXCEPT
3. Join operations :Equi join and Non Equi join
4. Nested Queries : correlated and uncorrelated
5. Aggregate functions
6. Null values
7. Views
8. Triggers
1. Overview :
What is SQL?
Why SQL?
When executing an SQL command for any RDBMS, the system determines the best way to
carry out your request, and the SQL engine figures out how to interpret the task.
There are various components included in this process. These components are −
• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine will not handle
logical files. Following is a simple diagram showing the SQL Architecture –
SQL Basic Commands: refer chapter 1 database langages
In SQL, set operators are used to combine the results of two or more SELECT statements into
a single result set. The commonly used set operators are UNION, INTERSECT, and EXCEPT
(or MINUS in some databases). These operators allow you to perform operations on sets of
rows rather than individual rows. Here's an overview of SQL set operators.
➢ Result sets of all the queries must be the same number of columns.
➢ In all result sets the data type of each of the columns must be well matched and
compatible with the data type of its corresponding columns in another result set.
➢ For sorting the result, the ORDER BY clause can be applied to the last query.
Example :
Speakers Authors
UNION
➢ Union combines the results of two queries into a single result set of all matching
rows.
➢ Both queries must have the same number of columns and compatible data types to
unite.
➢ All duplicate records are removed automatically unless UNION ALL is used.
➢ Generally, it can be useful in applications where tables are not perfectly normalized,
for example, a data warehouse application.
Synthax :
Example-1:
You want to invite all the Speakers and Authors for the annual conference. Hence, how will
you prepare the invitation list?
Synthax :
union
order by name
output :
As you can see here, the default order is ascending order and you have to use in the last query
instead of both queries.
UNION ALL
Example-2
You want to give a prize to all the Speakers and Authors at the annual conference. Hence,
how will you prepare the prize list?
Syntax:
union all
order by name
output
INTERSECT
It is used to take the result of two queries and returns the only those rows which are
common in both result sets. It removes duplicate records from the final result set.
Syntax
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;
Example-3
You want the list of people who are Speakers and they are also Authors. Hence, how will you
prepare such a list?
Syntax:
intersect
order by name
Output:
EXCEPT :
It is used to take the distinct records of two one query and returns the only those rows which
do not appear in the second result set.
Syntax :
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;
Example-4
You want the list of people who are only Speakers and they are not Authors. Hence, how will
you prepare such a list?
Syntax :
select name from Speakers
except
order by name
OUTTPUT
Example-5
You want the list of people who are only Authors and they are not Speakers. Hence, how will
you prepare such a list?
Syntax :
select name from Authors
except
order by name
output :
➢ UNION combines results from both tables.
➢ UNION ALL combines two or more result sets into a single set, including all
duplicate rows.
➢ INTERSECT takes the rows from both the result sets which are common in both.
➢ EXCEPT takes the rows from the first result data but does not in the second result
set.
3. Join operations
A SQL Join statement combines data or rows from two or more tables based on a common
field between them.
➢ SQL Joins are mostly used when a user is trying to extricate data from multiple
tables (which have one-to-many or many-to-many relationships with each other) at
one time.
➢ Large databases are often prone to data redundancy, i.e., the creation of repetitive
data anomalies by insertion, deletion, and updation. But by using SQL Joins, we
promote database normalization, which reduces data redundancy and eliminates
redundant data.
➢ In SQL joins are mainly two types they are INNER JOINS and OUTER JOINS.
Join can represented by using below symbol.
JOINS
THETA JOIN NATURAL JOIN LEFT JOIN RIGHT JOIN FULL OUTER
(Non-Equi join) (Equi-Join)
JOIN
CONDITION
Equi-Join: An equi-join is a type of join where the matching condition is based on equality
between columns from different tables. It matches rows where the specified columns have the
same values.
➢ the Equi Join in SQL returns only the data in all the tables we are comparing based on
the common column field. It does not display null or unmatchable data.
➢ The equality operator in the Equi Join operation is used to refer to the equality in
the WHERE clause. However, it returns the same result when we use
the JOIN keyword with the ON clause along with column names and their respective
tables
Example :
SELECT *
OR
SELECT *
FROM TableName1
JOIN TableName2
ON TableName1.ColumnName = TableName2.ColumnName;
➢ An equi join is any JOIN operation that uses only and only the equals sign. If there
is a query with more than one join condition, out of which one condition has an
equals sign, and the other doesn't, then this query would be considered a non-equi
join in SQL.
Non-equi join :
➢ A non-equi join, also known as a range join or a theta join, is a type of join operation
where the joining condition involves operators other than equality, such as greater
than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), or
not equal to (!= or <>).
➢ Non-Equi Join is also a type of INNER Join in which we need to retrieve data from
multiple tables. In a non-equi join, rows are matched based on a range of values
rather than a direct equality. Non-equi joins are less common and are often used to
solve specific data analysis problems.
SELECT *
FROM TableName1, TableName2
WHERE TableName1.columnName [> | < | >= | <= | != | BETWEEN ] table_name2.column;
EQUI-Join Example:
Suppose we have two tables, namely state and city, which contain the name of the states and
the name of the cities, respectively. In this example, we will map the cities with the states in
which they are present.
Table City :
Now, if we execute a query of Equi-join using the equality operation and the WHERE clause,
then
SELECT *
FROM state, city
WHERE state.State_Id = city.City_Id;
Output
2 Uttarakhand 2 Dehradun
2 Uttarakhand 2 Rishikesh
3 Madhyapradesh 3 Gwalior
Now, if we execute a query of Non-Equi-join using any operator other than the equality
operator, such as >(greater than) with the WHERE clause –
SELECT *
FROM test1,test2
WHERE test1.SNo > test2.SNo;
Output
SELECT *
FROM table1
ON table1.column = table2.column;
NATURAL JOIN
➢ SQL Natural Join is a type of Inner join based on the condition that columns having
the same name and datatype are present in both the tables to be joined.
SELECT *
FROM table-1
SQL Outer joins give both matched and unmatched rows of data depending on the type of outer
joins. These types are outer joins are sub-divided into the following types:
➢ A left join returns all the rows from the left table and the matching rows from the
right table. If there are no matching rows in the right table, NULL values are
included for the columns of the right table. The syntax for a left join is as follows:
SELECT *
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
A right join returns all the rows from the right table and the matching rows from the left table.
If there are no matching rows in the left table, NULL values are included for the columns of
the left table. The syntax for a right join is as follows:
SELECT *
FROM table1
ON table1.column = table2.column;
A full join returns all the rows from both the left and right tables. If there are no matching rows
in either table, NULL values are included for the columns of the non-matching table.
SELECT *
FROM table1
ON table1.column = table2.column;
Theta join :
In SQL, a theta join, also known as a non-equi join or a range join, is a type of join operation
where the joining condition involves comparison operators other than equality (=).
The syntax for a theta join typically involves using the JOIN keyword followed by the tables
being joined and the join condition with the desired comparison operator(s). Here's an example:
SELECT *
FROM table1
JOIN table2
Examples :
Customer Orders
1. Inner join :
SELECT Orders.Or_Id ,Customers.Cust_Name
FROM Orders
INNER JOIN Customers
ON Orders.Cust_Id=Customers.Id;
Out put:
Or_Id Cust_Name
601 Ram
603 Raj
2. LEFT JOIN
Right Join
4. Nested Queries:
➢ A nested query in SQL contains a query inside another query. The outer query will
use the result of the inner query. For instance, a nested query can have
two SELECT statements, one on the inner query and the other on the outer query.
In independent nested queries, the execution order is from the innermost query to the outer
query. An outer query won't be executed until its inner query completes its execution. The outer
query uses the result of the inner query. Operators such as IN, NOT IN, ALL, and ANY are
used to write independent nested queries.
➢ If a subquery does use any refrences from outer query then it would be called as
Independent Subquery.
➢ The IN operator checks if a column value in the outer query's result is present in
the inner query's result. The final result will have rows that satisfy the IN condition.
➢ The NOT IN operator checks if a column value in the outer query's result is not
present in the inner query's result. The final result will have rows that satisfy
the NOT IN condition.
➢ The ALL operator compares a value of the outer query's result with all the
values of the inner query's result and returns the row if it matches all the values.
➢ The ANY operator compares a value of the outer query's result with all the inner
query's result values and returns the row if there is a match with any value.
In co-related nested queries, the inner query uses the values from the outer query to execute
the inner query for every row processed by the outer query. The co-related nested queries run
slowly because the inner query is executed for every row of the outer query's result.
Syntax:
SELECT column1, column2, ....
FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 = outer.expr2);
Examples :
We will use the Employees and Awards table below to understand independent and co-related
nested queries. We will be using Oracle SQL syntax in our queries.
);
);
Example 1: IN
Output:
Example 2: NOT IN
Output :
Example 3: ALL
➢ Select all Developers who earn more than all the Managers
);
Output :
Example 4: ANY
Output :
➢ Select all employees whose salary is above the average salary of employees in their
role.
SELECT AVG(salary)
);
Output :
An aggregate function in SQL performs a calculation on multiple values and returns a single
value. SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An
aggregate function ignores NULL values when it performs the calculation, except for the count
function.
1. COUNT() Function
The COUNT() aggregate function returns the total number of rows from a database table that
matches the defined criteria in the SQL query.
Syntax
COUNT(*) OR COUNT(COLUMN_NAME)
Example :
The given table named EMP_DATA consists of data concerning 10 employees working in the
same organization in different departments.
1. Suppose you want to know the total number of employees working in the organization.
You can do so by the below-given query.
As COUNT(*) returns the total number of rows and the table named EMP_DATA provided
above consists of 10 rows, so the COUNT(*) function returns 10. The output is printed as
shown below.
Output: 10
Note: Except for COUNT(*), all other SQL aggregate functions ignore NULL values.
2. Suppose you need to count the number of people who are getting a salary. The query
given below can help you achieve this.
Output : 9
Here, the Salary column is passed as a parameter to the COUNT() function, and hence, this
query returns the number of non NULL values from the column Salary, i.e. 9.
3. Suppose you need to count the number of distinct departments present in the
organization. The following query can help you achieve this.
Output: 3
The above query returns the total number of distinct non NULL values over the column
Department i.e. 3 (Marketing, Production, R&D). The DISTINCT keyword makes sure that
only non-repetitive values are counted.
4. What if you want to calculate the number of people whose salaries are more than a
given amount(say 70,000)? Check out the example below.
Output : 5
The query returns the number of rows where the salary of the employee is greater than or equal
to 70,000 i.e 5.
2. SUM() Function
The SUM() function takes the name of the column as an argument and returns the sum of all
the non NULL values in that column. It works only on numeric fields(i.e the columns contain
only numeric values). When applied to columns containing both non-numeric(ex - strings) and
numeric values, only numeric values are considered. If no numeric values are present, the
function returns 0.
Syntax:
The function name is SUM() and the name of the column to be considered is passed as an
argument to the function.
SUM(COLUMN_NAME)
Example:
1. Suppose you need to build a budget for the organization and you need to know the total
amount needed to provide salaries to all the employees. To calculate the sum of all the
values present in column Salary. You can refer to the below-given example.
Output :646000
The above-mentioned query returns the sum of all non-NULL values over the column Salary
i.e 80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 = 646000
2. What if you need to consider only distinct salaries? The following query will help you
achieve that.
Output : 430000
The DISTINCT keyword makes sure that only non-repetitive values are considered. The query
returns the sum of all distinct non NULL values over the column Salary i.e. 80000 + 76000 +
84000 + 64000 + 60000 + 66000 = 430000.
3. Suppose you need to know the collective salaries for each department(say Marketing).
The query given below can help you achieve this.
Output :160000
The query returns the sum of salaries of employees who are working in the Marketing
Department i.e 80000 + 80000 = 160000.
Note: There are 3 rows consisting of Marketing as Department value but the third value is a
NULL value. Thus, the sum is returned considering only the first two entries having Marketing
as Department.
3.AVG() Function
The AVG() aggregate function uses the name of the column as an argument and returns the
average of all the non NULL values in that column. It works only on numeric fields(i.e the
columns contain only numeric values).
Note: When applied to columns containing both non-numeric (ex - strings) and numeric values,
only numeric values are considered. If no numeric values are present, the function returns 0.
Syntax:
The function name is AVG() and the name of the column to be considered is passed as an
argument to the function.
AVG(COLUMN_NAME)
Example:
Output : 71777.77777
Here, the column name Salary is passed as an argument and thus the values present in column
Salary are considered. The above query returns the average of all non NULL values present in
the Salary column of the table.
Average = (80000 + 76000 + 76000 + 84000 + 80000 + 64000 + 60000 + 60000 + 66000 ) / 9
= 646000 / 9 = 71777.77777
2. If you need to consider only distinct salaries, the following query will help you out.
Output : 71666.66666
The query returns the average of all non NULL distinct values present in the Salary column of
the table.
The MIN() function takes the name of the column as an argument and returns the minimum
value present in the column. MIN() returns NULL when no row is selected.
Syntax:
The function name is MIN() and the name of the column to be considered is passed as an
argument to the function.
MIN(COLUMN_NAME)
Example;
1. Suppose you want to find out what is the minimum salary that is provided by the
organization. The MIN() function can be used here with the column name as an
argument.
SELECT MIN(Salary) FROM EMP_DATA;
Output :60000
The query returns the minimum value of all the values present in the mentioned column i.e
60000.
2. Suppose you need to know the minimum salary of an employee belonging to the
Production department. The following query will help you achieve that.
Output:60000
The query returns the minimum value of all the values present in the mentioned column and
has Production as Department value i.e 60000.
5. MAX() Function
The MAX() function takes the name of the column as an argument and returns the maximum
value present in the column. MAX() returns NULL when no row is selected.
Syntax:
The function name is MAX() and the name of the column to be considered is passed as an
argument to the function.
MAX(COLUMN_NAME)
Example :
1. Suppose you want to find out what is the maximum salary that is provided by the
organization. The MAX() function can be used here with the column name as an
argument.
Output : 84000
The query returns the maximum value of all the values present in the mentioned column i.e
84000.
2. Suppose you need to know the maximum salary of an employee belonging to the R&D
department. The following query will help you achieve that.
Output : 84000
The query returns the maximum value of all the values present in the mentioned column and
has R&D as Department value i.e 84000.
6. Null values
➢ In SQL, NULL represents the absence of a value. It is used to indicate that a data
point does not have a value or that the value is unknown or undefined. Here are
some important points to understand about handling NULL values in SQL:
➢ NULL is not the same as an empty string or zero. It is a distinct value that signifies
the absence of a value.
➢ NULL values can be used in columns of any data type, including numeric, string,
date, and other data types.
➢ When performing comparisons involving NULL values, the result is always
unknown (neither true nor false). Therefore, you cannot use standard equality
operators like = or <> to compare NULL values.
➢ To check for NULL values in SQL, you use the IS NULL or IS NOT NULL
operators. For example:
SELECT * FROM table_name WHERE column_name IS NULL;
➢ When performing calculations involving NULL values, the result is usually NULL.
However, some database systems have specific behaviors when NULL values are
involved in calculations, so it's important to consult the documentation for your
specific database management system (DBMS) to understand its behavior.
➢ When inserting or updating data in a table, you can explicitly set a column to NULL
if you want to represent the absence of a value. For example:
➢ NULL values can also be used in joins and filtering conditions. For example, you
can include NULL values in a result set using a LEFT JOIN.
➢ It's important to handle NULL values appropriately in your SQL queries to ensure
accurate and reliable data processing. Be aware of any specific behavior and
handling of NULL values in your chosen database system, as it can vary between
different database management systems.
7. Triggers
➢ In SQL, a trigger is a database object that is associated with a table and
automatically executes a set of actions in response to certain database events, such
as INSERT, UPDATE, or DELETE operations on the table. Triggers are useful for
enforcing data integrity rules, auditing changes, maintaining derived data, or
implementing complex business logic within the database. Here's an overview of
SQL trigger
➢ Syntax: The basic syntax to create a trigger in SQL is as follows:
CREATE TRIGGER trigger_name
{BEFORE | AFTER}
{INSERT | UPDATE | DELETE}
ON table_name
[FOR EACH ROW]
[WHEN (condition)]
BEGIN
-- Trigger actions here (trigger body)
END;
Types of Triggers
DML Triggers
After Triggers
➢ These triggers execute after the database has processed a specified event (such as
an INSERT, UPDATE, or DELETE statement). AFTER triggers are commonly used
to perform additional processing or auditing tasks after a data modification has
occurred.
Instead Triggers
➢ These triggers are used for views and fire instead of the DML statement (INSERT,
UPDATE, DELETE) on the view.
DDL Triggers
➢ These triggers fire in response to data definition language (DDL) statements like
CREATE, ALTER, or DROP.
LOGON Triggers
LOGOFF Triggers
SERVERERROR Triggers
EXAMPLE :
Let’s take an example. Let’s assume a student table with column id, first_name, last_name,
and full_name.
Query 1:
Query 2:
Query 3:
Query 4:
/* Display all the records from the table */
OUTPUT
8. Views in SQL
Refer in Chapter 2.
Chapter 4 : DEPENDENCIES AND NORMAL FORMS
Syllabus:
Schema: A database schema is a blueprint that represents the tables and relations of a data set.
Good database schema design is essential to making your data tractable so that you can make
sense of it and build the dashboards, reports, and data models that you need.
A good schema design is crucial for the efficient and effective management of data in a database
system. It plays a fundamental role in determining how data is organized, stored, and retrieved,
and impacts the overall performance, scalability, and maintainability of the system. Here are
some key reasons why a good schema design is important
Data organization: A well-designed schema helps in structuring data in a logical and organized
manner. It defines the tables, relationships, and constraints that govern the data model, ensuring
data integrity and consistency. This organization facilitates easy navigation and understanding
of the data, making it more manageable and accessible.
Query performance: The schema design significantly impacts the performance of database
queries. By properly structuring tables, defining appropriate indexes, and optimizing data
types, a good schema design can enhance query execution speed and minimize resource
consumption. Efficient query performance leads to faster response times and improved overall
system performance.
Data integrity and consistency: A good schema design enforces data integrity and ensures
consistency. By defining appropriate constraints, such as primary keys, foreign keys, unique
constraints, and check constraints, it prevents the insertion of invalid or inconsistent data. This
helps maintain data quality and reliability throughout the system.
Scalability: A well-designed schema allows for easy scalability as the volume and complexity
of data grow. By considering future requirements and potential expansion, a good schema
design can accommodate evolving needs without significant rework or performance
degradation. This scalability is crucial for applications and systems that need to handle
increasing data loads over time.
Maintainability and extensibility: A good schema design simplifies the maintenance and
evolution of the database system. It provides a solid foundation for making changes and
additions to the schema without causing disruptions or data inconsistencies. A well-designed
schema also allows for seamless integration with new features or modules, making the system
more extensible and adaptable to future enhancements.
Data analysis and reporting: A well-designed schema facilitates effective data analysis and
reporting. By structuring data in a way that aligns with the analytical needs of the system, a
good schema design enables efficient querying, aggregation, and summarization of data. This,
in turn, supports decision-making processes and enables the extraction of meaningful insights
from the data
In summary, a good schema design is essential for data organization, query performance, data
integrity, scalability, maintainability, and data analysis. It is a foundational element in the
design and implementation of a robust and efficient database system.
Bad schema designs can lead to several problems that can hinder the efficient management and
utilization of data in a database system. Here are some common problems encountered with
bad schema designs:
➢ Poor query performance: A bad schema design can result in slow and inefficient query
performance. This can be due to a lack of proper indexing, inappropriate data types, or
inefficient table relationships. Slow queries can negatively impact the overall system
performance and user experience.
➢ Data redundancy and inconsistency: Inadequate schema designs can lead to data
redundancy and inconsistency. Redundant data takes up unnecessary storage space and
can cause data integrity issues when updates or modifications are made. Inconsistent
data, such as conflicting values or duplicate records, can lead to inaccurate results and
unreliable information.
➢ Difficulty in data maintenance: Bad schema designs can make data maintenance
challenging and error-prone. Without proper constraints and relationships, it becomes
harder to enforce data integrity and ensure consistent updates. This can lead to data
corruption, data loss, or difficulties in updating and modifying data in a controlled and
reliable manner.
➢ Lack of scalability: A poorly designed schema may lack scalability, making it difficult
to accommodate future growth and evolving data requirements. This can result in
performance degradation and the need for extensive schema modifications when the
system needs to handle increased data volumes or changes in data structure.
➢ Limited flexibility and extensibility: Bad schema designs can restrict the flexibility
and extensibility of the database system. It may be challenging to add new features or
modify existing ones without significant schema changes. This can lead to increased
development time, complexity, and potential disruptions to the system.
➢ Data analysis and reporting challenges: Inefficient schema designs can make data
analysis and reporting difficult. Poorly organized data, lack of appropriate relationships,
or inconsistent naming conventions can hinder the extraction of meaningful insights
from the data. This can limit the effectiveness of decision-making processes and hinder
the overall value derived from the data.
➢ Increased development and maintenance costs: Bad schema designs can result in
higher development and maintenance costs. Fixing or modifying a poorly designed
schema requires significant effort and resources. It may involve rewriting queries,
restructuring tables, or migrating data, which can be time-consuming and error-prone.
In summary, bad schema designs can lead to poor query performance, data redundancy and
inconsistency, difficulties in data maintenance, limited scalability and flexibility, challenges in
data analysis and reporting, and increased development and maintenance costs. It is crucial to
invest time and effort in designing a well-thought-out schema to avoid these problems and
ensure the efficient management of data in a database system
3. FUNCTIONAL DEPENDENCY
For any relation R, attribute Y is functionally dependent on attribute X(usually the Primary
key), It is denoted by X -> Y, where X is called a determinant and Y is called dependent.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Emp_Id → Emp_Name
William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.
Secondary Rules
1. Union– Union rule says, if X determines Y and X determines Z, then X must also
determine Y and Z. If X → Y and X → Z then X → YZ
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z.
➢ The Closure of Functional Dependency means the complete set of all possible
attributes that can be functionally derived from given functional dependency using the
inference rules known as Armstrong’s Rules.
➢ If “F” is a functional dependency then closure of functional dependency can be
denoted using “{F}+ ”.
There are three steps to calculate closure of functional dependency. These are:
➢ Step-1 : Add the attributes which are present on Left Hand Side in the original
functional dependency.
➢ Step-2 : Now, add the attributes present on the Right Hand Side of the functional
dependency.
➢ Step-3 : With the help of attributes present on Right Hand Side, check the other
attributes that can be derived from the other given functional dependencies. Repeat this
process until all the possible attributes which can be derived are added in the closure.
{Roll_no}+ = {Roll_No}
➢ Step-2 : Add attributes present on the RHS of the original functional dependency to
the closure.
Step-3 : Add the other possible attributes which can be derived using attributes present on the
RHS of the closure. So Roll_No attribute cannot functionally determine any attribute but Name
attribute can determine other attributes such as Marks and Location using 2nd Functional
Dependency
Similarly, we can calculate closure for other attributes too i.e “Name”.
Step-1 : Add attributes present on the LHS of the functional dependency to the closure.
{Name}+ = {Name}
Step-2 : Add the attributes present on the RHS of the functional dependency to the closure.
➢ Step-3 : Since, we don’t have any functional dependency where “Marks or Location”
attribute is functionally determining any other attribute , we cannot add more attributes
to the closure. Hence complete closure of Name would be :
➢ NOTE : We don’t have any Functional dependency where marks and location can
functionally determine any attribute. Hence, for those attributes we can only add the
attributes themselves in their closures. Therefore,
➢ {Marks}+ = {Marks}
➢ {Location}+ = { Location}
Example :
In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D} find closure
of each attribute.
{A}+ = {ABCD}
{B}+ = {BCD}
{C}+ = {CD}
{D}+ = {D}
Here attribute A have all attributes have in their closure , so it is a candidate key of relation.
Example :
In a relation R(ABCD) ,given functional dependencies {A->B , B->C , C->D, D->A} find
closure of each attribute.
{A}+ = {ABCD}
{B}+ = {BCDA}
{C}+ = {CDAB}
{D}+ = {DABC}
6. MINIMAL COVERS:
➢ A minimal cover is a simplified and reduced version of the given set of functional
dependencies.
Since it is a reduced version, it is also called as Irreducible set.
It is also called as Canonical Cover.
Characteristics :
Canonical cover is called minimal cover which is called the minimum set of FDs. A set of FD
is called canonical cover of F if each FD in
➢ Simple FD.
➢ Left reduced FD.
➢ Non-redundant FD.
Need :
• Working with the set containing extraneous functional dependencies increases the
computation time.
• Therefore, the given set is reduced by eliminating the useless functional
dependencies.
• This reduces the computation time and working with the irreducible set becomes
easier.
➢ Consider each functional dependency one by one from the set obtained in Step-01.
➢ Determine whether it is essential or non-essential.
To determine whether a functional dependency is essential or not, compute the closure of its
left side-
• Once by considering that the particular functional dependency is present in the set
• Once by considering that the particular functional dependency is not present in the set
• Consider the newly obtained set of functional dependencies after performing Step-02.
• Check if there is any functional dependency that contains more than one attribute on
its left side.
Then following two cases are possible-
Case-01: No-
• There exists no functional dependency containing more than one attribute on its
left side.
• In this case, the set obtained in Step-02 is the canonical cover.
Case-02: Yes-
• There exists at least one functional dependency containing more than one attribute on
its left side.
• In this case, consider all such functional dependencies one by one.
• Check if their left side can be reduced.
•
PRACTICE PROBLEM BASED ON FINDING CANONICAL COVER-
Problem-
The following functional dependencies hold true for the relational scheme R ( W , X , Y , Z )
–
X→W
WZ → XY
Y → WXZ
Write the irreducible equivalent for this set of functional dependencies.
Solution-
Step-01:
Write all the functional dependencies such that each contains exactly one attribute on its right
side-
X→W
WZ → X
WZ → Y
Y→W
Y→X
Y→Z
Step-02:
Check the essentiality of each functional dependency one by one.
For X → W:
• ConsideringX → W, (X)+ = { X , W }
• Ignoring X → W, (X)+ = { X }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that X → W is essential and can not be eliminated.
For WZ → X:
For Y → W:
• Considering Y → W, (Y)+ = { W , X , Y , Z }
• Ignoring Y → W, (Y)+ = { W , X , Y , Z }
Now,
• Clearly,
the two results are same.
• Thus, we conclude that Y → W is non-essential and can be eliminated.
Eliminating Y → W, our set of functional dependencies reduces to-
X→W
WZ → Y
Y→X
Y→Z
For Y → X:
• Considering Y → X, (Y)+ = { W , X , Y , Z }
• Ignoring Y → X, (Y)+ = { Y , Z }
Now,
• Clearly, the two results are different.
• Thus, we conclude that Y → X is essential and can not be eliminated.
For Y → Z:
• Considering Y → Z, (Y)+ = { W , X , Y , Z }
• Ignoring Y → Z, (Y)+ = { W , X , Y }
Now,
• Clearly,
the two results are different.
• Thus, we conclude that Y → Z is essential and can not be eliminated.
From here, our essential functional dependencies are-
X→W
WZ → Y
Y→X
Y→Z
Step-03:
• Consider the functional dependencies having more than one attribute on their left side.
• Check if their left side can be reduced.
In our set,
• Only WZ → Y contains more than one attribute on its left side.
• Considering WZ → Y, (WZ)+ = { W , X , Y , Z }
Now,
• Consider all the possible subsets of WZ.
• Check if the closure result of any subset matches to the closure result of WZ.
(W)+ = { W }
(Z)+ = { Z }
Clearly,
• None of the subsets have the same closure result same as that of the entire left side.
• Thus, we conclude that we can not write WZ → Y as W → Y or Z → Y.
• Thus, set of functional dependencies obtained in step-02 is the canonical cover.
Example 2:
Minimize {A->C, AC->D, E->H, E->AD}
7. NORMALIZATION
Normalization is the process of organizing the data and the attributes of a database. It is
performed to reduce the data redundancy in a database and to ensure that data is stored
logically.
➢ It helps to divide large database tables into smaller tables and make a relationship
between them. It can remove the redundant data and ease to add, manipulate or delete
table fields.
➢ Data redundancy in DBMS means having the same data but at multiple places.
➢ It is necessary to remove data redundancy because it causes anomalies in a database
which makes it very hard for a database administrator to maintain it.
➢ A normal form is a process that evaluates each relation against defined criteria and
removes the multi valued, joins, functional and trivial dependency from a relation.
The motivation for normal forms in database design is to eliminate data redundancy and
anomalies, ensure data integrity, and promote efficient data management.
Normal forms provide guidelines and principles for structuring the database schema to achieve
these objectives. Here are some key motivations for normal forms
➢ Eliminate data redundancy: Redundant data occurs when the same information is
repeated across multiple records or tables. This redundancy wastes storage space and
can lead to inconsistencies when updating or modifying data. Normal forms help
identify and eliminate redundant data by organizing data into separate tables based on
their functional dependencies.
➢ Prevent update anomalies: Update anomalies occur when modifying data results in
inconsistencies or unintended changes. For example, if the same data is stored in
multiple places and not all instances are updated correctly, inconsistencies can arise.
Normal forms help prevent these anomalies by ensuring that data is stored in a way that
allows for easy and controlled updates without introducing inconsistencies.
➢ Maintain data integrity: Data integrity refers to the accuracy, validity, and consistency
of data. Normal forms help enforce data integrity by defining appropriate constraints,
such as primary keys, foreign keys, and entity relationships. These constraints ensure
that data is correctly and consistently represented, preventing the insertion of invalid or
inconsistent data.
➢ Simplify data management and maintenance: Normal forms provide guidelines for
organizing data in a logical and structured manner. By following these guidelines,
database management and maintenance tasks become more manageable and less prone
to errors. Normalized schemas are typically easier to understand, navigate, and modify,
reducing the complexity and effort required for data management activities
➢ Support efficient query processing: Normal forms can contribute to improved query
performance and efficiency. By reducing data redundancy and organizing data based on
functional dependencies, normalized schemas allow for more efficient retrieval and
manipulation of data. Well-designed indexes and relationships based on normal forms
can speed up query execution and improve overall system performance.
➢ Facilitate data integration and interoperability: Normalized schemas provide a
standardized and consistent way of representing data. This promotes data integration
and interoperability across different systems and applications. By adhering to normal
forms, databases can easily exchange and share data without conflicts or
inconsistencies, enabling seamless integration and collaboration.
➢ Adapt to evolving data requirements: Normal forms provide a foundation for a flexible
and extensible database design. By organizing data based on functional dependencies and
avoiding data anomalies, normalized schemas can adapt to changing data requirements
without significant schema modifications. This scalability and flexibility are crucial for
accommodating future data growth and evolving business needs
➢ In summary, the motivations for normal forms in database design are to eliminate data
redundancy and anomalies, ensure data integrity, simplify data management, support
efficient query processing, facilitate data integration, and adapt to evolving data
requirements. By following normal forms, database designers can create well-structured
and efficient database schemas that promote reliable and effective data management
Insertion anomalies: This occurs when we are not able to insert data into a database because
some attributes may be missing at the time of insertion.
Updation anomalies: This occurs when the same data items are repeated with the same
values and are not linked to each other.
Deletion anomalies: This occurs when deleting one part of the data deletes the other
necessary information from the database.
8.NORMAL FORMS :
The process of normalization helps us divide a larger table in the database into various smaller
tables and then link their using relationships. Normal forms are basically useful for reducing
the overall redundancy (repeating data) from the tables present in a database, so as to ensure
logical storage.
There are four types of normal forms that are usually used in relational databases as you can
see in the following figure:
➢ Here, the Corse content is a multi-valued attribute. So, this relation is not in 1NF.
➢ We re-arrange the relation (table) as below, to convert it to First Normal Form.
To convert this table into 1NF, we make new rows with each Course Content as a new row as
shown below
For a relational table to be in second normal form, it must satisfy the following rules:
If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.
Example : Student_Project relation
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.
But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second
Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
• X -> Y
• Y does not -> X
• Y -> Z
For a relational table to be in third normal form, it must satisfy the following rules:
If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.
Example :
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is
a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there
exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows –
The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and
remove redundancy caused by that. However, there may still exist some dependencies that
cause redundancy in the database. These redundancies are removed by a more strict normal
form known as BCNF.
For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:
The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into three tables:
Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.
9.DECOMPOSITIONS AND DESIRABLE PROPERTIES
➢ A relation in BCNF is free of redundancy and a relation schema in 3NF comes close. If
a relation schema is not in one of these normal forms, the FDs that cause a violation
can give us insight into the potential problems.
➢ A decomposition of a relation schema R consists of replacing the relation schema by
two (or more) relation schemas that each contain a subset of the attributes of R and
together include all attributes in R.
➢ When a relation in the relational model is not appropriate normal form then the
decomposition of a relation is required. In a database, breaking down the table into
multiple tables termed as decomposition.
Attribute Preservation:
➢ Using functional dependencies the algorithms decompose the universal relation schema
R in a set of relation schemas D = { R1, R2, ….. Rn } relational database schema, where
‘D’ is called the Decomposition of R.
➢ The attributes in R will appear in at least one relation schema Ri in the decomposition,
i.e., no attribute is lost. This is called the Attribute Preservation condition of
decomposition.
Dependency Preservation:
For example, suppose there is a relation R (A, B, C, D) with functional dependency set
(A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of relation R1(ABC).
3Lossless Join:
➢ Lossless join property is a feature of decomposition supported by normalization. It is
the ability to ensure that any instance of the original relation can be identified from
corresponding instances in the smaller relations.
➢ The relation is said to be lossless decomposition if natural joins of all the decomposition
give the original relation.
➢ X intersection Y -> X, that is: all attributes common to both X and Y functionally
determine ALL the attributes in X.
➢ X intersection Y -> Y, that is: all attributes common to both X and Y functionally
determine ALL the attributes in Y
➢ If X intersection Y forms a super key of either X or Y, the decomposition of R is a
lossless decomposition.