0% found this document useful (0 votes)
44 views58 pages

Unit 3 DBMS

SQL, or Structured Query Language, is the standard language for managing and manipulating data in relational database systems, utilized by various RDMS like MySQL and Oracle. It encompasses several command categories including DDL, DML, DCL, and TCL, which facilitate tasks such as data definition, manipulation, control, and transaction management. Additionally, SQL supports constraints to enforce data integrity and triggers for automatic execution of specific actions in response to database events.

Uploaded by

mamatha.vodije
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views58 pages

Unit 3 DBMS

SQL, or Structured Query Language, is the standard language for managing and manipulating data in relational database systems, utilized by various RDMS like MySQL and Oracle. It encompasses several command categories including DDL, DML, DCL, and TCL, which facilitate tasks such as data definition, manipulation, control, and transaction management. Additionally, SQL supports constraints to enforce data integrity and triggers for automatic execution of specific actions in response to database events.

Uploaded by

mamatha.vodije
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

UNIT -III

What is SQL?
SQL is Structured Query Language, which is a computer language for storing,
manipulating and retrieving data stored in a relational database.

SQL is the standard language for Relational Database System. All the Relational
Database Management Systems (RDMS) like MySQL, MS Access, Oracle,
Sybase, Informix, Postgres and SQL Server use SQL as their standard database
language.

Also, they are using different dialects, such as −

• MS SQL Server using T-SQL,


• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native format) etc.

Why SQL?
SQL is widely popular because it offers the following advantages −

• Allows users to access data in the relational database management systems.


• Allows users to describe the data.
• Allows users to define the data in a database and manipulate that data.
• Allows to embed within other languages using SQL modules, libraries & pre-
compilers.
• Allows users to create and drop databases and tables.
• Allows users to create view, stored procedure, functions in a database.
• Allows users to set permissions on tables, procedures and views.

A Brief History of SQL


• 1970 − Dr. Edgar F. "Ted" Codd of IBM is known as the father of relational
databases. He described a relational model for databases.
• 1974 − Structured Query Language appeared.
• 1978 − IBM worked to develop Codd's ideas and released a product named
System/R.
• 1986 − IBM developed the first prototype of relational database and
standardized by ANSI. The first relational database was released by
Relational Software which later came to be known as Oracle.
SQL Process
When you are executing an SQL command for any RDBMS, the system
determines the best way to carry out your request and SQL engine figures out how
to interpret the task.

There are various components included in this process.

These components are −

• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.

A classic query engine handles all the non-SQL queries, but a SQL query engine
won't handle logical files.

Following is a simple diagram showing the SQL Architecture −

SQL Commands
The standard SQL commands to interact with relational databases are CREATE,
SELECT, INSERT, UPDATE, DELETE and DROP. These commands can be
classified into the following groups based on their nature −

DDL - Data Definition Language


These SQL commands are mainly categorized into five categories:
1. DDL – Data Definition Language
2. DQL – Data Query Language
3. DML – Data Manipulation Language
4. DCL – Data Control Language
5. TCL – Transaction Control Language
Now, we will see all of these in detail.

DDL (Data Definition Language)


DDL or Data Definition Language actually consists of the SQL commands that can
be used to define the database schema. It simply deals with descriptions of the
database schema and is used to create and modify the structure of database objects
in the database.
DDL is a set of SQL commands used to create, modify, and delete database
structures but not data. These commands are normally not used by a general user,
who should be accessing the database via an application.
List of DDL Commands
Some DDL commands and their syntax are:
Command Description Syntax

Create database or its CREATE TABLE


objects (table, index, table_name (column1
CREATE data_type, column2
function, views, store
procedure, and triggers) data_type, ...);

Delete objects from the DROP TABLE


DROP table_name;
database

ALTER TABLE
Alter the structure of the table_name ADD
ALTER COLUMN column_name
database
data_type;

Remove all records from


a table, including all TRUNCATE TABLE
TRUNCATE table_name;
spaces allocated for the
records are removed

COMMENT
Add comments to the
COMMENT 'comment_text' ON
data dictionary TABLE table_name;

RENAME TABLE
Rename an object
RENAME old_table_name TO
existing in the database new_table_name;

DQL (Data Query Language)


DQL statements are used for performing queries on the data within schema objects.
The purpose of the DQL Command is to get some schema relation based on the
query passed to it. We can define DQL as follows it is a component of SQL
statement that allows getting data from the database and imposing order upon it. It
includes the SELECT statement.
This command allows getting the data out of the database to perform operations
with it. When a SELECT is fired against a table or tables the result is compiled into
a further temporary table, which is displayed or perhaps received by the program
i.e. a front-end.
DQL Command
There is only one DQL command in SQL i.e.
Command Description Syntax

It is used to retrieve data SELECT column1, column2, ...FROM


SELECT table_name<br>WHERE condition;
from the database

DML(Data Manipulation Language)


The SQL commands that deal with the manipulation of data present in the database
belong to DML or Data Manipulation Language and this includes most of the SQL
statements.
It is the component of the SQL statement that controls access to data and to the
database. Basically, DCL statements are grouped with DML statements.
List of DML commands
Some DML commands and their syntax are:
Command Description Syntax

INSERT INTO table_name (column1, column2,


INSERT Insert data into a table ...) VALUES (value1, value2, ...);

Update existing data UPDATE table_name SET column1 = value1,


UPDATE column2 = value2 WHERE condition;
within a table

Delete records from a


DELETE DELETE FROM table_name WHERE condition;
database table

Table control
LOCK LOCK TABLE table_name IN lock_mode;
concurrency

Call a PL/SQL or
CALL CALL procedure_name(arguments);
JAVA subprogram

EXPLAIN Describe the access EXPLAIN PLAN FOR SELECT * FROM


PLAN path to data table_name;

DCL (Data Control Language)


DCL includes commands such as GRANT and REVOKE which mainly deal with
the rights, permissions, and other controls of the database system.
List of DCL commands:
Two important DCL commands and their syntax are:
Command Description Syntax

Assigns new privileges


to a user account, GRANT privilege_type
[(column_list)] ON
allowing access to
GRANT [object_type] object_name
specific database TO user [WITH GRANT
objects, actions, or OPTION];
functions.

Removes previously REVOKE [GRANT


granted privileges from a OPTION FOR]
user account, taking privilege_type
REVOKE [(column_list)] ON
away their access to
certain database objects [object_type] object_name
or actions. FROM user [CASCADE];

TCL (Transaction Control Language)


Transactions group a set of tasks into a single execution unit. Each transaction
begins with a specific task and ends when all the tasks in the group are successfully
completed. If any of the tasks fail, the transaction fails.
Therefore, a transaction has only two results: success or failure. You can explore
more about transactions here. Hence, the following TCL commands are used to
control the execution of a transaction:
List of TCL Commands
Some TCL commands and their syntax are:
Command Description Syntax

BEGIN BEGIN TRANSACTION


Starts a new transaction [transaction_name];
TRANSACTION

Saves all changes made


COMMIT COMMIT;
during the transaction

Undoes all changes made


ROLLBACK ROLLBACK;
during the transaction

Creates a savepoint within


SAVEPOINT SAVEPOINT savepoint_name;
the current transaction

Important SQL Commands


Some of the most important SQL commands are:
1. SELECT: Used to retrieve data from a database.
2. INSERT: Used to add new data to a database.
3. UPDATE: Used to modify existing data in a database.
4. DELETE: Used to remove data from a database.
5. CREATE TABLE: Used to create a new table in a database.
6. ALTER TABLE: Used to modify the structure of an existing table.
7. DROP TABLE: Used to delete an entire table from a database.
8. WHERE: Used to filter rows based on a specified condition.
9. ORDER BY: Used to sort the result set in ascending or descending
order.
10. JOIN: Used to combine rows from two or more tables based on a
related column between them.
SQL Commands With Examples
The examples demonstrates how to use an SQL command. Here is the list of
popular SQL commands with Examples.
SQL
Command Example

SELECT SELECT * FROM employees;

INSERT INTO employees (first_name, last_name, email) VALUES ('John',


INSERT 'Doe', '[email protected]');

UPDATE employees SET email = '[email protected]' WHERE


UPDATE first_name = 'Jane' AND last_name = 'Doe';

DELETE DELETE FROM employees WHERE employee_id = 123;

CREATE CREATE TABLE employees ( employee_id INT PRIMARY KEY,


TABLE first_name VARCHAR(50), last_name VARCHAR(50));

ALTER
ALTER TABLE employees ADD COLUMN phone VARCHAR(20);
TABLE

DROP
DROP TABLE employees;
TABLE

WHERE SELECT * FROM employees WHERE department = 'Sales';

ORDER BY SELECT * FROM employees ORDER BY hire_date DESC;


SQL
Command Example

SELECT e.first_name, e.last_name, d.department_name FROM employees e


JOIN JOIN departments d ON e.department_id = d.department_id;

These are common examples of some important SQL commands. The examples
provide better understanding of the SQL commands and teaches correct way to
use them.
Conclusion
SQL commands are the foundation of an effective database management system.
Whether you are manipulating data, or managing data, SQL provides all sets of
tools. Now, with this detailed guide, we hope you have gained a deep
understanding of SQL commands, their categories, and syntax with examples.
SQL | Constraints
Constraints are the rules that we can apply on the type of data in a table. That is,
we can specify the limit on the type of data that can be stored in a particular
column in a table using constraints.
The available constraints in SQL are:

• NOT NULL: This constraint tells that we cannot store a null value in a
column. That is, if a column is specified as NOT NULL then we will
not be able to store null in this particular column any more.
• UNIQUE: This constraint when specified with a column, tells that all
the values in the column must be unique. That is, the values in any row
of a column must not be repeated.
• PRIMARY KEY: A primary key is a field which can uniquely identify
each row in a table. And this constraint is used to specify a field in a
table as primary key.
• FOREIGN KEY: A Foreign key is a field which can uniquely identify
each row in a another table. And this constraint is used to specify a field
as Foreign key.
• CHECK: This constraint helps to validate the values of a column to
meet a particular condition. That is, it helps to ensure that the value
stored in a column meets a specific condition.
• DEFAULT: This constraint specifies a default value for the column
when no value is specified by the user.
How to specify constraints?
We can specify constraints at the time of creating the table using CREATE
TABLE statement. We can also specify the constraints after creating a table using
ALTER TABLE statement.
Syntax:
Below is the syntax to create constraints using CREATE TABLE statement at the
time of creating the table.

CREATE TABLE sample_table


(
column1 data_type(size) constraint_name,
column2 data_type(size) constraint_name,
column3 data_type(size) constraint_name,
....
);

sample_table: Name of the table to be created.


data_type: Type of data that can be stored in the field.
constraint_name: Name of the constraint. for example- NOT NULL, UNIQUE,
PRIMARY KEY etc.
Let us see each of the constraint in detail.

1. NOT NULL –
If we specify a field in a table to be NOT NULL. Then the field will never accept
null value. That is, you will be not allowed to insert a new row in the table
without specifying any value to this field.
For example, the below query creates a table Student with the fields ID and
NAME as NOT NULL. That is, we are bound to specify values for these two
fields every time we wish to insert a new row.

CREATE TABLE Student


(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
ADDRESS varchar(20)
);
2. UNIQUE –
This constraint helps to uniquely identify each row in the table. i.e. for a
particular column, all the rows should have unique values. We can have more
than one UNIQUE columns in a table.
For example, the below query creates a table Student where the field ID is
specified as UNIQUE. i.e, no two students can have the same ID. Unique
constraint in detail.

CREATE TABLE Student


(
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20)
);
3. PRIMARY KEY –
Primary Key is a field which uniquely identifies each row in the table. If a field in
a table as primary key, then the field will not be able to contain NULL values as
well as all the rows should have unique values for this field. So, in other words
we can say that this is combination of NOT NULL and UNIQUE constraints.
A table can have only one field as primary key. Below query will create a table
named Student and specifies the field ID as primary key.

CREATE TABLE Student


(
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20),
PRIMARY KEY(ID)
);
4. FOREIGN KEY –
Foreign Key is a field in a table which uniquely identifies each row of a another
table. That is, this field points to primary key of another table. This usually
creates a kind of link between the tables.
Consider the two tables as shown below:

Orders

O_ID ORDER_NO C_ID


1 2253 3
2 3325 3
3 4521 2
4 8532 1

Customers
C_ID NAME ADDRESS
1 RAMESH DELHI
2 SURESH NOIDA
3 DHARMESH GURGAON
As we can see clearly that the field C_ID in Orders table is the primary key in
Customers table, i.e. it uniquely identifies each row in the Customers table.
Therefore, it is a Foreign Key in Orders table.
Syntax:

CREATE TABLE Orders


(
O_ID int NOT NULL,
ORDER_NO int NOT NULL,
C_ID int,
PRIMARY KEY (O_ID),
FOREIGN KEY (C_ID) REFERENCES Customers(C_ID)
)
(i) CHECK –
Using the CHECK constraint we can specify a condition for a field, which should
be satisfied at the time of entering values for this field.
For example, the below query creates a table Student and specifies the condition
for the field AGE as (AGE >= 18 ). That is, the user will not be allowed to enter
any record in the table with AGE < 18. Check constraint in detail

CREATE TABLE Student


(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
AGE int NOT NULL CHECK (AGE >= 18)
);
(ii) DEFAULT –
This constraint is used to provide a default value for the fields. That is, if at the
time of entering new records in the table if the user does not specify any value for
these fields then the default value will be assigned to them.
For example, the below query will create a table named Student and specify the
default value for the field AGE as 18.

CREATE TABLE Student


(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
AGE int DEFAULT 18
);

SQL Trigger | Student Database



••
A trigger is a stored procedure in a database that automatically invokes whenever a
special event in the database occurs. For example, a trigger can be invoked when a
row is inserted into a specified table or when specific table columns are updated. In
simple words, a trigger is a collection of SQL statements with particular names that
are stored in system memory. It belongs to a specific class of stored procedures that
are automatically invoked in response to database server events. Every trigger has a
table attached to it.
Because a trigger cannot be called directly, unlike a stored procedure, it is referred
to as a special procedure. A trigger is automatically called whenever a data
modification event against a table takes place, which is the main distinction between
a trigger and a procedure. On the other hand, a stored procedure must be called
directly.
The following are the key differences between triggers and stored procedures:
1. Triggers cannot be manually invoked or executed.
2. There is no chance that triggers will receive parameters.
3. A transaction cannot be committed or rolled back inside a trigger.
Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]
Explanation of Syntax
1. Create trigger [trigger_name]: Creates or replaces an existing trigger
with the trigger_name.
2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. On [table_name]: This specifies the name of the table associated with the
trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be
executed for each affected row.
6. [trigger_body]: This provides the operation to be performed as the trigger
is fired
Why Do We Employ Triggers?
When we need to carry out some actions automatically in certain desirable scenarios,
triggers will be useful. For instance, we need to be aware of the frequency and timing
of changes to a table that is constantly changing. In such cases, we could create a
trigger to insert the required data into a different table if the primary table underwent
any changes.
Different Trigger Types in SQL Server
Two categories of triggers exist:
1. DDL Trigger
2. DML Trigger
3. Logon Triggers
DDL Triggers
The Data Definition Language (DDL) command events such as Create_table,
Create_view, drop_table, Drop_view, and Alter_table cause the DDL triggers to be
activated.
SQL Server
create trigger safety
on database
for
create_table,alter_table,drop_table
as
print 'you can not create,drop and alter tab
Output:

DML Triggers
The Data uses manipulation Language (DML) command events that begin with
Insert, Update, and Delete set off the DML triggers. corresponding to insert_table,
update_view, and delete_table.
SQL Server
create trigger deep
on emp
for
insert,update ,delete
as
print 'you can not insert,update and delete this table i'
rollback;
Output:

Logon Triggers
logon triggers are fires in response to a LOGON event. When a user session is
created with a SQL Server instance after the authentication process of logging is
finished but before establishing a user session, the LOGON event takes place. As a
result, the PRINT statement messages and any errors generated by the trigger will
all be visible in the SQL Server error log. Authentication errors prevent logon
triggers from being used. These triggers can be used to track login activity or set a
limit on the number of sessions that a given login can have in order to audit and
manage server sessions.
How does SQL Server Show Trigger?
The show or list trigger is useful when we have many databases with many tables.
This query is very useful when the table names are the same across multiple
databases. We can view a list of every trigger available in the SQL Server by using
the command below:
Syntax:
FROM sys.triggers, SELECT name, is_instead_of_trigger
IF type = ‘TR’;
The SQL Server Management Studio makes it very simple to display or list all
triggers that are available for any given table. The following steps will help us
accomplish this:
Go to the Databases menu, select the desired database, and then expand it.
• Select the Tables menu and expand it.
• Select any specific table and expand it.
We will get various options here. When we choose the Triggers option, it displays
all the triggers available in this table.
BEFORE and AFTER Trigger
BEFORE triggers run the trigger action before the triggering statement is run.
AFTER triggers run the trigger action after the triggering statement is run.
Example
Given Student Report Database, in which student marks assessment is recorded. In
such a schema, create a trigger so that the total and percentage of specified marks
are automatically inserted whenever a record is inserted.
Here, a trigger will invoke before the record is inserted so BEFORE Tag can be
used.
Suppose the Database Schema
Query
mysql>>desc Student;

SQL Trigger to the problem statement.

Above SQL statement will create a trigger in the student database in which whenever
subjects marks are entered, before inserting this data into the database, the trigger
will compute those two values and insert them with the entered values. i.e.
Output

In this way, triggers can be created and executed in the databases.


Advantage of Triggers
The benefits of using triggers in SQL Server include the following:
1. Database object rules are established by triggers, which cause changes to
be undone if they are not met.
2. The trigger will examine the data and, if necessary, make changes.
3. We can enforce data integrity thanks to triggers.
4. Data is validated using triggers before being inserted or updated.
5. Triggers assist us in maintaining a records log.
6. Due to the fact that they do not need to be compiled each time they are
run, triggers improve the performance of SQL queries.
7. The client-side code is reduced by triggers, saving time and labor.
8. Trigger maintenance is simple.
Disadvantage of Triggers
The drawbacks of using triggers in SQL Server include the following:
1. Only triggers permit the use of extended validations.
2. Automatic triggers are used, and the user is unaware of when they are
being executed. Consequently, it is difficult to troubleshoot issues that
arise in the database layer.
3. The database server’s overhead may increase as a result of triggers.
4. In a single CREATE TRIGGER statement, we can specify the same
trigger action for multiple user actions, such as INSERT and UPDATE.
5. Only the current database is available for creating triggers, but they can
still make references to objects outside the database.

UNION and UNION ALL, EXCEPT, and INTERSECT

Please read UNION and UNION ALL, EXCEPT, and INTERSECT Operators of this article
series before proceeding to this article. The SET operators are mainly used to combine the result
of more than 1 select statement and return a single result set to the user.
The set operators work on complete rows of queries, so the results of the queries must have the
same column name and column order, and the types of columns must be compatible. There are the
following 4 set operators in SQL Server:
1. UNION: Combine two or more result sets into a single set without duplicates.
2. UNION ALL: Combine two or more result sets into a single set, including all
duplicates.
3. INTERSECT: Takes the data from both result sets, which are in common.
4. EXCEPT: Takes the data from the first result set but not in the second result set
(i.e., no matching to each other)
Rules on Set Operations:
1. The result sets of all queries must have the same number of columns.
2. In every result set, the data type of each column must be compatible (well-
matched) with the data type of its corresponding column in other result sets.
3. An ORDER BY clause should be part of the last select statement to sort the result.
The first select statement must find out the column names or aliases.
Understand the Differences Between These Operators with Examples.
Use the SQL Script to create and populate the two tables we will use in our examples.
CREATE TABLE TableA

ID INT,
Name VARCHAR(50),

Gender VARCHAR(10),

Department VARCHAR(50)

GO

INSERT INTO TableA VALUES(1, 'Pranaya', 'Male','IT')

INSERT INTO TableA VALUES(2, 'Priyanka', 'Female','IT')

INSERT INTO TableA VALUES(3, 'Preety', 'Female','HR')

INSERT INTO TableA VALUES(3, 'Preety', 'Female','HR')

GO
Fetch the records:
SELECT * FROM TableA

CREATE TABLE TableB

ID INT,

Name VARCHAR(50),

Gender VARCHAR(10),

Department VARCHAR(50)

GO

INSERT INTO TableB VALUES(2, 'Priyanka', 'Female','IT')

INSERT INTO TableB VALUES(3, 'Preety', 'Female','HR')

INSERT INTO TableB VALUES(4, 'Anurag', 'Male','IT')


GO
Fetch the records:
SELECT * FROM TableB

UNION Operator:
The Union operator will return all the unique rows from both queries. Notice that the duplicates are
removed from the result set.
SELECT ID, Name, Gender, Department FROM TableA
UNION
SELECT ID, Name, Gender, Department FROM TableB
Result:

Purpose: The UNION operator combines the result sets of two or more SELECT statements into
a single result set.
Distinct Values: It removes duplicate rows between the various SELECT statements.
Use Case: You would use UNION when listing all distinct rows from multiple tables or queries.
UNION ALL Operator:
The UNION ALL operator returns all the rows from both queries, including the duplicates.
SELECT ID, Name, Gender, Department FROM TableA
UNION ALL
SELECT ID, Name, Gender, Department FROM TableB
Result:

INTERSECT Operator:
The INTERSECT operator retrieves the common unique rows from the left and right queries.
Notice the duplicates are removed.
SELECT ID, Name, Gender, Department FROM TableA
INTERSECT
SELECT ID, Name, Gender, Department FROM TableB
Result:

Purpose: The INTERSECT operator returns all rows common to both SELECT statements.
Distinct Values: Like UNION and EXCEPT, INTERSECT also removes duplicates.
Use Case: You would use INTERSECT when you need to find rows that are shared between two
tables or queries.
EXCEPT Operator:
The EXCEPT operator will return unique rows from the left query that aren’t in the right query’s
results.
ADVERTISEMENT

SELECT ID, Name, Gender, Department FROM TableA


EXCEPT
SELECT ID, Name, Gender, Department FROM TableB
Result:

If you want the rows that are present in Table B but not in Table A, reverse the queries.
SELECT ID, Name, Gender, Department FROM TableB
EXCEPT
SELECT ID, Name, Gender, Department FROM TableA
Result:

Purpose: The EXCEPT operator returns all rows from the first SELECT statement that are absent
in the second SELECT statement’s results.
Distinct Values: It automatically removes duplicates.
Use Case: EXCEPT is used when you want to find rows in one query that are not found in another.
It’s useful for finding differences between tables or queries.
Note: For all these 4 operators to work, the following 2 conditions must be met
1. The number and the order of the columns must be the same in both the queries
2. The data types must be the same or at least compatible
For example, if the number of columns is different, you will get the following error
Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT, or EXCEPT operator must have an
equal number of expressions in their target lists.
Differences Between UNION EXCEPT and INTERSECT Operators in SQL Server:
In SQL Server, the UNION, EXCEPT, and INTERSECT operators combine or manipulate the
results of two or more SELECT statements. These operators help you perform set operations on
the result sets of those SELECT statements. Here are the main differences between these operators:
UNION Operator:
1. The UNION operator combines the result sets of two or more SELECT statements
into a single result set.
2. It removes duplicate rows from the combined result set by default.
3. The columns in the SELECT statements must have compatible data types, and the
number of columns in each SELECT statement must be the same.
4. The order of rows in the final result set may not be the same as in the individual
SELECT statements unless you use the ORDER BY clause.
EXCEPT Operator:
1. The EXCEPT operator retrieves the rows present in the first result set but not in
the second result set.
2. It returns distinct rows from the first result set that do not have corresponding
rows in the second result set.
3. The columns in both SELECT statements must have compatible data types, and
the number of columns in both statements must be the same.
INTERSECT Operator:
1. The INTERSECT operator is used to retrieve the rows that are common to both
result sets.
2. It returns distinct rows appearing in the first and second result sets.
3. The columns in both SELECT statements must have compatible data types, and
the number of columns in both statements must be the same.
So, UNION combines result sets, EXCEPT returns rows from the first set that are not in the second
set, and INTERSECT returns common rows between two result sets. It’s important to ensure that
the data types and the number of columns match when using these operators, and you can use the
ORDER BY clause to control the order of the final result set if needed.
In the next article, I will discuss the Joins in SQL Server with Examples. In this article, I try to
explain the differences between UNION EXCEPT and INTERSECT Operators in SQL
Server with some examples. I hope this article will help you with your needs. I would like to have
your feedback. Please post your feedback, questions, or comments about this article.

Structured Query Language (SQL) is a programming language. SQL is used to manage data stored
in a relational database. SQL has the ability of nest queries. A nested query is a query within another
query. Nested query allows for more complex and specific data retrieval. In this article, we will
discuss nested queries in SQL, their syntax, and examples.

Nested Query
In SQL, a nested query involves a query that is placed within another query. Output of the inner
query is used by the outer query. A nested query has two SELECT statements: one for the inner
query and another for the outer query.
Syntax of Nested Queries
The basic syntax of a nested query involves placing one query inside of another query. Inner query
or subquery is executed first and returns a set of values that are then used by the outer query. The
syntax for a nested query is as follows:

SELECT column1, column2, ...


FROM table1
WHERE column1 IN ( SELECT column1
FROM table2
WHERE condition );

Types of Nested Queries in SQL


Subqueries can be either correlated or non-correlated

Non-correlated (or Independent) Nested Queries


Non-correlated (or Independent) Nested Queries : Non-correlated (or Independent) subqueries are
executed independently of the outer query. Their results are passed to the outer query.

Correlated Nested Queries

Correlated subqueries are executed once for each row of the outer query. They use values from the
outer query to return results.

Execution Order in Independent Nested Queries


In independent nested queries, the execution order is from the innermost query to the outer query.
An outer query won't be executed until its inner query completes its execution. The outer query
uses the result of the inner query.

Operators Used in Independent Nested Queries

IN Operator

This operator checks if a column value in the outer query's result is present in the inner query's
result. The final result will have rows that satisfy the IN condition.

NOT IN Operator

This operator checks if a column value in the outer query's result is not present in the inner query's
result. The final result will have rows that satisfy the NOT IN condition.

ALL Operator

This operator compares a value of the outer query's result with all the values of the inner query's
result and returns the row if it matches all the values.

ANY Operator

This operator compares a value of the outer query's result with all the inner query's result values
and returns the row if there is a match with any value.

Execution Order in Co-related Nested Queries


In correlated nested queries, the inner query uses values from the outer query, and the execution
order is different from that of independent nested queries.

• First, the outer query selects the first row.


• Inner query uses the value of the selected row. It executes its query and returns a result set.
• Outer query uses the result set returned by the inner query. It determines whether the
selected row should be included in the final output.
• Steps 2 and 3 are repeated for each row in the outer query's result set.
• This process can be resource-intensive. It may lead to performance issues if the query is
not optimized properly.

Operators Used in Co-related Nested Queries

In co-related nested queries, the following operators can be used

EXISTS Operator

This operator checks whether a subquery returns any row. If it returns at least one row. EXISTS
operator returns true, and the outer query continues to execute. If the subquery returns no row, the
EXISTS operator returns false, and the outer query stops execution.

NOT EXISTS Operator

This operator checks whether a subquery returns no rows. If the subquery returns no row, the NOT
EXISTS operator returns true, and the outer query continues to execute. If the subquery returns at
least one row, the NOT EXISTS operator returns false, and the outer query stops execution.

ANY Operator

This operator compares a value of the outer query's result with one or more values returned by the
inner query. If the comparison is true for any one of the values returned by the inner query, the row
is included in the final result.

ALL Operator

This operator compares a value of the outer query's result with all the values returned by the inner
query. Only if the comparison is true for all the values returned by the inner query, the row is
included in the final result.

These operators are used to create co-related nested queries that depend on values from the outer
query for execution.

Examples

Consider the following sample table to execute nested queries on these.


Table: employees table

emp_id emp_name dept_id

1 John 1

2 Mary 2

3 Bob 1

4 Alice 3

5 Tom 1

Table: departments table

dept_id dept_name

1 Sales

2 Marketing

3 Finance

Table: sales table

sale_id emp_id sale_amt

1 1 1000

2 2 2000

3 3 3000

4 1 4000

5 5 5000

6 3 6000

7 2 7000

Example 1: Find the names of all employees in the Sales department.


Required query

SELECT emp_name
FROM employees
WHERE dept_id IN (SELECT dept_id
FROM departments
WHERE dept_name = 'Sales');

Output

emp_name

John

Bob

Tom

Example 2: Find the names of all employees who have made a sale

Required query

SELECT emp_name
FROM employees
WHERE EXISTS (SELECT emp_id
FROM sales
WHERE employees.emp_id = sales.emp_id);

Output

emp_name

John

Mary

Bob

Alice

Tom
This query selects all employees from the "employees" table where there exists a sale record in the
"sales" table for that employee.

Example 3: Find the names of all employees who have made sales greater than $1000.

Required query

SELECT emp_name
FROM employees
WHERE emp_id = ALL (SELECT emp_id
FROM sales
WHERE sale_amt > 1000);

Output

emp_name

John

Mary

Bob

Alice

Tom

This query selects all employees from the "employees" table. With the condition that where their
emp_id equals all the emp_ids in the "sales" table where the sale amount is greater than $1000.
Since all employees have made a sale greater than $1000, all employee names are returned.

Aggregate functions
SQL Aggregate functions are functions where the values of multiple rows are
grouped as input on certain criteria to form a single value result of more significant
meaning.
It is used to summarize data, by combining multiple values to form a single result.
SQL Aggregate functions are mostly used with the GROUP BY clause of the
SELECT statement.
Various Aggregate Functions
1. Count()
2. Sum()
3. Avg()
4. Min()
5. Max()
Aggregate Functions in SQL
Below is the list of SQL aggregate functions, with examples
Count():
• Count(*): Returns the total number of records .i.e 6.
• Count(salary): Return the number of Non-Null values over the column
salary. i.e 5.
• Count(Distinct Salary): Return the number of distinct Non-Null values
over the column salary .i.e 5.
Sum():
• sum(salary): Sum all Non-Null values of Column salary i.e., 310
• sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.
Avg():
• Avg(salary) = Sum(salary) / count(salary) = 310/5
• Avg(Distinct salary) = sum(Distinct salary) / Count(Distinct Salary) =
250/4
Min():
• Min(salary): Minimum value in the salary column except NULL i.e.,
40.
Max():
• Max(salary): Maximum value in the salary i.e., 80.
Demo SQL Database
In this tutorial on aggregate functions, we will use the following table for
examples:
Id Name Salary

1 A 802

2 B 403

3 C 604

4 D 705

5 E 606
Id Name Salary

6 F NULL

You can also create this table on your system, by writing the following queries:
MySQL

CREATE TABLE Employee (


Id INT PRIMARY KEY,
Name CHAR(1), -- Adjust data type and length if names can be longer than a single
character
Salary DECIMAL(10,2) -- Adjust precision and scale if needed for salaries
);

INSERT INTO Employee (Id, Name, Salary)


VALUES (1, 'A', 802),
(2, 'B', 403),
(3, 'C', 604),
(4, 'D', 705),
(5, 'E', 606),
(6, 'F', NULL);
Aggregate Function Example
In this example, we will use multiple aggregate functions on the data.
Queries
--Count the number of employees
SELECT COUNT(*) AS TotalEmployees FROM Employee;

-- Calculate the total salary


SELECT SUM(Salary) AS TotalSalary FROM Employee;

-- Find the average salary


SELECT AVG(Salary) AS AverageSalary FROM Employee;

-- Get the highest salary


SELECT MAX(Salary) AS HighestSalary FROM Employee;

-- Determine the lowest salary


SELECT MIN(Salary) AS LowestSalary FROM Employee;
Output
TotalEmployees
6
TotalSalary
3120
AverageSalary
624
HighestSalary
802
LowestSalary
403
Key Takeaways about SQL Aggregate Functions
• Aggregate functions in SQL operate on a group of values and return a
single result.
• They are often used with the GROUP BY clause to summarize the
grouped data.
• Aggregate function operates on non-NULL values only (except
COUNT).
• Commonly used aggregate functions are
– MIN(), MAX(), COUNT(), AVG(), and SUM().
NULL values in SQL
In SQL there may be some records in a table that do not have values or data for
every field and those fields are termed as a NULL value.
NULL values could be possible because at the time of data entry information is not
available. So SQL supports a special value known as NULL which is used to
represent the values of attributes that may be unknown or not apply to a
tuple. SQL places a NULL value in the field in the absence of a user-defined value.
For example, the Apartment_number attribute of an address applies only to
addresses that are in apartment buildings and not to other types of residences.
So, NULL values are those values in which there is no data value in the particular
field in the table.
Importance of NULL Value
• It is important to understand that a NULL value differs from a zero value.
• A NULL value is used to represent a missing value, but it usually has one
of three different interpretations:
• The value unknown (value exists but is not known)
• Value not available (exists but is purposely withheld)
• Attribute not applicable (undefined for this tuple)
• It is often not possible to determine which of the meanings is intended.
Hence, SQL does not distinguish between the different meanings of
NULL.
Principles of NULL values
• Setting a NULL value is appropriate when the actual value is unknown,
or when a value is not meaningful.
• A NULL value is not equivalent to a value of ZERO if the data type is a
number and is not equivalent to spaces if the data type is a character.
• A NULL value can be inserted into columns of any data type.
• A NULL value will evaluate NULL in any expression.
• Suppose if any column has a NULL value, then
UNIQUE, FOREIGN key, and CHECK constraints will ignore by SQL.
In general, each NULL value is considered to be different from every other NULL
in the database. When a NULL is involved in a comparison operation, the result is
considered to be UNKNOWN. Hence, SQL uses a three-valued logic with
values True, False, and Unknown. It is, therefore, necessary to define the results
of three-valued logical expressions when the logical connectives AND, OR, and
NOT are used.

How To Test for NULL Values?


SQL allows queries that check whether an attribute value is NULL. Rather than
using = or to compare an attribute value to NULL, SQL uses IS and IS NOT. This
is because SQL considers each NULL value as being distinct from every other
NULL value, so equality comparison is not appropriate.
Now, consider the following Employee Table.
Query:
CREATE TABLE Employee (
Fname VARCHAR(50),
Lname VARCHAR(50),
SSN VARCHAR(11),
Phoneno VARCHAR(15),
Salary FLOAT
);

INSERT INTO Employee (Fname, Lname, SSN, Phoneno, Salary)


VALUES
('Shubham', 'Thakur', '123-45-6789', '9876543210', 50000.00),
('Aman', 'Chopra', '234-56-7890', NULL, 45000.00),
('Aditya', 'Arpan', NULL, '8765432109', 55000.00),
('Naveen', 'Patnaik', '345-67-8901', NULL, NULL),
('Nishant', 'Jain', '456-78-9012', '7654321098', 60000.00);
Output:

The IS NULL Operator


Suppose we find the Fname and Lname of the Employee having no Super_ssn then
the query will be:
Query:
SELECT Fname, Lname FROM Employee WHERE SSN IS NULL;
Output:

The IS NOT NULL Operator


Now if we find the Count of number of Employees having SSNs.
Query:
SELECT COUNT(*) AS Count FROM Employee WHERE SSN IS NOT NULL;
Output:

Updating NULL Values in a Table


We can update the NULL values present in a table using the UPDATE statement
in SQL. To do so, we can use the IS NULL operator in the WHERE clause to select
the rows with NULL values and then we can set the new value using the SET
keyword.
Let’s suppose that we want to update SSN in the row where it is NULL.
Query:
UPDATE Employee
SET SSN = '789-01-2345'
WHERE Fname = 'Aditya' AND Lname = 'Arpan';

select* from Employee;


Output:

Complex integrity constraints


in SQL, integrity constraints are rules that ensure data integrity in a database. They can be
classified into two categories: simple and complex.

Simple integrity constraints include primary keys, unique constraints, and foreign keys. These
constraints ensure that data in a table is unique and consistent, and they prevent certain
operations that would violate these rules.

On the other hand, complex integrity constraints are more advanced rules that cannot be
expressed using simple constraints. They are typically used to enforce business rules, such as
limiting the range of values that can be entered into a column or ensuring that certain
combinations of values are present in a table.

There are several ways to implement complex integrity constraints in SQL, including:

1. Check Constraints: Check constraints are used to restrict the range of values that can be
entered into a column. For example, a check constraint can be used to ensure that a date
column only contains dates in a certain range.

CREATE TABLE Employee (


ID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT,
Salary DECIMAL(10,2),
CONSTRAINT CHK_Age CHECK (Age >= 18 AND Age <= 65),
CONSTRAINT CHK_Salary CHECK (Salary >= 0)
);
In this example, we are creating a table called "Employee" with columns for ID, Name, Age,
and Salary. We have added two check constraints, one to ensure that the age is between 18
and 65, and another to ensure that the salary is greater than or equal to 0.

2. Assertions: Assertions are used to specify complex rules that cannot be expressed using
simple constraints. They are typically used to enforce business rules, such as ensuring that a
customer's age is greater than a certain value.

CREATE TABLE Customer (


ID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT,
Gender VARCHAR(10),
CONSTRAINT CHK_Age CHECK (Age >= 18),
CONSTRAINT CHK_Gender CHECK (Gender IN ('M', 'F')),
CONSTRAINT CHK_Female_Age CHECK (Gender <> 'F' OR Age >= 21)
);

In this example, we are creating a table called "Customer" with columns for ID, Name, Age,
and Gender. We have added three constraints: one to ensure that the age is greater than or
equal to 18, another to ensure that the gender is either "M" or "F", and a third to ensure that if
the gender is "F", the age is greater than or equal to 21.

3. Triggers: Triggers are special procedures that are executed automatically in response to
certain events, such as an insert or update operation on a table. Triggers can be used to
implement complex business rules that cannot be expressed using simple constraints.

CREATE TABLE Order (


ID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
Amount DECIMAL(10,2),
CONSTRAINT FK_Customer_Order FOREIGN KEY (CustomerID) REFERENCES
Customer(ID)
);

CREATE TRIGGER TR_Order_Check_Amount


BEFORE INSERT ON Order
FOR EACH ROW
BEGIN
IF NEW.Amount <= 0 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Amount must be greater than
zero.';
END IF;
END;

In this example, we are creating a table called "Order" with columns for ID, CustomerID,
OrderDate, and Amount. We have added a foreign key constraint to ensure that the
CustomerID in the Order table matches the ID in the Customer table. We have also created a
trigger called "TR_Order_Check_Amount" that will be executed before an insert operation
on the Order table. The trigger checks if the amount being inserted is greater than zero. If it is
not, an error message will be generated and the insert operation will be cancelled.

These are just a few examples of how complex integrity constraints can be implemented in
SQL to ensure the accuracy and consistency of data in a database.

Overall, complex integrity constraints are essential for ensuring the integrity and consistency
of data in a database. They allow developers to enforce complex business rules and prevent
data inconsistencies, ensuring that the data in the database remains accurate and trustworthy

Schema Refinement
The Problem of Redundancy in Database

Redundancy means having multiple copies of the same data in the database. This
problem arises when a database is not normalized. Suppose a table of student details
attributes is: student ID, student name, college name, college rank, and course opted.
Student_I Name Contact College Course Rank
D

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

It can be observed that values of attribute college name, college rank, and course are
being repeated which can lead to problems. Problems caused due to redundancy are:
• Insertion anomaly
• Deletion anomaly
• Updation anomaly
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for the student.
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU 1

This problem happens when the insertion of a data record is not possible without
adding some additional unrelated data to the record.
Deletion Anomaly
If the details of students in this table are deleted then the details of the college will
also get deleted which should not occur by common sense. This anomaly happens
when the deletion of a data record results in losing some unrelated information that
was stored as part of the record that was deleted from a table.
It is not possible to delete some information without losing some other information
in the table as well.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the
database which will be time-consuming and computationally costly.
Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places then the
database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in multiple places.
Redundancy can cause various problems such as data inconsistencies, higher storage
requirements, and slower data retrieval.
Problems Caused Due to Redundancy
• Data Inconsistency: Redundancy can lead to data inconsistencies, where
the same data is stored in multiple locations, and changes to one copy of
the data are not reflected in the other copies. This can result in incorrect
data being used in decision-making processes and can lead to errors and
inconsistencies in the data.
• Storage Requirements: Redundancy increases the storage requirements
of a database. If the same data is stored in multiple places, more storage
space is required to store the data. This can lead to higher costs and
slower data retrieval.
• Update Anomalies: Redundancy can lead to update anomalies, where
changes made to one copy of the data are not reflected in the other
copies. This can result in incorrect data being used in decision-making
processes and can lead to errors and inconsistencies in the data.
• Performance Issues: Redundancy can also lead to performance issues,
as the database must spend more time updating multiple copies of the
same data. This can lead to slower data retrieval and slower overall
performance of the database.
• Security Issues: Redundancy can also create security issues, as multiple
copies of the same data can be accessed and manipulated by
unauthorized users. This can lead to data breaches and compromise
the confidentiality, integrity, and availability of the data.
• Maintenance Complexity: Redundancy can increase the complexity of
database maintenance, as multiple copies of the same data must be
updated and synchronized. This can make it more difficult to
troubleshoot and resolve issues and can require more time and resources
to maintain the database.
• Data Duplication: Redundancy can lead to data duplication, where the
same data is stored in multiple locations, resulting in wasted storage
space and increased maintenance complexity. This can also lead to
confusion and errors, as different copies of the data may have different
values or be out of sync.
• Data Integrity: Redundancy can also compromise data integrity, as
changes made to one copy of the data may not be reflected in the other
copies. This can result in inconsistencies and errors and can make it
difficult to ensure that the data is accurate and up-to-date.
• Usability Issues: Redundancy can also create usability issues, as users
may have difficulty accessing the correct version of the data or may be
confused by inconsistencies and errors. This can lead to frustration and
decreased productivity, as users spend more time searching for the
correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can be used.
Normalization is the process of organizing data in a database to eliminate
redundancy and improve data integrity. Normalization involves breaking down a
larger table into smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and reliable.
Advantages of Redundant Data
• Enhanced Query Performance: By eliminating the need for intricate
joins, redundancy helps expedite data retrieval.
• Offline Access: In offline circumstances, redundant copies allow data
access even in the absence of continuous connectivity.
• Increased Availability: Redundancy helps to increase fault tolerance,
which makes data accessible even in the event of server failures.
Disadvantages of Redundant Data
• Increased storage requirements: Redundant data takes up additional
storage space within the database, which can increase costs and slow
down performance.
• Inconsistency: If the same data is stored in multiple places within the
database, there is a risk that updates or changes made to one copy of the
data may not be reflected in other copies, leading to inconsistency and
potentially incorrect results.
• Difficulty in maintenance: With redundant data, it becomes more
difficult to maintain the accuracy and consistency of the data. It requires
more effort and resources to ensure that all copies of the data are updated
correctly.
• Increased risk of errors: When data is redundant, there is a greater risk
of errors in the database. For example, if the same data is stored in
multiple tables, there is a risk of inconsistencies between the tables.
• Reduced flexibility: Redundancy can reduce the flexibility of the
database. For example, if a change needs to be made to a particular piece
of data, it may need to be updated in multiple places, which can be time-
consuming and error-prone.
Conclusion
In databases, data redundancy is a prevalent issue. It can cause a number of problems
, such as inconsistent data, wasted storage space, decreased database
performance, and increased security risk.
The most effective technique to reduce redundancy is to normalize the database. The
use of views materialized views, and foreign keys are additional techniques to reduce
redundancy.
Decomposition In DBMS
Decomposition refers to the division of tables into multiple tables to produce
consistency in the data. In this article, we will learn about the Database concept.
This article is related to the concept of Decomposition in DBMS. It explains the
definition of Decomposition, types of Decomposition in DBMS, and its properties.
What is Decomposition in DBMS?
When we divide a table into multiple tables or divide a relation into multiple
relations, then this process is termed Decomposition in DBMS. We perform
decomposition in DBMS when we want to process a particular data set. It is
performed in a database management system when we need to ensure consistency
and remove anomalies and duplicate data present in the database. When we
perform decomposition in DBMS, we must try to ensure that no information or
data is lost.

Decomposition in DBMS

Types of Decomposition
There are two types of Decomposition:
• Lossless Decomposition
• Lossy Decomposition

Types of Decomposition

Lossless Decomposition
The process in which where we can regain the original relation R with the help of
joins from the multiple relations formed after decomposition. This process is
termed as lossless decomposition. It is used to remove the redundant data from the
database while retaining the useful information. The lossless decomposition tries
to ensure following things:
• While regaining the original relation, no information should be lost.
• If we perform join operation on the sub-divided relations, we must get
the original relation.
Example:
There is a relation called R(A, B, C)
A B C

55 16 27

48 52 89

Now we decompose this relation into two sub relations R1 and R2


R1(A, B)
A B

55 16

48 52

R2(B, C)
B C

16 27

52 89

After performing the Join operation we get the same original relation
A B C

55 16 27

48 52 89

Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation
on the sub-relations it doesn’t result to the same relation which was decomposed.
After the join operation, we always found some extraneous tuples. These extra
tuples genrates difficulty for the user to identify the original tuples.
Example:
We have a relation R(A, B, C)
A B C

1 2 1

2 5 3

3 3 3

Now , we decompose it into sub-relations R1 and R2


R1(A, B)
A B

1 2

2 5

3 3

R2(B, C)
B C

2 1

5 3

3 3

Now After performing join operation


A B C

1 2 1

2 5 3

2 3 3
A B C

3 5 3

3 3 3

Properties of Decomposition
• Lossless: All the decomposition that we perform in Database
management system should be lossless. All the information should not
be lost while performing the join on the sub-relation to get back the
original relation. It helps to remove the redundant data from the
database.
• Dependency Preservation: Dependency Preservation is an important
technique in database management system. It ensures that the functional
dependencies between the entities is maintained while performing
decomposition. It helps to improve the database efficiency, maintain
consistency and integrity.
• Lack of Data Redundancy: Data Redundancy is generally termed as
duplicate data or repeated data. This property states that the
decomposition performed should not suffer redundant data. It will help
us to get rid of unwanted data and focus only on the useful data or
information.
Conclusion
Decomposition is an important term in Database Management system. It refers to
the method of splitting the realtions into multiple relation so that database
operations are performed efficiently. There are two types of decomposition one is
lossless and the other is lossy decomposition. The properties of Decomposition
helps us to maintain consistency, reduce redundant data and remove the anomalies.

Problems Related to Decomposition

1. Loss of Information
• Non-loss decomposition: When a relation is decomposed into two or more smaller relations,
and the original relation can be perfectly reconstructed by taking the natural join of the
decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy
decomposition."
• Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it
into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't recreate the original table
using natural joins.
Example: Consider a relation R(A,B,C) with the following data:
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |

Suppose we decompose R into R1(A,B) and R2(A,C).


R1(A, B):

|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |

R2(A, C):

|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |

Now, if we take the natural join of R1 and R2 on attribute A, we get back the original relation
R. Therefore, this is a lossless decomposition.

2. Loss of Functional Dependency


• Once tables are decomposed, certain functional dependencies might not be preserved, which
can lead to the inability to enforce specific integrity constraints.
• Example: If you have the functional dependency `A → B` in the original table, but in the
decomposed tables, there is no table with both `A` and `B`, this functional dependency can't be
preserved.
Example: Let's consider a relation R with attributes A,B, and C and the following functional
dependencies:
A → B
B→C
Now, suppose we decompose R into two relations:
R1(A,B) with FD A → B
R2(B,C) with FD B → C
In this case, the decomposition is dependency-preserving because all the functional
dependencies of the original relation R can be found in the decomposed relations R1 and R2.
We do not need to join R1 and R2 to enforce or check any of the functional dependencies.
However, if we had a functional dependency in R, say A → C, which cannot be determined
from either R1 or R2 without joining them, then the decomposition would not be dependency-
preserving for that specific FD.

3. Increased Complexity
• Decomposition leads to an increase in the number of tables, which can complicate queries and
maintenance tasks. While tools and ORM (Object-Relational Mapping) libraries can mitigate
this to some extent, it still adds complexity.

4. Redundancy
• Incorrect decomposition might not eliminate redundancy, and in some cases, can even introduce
new redundancies.

5. Performance Overhead
• An increased number of tables, while aiding normalization, can also lead to more complex SQL
queries involving multiple joins, which can introduce performance overheads.

Best Practices
1. Ensure decomposition is non-lossy. After decomposition, it should be possible to recreate the
original data using natural joins.
2. Preserve functional dependencies to enforce integrity constraints.
3. Strike a balance. While normalization and decomposition are essential, in some scenarios (like
reporting databases), a certain level of denormalization might be preferred for performance
reasons.
4. Regularly review and optimize the database design, especially as the application's
requirements evolve.

In essence, while decomposition is a powerful tool in achieving database normalization and


reducing anomalies, it must be done thoughtfully and judiciously to avoid introducing new
problems.

reasoning about functional dependencies

1. Trivial Dependency
- If Y is a subset of X, then the dependency X -> Y is trivial.
- For example, in {A, B} -> {A}, the dependency is trivial because A is part of {A, B}.
Example: For attributes A,B,C:

• \( A \rightarrow A \) is a trivial dependency because an attribute always determines


itself.
• \( AB \rightarrow A \) is a trivial dependency because the combined attributes A and
B always determine A as it's a subset.
• \( ABC \rightarrow AC \) is a trivial dependency for the same reason; the combined
attributes A,B, and C always determine A and C.

2. Full Functional Dependency


- An attribute functionally depends on a set of attributes, X, and does not functionally depend
on any proper subset of X.
Example: Consider a relation StudentCourses that has the following attributes:

• StudentID (unique identifier for each student)


• CourseID (unique identifier for each course)
• Instructor (name of the instructor teaching the course)
The relation is used to keep track of which student is enrolled in which course and who the
instructor for that course is.
Now, assume we have the following functional dependency:
(StudentID,CourseID) → Instructor
This means that a combination of a specific student and a specific course will determine who
the instructor is.
Thus, Instructor is fully functionally dependent on the combined attributes StudentID and
CourseID.

3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.
Example: Consider a relation Employees with the following attributes:

• EmployeeID (unique identifier for each employee)


• EmployeeName
• Department (department in which the employee works)
• DepartmentLocation (location of the department)
Now, let's consider the following functional dependencies:
EmployeeID → Department
Department → DepartmentLocation
From the above functional dependencies:
An EmployeeID determines the Department an employee works in.
A Department determines its DepartmentLocation.
However, the DepartmentLocation is also dependent on the EmployeeID through Department.
This means the DepartmentLocation has a transitive dependency on EmployeeID via
Department.
This kind of transitive dependency can lead to redundancy.

4. Closure
- The closure of a set of attributes X with respect to a set of functional dependencies FD,
denoted as X+, is the set of attributes that are functionally determined by X.
- For example, given FDs: {A -> B, B -> C}, the closure of {A}, denoted as A+, would be {A,
B, C}.

Introduction to Normal Forms


In database management systems (DBMS), the concept of normalization is employed to
organize relational databases efficiently and to eliminate redundant data, ensure data
dependency, and ensure data integrity. The process of normalization is divided into several
stages, called "normal forms." Each normal form has a specific set of rules and criteria that a
database schema must meet.
Here's a brief overview of the main normal forms:

1. First Normal Form (1NF)


• Each table should have a primary key.
• Atomic values: Each attribute (column) of a table should hold only a single value, meaning no
repeating groups or arrays.
• All entries in any column must be of the same kind.

Second Normal Form (2NF)


• It meets all the requirements of 1NF.
• It ensures that non-key attributes are fully functionally dependent on the primary key. In other
words, if a table has a composite primary key, then every non-key attribute should be dependent
on the full set of primary key attributes.
Third Normal Form (3NF)
• It meets all the requirements of 2NF.
• It ensures that the non-key columns are functionally dependent only on the primary key. This
means there should be no transitive dependencies.

Boyce-Codd Normal Form (BCNF)


• Meets all requirements of 3NF.
• For any non-trivial functional dependency, X → Y, X should be a superkey. It's a more stringent
version of 3NF.

Fourth Normal Form (4NF)


• Meets all the requirements of BCNF.
• There shouldn’t be any multi-valued dependency for a superkey. This deals with separating
independent multiple relationships, ensuring that you cannot determine multiple sets of values
in a table from a single key attribute.

Fifth Normal Form (5NF or Project-Join Normal Form -


PJNF)
• It deals with cases where certain projections of your data must be recreatable from other
projections.

Sixth Normal Form (6NF)


• Often considered when dealing with temporal databases (databases that have time-dependent
data).
• Deals with how data evolves over time and is less commonly discussed in most relational
database design contexts.

First Normal Form (1NF) in DBMS


The First Normal Form (1NF) is the first step in the normalization process of organizing data
within a relational database to reduce redundancy and improve data integrity. A relation (table)
is said to be in 1NF if it adheres to the following rules:
1. Atomic Values:

• Each attribute (column) contains only atomic (indivisible) values. This means values in each
column are indivisible units and there should be no sets, arrays, or lists.
• For example, a column called "Phone Numbers" shouldn't contain multiple phone numbers for
a single record. Instead, you'd typically break it into additional rows or another related table.
2. Primary Key:

• Each table should have a primary key that uniquely identifies each row. This ensures that each
row in the table can be uniquely identified.
3. No Duplicate Rows:

• There shouldn’t be any duplicate rows in the table. This is often ensured by the use of the
primary key.
4. Order Doesn't Matter:

• The order in which data is stored doesn't matter in the context of 1NF (or any of the normal
forms). Relational databases don't guarantee an order for rows in a table unless explicitly sorted.
5. Single Valued Attributes:

• Columns should not contain multiple values of the same type. For example, a column "Skills"
shouldn't contain a list like "Java, Python, C++" for a single record. Instead, these skills should
be split across multiple rows or placed in a separate related table.

Example for First Normal Form (1NF)


Consider a table with a structure:

| Student_ID | Subjects |
|------------|-------------------|
|1 | Math, English |
|2 | English, Science |
|3 | Math, History |

The table above is not in 1NF because the "Subjects" column contains multiple values.
To transform it to 1NF:

| Student_ID | Subject |
|------------|-----------|
|1 | Math |
|1 | English |
|2 | English |
|2 | Science |
|3 | Math |
|3 | History |
Now, each combination of "Student_ID" and "Subject" is unique, and every attribute contains
only atomic values, ensuring the table is in 1NF.
Achieving 1NF is a fundamental step in database normalization, laying the foundation for
further normalization processes to eliminate redundancy and ensure data integrity.

Second Normal Form (2NF) in DBMS


The Second Normal Form (2NF) is the next stage in the normalization process after the First
Normal Form (1NF). A relation is in 2NF if:
1. It is already in 1NF:

• This means the relation contains only atomic values, there are no duplicate rows, and it has a
primary key.
2. No Partial Dependencies:

• All non-key attributes (i.e., columns that aren't part of the primary key) should be functionally
dependent on the *entire* primary key. This rule is especially relevant for tables with composite
primary keys (i.e., primary keys made up of more than one column).
• In simpler terms, no column should depend on just a part of the composite primary key.

Example for Second Normal Form


Let's consider a table that keeps track of the courses that students are enrolled in, with the
faculty who teach those courses:

| Student_ID | Course_ID | Course_Name | Faculty |


|------------|-----------|---------------|-------------|
|1 | C1 | Math | Mr. A |
|1 | C2 | English | Ms. B |
|2 | C1 | Math | Mr. A |
|3 | C3 | History | Ms. C |

Here, a combination of `Student_ID` and `Course_ID` can be considered as a primary key


because a student can be enrolled in multiple courses, and each course might be taken by many
students.
However, you'll notice that `Course_Name` and `Faculty` depend only on `Course_ID` and
not on the combination of `Student_ID` and `Course_ID`. This is a partial dependency.
To bring the table to 2NF, we need to remove the partial dependencies:
StudentCourse Table

| Student_ID | Course_ID |
|------------|-----------|
|1 | C1 |
|1 | C2 |
|2 | C1 |
|3 | C3 |

Course Table

| Course_ID | Course_Name | Faculty |


|-----------|---------------|-------------|
| C1 | Math | Mr. A |
| C2 | English | Ms. B |
| C3 | History | Ms. C |

Now, the `StudentCourse` table relates students to courses, and the `Course` table holds
information about each course. There are no more partial dependencies.
It's worth noting that while 2NF does improve the structure of our database by reducing
redundancy and eliminating partial dependencies, it might not eliminate all anomalies or
redundancy. Further normalization forms (like 3NF and BCNF) address additional types of
dependencies and potential issues.

Third Normal Form (3NF) in DBMS


The Third Normal Form (3NF) is a further step in the normalization process after achieving
Second Normal Form (2NF). A relation is considered to be in 3NF if:
1. It is already in 2NF:

• This means the relation has no partial dependencies of non-key attributes on the primary key.
2. No Transitive Dependencies:

• All non-key attributes are functionally dependent only on the primary key and not on any other
non-key attributes. If there is a dependency of one non-key attribute on another non-key
attribute, it is called a transitive dependency, and such a dependency violates 3NF.

Simply put, in 3NF, non-key attributes should not depend on other non-key attributes; they
should only depend on the primary key.

Example for Third Normal Form (3NF)


Consider a table storing information about products sold by different vendors:
| Product_ID | Product_Name | Vendor_Name | Vendor_Address |
|------------|--------------|-------------|-----------------|
| P1 | Laptop | TechCorp | 123 Tech St. |
| P2 | Mouse | TechCorp | 123 Tech St. |
| P3 | Chair | FurniShop | 456 Furni Rd. |

In the table above, `Product_ID` is the primary key. We can see that `Vendor_Address`
depends on `Vendor_Name` rather than `Product_ID`, which represents a transitive
dependency.
To convert this table to 3NF, we can split it into two tables:
Product Table

| Product_ID | Product_Name | Vendor_Name |


|------------|--------------|-------------|
| P1 | Laptop | TechCorp |
| P2 | Mouse | TechCorp |
| P3 | Chair | FurniShop |

Vendor Table

| Vendor_Name | Vendor_Address |
|-------------|-----------------|
| TechCorp | 123 Tech St. |
| FurniShop | 456 Furni Rd. |

Now, the `Product` table has `Product_ID` as the primary key, and all attributes in this table
depend only on the primary key. The `Vendor` table has `Vendor_Name` as its primary key,
and the address in this table depends only on the vendor name.
This normalization eliminates the transitive dependency and reduces redundancy. If we need
to change a vendor's address, we now only have to make the change in one place in the `Vendor`
table.
To further refine the database structure, we might proceed to other normalization forms like
BCNF, but 3NF is often sufficient for many practical applications and strikes a good balance
between minimizing redundancy and maintaining a manageable schema.

Boyce-Codd Normal Form (BCNF) in DBMS


Boyce-Codd Normal Form (BCNF) is an advanced step in the normalization process, and it's
a stronger version of the Third Normal Form (3NF). In fact, every relation in BCNF is also in
3NF, but the converse isn't necessarily true. BCNF was introduced to handle certain anomalies
that 3NF does not deal with.
A relation is in BCNF if:
1. It is already in 3NF.
2. For every non-trivial functional dependency \( X \rightarrow Y \), X is a superkey. This
essentially means that the only determinants in the relation are superkeys.
Here, "non-trivial" means that Y is not a subset of X, and a "superkey" is a set of attributes that
functionally determines all other attributes in the relation.

Example for Boyce-Codd Normal Form (BCNF)


Consider a university scenario where professors supervise student theses in various topics.
Now, let's assume each professor can only supervise one topic, but multiple professors can
supervise the same topic.
Initial Table

| Student | Professor | Topic |


|---------|-----------|--------|
| Alice | Mr. A | Math |
| Bob | Mr. B | Math |
| Charlie | Mr. C | Physics|

Here:

• Each professor is associated with exactly one topic.


• The primary key is {Student, Professor}, meaning a professor can supervise multiple students,
but each student has one thesis and thus one topic.
• There's a functional dependency {Professor} → {Topic} since each professor supervises only
one topic.
Now, observe that {Professor} is not a superkey (because the primary key is a combination of
Student and Professor), but it determines another attribute in the table (Topic). This violates
the definition of BCNF.
To bring this table into BCNF, we can decompose it into two tables:
StudentSupervision Table:

| Student | Professor |
|---------|-----------|
| Alice | Mr. A |
| Bob | Mr. B |
| Charlie | Mr. C |
ProfessorTopic Table:

| Professor | Topic |
|-----------|--------|
| Mr. A | Math |
| Mr. B | Math |
| Mr. C | Physics|

This decomposition eliminates the partial dependency and ensures that the only determinants
are superkeys, making the structure adhere to BCNF.
In practice, BCNF is a highly normalized form, and while it can minimize redundancy, it can
also increase the complexity of the database design. Designers often have to make trade-offs
between achieving higher normal forms and maintaining simplicity, depending on the specific
use case and requirements of the system.

Fourth Normal Form (4NF) in DBMS

The Fourth Normal Form (4NF) is an advanced level in the normalization process, aiming to
handle certain types of anomalies which aren't addressed by the Third Normal Form (3NF).
Specifically, 4NF addresses multi-valued dependencies.
A relation is in 4NF if:
1. It is already in 3NF.
2. No multi-valued dependencies exist. A multi-valued dependency occurs when an attribute
depends on another attribute but not on the primary key.
To clarify, consider a relation R with attributes X, Y, and Z. We say that there is a multi-valued
dependency from X to Y, denoted \(X \twoheadrightarrow Y\), if for a single value of X, there
are multiple values of Y associated with it, independent of Z.

Example for Fourth Normal Form (4NF)


Let's illustrate 4NF with a scenario involving students, their hobbies, and the courses they've
taken:
Initial Table

| Student_ID | Hobby | Course |


|------------|------------|------------|
| S1 | Painting | Math |
| S1 | Painting | Physics |
| S1 | Hiking | Math |
| S1 | Hiking | Physics |
| S2 | Reading | Chemistry |
| S2 | Reading | Biology |

In the table:

• For student `S1`, there are two hobbies (`Painting` and `Hiking`) and two courses (`Math` and
`Physics`), resulting in a combination of every hobby with every course.
• This design suggests a multi-valued dependency between `Student_ID` and `Hobby`, and also
between `Student_ID` and `Course`.
To bring the table to 4NF, we can decompose it into two separate tables:
StudentHobbies Table:

| Student_ID | Hobby |
|------------|------------|
| S1 | Painting |
| S1 | Hiking |
| S2 | Reading |

StudentCourses Table:

| Student_ID | Course |
|------------|------------|
| S1 | Math |
| S1 | Physics |
| S2 | Chemistry |
| S2 | Biology |

With this separation:

• The `StudentHobbies` table lists the hobbies of each student.


• The `StudentCourses` table lists the courses taken by each student.
There are no more multi-valued dependencies. This setup not only reduces redundancy but also
prevents the possibility of certain types of inconsistencies and anomalies in the data.
For most practical applications, normalization up to 3NF or BCNF is often adequate. However,
when specific types of redundancy or data anomalies are a concern, proceeding to 4NF or even
5NF can be beneficial.
Fifth Normal Form (5NF or PJNF) in DBMS

The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a further
step in the normalization process. It aims to address redundancy arising from certain types of
join dependencies that aren't covered by earlier normal forms.
A relation is in 5NF or PJNF if:
1. It is already in BCNF.
2. Every non-trivial join dependency in the relation is implied by the candidate keys.
A join dependency occurs in a relation R when it is always possible to reconstruct R by joining
multiple projections of R. A join dependency is represented as {R1, R2, ..., Rn} ⟶ R, which
means that when R is decomposed into R1, R2, ..., Rn, the natural join of these projections
results in the original relation R.
The join dependency is non-trivial if none of the projections Ri is equal to R.

Example for Fifth Normal Form (5NF)


Consider a relation involving suppliers, parts, and projects:
Initial Table (SupplierPartsProjects)

| Supplier | Part | Project |


|----------|-------|---------|
| S1 | P1 | J1 |
| S1 | P2 | J1 |
| S1 | P1 | J2 |
| S2 | P2 | J2 |

Assume the following constraints for our example:

1. Every part supplied for a project is supplied by all suppliers supplying any part for that project.
2. Every part supplied by a supplier is supplied by that supplier for all projects to which that
supplier supplies any part.
Given the above constraints, the following join dependencies exist on the table:

• {Supplier, Part} ⟶ SupplierPartsProjects


• {Supplier, Project} ⟶ SupplierPartsProjects
• {Part, Project} ⟶ SupplierPartsProjects
To decompose the relation into 5NF:
SupplierParts:

| Supplier | Part |
|----------|-------|
| S1 | P1 |
| S1 | P2 |
| S2 | P2 |

SupplierProjects:

| Supplier | Project |
|----------|---------|
| S1 | J1 |
| S1 | J2 |
| S2 | J2 |

PartsProjects:

| Part | Project |
|-------|---------|
| P1 | J1 |
| P2 | J1 |
| P1 | J2 |
| P2 | J2 |

Now, these decomposed tables eliminate the redundancy caused by the specific constraints and
join dependencies of the original relation. When you take the natural join of these tables, you
will get back the original table.
It's worth noting that reaching 5NF can lead to an increased number of tables, which can
complicate queries and database operations. Thus, achieving 5NF should be a conscious
decision made based on the specific requirements and constraints of a given application.

Sixth Normal Form (6NF) in DBMS


The Sixth Normal Form (6NF) is a level of database normalization that specifically deals with
temporal data. While other normal forms focus primarily on eliminating redundancy and
ensuring logical consistency, 6NF aims to efficiently handle the historical data in temporal
databases.
A relation is said to be in 6NF if:
1. It is already in 5NF.
2. All temporal data (data that has time-bound attributes) is segregated into its own separate
table, such that it allows for efficient insertion, deletion, and modification of temporally
bounded data without the need to update non-temporal data.
Temporal databases require special attention to represent and query data spanning across
different time frames or versions. There could be valid-time (the time for which a fact is valid
in the real world) and transaction-time (the time at which a fact is stored in the database).

Example for Sixth Normal Form (6NF)


Consider a table with employee salaries over time. Employees may receive raises, and we wish
to keep a history of all their past salaries.
Initial Table (EmployeeSalaries):

| EmployeeID | Salary | ValidFrom | ValidTo |


|------------|---------|-------------|-------------|
| E1 | ₹50,000 | 2021-01-01 | 2022-01-01 |
| E1 | ₹55,000 | 2022-01-01 | 2023-01-01 |
| E2 | ₹60,000 | 2021-06-01 | 2022-06-01 |

In the above table, each row specifies the salary of an employee for a specific time interval. As
you can imagine, updates (like giving a raise) could become complicated and might require
adjustments in the `ValidTo` and `ValidFrom` columns, especially if you have multiple date
ranges.
To bring this into 6NF, you could decompose the table into separate relations, one capturing
the essence of the entity (e.g., the employee and some constant attributes) and others capturing
the temporal aspects.
Employee:

| EmployeeID | OtherConstantAttributes |
|------------|-------------------------|
| E1 | ... |
| E2 | ... |

EmployeeSalaryHistory:

| EmployeeID | Salary | ValidFrom | ValidTo |


|------------|---------|-------------|-------------|
| E1 | ₹50,000 | 2021-01-01 | 2022-01-01 |
| E1 | ₹55,000 | 2022-01-01 | 2023-01-01 |
| E2 | ₹60,000 | 2021-06-01 | 2022-06-01 |

By segregating the time-variant data in its own table, operations related to time-bound
attributes become more efficient and clearer. This structure makes it easier to handle and query
temporal data.
In practice, 6NF is specialized, and its application is restricted to systems that demand intricate
temporal data management. Also, while 6NF facilitates the handling of temporal data, it can
introduce complexity in the form of multiple tables, which might require complex joins during
querying.

You might also like