Unit 3 DBMS
Unit 3 DBMS
What is SQL?
SQL is Structured Query Language, which is a computer language for storing,
manipulating and retrieving data stored in a relational database.
SQL is the standard language for Relational Database System. All the Relational
Database Management Systems (RDMS) like MySQL, MS Access, Oracle,
Sybase, Informix, Postgres and SQL Server use SQL as their standard database
language.
Why SQL?
SQL is widely popular because it offers the following advantages −
• Query Dispatcher
• Optimization Engines
• Classic Query Engine
• SQL Query Engine, etc.
A classic query engine handles all the non-SQL queries, but a SQL query engine
won't handle logical files.
SQL Commands
The standard SQL commands to interact with relational databases are CREATE,
SELECT, INSERT, UPDATE, DELETE and DROP. These commands can be
classified into the following groups based on their nature −
ALTER TABLE
Alter the structure of the table_name ADD
ALTER COLUMN column_name
database
data_type;
COMMENT
Add comments to the
COMMENT 'comment_text' ON
data dictionary TABLE table_name;
RENAME TABLE
Rename an object
RENAME old_table_name TO
existing in the database new_table_name;
Table control
LOCK LOCK TABLE table_name IN lock_mode;
concurrency
Call a PL/SQL or
CALL CALL procedure_name(arguments);
JAVA subprogram
ALTER
ALTER TABLE employees ADD COLUMN phone VARCHAR(20);
TABLE
DROP
DROP TABLE employees;
TABLE
These are common examples of some important SQL commands. The examples
provide better understanding of the SQL commands and teaches correct way to
use them.
Conclusion
SQL commands are the foundation of an effective database management system.
Whether you are manipulating data, or managing data, SQL provides all sets of
tools. Now, with this detailed guide, we hope you have gained a deep
understanding of SQL commands, their categories, and syntax with examples.
SQL | Constraints
Constraints are the rules that we can apply on the type of data in a table. That is,
we can specify the limit on the type of data that can be stored in a particular
column in a table using constraints.
The available constraints in SQL are:
• NOT NULL: This constraint tells that we cannot store a null value in a
column. That is, if a column is specified as NOT NULL then we will
not be able to store null in this particular column any more.
• UNIQUE: This constraint when specified with a column, tells that all
the values in the column must be unique. That is, the values in any row
of a column must not be repeated.
• PRIMARY KEY: A primary key is a field which can uniquely identify
each row in a table. And this constraint is used to specify a field in a
table as primary key.
• FOREIGN KEY: A Foreign key is a field which can uniquely identify
each row in a another table. And this constraint is used to specify a field
as Foreign key.
• CHECK: This constraint helps to validate the values of a column to
meet a particular condition. That is, it helps to ensure that the value
stored in a column meets a specific condition.
• DEFAULT: This constraint specifies a default value for the column
when no value is specified by the user.
How to specify constraints?
We can specify constraints at the time of creating the table using CREATE
TABLE statement. We can also specify the constraints after creating a table using
ALTER TABLE statement.
Syntax:
Below is the syntax to create constraints using CREATE TABLE statement at the
time of creating the table.
1. NOT NULL –
If we specify a field in a table to be NOT NULL. Then the field will never accept
null value. That is, you will be not allowed to insert a new row in the table
without specifying any value to this field.
For example, the below query creates a table Student with the fields ID and
NAME as NOT NULL. That is, we are bound to specify values for these two
fields every time we wish to insert a new row.
Orders
Customers
C_ID NAME ADDRESS
1 RAMESH DELHI
2 SURESH NOIDA
3 DHARMESH GURGAON
As we can see clearly that the field C_ID in Orders table is the primary key in
Customers table, i.e. it uniquely identifies each row in the Customers table.
Therefore, it is a Foreign Key in Orders table.
Syntax:
DML Triggers
The Data uses manipulation Language (DML) command events that begin with
Insert, Update, and Delete set off the DML triggers. corresponding to insert_table,
update_view, and delete_table.
SQL Server
create trigger deep
on emp
for
insert,update ,delete
as
print 'you can not insert,update and delete this table i'
rollback;
Output:
Logon Triggers
logon triggers are fires in response to a LOGON event. When a user session is
created with a SQL Server instance after the authentication process of logging is
finished but before establishing a user session, the LOGON event takes place. As a
result, the PRINT statement messages and any errors generated by the trigger will
all be visible in the SQL Server error log. Authentication errors prevent logon
triggers from being used. These triggers can be used to track login activity or set a
limit on the number of sessions that a given login can have in order to audit and
manage server sessions.
How does SQL Server Show Trigger?
The show or list trigger is useful when we have many databases with many tables.
This query is very useful when the table names are the same across multiple
databases. We can view a list of every trigger available in the SQL Server by using
the command below:
Syntax:
FROM sys.triggers, SELECT name, is_instead_of_trigger
IF type = ‘TR’;
The SQL Server Management Studio makes it very simple to display or list all
triggers that are available for any given table. The following steps will help us
accomplish this:
Go to the Databases menu, select the desired database, and then expand it.
• Select the Tables menu and expand it.
• Select any specific table and expand it.
We will get various options here. When we choose the Triggers option, it displays
all the triggers available in this table.
BEFORE and AFTER Trigger
BEFORE triggers run the trigger action before the triggering statement is run.
AFTER triggers run the trigger action after the triggering statement is run.
Example
Given Student Report Database, in which student marks assessment is recorded. In
such a schema, create a trigger so that the total and percentage of specified marks
are automatically inserted whenever a record is inserted.
Here, a trigger will invoke before the record is inserted so BEFORE Tag can be
used.
Suppose the Database Schema
Query
mysql>>desc Student;
Above SQL statement will create a trigger in the student database in which whenever
subjects marks are entered, before inserting this data into the database, the trigger
will compute those two values and insert them with the entered values. i.e.
Output
Please read UNION and UNION ALL, EXCEPT, and INTERSECT Operators of this article
series before proceeding to this article. The SET operators are mainly used to combine the result
of more than 1 select statement and return a single result set to the user.
The set operators work on complete rows of queries, so the results of the queries must have the
same column name and column order, and the types of columns must be compatible. There are the
following 4 set operators in SQL Server:
1. UNION: Combine two or more result sets into a single set without duplicates.
2. UNION ALL: Combine two or more result sets into a single set, including all
duplicates.
3. INTERSECT: Takes the data from both result sets, which are in common.
4. EXCEPT: Takes the data from the first result set but not in the second result set
(i.e., no matching to each other)
Rules on Set Operations:
1. The result sets of all queries must have the same number of columns.
2. In every result set, the data type of each column must be compatible (well-
matched) with the data type of its corresponding column in other result sets.
3. An ORDER BY clause should be part of the last select statement to sort the result.
The first select statement must find out the column names or aliases.
Understand the Differences Between These Operators with Examples.
Use the SQL Script to create and populate the two tables we will use in our examples.
CREATE TABLE TableA
ID INT,
Name VARCHAR(50),
Gender VARCHAR(10),
Department VARCHAR(50)
GO
GO
Fetch the records:
SELECT * FROM TableA
ID INT,
Name VARCHAR(50),
Gender VARCHAR(10),
Department VARCHAR(50)
GO
UNION Operator:
The Union operator will return all the unique rows from both queries. Notice that the duplicates are
removed from the result set.
SELECT ID, Name, Gender, Department FROM TableA
UNION
SELECT ID, Name, Gender, Department FROM TableB
Result:
Purpose: The UNION operator combines the result sets of two or more SELECT statements into
a single result set.
Distinct Values: It removes duplicate rows between the various SELECT statements.
Use Case: You would use UNION when listing all distinct rows from multiple tables or queries.
UNION ALL Operator:
The UNION ALL operator returns all the rows from both queries, including the duplicates.
SELECT ID, Name, Gender, Department FROM TableA
UNION ALL
SELECT ID, Name, Gender, Department FROM TableB
Result:
INTERSECT Operator:
The INTERSECT operator retrieves the common unique rows from the left and right queries.
Notice the duplicates are removed.
SELECT ID, Name, Gender, Department FROM TableA
INTERSECT
SELECT ID, Name, Gender, Department FROM TableB
Result:
Purpose: The INTERSECT operator returns all rows common to both SELECT statements.
Distinct Values: Like UNION and EXCEPT, INTERSECT also removes duplicates.
Use Case: You would use INTERSECT when you need to find rows that are shared between two
tables or queries.
EXCEPT Operator:
The EXCEPT operator will return unique rows from the left query that aren’t in the right query’s
results.
ADVERTISEMENT
If you want the rows that are present in Table B but not in Table A, reverse the queries.
SELECT ID, Name, Gender, Department FROM TableB
EXCEPT
SELECT ID, Name, Gender, Department FROM TableA
Result:
Purpose: The EXCEPT operator returns all rows from the first SELECT statement that are absent
in the second SELECT statement’s results.
Distinct Values: It automatically removes duplicates.
Use Case: EXCEPT is used when you want to find rows in one query that are not found in another.
It’s useful for finding differences between tables or queries.
Note: For all these 4 operators to work, the following 2 conditions must be met
1. The number and the order of the columns must be the same in both the queries
2. The data types must be the same or at least compatible
For example, if the number of columns is different, you will get the following error
Msg 205, Level 16, State 1, Line 1
All queries combined using a UNION, INTERSECT, or EXCEPT operator must have an
equal number of expressions in their target lists.
Differences Between UNION EXCEPT and INTERSECT Operators in SQL Server:
In SQL Server, the UNION, EXCEPT, and INTERSECT operators combine or manipulate the
results of two or more SELECT statements. These operators help you perform set operations on
the result sets of those SELECT statements. Here are the main differences between these operators:
UNION Operator:
1. The UNION operator combines the result sets of two or more SELECT statements
into a single result set.
2. It removes duplicate rows from the combined result set by default.
3. The columns in the SELECT statements must have compatible data types, and the
number of columns in each SELECT statement must be the same.
4. The order of rows in the final result set may not be the same as in the individual
SELECT statements unless you use the ORDER BY clause.
EXCEPT Operator:
1. The EXCEPT operator retrieves the rows present in the first result set but not in
the second result set.
2. It returns distinct rows from the first result set that do not have corresponding
rows in the second result set.
3. The columns in both SELECT statements must have compatible data types, and
the number of columns in both statements must be the same.
INTERSECT Operator:
1. The INTERSECT operator is used to retrieve the rows that are common to both
result sets.
2. It returns distinct rows appearing in the first and second result sets.
3. The columns in both SELECT statements must have compatible data types, and
the number of columns in both statements must be the same.
So, UNION combines result sets, EXCEPT returns rows from the first set that are not in the second
set, and INTERSECT returns common rows between two result sets. It’s important to ensure that
the data types and the number of columns match when using these operators, and you can use the
ORDER BY clause to control the order of the final result set if needed.
In the next article, I will discuss the Joins in SQL Server with Examples. In this article, I try to
explain the differences between UNION EXCEPT and INTERSECT Operators in SQL
Server with some examples. I hope this article will help you with your needs. I would like to have
your feedback. Please post your feedback, questions, or comments about this article.
Structured Query Language (SQL) is a programming language. SQL is used to manage data stored
in a relational database. SQL has the ability of nest queries. A nested query is a query within another
query. Nested query allows for more complex and specific data retrieval. In this article, we will
discuss nested queries in SQL, their syntax, and examples.
Nested Query
In SQL, a nested query involves a query that is placed within another query. Output of the inner
query is used by the outer query. A nested query has two SELECT statements: one for the inner
query and another for the outer query.
Syntax of Nested Queries
The basic syntax of a nested query involves placing one query inside of another query. Inner query
or subquery is executed first and returns a set of values that are then used by the outer query. The
syntax for a nested query is as follows:
Correlated subqueries are executed once for each row of the outer query. They use values from the
outer query to return results.
IN Operator
This operator checks if a column value in the outer query's result is present in the inner query's
result. The final result will have rows that satisfy the IN condition.
NOT IN Operator
This operator checks if a column value in the outer query's result is not present in the inner query's
result. The final result will have rows that satisfy the NOT IN condition.
ALL Operator
This operator compares a value of the outer query's result with all the values of the inner query's
result and returns the row if it matches all the values.
ANY Operator
This operator compares a value of the outer query's result with all the inner query's result values
and returns the row if there is a match with any value.
EXISTS Operator
This operator checks whether a subquery returns any row. If it returns at least one row. EXISTS
operator returns true, and the outer query continues to execute. If the subquery returns no row, the
EXISTS operator returns false, and the outer query stops execution.
This operator checks whether a subquery returns no rows. If the subquery returns no row, the NOT
EXISTS operator returns true, and the outer query continues to execute. If the subquery returns at
least one row, the NOT EXISTS operator returns false, and the outer query stops execution.
ANY Operator
This operator compares a value of the outer query's result with one or more values returned by the
inner query. If the comparison is true for any one of the values returned by the inner query, the row
is included in the final result.
ALL Operator
This operator compares a value of the outer query's result with all the values returned by the inner
query. Only if the comparison is true for all the values returned by the inner query, the row is
included in the final result.
These operators are used to create co-related nested queries that depend on values from the outer
query for execution.
Examples
1 John 1
2 Mary 2
3 Bob 1
4 Alice 3
5 Tom 1
dept_id dept_name
1 Sales
2 Marketing
3 Finance
1 1 1000
2 2 2000
3 3 3000
4 1 4000
5 5 5000
6 3 6000
7 2 7000
SELECT emp_name
FROM employees
WHERE dept_id IN (SELECT dept_id
FROM departments
WHERE dept_name = 'Sales');
Output
emp_name
John
Bob
Tom
Example 2: Find the names of all employees who have made a sale
Required query
SELECT emp_name
FROM employees
WHERE EXISTS (SELECT emp_id
FROM sales
WHERE employees.emp_id = sales.emp_id);
Output
emp_name
John
Mary
Bob
Alice
Tom
This query selects all employees from the "employees" table where there exists a sale record in the
"sales" table for that employee.
Example 3: Find the names of all employees who have made sales greater than $1000.
Required query
SELECT emp_name
FROM employees
WHERE emp_id = ALL (SELECT emp_id
FROM sales
WHERE sale_amt > 1000);
Output
emp_name
John
Mary
Bob
Alice
Tom
This query selects all employees from the "employees" table. With the condition that where their
emp_id equals all the emp_ids in the "sales" table where the sale amount is greater than $1000.
Since all employees have made a sale greater than $1000, all employee names are returned.
Aggregate functions
SQL Aggregate functions are functions where the values of multiple rows are
grouped as input on certain criteria to form a single value result of more significant
meaning.
It is used to summarize data, by combining multiple values to form a single result.
SQL Aggregate functions are mostly used with the GROUP BY clause of the
SELECT statement.
Various Aggregate Functions
1. Count()
2. Sum()
3. Avg()
4. Min()
5. Max()
Aggregate Functions in SQL
Below is the list of SQL aggregate functions, with examples
Count():
• Count(*): Returns the total number of records .i.e 6.
• Count(salary): Return the number of Non-Null values over the column
salary. i.e 5.
• Count(Distinct Salary): Return the number of distinct Non-Null values
over the column salary .i.e 5.
Sum():
• sum(salary): Sum all Non-Null values of Column salary i.e., 310
• sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.
Avg():
• Avg(salary) = Sum(salary) / count(salary) = 310/5
• Avg(Distinct salary) = sum(Distinct salary) / Count(Distinct Salary) =
250/4
Min():
• Min(salary): Minimum value in the salary column except NULL i.e.,
40.
Max():
• Max(salary): Maximum value in the salary i.e., 80.
Demo SQL Database
In this tutorial on aggregate functions, we will use the following table for
examples:
Id Name Salary
1 A 802
2 B 403
3 C 604
4 D 705
5 E 606
Id Name Salary
6 F NULL
You can also create this table on your system, by writing the following queries:
MySQL
Simple integrity constraints include primary keys, unique constraints, and foreign keys. These
constraints ensure that data in a table is unique and consistent, and they prevent certain
operations that would violate these rules.
On the other hand, complex integrity constraints are more advanced rules that cannot be
expressed using simple constraints. They are typically used to enforce business rules, such as
limiting the range of values that can be entered into a column or ensuring that certain
combinations of values are present in a table.
There are several ways to implement complex integrity constraints in SQL, including:
1. Check Constraints: Check constraints are used to restrict the range of values that can be
entered into a column. For example, a check constraint can be used to ensure that a date
column only contains dates in a certain range.
2. Assertions: Assertions are used to specify complex rules that cannot be expressed using
simple constraints. They are typically used to enforce business rules, such as ensuring that a
customer's age is greater than a certain value.
In this example, we are creating a table called "Customer" with columns for ID, Name, Age,
and Gender. We have added three constraints: one to ensure that the age is greater than or
equal to 18, another to ensure that the gender is either "M" or "F", and a third to ensure that if
the gender is "F", the age is greater than or equal to 21.
3. Triggers: Triggers are special procedures that are executed automatically in response to
certain events, such as an insert or update operation on a table. Triggers can be used to
implement complex business rules that cannot be expressed using simple constraints.
In this example, we are creating a table called "Order" with columns for ID, CustomerID,
OrderDate, and Amount. We have added a foreign key constraint to ensure that the
CustomerID in the Order table matches the ID in the Customer table. We have also created a
trigger called "TR_Order_Check_Amount" that will be executed before an insert operation
on the Order table. The trigger checks if the amount being inserted is greater than zero. If it is
not, an error message will be generated and the insert operation will be cancelled.
These are just a few examples of how complex integrity constraints can be implemented in
SQL to ensure the accuracy and consistency of data in a database.
Overall, complex integrity constraints are essential for ensuring the integrity and consistency
of data in a database. They allow developers to enforce complex business rules and prevent
data inconsistencies, ensuring that the data in the database remains accurate and trustworthy
Schema Refinement
The Problem of Redundancy in Database
•
Redundancy means having multiple copies of the same data in the database. This
problem arises when a database is not normalized. Suppose a table of student details
attributes is: student ID, student name, college name, college rank, and course opted.
Student_I Name Contact College Course Rank
D
It can be observed that values of attribute college name, college rank, and course are
being repeated which can lead to problems. Problems caused due to redundancy are:
• Insertion anomaly
• Deletion anomaly
• Updation anomaly
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then
insertion will not be possible till the time course is decided for the student.
Student_ID Name Contact College Course Rank
This problem happens when the insertion of a data record is not possible without
adding some additional unrelated data to the record.
Deletion Anomaly
If the details of students in this table are deleted then the details of the college will
also get deleted which should not occur by common sense. This anomaly happens
when the deletion of a data record results in losing some unrelated information that
was stored as part of the record that was deleted from a table.
It is not possible to delete some information without losing some other information
in the table as well.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the
database which will be time-consuming and computationally costly.
Student_ID Name Contact College Course Rank
All places should be updated, If updation does not occur at all places then the
database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in multiple places.
Redundancy can cause various problems such as data inconsistencies, higher storage
requirements, and slower data retrieval.
Problems Caused Due to Redundancy
• Data Inconsistency: Redundancy can lead to data inconsistencies, where
the same data is stored in multiple locations, and changes to one copy of
the data are not reflected in the other copies. This can result in incorrect
data being used in decision-making processes and can lead to errors and
inconsistencies in the data.
• Storage Requirements: Redundancy increases the storage requirements
of a database. If the same data is stored in multiple places, more storage
space is required to store the data. This can lead to higher costs and
slower data retrieval.
• Update Anomalies: Redundancy can lead to update anomalies, where
changes made to one copy of the data are not reflected in the other
copies. This can result in incorrect data being used in decision-making
processes and can lead to errors and inconsistencies in the data.
• Performance Issues: Redundancy can also lead to performance issues,
as the database must spend more time updating multiple copies of the
same data. This can lead to slower data retrieval and slower overall
performance of the database.
• Security Issues: Redundancy can also create security issues, as multiple
copies of the same data can be accessed and manipulated by
unauthorized users. This can lead to data breaches and compromise
the confidentiality, integrity, and availability of the data.
• Maintenance Complexity: Redundancy can increase the complexity of
database maintenance, as multiple copies of the same data must be
updated and synchronized. This can make it more difficult to
troubleshoot and resolve issues and can require more time and resources
to maintain the database.
• Data Duplication: Redundancy can lead to data duplication, where the
same data is stored in multiple locations, resulting in wasted storage
space and increased maintenance complexity. This can also lead to
confusion and errors, as different copies of the data may have different
values or be out of sync.
• Data Integrity: Redundancy can also compromise data integrity, as
changes made to one copy of the data may not be reflected in the other
copies. This can result in inconsistencies and errors and can make it
difficult to ensure that the data is accurate and up-to-date.
• Usability Issues: Redundancy can also create usability issues, as users
may have difficulty accessing the correct version of the data or may be
confused by inconsistencies and errors. This can lead to frustration and
decreased productivity, as users spend more time searching for the
correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can be used.
Normalization is the process of organizing data in a database to eliminate
redundancy and improve data integrity. Normalization involves breaking down a
larger table into smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and reliable.
Advantages of Redundant Data
• Enhanced Query Performance: By eliminating the need for intricate
joins, redundancy helps expedite data retrieval.
• Offline Access: In offline circumstances, redundant copies allow data
access even in the absence of continuous connectivity.
• Increased Availability: Redundancy helps to increase fault tolerance,
which makes data accessible even in the event of server failures.
Disadvantages of Redundant Data
• Increased storage requirements: Redundant data takes up additional
storage space within the database, which can increase costs and slow
down performance.
• Inconsistency: If the same data is stored in multiple places within the
database, there is a risk that updates or changes made to one copy of the
data may not be reflected in other copies, leading to inconsistency and
potentially incorrect results.
• Difficulty in maintenance: With redundant data, it becomes more
difficult to maintain the accuracy and consistency of the data. It requires
more effort and resources to ensure that all copies of the data are updated
correctly.
• Increased risk of errors: When data is redundant, there is a greater risk
of errors in the database. For example, if the same data is stored in
multiple tables, there is a risk of inconsistencies between the tables.
• Reduced flexibility: Redundancy can reduce the flexibility of the
database. For example, if a change needs to be made to a particular piece
of data, it may need to be updated in multiple places, which can be time-
consuming and error-prone.
Conclusion
In databases, data redundancy is a prevalent issue. It can cause a number of problems
, such as inconsistent data, wasted storage space, decreased database
performance, and increased security risk.
The most effective technique to reduce redundancy is to normalize the database. The
use of views materialized views, and foreign keys are additional techniques to reduce
redundancy.
Decomposition In DBMS
Decomposition refers to the division of tables into multiple tables to produce
consistency in the data. In this article, we will learn about the Database concept.
This article is related to the concept of Decomposition in DBMS. It explains the
definition of Decomposition, types of Decomposition in DBMS, and its properties.
What is Decomposition in DBMS?
When we divide a table into multiple tables or divide a relation into multiple
relations, then this process is termed Decomposition in DBMS. We perform
decomposition in DBMS when we want to process a particular data set. It is
performed in a database management system when we need to ensure consistency
and remove anomalies and duplicate data present in the database. When we
perform decomposition in DBMS, we must try to ensure that no information or
data is lost.
Decomposition in DBMS
Types of Decomposition
There are two types of Decomposition:
• Lossless Decomposition
• Lossy Decomposition
Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of
joins from the multiple relations formed after decomposition. This process is
termed as lossless decomposition. It is used to remove the redundant data from the
database while retaining the useful information. The lossless decomposition tries
to ensure following things:
• While regaining the original relation, no information should be lost.
• If we perform join operation on the sub-divided relations, we must get
the original relation.
Example:
There is a relation called R(A, B, C)
A B C
55 16 27
48 52 89
55 16
48 52
R2(B, C)
B C
16 27
52 89
After performing the Join operation we get the same original relation
A B C
55 16 27
48 52 89
Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation
on the sub-relations it doesn’t result to the same relation which was decomposed.
After the join operation, we always found some extraneous tuples. These extra
tuples genrates difficulty for the user to identify the original tuples.
Example:
We have a relation R(A, B, C)
A B C
1 2 1
2 5 3
3 3 3
1 2
2 5
3 3
R2(B, C)
B C
2 1
5 3
3 3
1 2 1
2 5 3
2 3 3
A B C
3 5 3
3 3 3
Properties of Decomposition
• Lossless: All the decomposition that we perform in Database
management system should be lossless. All the information should not
be lost while performing the join on the sub-relation to get back the
original relation. It helps to remove the redundant data from the
database.
• Dependency Preservation: Dependency Preservation is an important
technique in database management system. It ensures that the functional
dependencies between the entities is maintained while performing
decomposition. It helps to improve the database efficiency, maintain
consistency and integrity.
• Lack of Data Redundancy: Data Redundancy is generally termed as
duplicate data or repeated data. This property states that the
decomposition performed should not suffer redundant data. It will help
us to get rid of unwanted data and focus only on the useful data or
information.
Conclusion
Decomposition is an important term in Database Management system. It refers to
the method of splitting the realtions into multiple relation so that database
operations are performed efficiently. There are two types of decomposition one is
lossless and the other is lossy decomposition. The properties of Decomposition
helps us to maintain consistency, reduce redundant data and remove the anomalies.
1. Loss of Information
• Non-loss decomposition: When a relation is decomposed into two or more smaller relations,
and the original relation can be perfectly reconstructed by taking the natural join of the
decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy
decomposition."
• Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it
into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't recreate the original table
using natural joins.
Example: Consider a relation R(A,B,C) with the following data:
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get back the original relation
R. Therefore, this is a lossless decomposition.
3. Increased Complexity
• Decomposition leads to an increase in the number of tables, which can complicate queries and
maintenance tasks. While tools and ORM (Object-Relational Mapping) libraries can mitigate
this to some extent, it still adds complexity.
4. Redundancy
• Incorrect decomposition might not eliminate redundancy, and in some cases, can even introduce
new redundancies.
5. Performance Overhead
• An increased number of tables, while aiding normalization, can also lead to more complex SQL
queries involving multiple joins, which can introduce performance overheads.
Best Practices
1. Ensure decomposition is non-lossy. After decomposition, it should be possible to recreate the
original data using natural joins.
2. Preserve functional dependencies to enforce integrity constraints.
3. Strike a balance. While normalization and decomposition are essential, in some scenarios (like
reporting databases), a certain level of denormalization might be preferred for performance
reasons.
4. Regularly review and optimize the database design, especially as the application's
requirements evolve.
1. Trivial Dependency
- If Y is a subset of X, then the dependency X -> Y is trivial.
- For example, in {A, B} -> {A}, the dependency is trivial because A is part of {A, B}.
Example: For attributes A,B,C:
3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.
Example: Consider a relation Employees with the following attributes:
4. Closure
- The closure of a set of attributes X with respect to a set of functional dependencies FD,
denoted as X+, is the set of attributes that are functionally determined by X.
- For example, given FDs: {A -> B, B -> C}, the closure of {A}, denoted as A+, would be {A,
B, C}.
• Each attribute (column) contains only atomic (indivisible) values. This means values in each
column are indivisible units and there should be no sets, arrays, or lists.
• For example, a column called "Phone Numbers" shouldn't contain multiple phone numbers for
a single record. Instead, you'd typically break it into additional rows or another related table.
2. Primary Key:
• Each table should have a primary key that uniquely identifies each row. This ensures that each
row in the table can be uniquely identified.
3. No Duplicate Rows:
• There shouldn’t be any duplicate rows in the table. This is often ensured by the use of the
primary key.
4. Order Doesn't Matter:
• The order in which data is stored doesn't matter in the context of 1NF (or any of the normal
forms). Relational databases don't guarantee an order for rows in a table unless explicitly sorted.
5. Single Valued Attributes:
• Columns should not contain multiple values of the same type. For example, a column "Skills"
shouldn't contain a list like "Java, Python, C++" for a single record. Instead, these skills should
be split across multiple rows or placed in a separate related table.
| Student_ID | Subjects |
|------------|-------------------|
|1 | Math, English |
|2 | English, Science |
|3 | Math, History |
The table above is not in 1NF because the "Subjects" column contains multiple values.
To transform it to 1NF:
| Student_ID | Subject |
|------------|-----------|
|1 | Math |
|1 | English |
|2 | English |
|2 | Science |
|3 | Math |
|3 | History |
Now, each combination of "Student_ID" and "Subject" is unique, and every attribute contains
only atomic values, ensuring the table is in 1NF.
Achieving 1NF is a fundamental step in database normalization, laying the foundation for
further normalization processes to eliminate redundancy and ensure data integrity.
• This means the relation contains only atomic values, there are no duplicate rows, and it has a
primary key.
2. No Partial Dependencies:
• All non-key attributes (i.e., columns that aren't part of the primary key) should be functionally
dependent on the *entire* primary key. This rule is especially relevant for tables with composite
primary keys (i.e., primary keys made up of more than one column).
• In simpler terms, no column should depend on just a part of the composite primary key.
| Student_ID | Course_ID |
|------------|-----------|
|1 | C1 |
|1 | C2 |
|2 | C1 |
|3 | C3 |
Course Table
Now, the `StudentCourse` table relates students to courses, and the `Course` table holds
information about each course. There are no more partial dependencies.
It's worth noting that while 2NF does improve the structure of our database by reducing
redundancy and eliminating partial dependencies, it might not eliminate all anomalies or
redundancy. Further normalization forms (like 3NF and BCNF) address additional types of
dependencies and potential issues.
• This means the relation has no partial dependencies of non-key attributes on the primary key.
2. No Transitive Dependencies:
• All non-key attributes are functionally dependent only on the primary key and not on any other
non-key attributes. If there is a dependency of one non-key attribute on another non-key
attribute, it is called a transitive dependency, and such a dependency violates 3NF.
•
Simply put, in 3NF, non-key attributes should not depend on other non-key attributes; they
should only depend on the primary key.
In the table above, `Product_ID` is the primary key. We can see that `Vendor_Address`
depends on `Vendor_Name` rather than `Product_ID`, which represents a transitive
dependency.
To convert this table to 3NF, we can split it into two tables:
Product Table
Vendor Table
| Vendor_Name | Vendor_Address |
|-------------|-----------------|
| TechCorp | 123 Tech St. |
| FurniShop | 456 Furni Rd. |
Now, the `Product` table has `Product_ID` as the primary key, and all attributes in this table
depend only on the primary key. The `Vendor` table has `Vendor_Name` as its primary key,
and the address in this table depends only on the vendor name.
This normalization eliminates the transitive dependency and reduces redundancy. If we need
to change a vendor's address, we now only have to make the change in one place in the `Vendor`
table.
To further refine the database structure, we might proceed to other normalization forms like
BCNF, but 3NF is often sufficient for many practical applications and strikes a good balance
between minimizing redundancy and maintaining a manageable schema.
Here:
| Student | Professor |
|---------|-----------|
| Alice | Mr. A |
| Bob | Mr. B |
| Charlie | Mr. C |
ProfessorTopic Table:
| Professor | Topic |
|-----------|--------|
| Mr. A | Math |
| Mr. B | Math |
| Mr. C | Physics|
This decomposition eliminates the partial dependency and ensures that the only determinants
are superkeys, making the structure adhere to BCNF.
In practice, BCNF is a highly normalized form, and while it can minimize redundancy, it can
also increase the complexity of the database design. Designers often have to make trade-offs
between achieving higher normal forms and maintaining simplicity, depending on the specific
use case and requirements of the system.
The Fourth Normal Form (4NF) is an advanced level in the normalization process, aiming to
handle certain types of anomalies which aren't addressed by the Third Normal Form (3NF).
Specifically, 4NF addresses multi-valued dependencies.
A relation is in 4NF if:
1. It is already in 3NF.
2. No multi-valued dependencies exist. A multi-valued dependency occurs when an attribute
depends on another attribute but not on the primary key.
To clarify, consider a relation R with attributes X, Y, and Z. We say that there is a multi-valued
dependency from X to Y, denoted \(X \twoheadrightarrow Y\), if for a single value of X, there
are multiple values of Y associated with it, independent of Z.
In the table:
• For student `S1`, there are two hobbies (`Painting` and `Hiking`) and two courses (`Math` and
`Physics`), resulting in a combination of every hobby with every course.
• This design suggests a multi-valued dependency between `Student_ID` and `Hobby`, and also
between `Student_ID` and `Course`.
To bring the table to 4NF, we can decompose it into two separate tables:
StudentHobbies Table:
| Student_ID | Hobby |
|------------|------------|
| S1 | Painting |
| S1 | Hiking |
| S2 | Reading |
StudentCourses Table:
| Student_ID | Course |
|------------|------------|
| S1 | Math |
| S1 | Physics |
| S2 | Chemistry |
| S2 | Biology |
The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a further
step in the normalization process. It aims to address redundancy arising from certain types of
join dependencies that aren't covered by earlier normal forms.
A relation is in 5NF or PJNF if:
1. It is already in BCNF.
2. Every non-trivial join dependency in the relation is implied by the candidate keys.
A join dependency occurs in a relation R when it is always possible to reconstruct R by joining
multiple projections of R. A join dependency is represented as {R1, R2, ..., Rn} ⟶ R, which
means that when R is decomposed into R1, R2, ..., Rn, the natural join of these projections
results in the original relation R.
The join dependency is non-trivial if none of the projections Ri is equal to R.
1. Every part supplied for a project is supplied by all suppliers supplying any part for that project.
2. Every part supplied by a supplier is supplied by that supplier for all projects to which that
supplier supplies any part.
Given the above constraints, the following join dependencies exist on the table:
| Supplier | Part |
|----------|-------|
| S1 | P1 |
| S1 | P2 |
| S2 | P2 |
SupplierProjects:
| Supplier | Project |
|----------|---------|
| S1 | J1 |
| S1 | J2 |
| S2 | J2 |
PartsProjects:
| Part | Project |
|-------|---------|
| P1 | J1 |
| P2 | J1 |
| P1 | J2 |
| P2 | J2 |
Now, these decomposed tables eliminate the redundancy caused by the specific constraints and
join dependencies of the original relation. When you take the natural join of these tables, you
will get back the original table.
It's worth noting that reaching 5NF can lead to an increased number of tables, which can
complicate queries and database operations. Thus, achieving 5NF should be a conscious
decision made based on the specific requirements and constraints of a given application.
In the above table, each row specifies the salary of an employee for a specific time interval. As
you can imagine, updates (like giving a raise) could become complicated and might require
adjustments in the `ValidTo` and `ValidFrom` columns, especially if you have multiple date
ranges.
To bring this into 6NF, you could decompose the table into separate relations, one capturing
the essence of the entity (e.g., the employee and some constant attributes) and others capturing
the temporal aspects.
Employee:
| EmployeeID | OtherConstantAttributes |
|------------|-------------------------|
| E1 | ... |
| E2 | ... |
EmployeeSalaryHistory:
By segregating the time-variant data in its own table, operations related to time-bound
attributes become more efficient and clearer. This structure makes it easier to handle and query
temporal data.
In practice, 6NF is specialized, and its application is restricted to systems that demand intricate
temporal data management. Also, while 6NF facilitates the handling of temporal data, it can
introduce complexity in the form of multiple tables, which might require complex joins during
querying.