0% found this document useful (0 votes)
99 views69 pages

Dbms Notes 1

The relational model organizes data into tables (relations) with rows and columns. This provides an efficient and flexible way to store and access structured data. Data is stored in relations (tables) with attributes (columns) representing properties and tuples (rows) representing records. Keys are used to uniquely identify rows. Constraints ensure data accuracy. Anomalies like insertion, deletion, and modification anomalies can occur due to redundancy and are addressed by normalization. The relational model is widely used due to advantages like simplicity, manageability, and flexibility.

Uploaded by

life hacker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views69 pages

Dbms Notes 1

The relational model organizes data into tables (relations) with rows and columns. This provides an efficient and flexible way to store and access structured data. Data is stored in relations (tables) with attributes (columns) representing properties and tuples (rows) representing records. Keys are used to uniquely identify rows. Constraints ensure data accuracy. Anomalies like insertion, deletion, and modification anomalies can occur due to redundancy and are addressed by normalization. The relational model is widely used due to advantages like simplicity, manageability, and flexibility.

Uploaded by

life hacker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Relational Model in DBMS

The relational model for database management is an approach to logically


represent and manage the data stored in a database. In this model, the data is
organized into a collection of two-dimensional inter-related tables, also
known as relations. Each relation is a collection of columns and rows, where
the column represents the attributes of an entity and the rows (or tuples)
represents the records.
The use of tables to store the data provided a straightforward, efficient, and
flexible way to store and access structured information. Because of this
simplicity, this data model provides easy data sorting and data access. Hence, it
is used widely around the world for data storage and processing.

Let's look at a scenario to understand the relational model:

Consider a case where you wish to store the name, the CGPA attained, and the roll number
of all the students of a particular class. This structured data can be easily stored in a table
as described below:
As we can notice from the above relation:

 Any given row of the relation indicates a student i.e., the row of the table describes
a real-world entity.
 The columns of the table indicate the attributes related to the entity. In this case,
the roll number, CGPA, and the name of the student.

NOTE: A database implemented and organized in terms of the relational model is known as
a relational database management system (RDBMS). Hence, the relational model describes
how data is stored in relational databases.

Highlights:

1. Relational Model stores the data into tables (relations).


2. It makes data sorting and data access easier.
3. Provides a standard way to organize data in databases.

Relational Model Concepts

As discussed earlier, a relational database is based on the relational model. This database
consists of various components based on the relational model. These include:
 Relation : Two-dimensional table used to store a collection of data elements.
 Tuple : Row of the relation, depicting a real-world entity.
 Attribute/Field : Column of the relation, depicting properties that define the relation.
 Attribute Domain : Set of pre-defined atomic values that an attribute can take i.e., it
describes the legal values that an attribute can take.
 Degree : It is the total number of attributes present in the relation.
 Cardinality : It specifies the number of entities involved in the relation i.e., it is the total
number of rows present in the relation.
 Relational Schema : It is the logical blueprint of the relation i.e., it describes the design and
the structure of the relation. It contains the table name, its attributes, and their types:

TABLE_NAME(ATTRIBUTE_1 TYPE_1, ATTRIBUTE_2 TYPE_2, ...)

For our Student relation example, the relational schema will be:

STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)

 Relational Instance : It is the collection of records present in the relation at a given time.
 Relation Key : It is an attribute or a group of attributes that can be used to uniquely identify
an entity in a table or to determine the relationship between two tables. Relation keys can
be of 6 different types:
1. Candidate Key
2. Super Key
3. Composite Key
4. Primary Key
5. Alternate Key
6. Foreign Key

Highlights:

1. A Relation is a collection of rows (tuples) and columns (attributes).


2. In a relation, the tuples depicts real-world entity, while the attributes are the properties that
define the relation.
3. Structure of the relation is described by the relational schema.
4. Relational keys are used to uniquely identify a row in a table or to determine the
relationship between two tables.

Constraints in Relational Model

Relational models make use of some rules to ensure the accuracy and accessibility of the
data. These rules or constraints are known as Relational Integrity Constraints. These
constraints are checked before performing any operation like insertion, deletion, or updation
on the data present in a relational database. These constraints include:

 Domain Constraint : It specifies that every attribute is bound to have a value that lies inside
a specific range of values. It is implemented with the help of the Attribute Domain concept.
 Key Constraint : It states that every relation must contain an attribute or a set of attributes
(Primary Key) that can uniquely identify a tuple in that relation. This key can never be NULL
or contain the same value for two different tuples.
 Referential Integrity Constraint : It is defined between two inter-related tables. It states that
if a given relation refers to a key attribute of a different or same table, then that key must
exist in the given relation.

Highlights:

1. To ensure data accuracy and accessibility, Relational Integrity Constraints are implemented.
2. It includes domain, key, and referential integrity constraints.

Anomalies in Relational Model

When we notice any unexpected behavior while working with the relational databases, there
may be a presence of too much redundancy in the data stored in the database. This can cause
anomalies in the DBMS and it can be of various types such as:

 Insertion Anomalies: It is the inability to insert data in the database due to the
absence of other data. For example: Suppose we are dividing the whole class into
groups for a project and the GroupNumber attribute is defined so that null values are
not allowed. If a new student is admitted to the class but not immediately assigned to
a group then this student can't be inserted into the database.
 Deletion Anomalies - It is the accidental loss of data in the database upon deletion of
any other data element. For example: Suppose, we have an employee relation that
contains the details of the employee along with the department they are working in.
Now, if a department has one employee working in it and we remove the information
of this employee from the table, there will be the loss of data related to the department
also. This can lead to data inconsistency.
 Modification/Update Anomalies - It is the data inconsistency that arises from data
redundancy and partial updation of data in the database. For example: Suppose,
while updating the data into the database duplicate entries were entered. Now, if the
user does not realize that the data is stored redundantly after updation, there will be
data inconsistency in the database.

All these anomalies can lead to unexpected behavior and inconvenience for the user. These
anomalies can be removed with the help of a process known as normalization.

Highlights:

 Any unexpected behavior in a relational database can be caused by an anomaly.


 Anomaly occurs mainly due to the presence of data redundancy in the database.
 Anomalies are of 3 types i.e., Insertion, Updation, and Deletion anomaly.

Codd Rules in DBMS

Edgar F. Codd, the creator of the relational model proposed 13 rules known as Codd
Rules that states:

For a database to be considered as a perfect relational database, it must follow the following
rules:

1. Foundation Rule - The database must be able to manage data in relational form.
2. Information Rule - All data stored in the database must exist as a value of some table cell.
3. Guaranteed Access Rule - Every unique data element should be accessible by only a
combination of the table name, primary key value, and the column name.
4. Systematic Treatment of NULL values - Database must support NULL values.
5. Active Online Catalog - The organization of the database must exist in an online catalog that
can be queried by authorized users.
6. Comprehensive Data Sub-Language Rule - Database must support at least one language
that supports: data definition, view definition, data manipulation, integrity constraints,
authorization, and transaction boundaries.
7. View Updating Rule - All views should be theoretically and practically updatable by the
system.
8. Relational Level Operation Rule - The database must support high-level insertion, updation,
and deletion operations.
9. Physical Data Independence Rule - Data stored in the database must be independent of the
applications that can access it i.e., the data stored in the database must not depend on any
other data or an application.
10. Logical Data Independence Rule - Any change in the logical representation of the data
(structure of the tables) must not affect the user's view.
11. Integrity independence - Changing the integrity constraints at the database level should not
reflect any change at the application level.
12. Distribution independence - The database must work properly even if the data is stored in
multiple locations or is being used by multiple end-users.
13. Non-subversion Rule - Accessing the data by low-level relational language should not be
able to bypass the integrity rules and constraints expressed in the high-level relational
language.

Highlights:

1. Codd Rules are 13 sets of constraints that a perfect relational database must follow.
2. Codd Rules were introduced by Edgar F. Codd to resolve the database standardization
problem.

Advantages of using the relational model

The advantages and reasons due to which the relational model in DBMS is widely accepted
as a standard are:

 Simple and Easy To Use - Storing data in tables is much easier to understand and implement
as compared to other storage techniques.
 Manageability - Because of the independent nature of each relation in a relational database,
it is easy to manipulate and manage. This improves the performance of the database.
 Query capability - With the introduction of relational algebra, relational databases provide
easy access to data via high-level query language like SQL.
 Data integrity - With the introduction and implementation of relational constraints, the
relational model can maintain data integrity in the database.

Highlights:

 Relational databases are simple to use, easy to manage, provide data integrity, and are
query capable.
 All the advantages of relational databases are because of the use of tables and constraints.
Disadvantages of using the relational model

The main disadvantages of relational model in DBMS occur while dealing with a huge
amount of data as:

 The performance of the relational model depends upon the number of relations present in
the database.
 Hence, as the number of tables increases, the requirement of physical memory increases.
 The structure becomes complex and there is a decrease in the response time for the queries.
 Because of all these factors, the cost of implementing a relational database increase.

Highlights:

1. Relational databases work perfectly well for a limited number of relations.


2. Increasing the amount of data can lead to performance and storage issues with relational
databases.

Conclusion

 Relational model in DBMS is an approach to logically represent and manage the data stored
in a database by storing data in tables.
 Relations, Attributes and Tuples, Degree and Cardinality, Relational Schema and Relation
instance, and Relation Keys are some important components of the Relational Model.
 To maintain data integrity constraints such as domain, key, and referential integrity are
implemented in the relational model.
 Presence of redundancy in data can lead to insertion, deletion, and updation anomalies in a
relational database.
 A perfect relational database follows and implements all the 13 Codd Rules.
 Because of the use of tables and constraints, relational models are simple to use, easy to
manage, provide data integrity, and are query capable.
 Increasing the amount of data can lead to performance and storage issues with relational
databases.
Read More:

 Keys is DBMS.
 Relational Calculus in DBMS.
 Schema in DBMS.
 Data Independence in DBMS.

Keys in DBMS are just that.


A key is an attribute or a set of attributes that can uniquely identify an entire row in a
database.

In simple databases, one attribute is enough to uniquely identify all rows. However, that’s not
always the case. Consider the following example,

There are 3 sections (A, B, C) in the 5th grade of a certain school. Based on the alphabetic
order, the students are given a roll number, which uniquely identifies them in a classroom.

However, since there are 3 such sections, there will be many students who have the same roll
number but are in different sections.

In this case, we can uniquely identify a student using their section and roll numbers together.
For example, (5B, 12) represents exactly one student, who is in 5B and has the roll number
12. To know more about Keys in DBMS check out this article.

Various Types of Keys in Database Management Systems

There are various classes of keys in DBMS. Some of them are:

1) Super Keys

A Super Key is essentially just a key, i.e. it can uniquely identify all the attributes in a
database.

2) Composite Keys

A Composite Key is a key that contains more than one attribute. In the student table
mentioned above, the key – (Section, Roll Number) is a Composite Key. This key can contain
any number of attributes (greater than 1). Trivially, the key involving all the columns in the
table is the largest Composite Key possible.

3) Candidate Keys

A Candidate Key is a key that contains the least possible attributes, and that maintains the
criteria that it can uniquely identify any table row. Again, in the student table mentioned
above, Roll Number cannot be a candidate key, since it cannot identify a student across 5th
grade. Similarly, the key (Section, Roll Number, Name) cannot be a candidate key since we
can make do with (Section, Roll Number) as a key, which has 1 less attribute.

Therefore, the key (Section, Roll Number) is a candidate key.

4) Primary Key

One candidate key from the set of all possible candidate keys is chosen to be the primary key.
This primary key is used to identify rows once decided, which reduces the complexity of data
retrieval since we would rely on only 1 key for most queries. A primary key cannot have null
values for obvious reasons.
5) Alternate Keys

After a primary key is chosen from the set of candidate keys, the leftover keys are called
Alternate Keys.

6) Foreign Key

A Foreign Key in table X is a primary key in another table Y, which is used to identify the
rows in table Y from the point of view of table X.

For example, if each college student had a proctor, we could put the details of the proctor on
the student table itself. But, since many students can have the same proctor, doing so will
result in redundant data. To eliminate this redundancy, we can create a separate proctor table
and mention the proctor’s id for each student in the student table.

In this scenario, proctor_Id is the foreign key in the Student table and is used to cross-
reference the proctor’s details.

What is Super Key in DBMS?

Since a super key is just a key that can uniquely identify a set of attributes, all candidate keys
come under the bracket of super keys. The set of all Super Keys is a superset of all Candidate
Keys.

Essentially, the super keys with the least number of attributes form the candidate keys. With
this, we can create all super keys by just pairing the candidate keys with other table columns.

Super keys in DBMS are important since they’re the starting point of keys, normal forms, and
more!

Importance of Super Key in DBMS

The main purpose of a super key is just to identify rows in the table. In many cases, you can't
identify a table with any random column, since a column with duplicates will not be able to
identify a unique row.

Super Keys remove this ambiguity and make data retrieval easy.

Super Keys vs. Candidate Keys

Consider the following example,


In this table, we can't identify rows uniquely by section or roll number alone. However, we
can club them together to form the key - (section, roll number), which can identify all rows
uniquely. Therefore, (section, roll number) is a super key.

Interestingly, we can't identify any row with less than 1 column, which makes (section, roll
number) a Candidate Key.

We can do the same with the key (first name, last name) for this table specifically, but it is
possible for 2 people to have the same first and last names, which is why it is avoided.

We can create other super keys using the candidate keys. For example, the key (section, roll
number, first name) is also a Super Key. It is not a candidate key, since it doesn't use the least
number of columns required to uniquely identify a row.
Conclusion

 Super key in DBMS is the key that can uniquely identify any row in a database.
 Candidate Keys are super keys with the least number of columns.
 We can generate the set of all super keys using the candidate keys as a base.
Unit-2
Relational Algebra

Relational Algebra is a procedural query language, which takes Relation as


input and generates relation as output. Relational algebra mainly provides a
theoretical foundation for relational databases and SQL.
Types of Relational Operations in DBMS

In Relational Algebra, we have two types of Operations.

Basic Operations

Derived Operations

Applying these operations over relations/tables will give us new relations as output.
Basic Operations

Six fundamental operations are mentioned below. The majority of data retrieval
operations are carried out by these. Let's know them one by one.

But, before moving into detail, let's have two tables or we can say
relations STUDENT(ROLL, NAME, AGE) and EMPLOYEE(EMPLOYEE_NO, NAME,
AGE) which will be used in the below examples.

STUDENT

ROLL NAME AGE

1 Aman 20

2 Atul 18

3 Baljeet 19

4 Harsh 20

5 Prateek 21

6 Prateek 23

EMPLOYEE
EMPLOYEE_NO NAME AGE

E-1 Anant 20

E-2 Ashish 23

E-3 Baljeet 25

E-4 Harsh 20

E-5 Pranav 22

Select (σ)

Select operation is done by Selection Operator which is represented by "sigma"(σ). It is


used to retrieve tuples(rows) from the table where the given condition is satisfied. It is
a unary operator means it requires only one operand.
Notation : σ p(R)
Where σ is used to represent SELECTION
R is used to represent RELATION
p is the logic formula
Let's understand this with an example:
Suppose we want the row(s) from STUDENT Relation where "AGE" is 20

σ AGE=20 (STUDENT)

This will return the following output:

ROLL NAME AGE


1 Aman 20
4 Harsh 20

Project (∏)

Project operation is done by Projection Operator which is represented by "pi"(∏). It is used


to retrieve certain attributes(columns) from the table. It is also known as vertical
partitioning as it separates the table vertically. It is also a unary operator.
Notation : ∏ a(r)
Where ∏ is used to represent PROJECTION
r is used to represent RELATION
a is the attribute list
Let's understand this with an example:
Suppose we want the names of all students from STUDENT Relation.

∏ NAME(STUDENT)
This will return the following output:

NAME
Aman
Atul
Baljeet
Harsh
Prateek

As you can see from the above output it eliminates duplicates.


For multiple attributes, we can separate them using a ",".

∏ ROLL,NAME(STUDENT)

Above code will return two columns, ROLL and NAME.

ROLL NAME
1 Aman
2 Atul
3 Baljeet
4 Harsh
5 Prateek
6 Prateek

Union (∪)

Union operation is done by Union Operator which is represented by "union"(∪). It is the


same as the union operator from set theory, i.e., it selects all tuples from both relations
but with the exception that for the union of two relations/tables both relations must
have the same set of Attributes. It is a binary operator as it requires two operands.
Notation: R ∪ S
Where R is the first relation
S is the second relation

If relations don't have the same set of attributes, then the union of such relations will
result in NULL.

Let's have an example to clarify the concept:


Suppose we want all the names from STUDENT and EMPLOYEE relation.

∏ NAME(STUDENT) ∪ ∏ NAME(EMPLOYEE)
NAME

Aman
NAME

Anant

Ashish

Atul

Baljeet

Harsh

Pranav

Prateek

As we can see from the above output it also eliminates duplicates.

Set Difference (-)

Set Difference as its name indicates is the difference between two relations (R-S). It is
denoted by a "Hyphen"(-) and it returns all the tuples(rows) which are in relation R but not
in relation S. It is also a binary operator.
Notation : R - S
Where R is the first relation
S is the second relation

Just like union, the set difference also comes with the exception of the same set of
attributes in both relations.

Let's take an example where we would like to know the names of students who are in
STUDENT Relation but not in EMPLOYEE Relation.

∏ NAME(STUDENT) - ∏ NAME(EMPLOYEE)

This will give us the following output:

NAME

Aman

Atul

Prateek

Cartesian product (X)

Cartesian product is denoted by the "X" symbol. Let's say we have two relations R and S.
Cartesian product will combine every tuple(row) from R with all the tuples from S. I know
it sounds complicated, but once we look at an example, you'll see what I mean.
Notation: R X S
Where R is the first relation
S is the second relation

As we can see from the notation it is also a binary operator.

Let's combine the two relations STUDENT and EMPLOYEE.

STUDENT X EMPLOYEE

ROLL NAME AGE EMPLOYEE_NO NAME AGE

1 Aman 20 E-1 Anant 20

1 Aman 20 E-2 Ashish 23

1 Aman 20 E-3 Baljeet 25

1 Aman 20 E-4 Harsh 20

1 Aman 20 E-5 Pranav 22

2 Atul 18 E-1 Anant 20

2 Atul 18 E-2 Ashish 23

2 Atul 18 E-3 Baljeet 25

2 Atul 18 E-4 Harsh 20

2 Atul 18 E-5 Pranav 22

. . . And so on.

Rename (ρ)

Rename operation is denoted by "Rho"(ρ). As its name suggests it is used to rename the
output relation. Rename operator too is a binary operator.
Notation: ρ(R,S)
Where R is the new relation name
S is the old relation name

Let's have an example to clarify this


Suppose we are fetching the names of students from STUDENT relation. We would like to
rename this relation as STUDENT_NAME.

ρ(STUDENT_NAME,∏ NAME(STUDENT))

STUDENT_NAME
NAME

Aman

Atul

Baljeet

Harsh

Prateek

As you can see, this output relation is named "STUDENT_NAME".

Takeaway

 Select (σ) is used to retrieve tuples(rows) based on certain conditions.


 Project (∏) is used to retrieve attributes(columns) from the relation.
 Union (∪) is used to retrieve all the tuples from two relations.
 Set Difference (-) is used to retrieve the tuples which are present in R but not in S(R-S).
 Cartesian product (X) is used to combine each tuple from the first relation with each tuple
from the second relation.
 Rename (ρ) is used to rename the output relation.

Derived Operations

Also known as extended operations, these operations can be derived from basic operations
and hence named Derived Operations. These include three operations: Join Operations,
Intersection operations, and Division operations.

Let's study them one by one.

Join Operations

Join Operation in DBMS are binary operations that allow us to combine two or more
relations.
They are further classified into two types: Inner Join, and Outer Join.

First, let's have two relations EMPLOYEE consisting


of E_NO, E_NAME, CITY and EXPERIENCE. EMPLOYEE table contains employee's
information such as id, name, city, and experience of employee(In Years). The other relation
is DEPARTMENT consisting of D_NO, D_NAME, E_NO and MIN_EXPERIENCE.

DEPARTMENT table defines the mapping of an employee to their department. It contains


Department Number, Department Name, Employee Id of the employee working in that
department, and the minimum experience required(In Years) to be in that department.

EMPLOYEE
E_NO E_NAME CITY EXPERIENCE
E-1 Ram Delhi 04
E-2 Varun Chandigarh 09
E-3 Ravi Noida 03
E-4 Amit Bangalore 07

DEPARTMENT

D_NO D_NAME E_NO MIN_EXPERIENCE


D-1 HR E-1 03
D-2 IT E-2 05
D-3 Marketing E-3 02

Also, let's have the Cartesian Product of the above two relations. It will be much easier to
understand Join Operations when we have the Cartesian Product.

E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_EXPERIENCE

E-1 Ram Delhi 04 D-1 HR E-1 03

E-1 Ram Delhi 04 D-2 IT E-2 05

E-1 Ram Delhi 04 D-3 Marketing E-3 02

E-2 Varun Chandigarh 09 D-1 HR E-1 03

E-2 Varun Chandigarh 09 D-2 IT E-2 05

E-2 Varun Chandigarh 09 D-3 Marketing E-3 02

E-3 Ravi Noida 03 D-1 HR E-1 03

E-3 Ravi Noida 03 D-2 IT E-2 05

E-3 Ravi Noida 03 D-3 Marketing E-3 02

E-4 Amit Bangalore 07 D-1 HR E-1 03

E-4 Amit Bangalore 07 D-2 IT E-2 05

E-4 Amit Bangalore 07 D-3 Marketing E-3 02

Inner Join

When we perform Inner Join, only those tuples returned that satisfy the certain condition.
It is also classified into three types: Theta Join, Equi Join and Natural Join.
Theta Join (θ)

Theta Join combines two relations using a condition. This condition is represented by the
symbol "theta"(θ). Here conditions can be inequality conditions such as >,<,>=,<=, etc.
Notation : R ⋈θ S
Where R is the first relation
S is the second relation

Let's have a simple example to understand this.

Suppose we want a relation where EXPERIENCE from EMPLOYEE >= MIN_EXPERIENCE


from DEPARTMENT.

EMPLOYEE⋈θ EMPLOYEE.EXPERIENCE>=DEPARTMENT.MIN_EXPERIENCE DEPARTMENT

E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_

E-1 Ram Delhi 04 D-1 HR E-1

E-1 Ram Delhi 04 D-3 Marketing E-3

E-2 Varun Chandigarh 09 D-1 HR E-1

E-2 Varun Chandigarh 09 D-2 IT E-2

E-2 Varun Chandigarh 09 D-3 Marketing E-3

E-3 Ravi Noida 03 D-1 HR E-1

E-3 Ravi Noida 03 D-3 Marketing E-3

E-4 Amit Bangalore 07 D-1 HR E-1

E-4 Amit Bangalore 07 D-2 IT E-2

E-4 Amit Bangalore 07 D-3 Marketing E-3

Check the Cartesian Product, if in any tuple/row EXPERIENCE >= MIN_EXPERIENCE then
insert this tuple/row in output relation.

Equi Join

Equi Join is a special case of theta join where the condition can only
contain **equality(=)** comparisons.
A non-equijoin is the inverse of an equi join, which occurs when you join on a condition
other than "=".

Let's have an example where we would like to join EMPLOYEE and DEPARTMENT relation
where E_NO from EMPLOYEE = E_NO from DEPARTMENT.
EMPLOYEE ⋈EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_EXPERIENCE

E-1 Ram Delhi 04 D-1 HR E-1 03

E-2 Varun Chandigarh 09 D-2 IT E-2 05

E-3 Ravi Noida 03 D-3 Marketing E-3 02

Check Cartesian Product, if the tuple contains same E_NO, insert that tuple in the output
relation

Natural Join (⋈)

A comparison operator is not used in a natural join. It does not concatenate like a
Cartesian product. A Natural Join can be performed only if two relations share at least one
common attribute. Furthermore, the attributes must share the same name and
domain.
Natural join operates on matching attributes where the values of the attributes in both
relations are the same and remove the duplicate ones.
Preferably Natural Join is performed on the foreign key.
Notation : R ⋈ S
Where R is the first relation
S is the second relation

Let's say we want to join EMPLOYEE and DEPARTMENT relation with E_NO as a common
attribute.

Notice, here E_NO has the same name in both the relations and also consists of the same
domain, i.e., in both relations E_NO is a string.

EMPLOYEE ⋈ DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE

E-1 Ram Delhi 04 D-1 HR 03

E-2 Varun Chandigarh 09 D-2 IT 05

E-3 Ravi Noida 03 D-3 Marketing 02

But unlike the above operation, where we have two columns of E_NO, here we are having
only one column of E_NO. This is because Natural Join automatically keeps a single
copy of a common attribute.
Outer Join

Unlike Inner Join which includes the tuple that satisfies the given condition, Outer Join
also includes some/all the tuples which don't satisfy the given condition. It is also of three
types: Left Outer Join, Right Outer Join, and Full Outer Join.

Let's say we have two relations R and S, then


Below is the representation of Left, Right, and Full Outer Joins.

Left Outer Join

As we can see from the diagram, Left Outer Join returns the matching tuples(tuples
present in both relations) and the tuples which are only present in Left Relation, here R.

However, if the matching tuples are NULL, then attributes/columns of Right Relation, here
S are made NULL in the output relation.

Let's understand this a bit more using an example:

EMPLOYEE ⟕EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT

Here we are combining EMPLOYEE and DEPARTMENT relation with the constraint that
EMPLOYEE's E_NO must be equal to DEPARTMENT's E_NO.
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE

E-1 Ram Delhi 04 D-1 HR 03

E-2 Varun Chandigarh 09 D-2 IT 05

E-3 Ravi Noida 03 D-3 Marketing 02

E-4 Amit Bangalore 07 - - -

As you can see here, all the tuples from left, i.e., EMPLOYEE relation are present. But E-4 is
not satisfying the given condition, i.e., E_NO from EMPLOYEE must be equal to E_NO from
DEPARTMENT, still it is included in the output relation. This is because Outer Join also
includes some/all the tuples which don't satisfy the condition. That's why Outer Join
marked E-4's corresponding tuple/row from DEPARTMENT as NULL.

Right Outer Join

Right Outer Join returns the matching tuples and the tuples which are only present in
Right Relation here S.

The same happens with the Right Outer Join, if the matching tuples are NULL, then the
attributes of Left Relation, here R are made NULL in the output relation.

We will combine EMPLOYEE and DEPARTMENT relations with the same constraint as
above.

EMPLOYEE ⟖EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT


E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE

E-1 Ram Delhi 04 D-1 HR 03

E-2 Varun Chandigarh 09 D-2 IT 05

E-3 Ravi Noida 03 D-3 Marketing 02

As all the tuples from DEPARTMENT relation have a corresponding E_NO in EMPLOYEE
relation, therefore no tuple from EMPLOYEE relation contains a NULL.

Full Outer Join

Full Outer Join returns all the tuples from both relations. However, if there are no
matching tuples then, their respective attributes are made NULL in output relation.

Again, combine the EMPLOYEE and DEPARTMENT relation with the same constraint.

EMPLOYEE ⟗EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT


E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE

E-1 Ram Delhi 04 D-1 HR 03

E-2 Varun Chandigarh 09 D-2 IT 05

E-3 Ravi Noida 03 D-3 Marketing 02

E-4 Amit Bangalore 07 - - -

Intersection (∩)

Intersection operation is done by Intersection Operator which is represented


by "intersection"(∩).It is the same as the intersection operator from set theory, i.e., it
selects all the tuples which are present in both relations. It is a binary operator as it
requires two operands. Also, it eliminates duplicates.
Notation : R ∩ S
Where R is the first relation
S is the second relation

Let's have an example to clarify the concept:


Suppose we want the names which are present in STUDENT as well as in EMPLOYEE
relation, Relations we used in Basic Operations.

∏ NAME(STUDENT) ∩ ∏ NAME(EMPLOYEE)
NAME

Baljeet

Harsh

Division (÷)

Division Operation is represented by "division"(÷ or /) operator and is used in queries that


involve keywords "every", "all", etc.
Notation : R(X,Y)/S(Y)
Here,
R is the first relation from which data is retrieved.
S is the second relation that will help to retrieve the data.
X and Y are the attributes/columns present in relation. We can have multiple attributes in
relation, but keep in mind that attributes of S must be a proper subset of attributes of R.
For each corresponding value of Y, the above notation will return us the value of X from
tuple<X,Y> which exists everywhere.
It's a bit difficult to understand this in a theoretical way, but you will understand this with
an example.

Let's have two relations, ENROLLED and COURSE. ENROLLED consist of two attributes
STUDENT_ID and COURSE_ID. It denotes the map of students who are enrolled in given
courses.

COURSE contains the list of courses available.


See, here attributes/columns of COURSE relation are a proper subset of
attributes/columns of ENROLLED relation. Hence Division operation can be used here.

ENROLLED

STUDENT_ID COURSE_ID

Student_1 DBMS

Student_2 DBMS

Student_1 OS

Student_3 OS

COURSE

COURSE_ID

DBMS

OS

Now the query is to return the STUDENT_ID of students who are enrolled in every course.

ENROLLED(STUDENT_ID, COURSE_ID)/COURSE(COURSE_ID)

This will return the following relation as output.

STUDENT_ID

Student_1

SQL Commands
o SQL commands are instructions. It is used to communicate with the database. It is also
used to perform specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the table, modify
the table, set permission for users.

Types of SQL Commands


There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.

1. Data Definition Language (DDL)


o DDL changes the structure of the table like creating a table, deleting a table, altering a
table, etc.
o All the command of DDL are auto-committed that means it permanently save all the
changes in the database.

Here are some commands that come under DDL:

o CREATE
o ALTER
o DROP
o TRUNCATE

a. CREATE It is used to create a new table in the database.

Syntax:

1. CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);

Example:

1. CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Email VARCHAR2(100),


DOB DATE);

b. DROP: It is used to delete both the structure and record stored in the table.

Syntax

1. DROP TABLE table_name;

Example

1. DROP TABLE EMPLOYEE;

c. ALTER: It is used to alter the structure of the database. This change could be either to
modify the characteristics of an existing attribute or probably to add a new attribute.

Syntax:

To add a new column in the table

1. ALTER TABLE table_name ADD column_name COLUMN-definition;

To modify existing column in the table:

1. ALTER TABLE table_name MODIFY(column_definitions....);

EXAMPLE

1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCHAR2(20));


2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCHAR2(20));

d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.

Syntax:
1. TRUNCATE TABLE table_name;

Example:

1. TRUNCATE TABLE EMPLOYEE;


2. Data Manipulation Language
o DML commands are used to modify the database. It is responsible for all form of changes
in the database.
o The command of DML is not auto-committed that means it can't permanently save all the
changes in the database. They can be rollback.

Here are some commands that come under DML:

o INSERT
o UPDATE
o DELETE

a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of
a table.

Syntax:

1. INSERT INTO TABLE_NAME


2. (col1, col2, col3,.... col N)
3. VALUES (value1, value2, value3, .... valueN);

Or

1. INSERT INTO TABLE_NAME


2. VALUES (value1, value2, value3, .... valueN);

For example:

1. INSERT INTO javatpoint (Author, Subject) VALUES ("Sonoo", "DBMS");

b. UPDATE: This command is used to update or modify the value of a column in the table.

Syntax:

1. UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [W


HERE CONDITION]
For example:

1. UPDATE students
2. SET User_Name = 'Sonoo'
3. WHERE Student_Id = '3'

c. DELETE: It is used to remove one or more row from a table.

Syntax:

1. DELETE FROM table_name [WHERE condition];

For example:

1. DELETE FROM javatpoint


2. WHERE Author="Sonoo";
3. Data Control Language
DCL commands are used to grant and take back authority from any database user.

Here are some commands that come under DCL:

o Grant
o Revoke

a. Grant: It is used to give user access privileges to a database.

Example

1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_USER, ANOTHER_USER;

b. Revoke: It is used to take back permissions from the user.

Example

1. REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;


4. Transaction Control Language
TCL commands can only use with DML commands like INSERT, DELETE and UPDATE
only.

These operations are automatically committed in the database that's why they cannot be
used while creating tables or dropping them.

Here are some commands that come under TCL:


o COMMIT
o ROLLBACK
o SAVEPOINT

a. Commit: Commit command is used to save all the transactions to the database.

Syntax:

1. COMMIT;

Example:

1. DELETE FROM CUSTOMERS


2. WHERE AGE = 25;
3. COMMIT;

b. Rollback: Rollback command is used to undo transactions that have not already been
saved to the database.

Syntax:

1. ROLLBACK;

Example:

1. DELETE FROM CUSTOMERS


2. WHERE AGE = 25;
3. ROLLBACK;

c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.

Syntax:

1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
DQL is used to fetch the data from the database.

It uses only one command:

o SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.

Syntax:

1. SELECT expressions
2. FROM TABLES
3. WHERE conditions;

For example:

1. SELECT emp_name
2. FROM employee
3. WHERE age > 20;

What are Integrity Constraints in DBMS?

In Database Management Systems, integrity constraints are pre-defined set of rules that are
applied on the table fields(columns) or relations to ensure that the overall validity, integrity,
and consistency of the data present in the database table is maintained. Evaluation of all the
conditions or rules mentioned in the integrity constraint is done every time a table insert,
update, delete, or alter operation is performed. The data can be inserted, updated, deleted, or
altered only if the result of the constraint comes out to be True. Thus, integrity constraints are
useful in preventing any accidental damage to the database by an authorized user.

Types of Integrity Constraints


Domain Constraint

Domain integrity constraint contains a certain set of rules or conditions to restrict the kind
of attributes or values a column can hold in the database table. The data type of a domain
can be string, integer, character, DateTime, currency, etc.

Example:

Consider a Student's table having Roll No, Name, Age, Class of students.

Roll No Name Age Class

101 Adam 14 6

102 Steve 16 8

103 David 8 4

104 Bruce 18 12

105 Tim 6 A

In the above student's table, the value A in the last row last column violates the
domain integrity constraint because the Class attribute contains only integer
values while A is a character.
Entity Integrity Constraint

Entity Integrity Constraint is used to ensure that the primary key cannot be null. A primary
key is used to identify individual records in a table and if the primary key has a null value,
then we can't identify those records. There can be null values anywhere in the table
except the primary key column.

Example:

Consider Employees table having Id, Name, and salary of employees

ID Name Salary
1101 Jackson 40000
1102 Harry 60000
1103 Steve 80000
1104 Ash 1800000
James 36000
In the above employee's table, we can see that the ID column is the primary key
and contains a null value in the last row which violates the entity integrity
constraint.
Referential Integrity Constraint

Referential Integrity Constraint ensures that there must always exist a valid relationship
between two relational database tables. This valid relationship between the two tables
confirms that a foreign key exists in a table. It should always reference a corresponding
value or attribute in the other table or be null.

Example:

Consider an Employee and a Department table where Dept_ID acts as a foreign key
between the two tables

Employees Table

ID Name Salary Dept_ID


1101 Jackson 40000 3
1102 Harry 60000 2
1103 Steve 80000 4
1104 Ash 1800000 3
1105 James 36000 1

Department Table

Dept_ID Dept_Name
1 Sales
2 HR
3 Technical

In the above example, Dept_ID acts as a foreign key in the Employees table and
a primary key in the Department table. Row having DeptID=4 violates the
referential integrity constraint since DeptID 4 is not defined as a primary key
column in the Departments table.
Key constraint

Keys are the set of entities that are used to identify an entity within its entity set uniquely.
There could be multiple keys in a single entity set, but out of these multiple keys, only one
key will be the primary key. A primary key can only contain unique and not null values in
the relational database table

Example:
Consider a student's table

Roll No Name Age Class


101 Adam 14 6
102 Steve 16 8
103 David 8 4
104 Bruce 18 12
102 Tim 6 2

The last row of the student's table violates the key integrity constraint since Roll
No 102 is repeated twice in the primary key column. A primary key must be
unique and not null therefore duplicate values are not allowed in the Roll No
column of the above student's table.
Conclusion

 Integrity Constraints in Database Management Systems are the set of pre-defined rules
responsible for maintaining the quality and consistency of data in the database.
 Evaluation against the rules mentioned in the integrity constraint is done every time an
insert, update, delete, or alter operation is performed on the table.
 Integrity Constraints in DBMS are of 4 types:

1. Domain Constraint
2. Entity Constraint
3. Referential Integrity Constraint
4. Key Constraint

Aggregate functions in SQL


In database management an aggregate function is a function where the
values of multiple rows are grouped together as input on certain criteria to
form a single value of more significant meaning.

Various Aggregate Functions


1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()
The SQL COUNT(), AVG() and SUM()
Functions
The COUNT() function returns the number of rows that matches a specified
criterion.

COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;

The AVG() function returns the average value of a numeric column.

AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;

The SUM() function returns the total sum of a numeric column.

SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;

The SQL MIN() and MAX() Functions


The MIN() function returns the smallest value of the selected column.

The MAX() function returns the largest value of the selected column.

MIN()
SELECT MIN(column_name)
FROM table_name
WHERE condition;
MAX()
SELECT MAX(column_name)
FROM table_name
WHERE condition;

Now let us understand each Aggregate function with a example:


Id Name Salary
-----------------------
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null

Count():
Count(*): Returns total number of records .i.e 6.
Count(salary): Return number of Non Null values over the column salary. i.e
5.
Count(Distinct Salary): Return number of distinct Non Null values over the
column salary .i.e 4

Sum():

sum(salary): Sum all Non Null values of Column salary i.e., 310
sum(Distinct salary): Sum of all distinct Non-Null values i.e., 250.

Avg():

Avg(salary) = Sum(salary) / count(salary) = 310/5


Avg(Distinct salary) = sum(Distinct salary) / Count(Distinct Salary) = 250/4

Min():
Min(salary): Minimum value in the salary column except NULL i.e., 40.
Max(salary): Maximum value in the salary i.e., 80.

SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a
related column between them.

Different Types of SQL JOINs


Here are the different types of the JOINs in SQL:

 (INNER) JOIN: Returns records that have matching values in both


tables
 LEFT (OUTER) JOIN: Returns all records from the left table, and the
matched records from the right table
 RIGHT (OUTER) JOIN: Returns all records from the right table, and the
matched records from the left table
 FULL (OUTER) JOIN: Returns all records when there is a match in
either left or right table
SQL INNER JOIN Keyword
The INNER JOIN keyword selects records that have matching values in both
tables.

INNER JOIN Syntax


SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

Below is a selection from the "Orders" table:

OrderID CustomerID EmployeeID OrderDate ShipperID

10308 2 7 1996-09-18 3

10309 37 3 1996-09-19 1

10310 77 8 1996-09-20 2

And a selection from the "Customers" table:


CustomerID CustomerName ContactName Address City PostalCode Country

1 Alfreds Maria Anders Obere Str. Berlin 12209 Germany


Futterkiste 57

2 Ana Trujillo Ana Trujillo Avda. de la México 05021 Mexico


Emparedados y Constitución D.F.
helados 2222

3 Antonio Moreno Antonio Mataderos México 05023 Mexico


Taquería Moreno 2312 D.F.

SELECT Orders.OrderID, Customers.CustomerName


FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

SQL LEFT JOIN Keyword


The LEFT JOIN keyword returns all records from the left table (table1), and
the matching records from the right table (table2). The result is 0 records
from the right side, if there is no match.

LEFT JOIN
SELECT column_name(s)
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

Note: In some databases LEFT JOIN is called LEFT OUTER JOIN.


SQL LEFT JOIN Example
The following SQL statement will select all customers, and any orders they
might have:

Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;

The SQL ORDER BY Keyword


The ORDER BY keyword is used to sort the result-set in ascending or
descending order.

The ORDER BY keyword sorts the records in ascending order by default. To


sort the records in descending order, use the DESC keyword.

ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

SQL RIGHT JOIN Keyword


The RIGHT JOIN keyword returns all records from the right table (table2),
and the matching records from the left table (table1). The result is 0 records
from the left side, if there is no match.
RIGHT JOIN
SELECT column_name(s)
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

Note: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

SQL FULL OUTER JOIN Keyword


The FULL OUTER JOIN keyword returns all records when there is a match in
left (table1) or right (table2) table records.

Tip: FULL OUTER JOIN and FULL JOIN are the same.

FULL OUTER JOIN Syntax


SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name
WHERE condition;

Note: FULL OUTER JOIN can potentially return very large result-sets!

Relational Set Operators in DBMS


DBMS supports relational set operators as well. The major relational set operators
are union, intersection and set difference. All of these can be implemented in DBMS
using different queries.
The relational set operators in detail using given example are as follows as follows

Student_Number Student_Name S

1 John

2 Mary

3 Damon

Student_Number Student_Name

2 Mary

3 Damon

6 Matt

Union
Union combines two different results obtained by a query into a single result in the
form of a table. However, the results should be similar if union is to be applied on
them. Union removes all duplicates, if any from the data and only displays distinct
values. If duplicate values are required in the resultant data, then UNION ALL is used.
An example of union is −

Select Student_Name from Art_Students


UNION
Select Student_Name from Dance_Students

This will display the names of all the students in the table Art_Students and
Dance_Students i.e John, Mary, Damon and Matt.

Intersection
The intersection operator gives the common data values between the two data sets
that are intersected. The two data sets that are intersected should be similar for the
intersection operator to work. Intersection also removes all duplicates before
displaying the result.
An example of intersection is −

Select Student_Name from Art_Students


INTERSECT
Select Student_Name from Dance_Students

This will display the names of the students in the table Art_Students and in the table
Dance_Students i.e all the students that have taken both art and dance classes
.Those are Mary and Damon in this example.

Set difference
The set difference operators takes the two sets and returns the values that are in the
first set but not the second set.
An example of set difference is −

Select Student_Name from Art_Students


MINUS
Select Student_Name from Dance_Students

This will display the names of all the students in table Art_Students but not in table
Dance_Students i.e the students who are taking art classes but not dance classes.
That is John in this example.

Views in SQL
o Views in SQL are considered as a virtual table. A view also contains rows and
columns.
o To create the view, we can select the fields from one or more tables present in
the database.
o A view can either have specific rows based on certain condition or all the rows
of a table.

ample table:
Student_Detail
STU_ID NAME ADDRESS

1 Stephan Delhi

2 Kathrin Noida

3 David Ghaziabad

4 Alina Gurugram

Student_Marks

STU_ID NAME MARKS AGE

1 Stephan 97 19

2 Kathrin 86 21

3 David 74 18

4 Alina 90 20

5 John 96 18

1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view from
a single table or multiple tables.

Syntax:

CREATE VIEW view_name AS


SELECT column1, column2.....
FROM table_name
WHERE condition;
2. Creating View from a single table
In this example, we create a View named DetailsView from the table Student_Detail.

Query:

1. CREATE VIEW DetailsView AS


2. SELECT NAME, ADDRESS
3. FROM Student_Details
4. WHERE STU_ID < 4;

Just like table query, we can query the view to view the data.

1. SELECT * FROM DetailsView;

Output:

NAME ADDRESS

Stephan Delhi

Kathrin Noida

David Ghaziabad

3. Creating View from multiple tables


View from multiple tables can be created by simply include multiple tables in the
SELECT statement.

In the given example, a view is created named MarksView from two tables
Student_Detail and Student_Marks.

Query:

1. CREATE VIEW MarksView AS


2. SELECT Student_Detail.NAME, Student_Detail.ADDRESS, Student_Marks.MARKS
3. FROM Student_Detail, Student_Mark
4. WHERE Student_Detail.NAME = Student_Marks.NAME;

To display data of View MarksView: SELECT * FROM MarksView;


NAME ADDRESS MARKS

Stephan Delhi 97

Kathrin Noida 86

David Ghaziabad 74

Alina Gurugram 90

4. Deleting View
A view can be deleted using the Drop View statement.

Syntax

DROP VIEW view_name;

Example:

If we want to delete the View MarksView, we can do this as:

DROP VIEW MarksView;

SQL - Sub Queries

A Subquery or Inner query or a Nested query is a query within another SQL query
and embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition
to further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements
along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
 Subqueries must be enclosed within parentheses.
 A subquery can have only one column in the SELECT clause, unless
multiple columns are in the main query for the subquery to compare its
selected columns.
 An ORDER BY command cannot be used in a subquery, although the
main query can use an ORDER BY. The GROUP BY command can be
used to perform the same function as the ORDER BY in a subquery.
 Subqueries that return more than one row can only be used with
multiple value operators such as the IN operator.
 The SELECT list cannot include any references to values that evaluate
to a BLOB, ARRAY, CLOB, or NCLOB.
 A subquery cannot be immediately enclosed in a set function.
 The BETWEEN operator cannot be used with a subquery. However, the
BETWEEN operator can be used within the subquery.

Syntax: There is not any general syntax for Subqueries. However,


Subqueries are seen to be used most frequently with SELECT statement as
shown below:
SELECT column_name
FROM table_name
WHERE column_name expression operator
( SELECT COLUMN_NAME from TABLE_NAME WHERE ... );

Sample Table:
DATABASE

NAME ROLL_NO LOCATION PHONE_NUMBER

Ram 101 Chennai 9988775566

Raj 102 Coimbatore 8877665544

Sasi 103 Madurai 7766553344

Ravi 104 Salem 8989898989

Sumathi105 Kanchipuram 8989856868

STUDENT
NAME ROLL_NO SECTION

Ravi 104 A

Sumathi 105 B

Raj 102 A

Sample Queries

To display NAME, LOCATION, PHONE_NUMBER of the students from DATABASE table whose
section is A

Select NAME, LOCATION, PHONE_NUMBER from DATABASE

WHERE ROLL_NO IN

(SELECT ROLL_NO from STUDENT where SECTION=’A’);

Explanation : First subquery executes “ SELECT ROLL_NO from STUDENT where SECTION=’A’ ”
returns ROLL_NO from STUDENT table whose SECTION is ‘A’.Then outer-query executes it and return
the NAME, LOCATION, PHONE_NUMBER from the DATABASE table of the student whose ROLL_NO is
returned from inner subquery. Output:

NAME ROLL_NO LOCATION PHONE_NUMBER

Ravi 104 Salem 8989898989

Raj 102 Coimbatore 8877665544

Insert Query Example:


Table1: Student1

NAME ROLL_NO LOCATION PHONE_NUMBER

Ram 101 chennai 9988773344

Raju 102 coimbatore 9090909090

Ravi 103 salem 8989898989

Table2: Student2

NAME ROLL_NO LOCATION PHONE_NUMBER

Raj 111 chennai 8787878787

Sai 112 mumbai 6565656565


Sri 113 coimbatore 7878787878

To insert Student2 into Student1 table:


INSERT INTO Student1 SELECT * FROM Student2;

Output:

NAME ROLL_NO LOCATION PHONE_NUMBER

Ram 101 chennai 9988773344

Raju 102 coimbatore 9090909090

Ravi 103 salem 8989898989

Raj 111 chennai 8787878787

Sai 112 mumbai 6565656565

Sri 113 coimbatore 7878787878

To delete students from Student2 table whose rollno is same as that in Student1 table and
having location as Chennai.
DELETE FROM Student2

WHERE ROLL_NO IN ( SELECT ROLL_NO

FROM Student1

WHERE LOCATION = ’chennai’);

Output:

1 row delete successfully.

Display Student2 table:

NAME ROLL_NO LOCATION PHONE_NUMBER

Sai 112 mumbai 6565656565

Sri 113 coimbatore 7878787878

To update name of the students to geeks in Student2 table whose location is same as Raju,Ravi in
Student1 table

UPDATE Student2

SET NAME=’geeks’

WHERE LOCATION IN ( SELECT LOCATION

FROM Student1

WHERE NAME IN (‘Raju’,’Ravi’));

Output:
1 row updated successfully.

Display Student2 table:

NAME ROLL_NO LOCATION PHONE_NUMBER

Sai 112 mumbai 6565656565

geeks 113 coimbatore 7878787878

Unit-3

Entity(data) Integrity Rule in RDBMS

For Entity Integrity Rule, each table has a Primary Key.

Primary Key cannot have NULL value.


<Student>

Student_ID Student_Awards Stud

Above, you can see our primary key is Student_ID. We cannot


consider Student_Awards as the primary key since not every student would have
received the award.
Let us see another example −
<Employee>

Employee_ID Employee_Name Employee_Age

In the above table, the Primary Key is Employee_ID


Let us now summarize the Entity Integrity Rule −

 Make sure that each tuple in a table is unique.


 Every table mush has a primary key, for example, Student_ID for a
Student table.
 Every entity is unique.
 The relations Primary Key must have unique values for each row.
 Primary Key cannot have NULL value and must be unique.
 Example can be an Employee_ID cannot be null in an Employee table.

Functional dependency

Functional dependency in DBMS, as the name suggests is a relationship between


attributes of a table dependent on each other. Introduced by E. F. Codd, it helps in
preventing data redundancy and gets to know about bad designs.
To understand the concept thoroughly, let us consider P is a relation with attributes
A and B. Functional Dependency is represented by -> (arrow sign)
Then the following will represent the functional dependency between attributes
with an arrow sign −

A -> B

Above suggests the following:


Example
The following is an example that would make it easier to understand functional
dependency −
We have a <Department> table with two attributes − DeptId and DeptName.

DeptId = Department IDDeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies


the DeptName attribute. This is because if you want to know the department name,

then at first you need to have the DeptId.


DeptId DeptName

001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can
be determined as DeptId is functionally dependent on DeptName −

DeptId -> DeptName


Types of Functional Dependencies in DBMS

1. Trivial functional dependency


2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency

Trivial Functional Dependency in DBMS

 In Trivial functional dependency, a dependent is always a subset of the determinant. In


other words, a functional dependency is called trivial if the attributes on the right side are
the subset of the attributes on the left side of the functional dependency.
 X → Y is called a trivial functional dependency if Y is the subset of X.
 For example, consider the Employee table below.

Employee_Id Name

1 Zayn

2 Phobe

3 Hikki

4 David

 Here, { Employee_Id, Name } → { Name } is a Trivial functional dependency, since the


dependent Name is the subset of determinant { Employee_Id, Name }.
 { Employee_Id } → { Employee_Id }, { Name } → { Name } and { Age } → { Age } are also
Trivial.

Non-Trivial Functional Dependency in DBMS

 It is the opposite of Trivial functional dependency. Formally speaking, in Non-Trivial


functional dependency, dependent if not a subset of the determinant.
 X → Y is called a Non-trivial functional dependency if Y is not a subset of X. So, a functional
dependency X → Y where X is a set of attributes and Y is also a set of the attribute but not a
subset of X, then it is called Non-trivial functional dependency.
 For example, consider the Employee table below.

Employee_Id Name Age

1 Zayn 24

2 Phobe 34

3 Hikki 26

4 David 29

 Here, { Employee_Id } → { Name } is a non-trivial functional dependency


because Name(dependent) is not a subset of Employee_Id(determinant).
 Similarly, { Employee_Id, Name } → { Age } is also a non-trivial functional
dependency.

Multivalued Functional Dependency in DBMS

 In Multivalued functional dependency, attributes in the dependent set are not


dependent on each other.
 For example, X → { Y, Z }, if there exists is no functional dependency between Y and Z, then
it is called as Multivalued functional dependency.

 For example, consider the Employee table below.

Employee_Id Name Age

1 Zayn 24

2 Phobe 34

3 Hikki 26

4 David 29

4 Phobe 24

 Here, { Employee_Id } → { Name, Age } is a Multivalued functional dependency, since the


dependent attributes Name, Age are not functionally dependent(i.e. Name → Age or Age →
Name doesn’t exist !).

Transitive Functional Dependency in DBMS

 Consider two functional dependencies A → B and B → C then according to the transitivity


axiom A → C must also exist. This is called a transitive functional dependency.
 In other words, dependent is indirectly dependent on determinant in Transitive functional
dependency.

 For example, consider the Employee table below.

Employee_Id Name Department Street Number

1 Zayn CD 11

2 Phobe AB 24

3 Hikki CD 11

4 David PQ 71

5 Phobe LM 21

 Here, { Employee_Id → Department } and { Department → Street Number } holds


true. Hence, according to the axiom of transitivity, { Employee_Id → Street Number } is a
valid functional dependency.
Armstrong’s Axioms/Properties of Functional Dependency in DBMS

William Armstrong in 1974 suggested a few rules related to functional dependency. They are
called RAT rules.

1. Reflexivity: If A is a set of attributes and B is a subset of A, then the functional


dependency A → B holds true.
o For example, { Employee_Id, Name } → Name is valid.
2. Augmentation: If a functional dependency A → B holds true, then appending any
number of the attribute to both sides of dependency doesn't affect the dependency. It
remains true.
o For example, X → Y holds true then, ZX → ZY also holds true.
o For example, if { Employee_Id, Name } → { Name } holds true then, { Employee_Id,
Name, Age } → { Name, Age }
3. Transitivity: If two functional dependencies X → Y and Y → Z hold true, then X →
Z also holds true by the rule of Transitivity.
o For example, if { Employee_Id } → { Name } holds true and { Name } → {
Department } holds true, then { Employee_Id } → { Department } also holds true.

Advantages of Functional Dependency in DBMS

Let's discuss some of the advantages of Functional dependency,

1. It is used to maintain the quality of data in the database.


2. It expresses the facts about the database design.
3. It helps in clearly defining the meanings and constraints of databases.
4. It helps to identify bad designs.
5. Functional Dependency removes data redundancy where the same values should not be
repeated at multiple locations in the same database table.
6. The process of Normalization starts with identifying the candidate keys in the
relation. Without functional dependency, it's impossible to find candidate keys and
normalize the database.

Normalization
Why Do We Need Normalization?

Normalization is used to reduce data redundancy. It provides a method to remove the


following anomalies from the database and bring it to a more consistent state:
A database anomaly is a flaw in the database that occurs because of poor planning and
redundancy.

1. Insertion anomalies: This occurs when we are not able to insert data into a
database because some attributes may be missing at the time of insertion.
2. Updation anomalies: This occurs when the same data items are repeated with the
same values and are not linked to each other.
3. Deletion anomalies: This occurs when deleting one part of the data deletes the
other necessary information from the database.

Types of Normal Forms:


Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it
satisfies constraints.

Following are the various types of Normal forms:

Normal Forms

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary
key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.


4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be lossless.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab


Second Normal Form (2NF)
o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent
on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:
TEACHER_ID SUBJECT

25 Chemistry

25 Biology

47 English

83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal


EMPLOYEE_DETAIL table:

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent


on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal
Boyce Codd normal form (BCNF)
BCNF is the advance version of 3NF. It is stricter than 3NF.

A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO

264 India Designing D394 283

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developing D283 549

In the above table Functional dependencies are as follows:

EMP_ID → EMP_COUNTRY

EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

Play Video

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232

D283 549

Functional dependencies:

EMP_ID → EMP_COUNTRY

EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:
For the first table: EMP_ID

For the second table: EMP_DEPT

For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent
of each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent
on a third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two


colors(white and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and


independent of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:

BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.

Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOB/BY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen
P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

Pitfalls in Relational database Design


Relational database design requires that we find a “good” collection of relational schemas. A bad

design may lead to—

Repetition of information

Inability to represent certain information

Design Goals for Relational Database

1. Avoid redundant data

2. Ensure that relationships among attributes are represented.

3. Facilitate the checking of updates for violation of database integrity constraints

Example

Consider the relational schema

Lending-schema = (branch-name, branch-city, assets, customer-name, loan- number, amount)

Redundancy

--Data for branch name, branch city, assets are repeated for each loan that a branch makes.

--Wastes space and complicates updating

Null Values

--cannot store information about a branch if no loan exists

--can use null values, but they are difficult to handle.

In the given example the database design is faulty which makes the above pitfalls in database.
So we observe that in relational database design if the design is not good then there will be faults in

databases.

You might also like