0% found this document useful (0 votes)
32 views229 pages

Dbms Unit-1 and 2, 3,4,5 Notes

DBMS, or Database Management System, is a collection of inter-related data and programs that facilitate the storage and retrieval of data efficiently. It addresses the need for optimized data storage and fast retrieval, particularly for large datasets, and is crucial in various applications such as banking, education, and online shopping. The document also discusses the advantages of DBMS over traditional file systems, including reduced data redundancy, improved data consistency, and enhanced security.

Uploaded by

Tejaswini A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views229 pages

Dbms Unit-1 and 2, 3,4,5 Notes

DBMS, or Database Management System, is a collection of inter-related data and programs that facilitate the storage and retrieval of data efficiently. It addresses the need for optimized data storage and fast retrieval, particularly for large datasets, and is crucial in various applications such as banking, education, and online shopping. The document also discusses the advantages of DBMS over traditional file systems, including reduced data redundancy, improved data consistency, and enhanced security.

Uploaded by

Tejaswini A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 229

DBMS UNIT 1

DBMS stands for Database Management System. We can break it like this DBMS =
Database + Management System. Database is a collection of data and Management
System is a set of programs to store and retrieve those data. Based on this we can define
DBMS like this: DBMS is a collection of inter-related data and set of programs to store
& access those data in an easy and effective manner.

What is the need of DBMS?


Database systems are basically developed for large amount of data. When dealing with
huge amount of data, there are two things that require optimization: Storage of
data and retrieval of data.

Storage: According to the principles of database systems, the data is stored in such a
way that it acquires lot less space as the redundant data (duplicate data) has been
removed before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account. Let’s say bank stores saving account data at one place
(these places are called tables we will learn them later) and salary account data at another
place, in that case if the customer information such as customer name, address etc. are
stored at both places then this is just a wastage of storage (redundancy/ duplication of
data), to organize the data in a better way the information should be stored at one place
and both the accounts should be linked to that information somehow. The same thing we
achieve in DBMS.

Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed. Database
systems ensure that the data is retrieved as quickly as possible.

Purpose of Database Systems

The main purpose of database systems is to manage the data. Consider a university that
keeps the data of students, teachers, courses, books etc. To manage this data we need to
store this data somewhere where we can add new data, delete unused data, update
outdated data, retrieve data, to perform these operations on data we need a Database
DATABASE MANAGEMENT SYSTEM Page 1
management system that allows us to store the data in such a way so that all these
operations can be performed on the data efficiently.

Database Applications – DBMS

Applications where we use Database Management Systems are:

 Telecom: There is a database to keeps track of the information regarding calls


made, network usage, customer details etc. Without the database systems it is hard
to maintain that huge amount of data that keeps updating every millisecond.
 Industry: Where it is a manufacturing unit, warehouse or distribution centre, each
one needs a database to keep the records of ins and outs. For example distribution
centre should keep a track of the product units that supplied into the centre as well
as the products that got delivered out from the distribution centre on each day; this
is where DBMS comes into picture.
 Banking System: For storing customer info, tracking day to day credit and debit
transactions, generating bank statements etc. All this work has been done with the
help of Database management systems.
 Sales: To store customer information, production information and invoice details.
 Airlines: To travel though airlines, we make early reservations, this reservation
information along with flight schedule is stored in database.
 Education sector: Database systems are frequently used in schools and colleges to
store and retrieve the data regarding student details, staff details, course details,
exam details, payroll data, attendance details, fees details etc. There is a hell lot
amount of inter-related data that needs to be stored and retrieved in an efficient
manner.
 Online shopping: You must be aware of the online shopping websites such as
Amazon, Flipkart etc. These sites store the product information, your addresses and
preferences, credit details and provide you the relevant list of products based on
your query. All this involves a Database management system.

Advantages of DBMS over file system

Drawbacks of File system

DATABASE MANAGEMENT SYSTEM Page 2


 Data redundancy: Data redundancy refers to the duplication of data, lets say we
are managing the data of a college where a student is enrolled for two courses, the
same student details in such case will be stored twice, which will take more storage
than needed. Data redundancy often leads to higher storage costs and poor access
time.
 Data inconsistency: Data redundancy leads to data inconsistency, lets take the
same example that we have taken above, a student is enrolled for two courses and
we have student address stored twice, now lets say student requests to change his
address, if the address is changed at one place and not on all the records then this
can lead to data inconsistency.
 Data Isolation: Because data are scattered in various files, and files may be in
different formats, writing new application programs to retrieve the appropriate data
is difficult.
 Dependency on application programs: Changing files would lead to change in
application programs.
 Atomicity issues: Atomicity of a transaction refers to “All or nothing”, which
means either all the operations in a transaction executes or none.

For example: Lets say Steve transfers 100$ to Negan’s account. This transaction
consists multiple operations such as debit 100$ from Steve’s account, credit 100$ to
Negan’s account. Like any other device, a computer system can fail lets say it fails
after first operation then in that case Steve’s account would have been debited by
100$ but the amount was not credited to Negan’s account, in such case the rollback
of operation should occur to maintain the atomicity of transaction. It is difficult to
achieve atomicity in file processing systems.

 Data Security: Data should be secured from unauthorised access, for example a
student in a college should not be able to see the payroll details of the teachers, such
kind of security constraints are difficult to apply in file processing systems.

Advantage of DBMS over file system

There are several advantages of Database management system over file system. Few of
them are as follows:
DATABASE MANAGEMENT SYSTEM Page 3
 No redundant data: Redundancy removed by data normalization. No data
duplication saves storage and improves access time.
 Data Consistency and Integrity: As we discussed earlier the root cause of data
inconsistency is data redundancy, since data normalization takes care of the data
redundancy, data inconsistency also been taken care of as part of it
 Data Security: It is easier to apply access constraints in database systems so that
only authorized user is able to access the data. Each user has a different set of
access thus data is secured from the issues such as identity theft, data leaks and
misuse of data.
 Privacy: Limited access means privacy of data.
 Easy access to data – Database systems manages data in such a way so that the
data is easily accessible with fast response times.
 Easy recovery: Since database systems keeps the backup of data, it is easier to do a
full recovery of data in case of a failure.
 Flexible: Database systems are more flexible than file processing systems.

Disadvantages of DBMS:

 DBMS implementation cost is high compared to the file system


 Complexity: Database systems are complex to understand
 Performance: Database systems are generic, making them suitable for various
applications. However this feature affect their performance for some applications

DBMS Architecture

The architecture of DBMS depends on the computer system on which it runs. For
example, in a client-server DBMS architecture, the database systems at server machine
can run several requests made by client machine. We will understand this communication
with the help of diagrams.

o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.

DATABASE MANAGEMENT SYSTEM Page 4


o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get
their request done.

Types of DBMS Architecture

There are three types of DBMS architecture:

1. Single tier architecture


2. Two tier architecture
3. Three tier architecture

Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.

1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.

2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing
and transaction management.

DATABASE MANAGEMENT SYSTEM Page 5


o To communicate with the DBMS, client-side application establishes a connection
with the server side.

3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.

DATABASE MANAGEMENT SYSTEM Page 6


In three-tier architecture, another layer is present between the client machine and
server machine. In this architecture, the client application doesn’t communicate
directly with the database systems present at the server machine, rather the client
application communicates with server application and the server application
internally communicates with the database system present at the server.

DBMS – Three Level Architecture

DBMS Three Level Architecture Diagram

DATABASE MANAGEMENT SYSTEM Page 7


This architecture has three levels:
1. External level
2. Conceptual level
3. Internal level

1. External level

It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with
the help of conceptual and internal level mapping.

The user doesn’t need to know the database schema details such as data structure, table
definition etc. user is only concerned about data which is what returned back to the view
level after it has been fetched from database (present at the internal level).

External level is the “top level” of the Three Level DBMS Architecture.

DATABASE MANAGEMENT SYSTEM Page 8


2. Conceptual level

It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.

Database constraints and security are also implemented in this level of architecture. This
level is maintained by DBA (database administrator).

3. Internal level

This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the
data. This is the lowest level of the architecture.

View of Data in DBMS

Abstraction is one of the main features of database systems. Hiding irrelevant details
from user and providing abstract view of data to users, helps in easy and efficient user-
database interaction. In the previous tutorial, we discussed the three level of DBMS
architecture, The top level of that architecture is “view level”. The view level provides
the “view of data” to the users and hides the irrelevant details such as data relationship,
database schema, constraints, security etc from the user.

To fully understand the view of data, you must have a basic knowledge of data
abstraction and instance & schema.

Data Abstraction in DBMS

Database systems are made-up of complex data structures. To ease the user interaction
with database, the developers hide internal irrelevant details from users. This process of
hiding irrelevant details from user is called data abstraction.

DATABASE MANAGEMENT SYSTEM Page 9


We have three levels of abstraction:
Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.

View level: Highest level of data abstraction. This level describes the user interaction
with database system.

Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.)
in memory. These details are often hidden from the programmers.

At the logical level these records can be described as fields and attributes along with
their data types, their relationship among each other can be logically implemented. The
programmers generally work at this level because they are aware of such things about
database systems.

At view level, user just interact with system with the help of GUI and enter the details at
the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.

DATABASE MANAGEMENT SYSTEM Page 10


Instance and schema in DBMS

DBMS Schema

Definition of schema: Design of a database is called the schema. Schema is of three


types: Physical schema, logical schema and view schema.

For example: In the following diagram, we have a schema that shows the relationship
between three tables: Course, Student and Section. The diagram only shows the design of
the database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.

The design of a database at physical level is called physical schema, how the data stored
in blocks of storage is described at this level.

Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of
data records gets stored in data structures, however the internal details such as
implementation of data structure is hidden at this level (available at physical level).

Design of database at view level is called view schema. This generally describes end
user interaction with database systems.

To learn more about these schemas, refer 3 level data abstraction architecture.
DATABASE MANAGEMENT SYSTEM Page 11
DBMS Instance

Definition of instance: The data stored in database at a particular moment of time is


called instance of database. Database schema defines the variable declarations in tables
that belong to a particular database; the value of these variables at a moment of time is
called the instance of that database.

For example, lets say we have a single table student in the database, today the table has
100 records, so today the instance of the database has 100 records. Lets say we are going
to add another 100 records in this table by tomorrow so the instance of database
tomorrow will have 200 records in table. In short, at a particular moment the data stored
in database is called the instance, that changes over time when we add or delete data
from the database.

DBMS languages

Database languages are used to read, update and store data in a database. There are
several such languages that can be used for this purpose; one of them is SQL (Structured
Query Language).

Types of DBMS languages:

DATABASE MANAGEMENT SYSTEM Page 12


Data Definition Language (DDL)

DDL is used for specifying the database schema. It is used for creating tables, schema,
indexes, constraints etc. in database. Lets see the operations that we can perform on
database using DDL:

 To create the database instance – CREATE


 To alter the structure of database – ALTER
 To drop database instances – DROP
 To delete tables in a database instance – TRUNCATE
 To rename database instances – RENAME
 To drop objects from database such as tables – DROP
 To Comment – Comment

All of these commands either defines or update the database schema that’s why they
come under Data Definition language.

Data Manipulation Language (DML)

DML is used for accessing and manipulating data in a database. The following
operations on database comes under DML:

 To read records from table(s) – SELECT


 To insert record(s) into the table(s) – INSERT
 Update the data in table(s) – UPDATE
 Delete all the records from the table – DELETE

Data Control language (DCL)

DCL is used for granting and revoking user access on a database –

 To grant access to user – GRANT


 To revoke access from user – REVOKE

In practical data definition language, data manipulation language and data control
languages are not separate language, rather they are the parts of a single database
language such as SQL.

Transaction Control Language(TCL)

The changes in the database that we made using DML commands are either performed or
rollbacked using TCL.

 To persist the changes made by DML commands in database – COMMIT


 To rollback the changes made to the database – ROLLBACK

DATABASE MANAGEMENT SYSTEM Page 13


Data models in DBMS

Data Model is a logical structure of Database. It describes the design of database to


reflect entities, attributes, relationship among data, constrains etc.

Types of Data Models


There are several types of data models in DBMS. We will cover them in detail in
separate articles(Links to those separate tutorials are already provided below). In this
guide, we will just see a basic overview of types of models.

Object based logical Models – Describe data at the conceptual and view levels.

1. E-R Model
2. Object oriented Model

Record based logical Models – Like Object based model, they also describe data at the
conceptual and view levels. These models specify logical structure of database with
records, fields and attributes.

1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it has
graph-like structure rather than a tree-based structure. Unlike hierarchical model,
this model allows each record to have more than one parent record.

Physical Data Models – These models describe data at the lowest level of abstraction.

Entity Relationship Diagram – ER Diagram in DBMS

An Entity–relationship model (ER model) describes the structure of a database with


the help of a diagram, which is known as Entity Relationship Diagram (ER Diagram).
An ER model is a design or blueprint of a database that can later be implemented as a
database. The main components of E-R model are: entity set and relationship set.

What is an Entity Relationship Diagram (ER Diagram)?

An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in database, so by showing relationship among tables and
their attributes, ER diagram shows the complete logical structure of a database. Lets have
a look at a simple ER diagram to understand this concept.

A simple ER Diagram:

In the following diagram we have two entities Student and College and their relationship.
The relationship between Student and College is many to one as a college can have many
students however a student cannot study in multiple colleges at the same time. Student
DATABASE MANAGEMENT SYSTEM Page 14
entity has attributes such as Stu_Id, Stu_Name & Stu_Addr and College entity has
attributes such as Col_ID & Col_Name.

Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss
these terms in detail in the next section(Components of a ER Diagram) of this guide so
don’t worry too much about these terms now, just go through them once.

Rectangle: Represents Entity sets.


Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set

Components of a ER Diagram

DATABASE MANAGEMENT SYSTEM Page 15


As shown in the above diagram, an ER diagram has three main components:
1. Entity
2. Attribute
3. Relationship

1. Entity

An entity is an object or component of data. An entity is represented as rectangle in an


ER diagram.
For example: In the following ER diagram we have two entities Student and College and
these two entities have many to one relationship as many students study in a single
college. We will read more about relationships later, for now focus on entities.

Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
DATABASE MANAGEMENT SYSTEM Page 16
double rectangle. For example – a bank account cannot be uniquely identified without
knowing the bank to which the account belongs, so bank account is a weak entity.

2. Attribute

An attribute describes the property of an entity. An attribute is represented as Oval in an


ER diagram. There are four types of attributes:

1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

1. Key attribute:

A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined.

DATABASE MANAGEMENT SYSTEM Page 17


2. Composite attribute:

An attribute that is a combination of other attributes is known as composite attribute. For


example, In student entity, the student address is a composite attribute as an address is
composed of other attributes such as pin code, state, country.

3. Multivalued attribute:

An attribute that can hold multiple values is known as multivalued attribute. It is


represented with double ovals in an ER Diagram. For example – A person can have
more than one phone numbers so the phone number attribute is multivalued.

4. Derived attribute:

A derived attribute is one whose value is dynamic and derived from another attribute. It
is represented by dashed oval in an ER Diagram. For example – Person age is a derived
attribute as it changes over time and can be derived from another attribute (Date of birth).

E-R diagram with multivalued and derived attributes:

DATABASE MANAGEMENT SYSTEM Page 18


3. Relationship

A relationship is represented by diamond shape in ER diagram, it shows the relationship


among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

1. One to One Relationship

When a single instance of an entity is associated with a single instance of another entity
then it is called one to one relationship. For example, a person has only one passport and
a passport is given to one person.

2. One to Many Relationship

When a single instance of an entity is associated with more than one instances of another
entity then it is called one to many relationship. For example – a customer can place
many orders but a order cannot be placed by many customers.

3. Many to One Relationship

When more than one instances of an entity is associated with a single instance of another
entity then it is called many to one relationship. For example – many students can study
in a single college but a student cannot study in many colleges at the same time.

DATABASE MANAGEMENT SYSTEM Page 19


4. Many to Many Relationship

When more than one instances of an entity is associated with more than one instances of
another entity then it is called many to many relationship. For example, a can be assigned
to many projects and a project can be assigned to many students.

Total Participation of an Entity set

A Total participation of an entity set represents that each entity in entity set must have at
least one relationship in a relationship set. For example: In the below diagram each
college must have at-least one associated Student.

DATABASE MANAGEMENT SYSTEM Page 20


DBMS Generalization

Generalization is a process in which the common attributes of more than one entities
form a new entity. This newly formed entity is called generalized entity.

Generalization Example
Lets say we have two entities Student and Teacher.
Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary

The ER diagram before generalization looks like this:

These two entities have two common attributes: Name and Address, we can make a
generalized entity with these common attributes. Lets have a look at the ER model after
generalization.

The ER diagram after generalization:


We have created a new generalized entity Person and this entity has the common
attributes of both the entities. As you can see in the following ER diagram that after the
generalization process the entities Student and Teacher only has the specialized attributes
Grade and Salary respectively and their common attributes (Name & Address) are now
associated with a new entity Person which is in the relationship with both the entities
(Student & Teacher).

DATABASE MANAGEMENT SYSTEM Page 21


Note:
1. Generalization uses bottom-up approach where two or more lower level entities
combine together to form a higher level new entity.
2. The new generalized entity can further combine together with lower level entity to
create a further higher level generalized entity.

DBMS Specialization

Specialization is a process in which an entity is divided into sub-entities. You can think
of it as a reverse process of generalization, in generalization two entities combine
together to form a new higher level entity. Specialization is a top-down process.

The idea behind Specialization is to find the subsets of entities that have few distinguish
attributes. For example – Consider an entity employee which can be further classified as
sub-entities Technician, Engineer & Accountant because these sub entities have some
distinguish attributes.

Specialization Example

DATABASE MANAGEMENT SYSTEM Page 22


In the above diagram, we can see that we have a higher level entity “Employee” which
we have divided in sub entities “Technician”, “Engineer” & “Accountant”. All of these
are just an employee of a company, however their role is completely different and they
have few different attributes. Just for the example, I have shown that Technician handles
service requests, Engineer works on a project and Accountant handles the credit & debit
details. All of these three employee types have few attributes common such as name &
salary which we had left associated with the parent entity “Employee” as shown in the
above diagram.

DBMS Aggregration

Aggregation is a process in which a single entity alone is not able to make sense in a
relationship so the relationship of two entities acts as one entity. I know it sounds
confusing but don’t worry the example we will take, will clear all the doubts.

Aggregration Example

DATABASE MANAGEMENT SYSTEM Page 23


In real world, we
know that a manager not only manages the employee working under them but he has to
manage the project as well. In such scenario if entity “Manager” makes a “manages”
relationship with either “Employee” or “Project” entity alone then it will not make any
sense because he has to manage both. In these cases the relationship of two entities acts
as one entity. In our example, the relationship “Works-On” between “Employee” &
“Project” acts as one entity that has a relationship “Manages” with the entity “Manager”.

Relational model in DBMS

In relational model, the data and relationships are represented by collection of inter-
related tables. Each table is a group of column and rows, where column represents
attribute of an entity and rows represents records.

Sample relationship Model: Student table with 3 columns and four records.

Table: Student

Stu_Id Stu_Name Stu_Age

DATABASE MANAGEMENT SYSTEM Page 24


111 Ashish 23

123 Saurav 22

169 Lester 24

234 Lou 26

Table: Course

Stu_Id Course_Id Course_Name

111 C01 Science

DATABASE MANAGEMENT SYSTEM Page 25


111 C02 DBMS

169 C22 Java

169 C39 Computer Networks

Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id
& Course_Name are attributes of table Course. The rows with values are the records
(commonly known as tuples).

Hierarchical model in DBMS

In hierarchical model, data is organized into a tree like structure with each record is
having one parent record and many children. The main drawback of this model is that, it
can have only one to many relationships between nodes.

Note: Hierarchical models are rarely used now.

Sample Hierarchical Model Diagram:


Lets say we have few students and few courses and a course can be assigned to a single
student only, however a student take any number of courses so this relationship becomes
one to many.

DATABASE MANAGEMENT SYSTEM Page 26


Example of hierarchical data represented as relational tables: The above hierarchical
model can be represented as relational tables like this:

Stu_Id Stu_Name Stu_Age

123 Steve 29

367 Chaitanya 27

234 Ajeet 28

Course Table:

DATABASE MANAGEMENT SYSTEM Page 27


Course_Id Course_Name Stu_Id

C01 Cobol 123

C21 Java 367

C22 Perl 367

C33 JQuery 234

DATABASE MANAGEMENT SYSTEM Page 28


keys in DBMS

Key plays an important role in relational database; it is used for identifying unique rows
from table. It also establishes relationship among tables.

Types of keys in DBMS

Primary key in DBMS

Definition: A primary key is a minimal set of attributes (columns) in a table that


uniquely identifies tuples (rows) in that table.

Primary Key Example in DBMS


Lets take an example to understand the concept of primary key. In the following table,
there are three attributes: Stu_ID, Stu_Name & Stu_Age. Out of these three attributes,
one attribute or a set of more than one attributes can be a primary key.

Attribute Stu_Name alone cannot be a primary key as more than one students can have
same name.

Attribute Stu_Age alone cannot be a primary key as more than one students can have
same age.

Attribute Stu_Id alone is a primary key as each student has a unique id that can identify
the student record in the table.

Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that
case we try to find a set of attributes that can uniquely identify a row in table. We will
see the example of it after this example.

Table Name: STUDENT

Stu_Id Stu_Name Stu_Age

101 Steve 23

102 John 24

DATABASE MANAGEMENT SYSTEM Page 29


103 Robert 28

104 Steve 29

105 Carl 29

Points to Note regarding Primary Key

 We denote usually denote it by underlining the attribute name (column name).


 The value of primary key should be unique for each row of the table. The column(s)
that makes the key cannot contain duplicate values.
 The attribute(s) that is marked as primary key is not allowed to have null values.
 Primary keys are not necessarily to be a single attribute (column). It can be a set of
more than one attributes (columns). For example {Stu_Id, Stu_Name} collectively
can identify the tuple in the above table, but we do not choose it as primary key
because Stu_Id alone is enough to uniquely identifies rows in a table and we always
go for minimal set. Having that said, we should choose more than one columns as
primary key only when there is no single column that can uniquely identify the
tuple in table.

Another example of primary key – More than one attributes

Consider this table ORDER, this table keeps the daily record of the purchases made by
the customer. This table has three
attributes: Customer_ID, Product_ID & Order_Quantity.

Customer_ID alone cannot be a primary key as a single customer can place more than
one order thus more than one rows of same Customer_ID value. As we see in the
following example that customer id 1011 has placed two orders with product if 9023 and
9111.

Product_ID alone cannot be a primary key as more than one customers can place a order
for the same product thus more than one rows with same product id. In the following
table, customer id 1011 & 1122 placed an order for the same product (product id 9023).

Order_Quantity alone cannot be a primary key as more more than one customers can
place the order for the same quantity.

Since none of the attributes alone were able to become a primary key, lets try to make a
set of attributes that plays the role of it.

DATABASE MANAGEMENT SYSTEM Page 30


{Customer_ID, Product_ID} together can identify the rows uniquely in the table so this
set is the primary key for this table.

Table Name: ORDER

Customer_ID Product_ID Order_Quantity

1011 9023 10

1122 9023 15

1099 9031 20

1177 9031 18

1011 9111 50

Note: While choosing a set of attributes for a primary key, we always choose the
minimal set that has minimum number of attributes. For example, if there are two sets
that can identify row in table, the set that has minimum number of attributes should be
chosen as primary key.

How to define primary key in RDBMS?


In the above example, we already had a table with data and we were trying to understand
the purpose and meaning of primary key, however you should know that generally we
define the primary key during table creation. We can define the primary key later as well
but that rarely happens in the real world scenario.

Lets say we want to create the table that we have discussed above with the customer id
and product id set working as primary key. We can do that in SQL like this:

Create table ORDER


DATABASE MANAGEMENT SYSTEM Page 31
(
Customer_ID int not null,
Product_ID int not null,
Order_Quantity int not null,
Primary key (Customer_ID, Product_ID)
)
Suppose we didn’t define the primary key while creating table then we can define it later
like this:

ALTER TABLE ORDER


ADD CONSTRAINT PK_Order PRIMARY KEY (Customer_ID, Product_ID);
Another way:
When we have only one attribute as primary key, like we see in the first example of
STUDENT table. we can define the key like this as well:

Create table STUDENT


(
Stu_Id int primary key,
Stu_Name varchar(255) not null,
Stu_Age int not null
)
Super key in DBMS

Definition of Super Key in DBMS: A super key is a set of one or more attributes
(columns), which can uniquely identify a row in a table.

How candidate key is different from super key?

Answer is simple – Candidate keys are selected from the set of super keys, the only thing
we take care while selecting candidate key is: It should not have any redundant attribute.
That’s the reason they are also termed as minimal super key.

Let’s take an example to understand this:


Table: Employee

Emp_SSN Emp_Number Emp_Name


--------- ---------- --------
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys: The above table has following super keys. All of the following sets of super
key are able to uniquely identify a row of the employee table.

 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}

DATABASE MANAGEMENT SYSTEM Page 32


 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}

The following two set of super keys are chosen from the above sets as there are no
redundant attributes in these sets.

 {Emp_SSN}
 {Emp_Number}

Only these two sets are candidate keys as all other sets are having redundant attributes
that are not necessary for unique identification.

Super key vs Candidate Key.


1. First you have to understand that all the candidate keys are super keys. This is because
the candidate keys are chosen out of the super keys.

2. How we choose candidate keys from the set of super keys? We look for those keys
from which we cannot remove any fields. In the above example, we have not chosen
{Emp_SSN, Emp_Name} as candidate key because {Emp_SSN} alone can identify a
unique row in the table and Emp_Name is redundant.

Candidate Key in DBMS

Definition of Candidate Key in DBMS: A super key with no redundant attribute is


known as candidate key. Candidate keys are selected from the set of super keys, the only
thing we take care while selecting candidate key is that the candidate key should not have
any redundant attributes. That’s the reason they are also termed as minimal super key.

Candidate Key Example

Lets take an example of table “Employee”. This table has three attributes: Emp_Id,
Emp_Number & Emp_Name. Here Emp_Id & Emp_Number will be having unique
values and Emp_Name can have duplicate values as more than one employees can have
same name.

Emp_Id Emp_Number Emp_Name


------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
How many super keys the above table can have?
1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}

DATABASE MANAGEMENT SYSTEM Page 33


5. {Emp_Id, Emp_Number, Emp_Name}
6. {Emp_Number, Emp_Name}

Lets select the candidate keys from the above set of super keys.

1. {Emp_Id} – No redundant attributes


2. {Emp_Number} – No redundant attributes
3. {Emp_Id, Emp_Number} – Redundant attribute. Either of those attributes can be a
minimal super key as both of these columns have unique values.
4. {Emp_Id, Emp_Name} – Redundant attribute Emp_Name.
5. {Emp_Id, Emp_Number, Emp_Name} – Redundant attributes. Emp_Id or
Emp_Number alone are sufficient enough to uniquely identify a row of Employee table.
6. {Emp_Number, Emp_Name} – Redundant attribute Emp_Name.

The candidate keys we have selected are:


{Emp_Id}
{Emp_Number}

Note: A primary key is selected from the set of candidate keys. That means we can either
have Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database
administrator)

Foreign key in DBMS

Definition: Foreign keys are the columns of a table that points to the primary key of
another table. They act as a cross-reference between tables.

For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.

Course_enrollment table:

Course_Id Stu_Id

C01 101

C02 102

DATABASE MANAGEMENT SYSTEM Page 34


C03 101

C05 102

C06 103

C07 102

Student table:

Stu_Id Stu_Name Stu_Age

101 Chaitanya 22

102 Arya 26

103 Bran 25

104 Jon 21

Note: Practically, the foreign key has nothing to do with the primary key tag of another
table, if it points to a unique column (not necessarily a primary key) of another table then
too, it would be a foreign key. So, a correct definition of foreign key would be: Foreign
keys are the columns of a table that points to the candidate key of another table.

DATABASE MANAGEMENT SYSTEM Page 35


Composite key in DBMS

Definition of Composite key: A key that has more than one attributes is known as
composite key. It is also known as compound key.

Note: Any key such as super key, primary key, candidate key etc. can be called
composite key if it has more than one attributes.

Composite key Example


Lets consider a table Sales. This table has four columns (attributes) – cust_Id, order_Id,
product_code & product_count.

Table – Sales

cust_Id order_Id product_code product_count


-------- -------- ------------ -------------
C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42
None of these columns alone can play a role of key in this table.

Column cust_Id alone cannot become a key as a same customer can place multiple
orders, thus the same customer can have multiple entires.

Column order_Id alone cannot be a primary key as a same order can contain the order of
multiple products, thus same order_Id can be present multiple times.

Column product_code cannot be a primary key as more than one customers can place
order for the same product.

Column product_count alone cannot be a primary key because two orders can be placed
for the same product count.

Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}

This is a composite key as it is made up of more than one attributes.

Alternate key in DBMS

As we have seen in the candidate key guide that a table can have multiple candidate keys.
Among these candidate keys, only one key gets selected as primary key, the remaining
keys are known as alternative or secondary keys.

DATABASE MANAGEMENT SYSTEM Page 36


Alternate Key Example

Lets take an example to understand the alternate key concept. Here we have a table
Employee, this table has three attributes: Emp_Id, Emp_Number & Emp_Name.

Table: Employee/strong>

Emp_Id Emp_Number Emp_Name


------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert
There are two candidate keys in the above table:
{Emp_Id}
{Emp_Number}

DBA (Database administrator) can choose any of the above key as primary key. Lets say
Emp_Id is chosen as primary key.

Since we have selected Emp_Id as primary key, the remaining key Emp_Number would
be called alternative or secondary key.

DATABASE MANAGEMENT SYSTEM Page 37


Constraints in DBMS

Constraints enforce limits to the data or type of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during
an update/delete/insert into a table. In this tutorial we will learn several types of
constraints that can be created in RDBMS.

Types of constraints

 NOT NULL
 UNIQUE
 DEFAULT
 CHECK
 Key Constraints – PRIMARY KEY, FOREIGN KEY
 Domain constraints
 Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value. When we
don’t provide value for a particular column while inserting a record into a table, it takes
NULL value by default. By specifying NULL constraint, we can be sure that a particular
column(s) cannot have NULL values.

Example:

How to specify the NULL constraint while creating table

Here I am creating a table “STUDENTS”. I have specified NOT NULL constraint for
columns ROLL_NO, STU_NAME and STU_AGE which means you must provide the
value for these three fields while inserting/updating records in this table. It enforces these
column(s) not to accept null values.

CREATE TABLE STUDENTS(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (235) ,
PRIMARY KEY (ROLL_NO)
);
Specify the NULL constraint for already existing table

In the above section we learnt how to specify the NULL constraint while creating a table.
However we can specify this constraint on a already present table also. For this we need
to use ALTER TABLE statement.

DATABASE MANAGEMENT SYSTEM Page 38


ALTER TABLE STUDENTS
MODIFY STU_ADDRESS VARCHAR (235) NOT NULL;
After this STU_ADDRESS column will not accept any null values.

UNIQUE Constraint in SQL

UNIQUE Constraint enforces a column or set of columns to have unique values. If a


column has a Unique constraint, it means that particular column cannot have duplicate
values in a table.

Set UNIQUE Constraint while creating a table

For SQL Server / MS Access / Oracle:

Syntax:

CREATE TABLE <table_name>


(
<column_name> <data_type> UNIQUE,
<column_name2> <data_type>,
....
....
);
Example:

Here we are setting up the UNIQUE Constraint for two columns: STU_NAME &
STU_ADDRESS. which means these two columns cannot have duplicate values.

Note: STU_NAME column has two constraints (NOT NULL and UNIQUE both) setup.

CREATE TABLE STUDENTS(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
Set UNIQUE Constraint on a already created table

For MySQL / Oracle / SQL Server / MS Access:

For single column and without constraint naming:

Syntax:

ALTER TABLE <table_name>

DATABASE MANAGEMENT SYSTEM Page 39


ADD UNIQUE (<column_name>);
Example:

ALTER TABLE STUDENTS


ADD UNIQUE (STU_NAME);
For multiple columns and with constraint naming:

Syntax:

ALTER TABLE <table_name>


ADD CONSTRAINT <constraint_name> UNIQUE (<column_name1>,
<column_name2>,...);
Example:

ALTER TABLE STUDENTS


ADD CONSTRAINT stu_Info UNIQUE (STU_NAME,STU_ADDRESS);
How to drop a UNIQUE Constraint

IN MySQL:

syntax:

ALTER TABLE <table_name>


DROP INDEX <constraint_name>;
Example:

ALTER TABLE STUDENTS


DROP INDEX stu_Info
IN ORACLE / SQL Server / MS Access:

Syntax:

ALTER TABLE <table_name>


DROP CONSTRAINT <constraint_name>;
Example:

ALTER TABLE STUDENTS


DROP CONSTRAINT stu_Info;

DEFAULT Constraint in SQL

The DEFAULT constraint provides a default value to a column when there is no value
provided while inserting a record into a table. Lets see how to specify this constraint and
how it works.

DATABASE MANAGEMENT SYSTEM Page 40


Specify DEFAULT constraint while creating a table

Here we are creating a table “STUDENTS”, we have a requirement to set the exam fees
to 10000 if fees is not specified while inserting a record (row) into the STUDENTS table.
We can do so by using DEFAULT constraint. As you can see we have set the default
value of EXAM_FEE column to 10000 using DEFAULT constraint.

CREATE TABLE STUDENTS(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
Specify DEFAULT constraint while creating a table

What if we want to set this constraint on a already existing table. For this we can ALTER
Table statement like this:

Syntax:

ALTER TABLE <table_name>


MODIFY <column_name> <column_data_type> DEFAULT <default_value>;
Example:

ALTER TABLE STUDENTS


MODIFY EXAM_FEE INT DEFAULT 10000;
This way we can set constraint on a already created table.

How to drop DEFAULT Constraint

In the above sections, we have learnt the ways to set Constraint. Here we will see how to
drop (delete) a Constraint:

Syntax:

ALTER TABLE <table_name>


ALTER COLUMN <column_name> DROP DEFAULT;
Example:
Lets say we want to drop the constraint from STUDENTS table, which we have created
in the above sections. We can do it like this.

ALTER TABLE CUSTOMERS


ALTER COLUMN EXAM_FEE DROP DEFAULT;

DATABASE MANAGEMENT SYSTEM Page 41


CHECK:

This constraint is used for specifying range of values for a particular column of a table.
When this constraint is being set on a column, it ensures that the specified column must
have the value falling in the specified range.

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
In the above example we have set the check constraint on ROLL_NO column of
STUDENT table. Now, the ROLL_NO field must have the value greater than 1000.

Key constraints:

PRIMARY KEY:

Primary key uniquely identifies each record in a table. It must have unique values and
cannot contain nulls. In the below example the ROLL_NO field is marked as primary
key, that means the ROLL_NO field cannot have duplicate and null values.

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.

Domain constraints:

A table is DBMS is a set of rows and columns that contain data. Columns in table have a
unique name, often referred as attributes in DBMS. A domain is a unique set of values
permitted for an attribute in a table. For example, a domain of month-of-year can accept
January, February….December as possible values, a domain of integers can accept whole
numbers that are negative, positive and zero.

Definition: Domain constraints are user defined data type and we can define them like
this:

DATABASE MANAGEMENT SYSTEM Page 42


Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY
/ FOREIGN KEY / CHECK / DEFAULT)

Example:
For example I want to create a table “student_info” with “stu_id” field having value
greater than 100, I can create a domain and table like this:

create domain id_value int


constraint id_test
check(value > 100);

create table student_info (


stu_id id_value PRIMARY KEY,
stu_name varchar(30),
stu_age int
);

Mapping constraints in DBMS

Mapping Cardinality:
One to One: An entity of entity-set A can be associated with at most one entity of entity-
set B and an entity in entity-set B can be associated with at most one entity of entity-set
A.

One to Many: An entity of entity-set A can be associated with any number of entities of
entity-set B and an entity in entity-set B can be associated with at most one entity of
entity-set A.

Many to One: An entity of entity-set A can be associated with at most one entity of
entity-set B and an entity in entity-set B can be associated with any number of entities of
entity-set A.

Many to Many: An entity of entity-set A can be associated with any number of entities
of entity-set B and an entity in entity-set B can be associated with any number of entities
of entity-set A.

We can have these constraints in place while creating tables in database.

Example:

DATABASE MANAGEMENT SYSTEM Page 43


CREATE TABLE Customer (
customer_id int PRIMARY KEY NOT NULL,
first_name varchar(20),
last_name varchar(20)
);

CREATE TABLE Order (


order_id int PRIMARY KEY NOT NULL,
customer_id int,
order_details varchar(50),
constraint fk_Customers foreign key (customer_id)
references dbo.Customer
);
Assuming, that a customer orders more than once, the above relation represents one to
many relation. Similarly we can achieve other mapping constraints based on the
requirements

DATABASE MANAGEMENT SYSTEM Page 44


Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of
information.
o Integrity constraints ensure that the data insertion, updating, and other processes
have to be performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the
database.

Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.

Example:

DATABASE MANAGEMENT SYSTEM Page 45


2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be
null or be available in Table 2.

Example:

DATABASE MANAGEMENT SYSTEM Page 46


4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set
uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary
key. A primary key can contain a unique and null value in the relational table.

Example:

DATABASE MANAGEMENT SYSTEM Page 47


DBMS Relational Algebra

What is Relational Algebra in DBMS?

Relational algebra is a procedural query language that works on relational model. The
purpose of a query language is to retrieve data from database or perform various
operations such as insert, update, delete on the data. When I say that relational algebra is
a procedural query language, it means that it tells what data to be retrieved and how to be
retrieved.

On the other hand relational calculus is a non-procedural query language, which means it
tells what data to be retrieved but doesn’t tell how to retrieve it. We will discuss
relational calculus.

Types of operations in relational algebra

We have divided these operations in two categories:


1. Basic Operations
2. Derived Operations

Basic/Fundamental Operations:

1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)

Derived Operations:

1. Natural Join (⋈)


2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)
4. Division (÷)

Lets discuss these operations one by one with the help of examples.

Select Operator (σ)


Select Operator is denoted by sigma (σ) and it is used to find the tuples (or rows) in a
relation (or table) which satisfy the given condition.

If you understand little bit of SQL then you can think of it as a where clause in SQL,
which is used for the same purpose.

Syntax of Select Operator (σ)

DATABASE MANAGEMENT SYSTEM Page 48


σ Condition/Predicate(Relation/Table name)
Select Operator (σ) Example

Table: CUSTOMER
---------------

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:

σ Customer_City="Agra" (CUSTOMER)
Output:

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
Project Operator (∏)

Project operator is denoted by ∏ symbol and it is used to select desired columns (or
attributes) from a table (or relation).

Project operator in relational algebra is similar to the Select statement in SQL.

Syntax of Project Operator (∏)

∏ column_name1, column_name2, ...., column_nameN(table_name)


Project Operator (∏) Example

In this example, we have a table CUSTOMER with three columns, we want to fetch only
two columns of the table, which we can do with the help of Project Operator ∏.

Table: CUSTOMER

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:

DATABASE MANAGEMENT SYSTEM Page 49


∏ Customer_Name, Customer_City (CUSTOMER)
Output:

Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Union Operator (∪)

Union operator is denoted by ∪ symbol and it is used to select all the rows (tuples) from
two tables (relations).

Lets discuss union operator a bit more. Lets say we have two relations R1 and R2 both
have same columns and we want to select all the tuples(rows) from these relations then
we can apply the union operator on these relations.

Note: The rows (tuples) that are present in both the tables will only appear once in the
union set. In short you can say that there are no duplicates present after the union
operation.

Syntax of Union Operator (∪)

table_name1 ∪ table_name2
Union Operator (∪) Example

Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
DATABASE MANAGEMENT SYSTEM Page 50
Query:

∏ Student_Name (COURSE) ∪ ∏ Student_Name (STUDENT)


Output:

Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names present in the output even though we
had few common names in both the tables, also in the COURSE table we had the
duplicate name itself.

Intersection Operator (∩)

Intersection operator is denoted by ∩ symbol and it is used to select common rows


(tuples) from two tables (relations).

Lets say we have two relations R1 and R2 both have same columns and we want to select
all those tuples(rows) that are present in both the relations, then in that case we can apply
intersection operation on these two relations R1 ∩ R2.

Note: Only those rows that are present in both the tables will appear in the result set.

Syntax of Intersection Operator (∩)

table_name1 ∩ table_name2
Intersection Operator (∩) Example

Lets take the same example that we have taken above.


Table 1: COURSE

Course_Id Student_Name Student_Id


--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT

Student_Id Student_Name Student_Age


------------ ---------- -----------
DATABASE MANAGEMENT SYSTEM Page 51
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18
Query:

∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)


Output:

Student_Name
------------
Aditya
Steve
Paul
Lucy
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we have two relations R1 and R2 and we
want to select all those tuples(rows) that are present in Relation R1 but not present in
Relation R2, this can be done using Set difference R1 – R2.

Syntax of Set Difference (-)

table_name1 - table_name2
Set Difference (-) Example

Lets take the same tables COURSE and STUDENT that we have seen above.

Query:
Lets write a query to select those student names that are present in STUDENT table but
not present in COURSE table.

∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)


Output:

Student_Name
------------
Carl
Rick
Cartesian product (X)

Cartesian Product is denoted by X symbol. Lets say we have two relations R1 and R2
then the cartesian product of these two relations (R1 X R2) would combine each tuple of
first relation R1 with the each tuple of second relation R2. I know it sounds confusing but
once we take an example of this, you will be able to understand this.
DATABASE MANAGEMENT SYSTEM Page 52
Syntax of Cartesian product (X)

R1 X R2
Cartesian product (X) Example

Table 1: R

Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S

Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.

RXS
Output:

Col_A Col_B Col_X Col_Y


----- ------ ------ ------
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101
Note: The number of rows in the output will always be the cross product of number of
rows in each table. In our example table 1 has 3 rows and table 2 has 3 rows so the output
has 3×3 = 9 rows.

Rename (ρ)

Rename (ρ) operation can be used to rename a relation or an attribute of a relation.


Rename (ρ) Syntax:
ρ(new_relation_name, old_relation_name)

DATABASE MANAGEMENT SYSTEM Page 53


Rename (ρ) Example

Lets say we have a table customer, we are fetching customer names and we are renaming
the resulted relation to CUST_NAMES.

Table: CUSTOMER

Customer_Id Customer_Name Customer_City


----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi
Query:

ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:

CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl

DBMS Relational Calculus

What is Relational Calculus?

Relational calculus is a non-procedural query language that tells the system what data to
be retrieved but doesn’t tell how to retrieve it.

Types of Relational Calculus

DATABASE MANAGEMENT SYSTEM Page 54


1. Tuple Relational Calculus (TRC)

Tuple relational calculus is used for selecting those tuples that satisfy the given
condition.
Table: Student

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Lets write relational calculus queries.

Query to display the last name of those students where age is greater than 30

{ t.Last_Name | Student(t) AND t.age > 30 }


In the above query you can see two parts separated by | symbol. The second part is where
we define the condition and in the first part we specify the fields which we want to
display for the selected tuples.

The result of the above query would be:

Last_Name
---------
Singh
Query to display all the details of students where Last name is ‘Singh’

{ t | Student(t) AND t.Last_Name = 'Singh' }


Output:

First_Name Last_Name Age


DATABASE MANAGEMENT SYSTEM Page 55
---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
2. Domain Relational Calculus (DRC)

In domain relational calculus the records are filtered based on the domains.
Again we take the same table to understand how DRC works.
Table: Student

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query to find the first name and age of students where student age is greater than 27

{< First_Name, Age > | ∈ Student ∧ Age > 27}


Note:
The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for NOT.

Output:

First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28

Join Operations:

A Join operation combines related tuples from different relations, if and only if a given
join condition is satisfied. It is denoted by ⋈.

DATABASE MANAGEMENT SYSTEM Page 56


Example:

EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry

SALARY

EMP_CODE SALARY

101 50000

102 30000

103 25000

1. Operation: (EMPLOYEE ⋈ SALARY)

Result:

EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000

Types of Join operations:

DATABASE MANAGEMENT SYSTEM Page 57


1. Natural Join:
o A natural join is the set of tuples of all combinations in R and S that are equal on
their common attribute names.
o It is denoted by ⋈.

Example: Let's use the above EMPLOYEE table and SALARY table:

Input:

1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)

Output:

EMP_NAME SALARY

Stephan 50000

Jack 30000

Harry 25000

DATABASE MANAGEMENT SYSTEM Page 58


2. Outer Join:

The outer join operation is an extension of the join operation. It is used to deal with
missing information.

Example:

EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad

FACT_WORKERS

EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000

Input:

1. (EMPLOYEE ⋈ FACT_WORKERS)

Output:

EMP_NAME STREET CITY BRANCH SALARY

DATABASE MANAGEMENT SYSTEM Page 59


Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru nagar Hyderabad TCS 50000

An outer join is basically of three types:

a. Left outer join


b. Right outer join
c. Full outer join

a. Left outer join:


o Left outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names.
o In the left outer join, tuples in R have no matching tuples in S.
o It is denoted by ⟕.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟕ FACT_WORKERS
EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

b. Right outer join:


o Right outer join contains the set of tuples of all combinations in R and S that are
equal on their common attribute names.
o In right outer join, tuples in S have no matching tuples in R.
o It is denoted by ⟖.
DATABASE MANAGEMENT SYSTEM Page 60
Example: Using the above EMPLOYEE table and FACT_WORKERS Relation

Input:

1. EMPLOYEE ⟖ FACT_WORKERS

Output:

EMP_NAME BRANCH SALARY STREET CITY

Ram Infosys 10000 Civil line Mumbai

Shyam Wipro 20000 Park street Kolkata

Hari TCS 50000 Nehru street Hyderabad

Kuber HCL 30000 NULL NULL

c. Full outer join:


o Full outer join is like a left or right join except that it contains all rows from both
tables.
o In full outer join, tuples in R that have no matching tuples in S and tuples in S that
have no matching tuples in R in their common attribute name.
o It is denoted by ⟗.

Example: Using the above EMPLOYEE table and FACT_WORKERS table

Input:

1. EMPLOYEE ⟗ FACT_WORKERS

Output:

EMP_NAME STREET CITY BRANCH SALARY

Ram Civil line Mumbai Infosys 10000

Shyam Park street Kolkata Wipro 20000

DATABASE MANAGEMENT SYSTEM Page 61


Hari Nehru street Hyderabad TCS 50000

Ravi M.G. Street Delhi NULL NULL

Kuber NULL NULL HCL 30000

3. Equi join:

It is also known as an inner join. It is the most common join. It is based on matched data
as per the equality condition. The equi join uses the comparison operator(=).

Example:

CUSTOMER RELATION

CLASS_ID NAME

1 John

2 Harry

3 Jackson

PRODUCT

PRODUCT_ID CITY

1 Delhi

2 Mumbai

3 Noida

Input:

1. CUSTOMER ⋈ PRODUCT

DATABASE MANAGEMENT SYSTEM Page 62


Output:

CLASS_ID NAME PRODUCT_ID CITY

1 John 1 Delhi

2 Harry 2 Mumbai

3 Harry 3 Noida

Unit -3 -SQL

SQL (Structured Query Language) is used to perform operations on the records stored in
the database such as updating records, deleting records, creating and modifying tables,
views, etc.

SQL is just a query language; it is not a database. To perform SQL queries, you need to
install any database, for example, Oracle, MySQL, MongoDB, PostGre SQL, SQL
Server, DB2, etc.

What is SQL

o SQL stands for Structured Query Language.


o It is designed for managing data in a relational database management system
(RDBMS).
o It is pronounced as S-Q-L or sometime See-Qwell.
o SQL is a database language, it is used for database creation, deletion, fetching
rows, and modifying rows, etc.
o SQL is based on relational algebra and tuple relational calculus.

All DBMS like MySQL, Oracle, MS Access, Sybase, Informix, PostgreSQL, and SQL
Server use SQL as standard database language.

Why SQL is required


DATABASE MANAGEMENT SYSTEM Page 63
SQL is required:

o To create new databases, tables and views


o To insert records in a database
o To update records in a database
o To delete records from a database
o To retrieve data from a database

What SQL does

o With SQL, we can query our database in several ways, using English-like
statements.
o With SQL, a user can access data from a relational database management system.
o It allows the user to describe the data.
o It allows the user to define the data in the database and manipulate it when needed.
o It allows the user to create and drop database and table.
o It allows the user to create a view, stored procedure, function in a database.
o It allows the user to set permission on tables, procedures, and views.

SQL Syntax

SQL follows some unique set of rules and guidelines called syntax. Here, we are
providing all the basic SQL syntax.

o SQL is not case sensitive. Generally SQL keywords are written in uppercase.
o SQL statements are dependent on text lines. We can place a single SQL statement
on one or multiple text lines.
o You can perform most of the action in a database with SQL statements.
o SQL depends on relational algebra and tuple relational calculus.

SQL statement

SQL statements are started with any of the SQL commands/keywords like SELECT,
INSERT, UPDATE, DELETE, ALTER, DROP etc. and the statement ends with a
semicolon (;).

Example of SQL statement:

1. SELECT "column_name" FROM "table_name";

Why semicolon is used after SQL statements:

DATABASE MANAGEMENT SYSTEM Page 64


Semicolon is used to separate SQL statements. It is a standard way to separate SQL
statements in a database system in which more than one SQL statements are used in the
same call.

In this tutorial, we will use semicolon at the end of each SQL statement.

SQL Commands

These are the some important SQL command:

o SELECT: it extracts data from a database.


o UPDATE: it updates data in database.
o DELETE: it deletes data from database.
o CREATE TABLE: it creates a new table.
o ALTER TABLE: it is used to modify the table.
o DROP TABLE: it deletes a table.
o CREATE DATABASE: it creates a new database.
o ALTER DATABASE: It is used to modify a database.
o INSERT INTO: it inserts new data into a database.
o CREATE INDEX: it is used to create an index (search key).
o DROP INDEX: it deletes an index.

SQL Data Types

Data types are used to represent the nature of the data that can be stored in the database
table. For example, in a particular column of a table, if we want to store a string type of
data then we will have to declare a string data type of this column.

Data types mainly classified into three categories for every database.

o String Data types


o Numeric Data types
o Date and time Data types

Data Types in MySQL, SQL Server and Oracle Databases


MySQL Data Types

A list of data types used in MySQL database. This is based on MySQL 8.0.

MySQL String Data Types

CHAR(Size) It is used to specify a fixed length string that can contain


numbers, letters, and special characters. Its size can be 0 to

DATABASE MANAGEMENT SYSTEM Page 65


255 characters. Default is 1.

VARCHAR(Size) It is used to specify a variable length string that can contain


numbers, letters, and special characters. Its size can be from
0 to 65535 characters.

BINARY(Size) It is equal to CHAR() but stores binary byte strings. Its size
parameter specifies the column length in the bytes. Default is
1.

VARBINARY(Size) It is equal to VARCHAR() but stores binary byte strings. Its


size parameter specifies the maximum column length in
bytes.

TEXT(Size) It holds a string that can contain a maximum length of 255


characters.

TINYTEXT It holds a string with a maximum length of 255 characters.

MEDIUMTEXT It holds a string with a maximum length of 16,777,215.

LONGTEXT It holds a string with a maximum length of 4,294,967,295


characters.

ENUM(val1, val2, It is used when a string object having only one value, chosen
val3,...) from a list of possible values. It contains 65535 values in an
ENUM list. If you insert a value that is not in the list, a blank
value will be inserted.

SET( It is used to specify a string that can have 0 or more values,


val1,val2,val3,....) chosen from a list of possible values. You can list up to 64
values at one time in a SET list.

BLOB(size) It is used for BLOBs (Binary Large Objects). It can hold up


to 65,535 bytes.

MySQL Numeric Data Types

DATABASE MANAGEMENT SYSTEM Page 66


BIT(Size) It is used for a bit-value type. The number of bits per value is
specified in size. Its size can be 1 to 64. The default value is 1.

INT(size) It is used for the integer value. Its signed range varies from -
2147483648 to 2147483647 and unsigned range varies from 0 to
4294967295. The size parameter specifies the max display width
that is 255.

INTEGER(size) It is equal to INT(size).

FLOAT(size, d) It is used to specify a floating point number. Its size parameter


specifies the total number of digits. The number of digits after
the decimal point is specified by d parameter.

FLOAT(p) It is used to specify a floating point number. MySQL used p


parameter to determine whether to use FLOAT or DOUBLE. If
p is between 0 to24, the data type becomes FLOAT (). If p is
from 25 to 53, the data type becomes DOUBLE().

DOUBLE(size, It is a normal size floating point number. Its size parameter


d) specifies the total number of digits. The number of digits after
the decimal is specified by d parameter.

DECIMAL(size, It is used to specify a fixed point number. Its size parameter


d) specifies the total number of digits. The number of digits after
the decimal parameter is specified by d parameter. The
maximum value for the size is 65, and the default value is 10.
The maximum value for d is 30, and the default value is 0.

DEC(size, d) It is equal to DECIMAL(size, d).

BOOL It is used to specify Boolean values true and false. Zero is


considered as false, and nonzero values are considered as true.

MySQL Date and Time Data Types

DATE It is used to specify date format YYYY-MM-DD. Its


supported range is from '1000-01-01' to '9999-12-31'.

DATABASE MANAGEMENT SYSTEM Page 67


DATETIME(fsp) It is used to specify date and time combination. Its format is
YYYY-MM-DD hh:mm:ss. Its supported range is from
'1000-01-01 00:00:00' to 9999-12-31 23:59:59'.

TIMESTAMP(fsp) It is used to specify the timestamp. Its value is stored as the


number of seconds since the Unix epoch('1970-01-01
00:00:00' UTC). Its format is YYYY-MM-DD hh:mm:ss. Its
supported range is from '1970-01-01 00:00:01' UTC to
'2038-01-09 03:14:07' UTC.

TIME(fsp) It is used to specify the time format. Its format is hh:mm:ss.


Its supported range is from '-838:59:59' to '838:59:59'

YEAR It is used to specify a year in four-digit format. Values


allowed in four digit format from 1901 to 2155, and 0000.

SQL Server Data Types

SQL Server String Data Type

char(n) It is a fixed width character string data type. Its size can be up to
8000 characters.

varchar(n) It is a variable width character string data type. Its size can be up to
8000 characters.

varchar(max) It is a variable width character string data types. Its size can be up
to 1,073,741,824 characters.

text It is a variable width character string data type. Its size can be up to
2GB of text data.

nchar It is a fixed width Unicode string data type. Its size can be up to
4000 characters.

nvarchar It is a variable width Unicode string data type. Its size can be up to
4000 characters.

DATABASE MANAGEMENT SYSTEM Page 68


ntext It is a variable width Unicode string data type. Its size can be up to
2GB of text data.

binary(n) It is a fixed width Binary string data type. Its size can be up to 8000
bytes.

varbinary It is a variable width Binary string data type. Its size can be up to
8000 bytes.

image It is also a variable width Binary string data type. Its size can be up
to 2GB.

SQL Server Numeric Data Types

bit It is an integer that can be 0, 1 or null.

tinyint It allows whole numbers from 0 to 255.

Smallint It allows whole numbers between -32,768 and 32,767.

Int It allows whole numbers between -2,147,483,648 and 2,147,483,647.

bigint It allows whole numbers between -9,223,372,036,854,775,808 and


9,223,372,036,854,775,807.

float(n) It is used to specify floating precision number data from -1.79E+308 to


1.79E+308. The n parameter indicates whether the field should hold the 4
or 8 bytes. Default value of n is 53.

real It is a floating precision number data from -3.40E+38 to 3.40E+38.

money It is used to specify monetary data from -922,337,233,685,477.5808 to


922,337,203,685,477.5807.

SQL Server Date and Time Data Type

datetime It is used to specify date and time combination. It supports range from

DATABASE MANAGEMENT SYSTEM Page 69


January 1, 1753, to December 31, 9999 with an accuracy of 3.33
milliseconds.

datetime2 It is used to specify date and time combination. It supports range from
January 1, 0001 to December 31, 9999 with an accuracy of 100
nanoseconds

date It is used to store date only. It supports range from January 1, 0001 to
December 31, 9999

time It stores time only to an accuracy of 100 nanoseconds

timestamp It stores a unique number when a new row gets created or modified.
The time stamp value is based upon an internal clock and does not
correspond to real time. Each table may contain only one-time stamp
variable.

SQL Server Other Data Types

Sql_variant It is used for various data types except for text, timestamp, and
ntext. It stores up to 8000 bytes of data.

XML It stores XML formatted data. Maximum 2GB.

cursor It stores a reference to a cursor used for database operations.

table It stores result set for later processing.

uniqueidentifier It stores GUID (Globally unique identifier).

Oracle Data Types

Oracle String data types

CHAR(size) It is used to store character data within the predefined length. It


can be stored up to 2000 bytes.

DATABASE MANAGEMENT SYSTEM Page 70


NCHAR(size) It is used to store national character data within the predefined
length. It can be stored up to 2000 bytes.

VARCHAR2(size) It is used to store variable string data within the predefined


length. It can be stored up to 4000 byte.

VARCHAR(SIZE) It is the same as VARCHAR2(size). You can also use


VARCHAR(size), but it is suggested to use VARCHAR2(size)

NVARCHAR2(size) It is used to store Unicode string data within the predefined


length. We have to must specify the size of NVARCHAR2
data type. It can be stored up to 4000 bytes.

Oracle Numeric Data Types

NUMBER(p, s) It contains precision p and scale s. The precision p can range


from 1 to 38, and the scale s can range from -84 to 127.

FLOAT(p) It is a subtype of the NUMBER data type. The precision p


can range from 1 to 126.

BINARY_FLOAT It is used for binary precision( 32-bit). It requires 5 bytes,


including length byte.

BINARY_DOUBLE It is used for double binary precision (64-bit). It requires 9


bytes, including length byte.

Oracle Date and Time Data Types

DATE It is used to store a valid date-time format with a fixed length. Its
range varies from January 1, 4712 BC to December 31, 9999 AD.

TIMESTAMP It is used to store the valid date in YYYY-MM-DD with time


hh:mm:ss format.

Oracle Large Object Data Types (LOB Types)

DATABASE MANAGEMENT SYSTEM Page 71


BLOB It is used to specify unstructured binary data. Its range goes up to 2 32-1
bytes or 4 GB.

BFILE It is used to store binary data in an external file. Its range goes up to
232-1 bytes or 4 GB.

CLOB It is used for single-byte character data. Its range goes up to 232-1
bytes or 4 GB.

NCLOB It is used to specify single byte or fixed length multibyte national


character set (NCHAR) data. Its range is up to 232-1 bytes or 4 GB.

RAW(size) It is used to specify variable length raw binary data. Its range is up to
2000 bytes per row. Its maximum size must be specified.

LONG It is used to specify variable length raw binary data. Its range up to
RAW 231-1 bytes or 2 GB, per row.

What is an Operator in SQL?

An operator is a reserved word or a character used primarily in an SQL statement's


WHERE clause to perform operation(s), such as comparisons and arithmetic operations.
These Operators are used to specify conditions in an SQL statement and to serve as
conjunctions for multiple conditions in a statement.

 Arithmetic operators
 Comparison operators
 Logical operators
 Operators used to negate conditions

SQL Arithmetic Operators

Assume 'variable a' holds 10 and 'variable b' holds 20, then −
Show Examples

Operator Description Example

+ (Addition) Adds values on either side of the operator. a+b


will give

DATABASE MANAGEMENT SYSTEM Page 72


30

Subtracts right hand operand from left hand operand. a - b will


- (Subtraction)
give -10

* Multiplies values on either side of the operator. a * b will


(Multiplication) give 200

Divides left hand operand by right hand operand. b / a will


/ (Division)
give 2

Divides left hand operand by right hand operand and b%a


% (Modulus) returns remainder. will give
0

SQL Comparison Operators

Assume 'variable a' holds 10 and 'variable b' holds 20, then −
Show Examples

Operator Description Example

Checks if the values of two operands are equal or not, if yes (a = b) is


=
then condition becomes true. not true.

Checks if the values of two operands are equal or not, if (a != b)


!=
values are not equal then condition becomes true. is true.

Checks if the values of two operands are equal or not, if (a <> b)


<>
values are not equal then condition becomes true. is true.

Checks if the value of left operand is greater than the value of (a > b) is
>
right operand, if yes then condition becomes true. not true.

< Checks if the value of left operand is less than the value of
(a < b) is

DATABASE MANAGEMENT SYSTEM Page 73


right operand, if yes then condition becomes true. true.

Checks if the value of left operand is greater than or equal to (a >= b)


>= the value of right operand, if yes then condition becomes true. is not
true.

Checks if the value of left operand is less than or equal to the (a <= b)
<=
value of right operand, if yes then condition becomes true. is true.

Checks if the value of left operand is not less than the value of (a !< b)
!<
right operand, if yes then condition becomes true. is false.

Checks if the value of left operand is not greater than the (a !> b)
!>
value of right operand, if yes then condition becomes true. is true.

SQL Logical Operators

Here is a list of all the logical operators available in SQL.


Show Examples

Sr.No. Operator & Description

ALL
1
The ALL operator is used to compare a value to all values in another value
set.

AND
2
The AND operator allows the existence of multiple conditions in an SQL
statement's WHERE clause.

ANY
3
The ANY operator is used to compare a value to any applicable value in the
list as per the condition.

4 BETWEEN
The BETWEEN operator is used to search for values that are within a set of

DATABASE MANAGEMENT SYSTEM Page 74


values, given the minimum value and the maximum value.

EXISTS
5
The EXISTS operator is used to search for the presence of a row in a
specified table that meets a certain criterion.

IN
6
The IN operator is used to compare a value to a list of literal values that
have been specified.

LIKE
7
The LIKE operator is used to compare a value to similar values using
wildcard operators.

NOT
8 The NOT operator reverses the meaning of the logical operator with which
it is used. Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc. This is a
negate operator.

OR
9
The OR operator is used to combine multiple conditions in an SQL
statement's WHERE clause.

10 IS NULL
The NULL operator is used to compare a value with a NULL value.

UNIQUE
11
The UNIQUE operator searches every row of a specified table for
uniqueness (no duplicates).

What is RDBMS?

DATABASE MANAGEMENT SYSTEM Page 75


RDBMS stands for Relational Database Management System. RDBMS is the basis for
SQL, and for all modern database systems like MS SQL Server, IBM DB2, Oracle,
MySQL, and Microsoft Access.
A Relational database management system (RDBMS) is a database management system
(DBMS) that is based on the relational model as introduced by E. F. Codd.

What is a table?

The data in an RDBMS is stored in database objects which are called as tables. This
table is basically a collection of related data entries and it consists of numerous columns
and rows.
Remember, a table is the most common and simplest form of data storage in a relational
database. The following program is an example of a CUSTOMERS table −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

What is a field?

Every table is broken up into smaller entities called fields. The fields in the
CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about
every record in the table.

What is a Record or a Row?

A record is also called as a row of data is each individual entry that exists in a table. For
example, there are 7 records in the above CUSTOMERS table. Following is a single
row of data or record in the CUSTOMERS table −
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
A record is a horizontal entity in a table.

What is a column?

A column is a vertical entity in a table that contains all information associated with a
specific field in a table.
DATABASE MANAGEMENT SYSTEM Page 76
For example, a column in the CUSTOMERS table is ADDRESS, which represents
location description and would be as shown below −
+-----------+
| ADDRESS |
+-----------+
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
+----+------+

What is a NULL value?

A NULL value in a table is a value in a field that appears to be blank, which means a
field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value or a
field that contains spaces. A field with a NULL value is the one that has been left blank
during a record creation.

SQL Constraints

Constraints are the rules enforced on data columns on a table. These are used to limit the
type of data that can go into a table. This ensures the accuracy and reliability of the data
in the database.
Constraints can either be column level or table level. Column level constraints are
applied only to one column whereas, table level constraints are applied to the entire
table.
Following are some of the most commonly used constraints available in SQL −
 NOT NULL Constraint − Ensures that a column cannot have a NULL value.
 DEFAULT Constraint − Provides a default value for a column when none is
specified.
 UNIQUE Constraint − Ensures that all the values in a column are different.
 PRIMARY Key − Uniquely identifies each row/record in a database table.
 FOREIGN Key − Uniquely identifies a row/record in any another database table.
 CHECK Constraint − The CHECK constraint ensures that all values in a column
satisfy certain conditions.
 INDEX − Used to create and retrieve data from the database very quickly.

Data Integrity

The following categories of data integrity exist with each RDBMS −


DATABASE MANAGEMENT SYSTEM Page 77
 Entity Integrity − There are no duplicate rows in a table.
 Domain Integrity − Enforces valid entries for a given column by restricting the
type, the format, or the range of values.
 Referential integrity − Rows cannot be deleted, which are used by other records.
 User-Defined Integrity − Enforces some specific business rules that do not fall
into entity, domain or referential integrity.

Database Normalization

Database normalization is the process of efficiently organizing data in a database. There


are two reasons of this normalization process −
 Eliminating redundant data, for example, storing the same data in more than one
table.
 Ensuring data dependencies make sense.
Both these reasons are worthy goals as they reduce the amount of space a database
consumes and ensures that data is logically stored. Normalization consists of a series of
guidelines that help guide you in creating a good database structure.
Normalization guidelines are divided into normal forms; think of a form as the format or
the way a database structure is laid out. The aim of normal forms is to organize the
database structure, so that it complies with the rules of first normal form, then second
normal form and finally the third normal form.
The SQL CREATE DATABASE statement is used to create a new SQL database.

Syntax

The basic syntax of this CREATE DATABASE statement is as follows −


CREATE DATABASE DatabaseName;
Always the database name should be unique within the RDBMS.

Example

If you want to create a new database <testDB>, then the CREATE DATABASE
statement would be as shown below −
SQL> CREATE DATABASE testDB;
Make sure you have the admin privilege before creating any database. Once a database
is created, you can check it in the list of databases as follows −
SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
DATABASE MANAGEMENT SYSTEM Page 78
| mysql |
| orig |
| test |
| testDB |
+--------------------+
7 rows in set (0.00 sec)
The SQL DROP DATABASE statement is used to drop an existing database in SQL
schema.

Syntax

The basic syntax of DROP DATABASE statement is as follows −


DROP DATABASE DatabaseName;
Always the database name should be unique within the RDBMS.

Example

If you want to delete an existing database <testDB>, then the DROP DATABASE
statement would be as shown below −
SQL> DROP DATABASE testDB;
NOTE − Be careful before using this operation because by deleting an existing database
would result in loss of complete information stored in the database.
Make sure you have the admin privilege before dropping any database. Once a database
is dropped, you can check it in the list of the databases as shown below −
SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
+--------------------+
6 rows in set (0.00 sec)

When you have multiple databases in your SQL Schema, then before starting your
operation, you would need to select a database where all the operations would be
performed.
The SQL USE statement is used to select any existing database in the SQL schema.

Syntax

DATABASE MANAGEMENT SYSTEM Page 79


The basic syntax of the USE statement is as shown below −
USE DatabaseName;
Always the database name should be unique within the RDBMS.

Example

You can check the available databases as shown below −


SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
+--------------------+
6 rows in set (0.00 sec)
Now, if you want to work with the AMROOD database, then you can execute the
following SQL command and start working with the AMROOD database.
SQL> USE AMROOD;

Creating a basic table involves naming the table and defining its columns and each
column's data type.
The SQL CREATE TABLE statement is used to create a new table.

Syntax

The basic syntax of the CREATE TABLE statement is as follows −


CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
PRIMARY KEY( one or more columns )
);
CREATE TABLE is the keyword telling the database system what you want to do. In
this case, you want to create a new table. The unique name or identifier for the table
follows the CREATE TABLE statement.
Then in brackets comes the list defining each column in the table and what sort of data
type it is. The syntax becomes clearer with the following example.

DATABASE MANAGEMENT SYSTEM Page 80


A copy of an existing table can be created using a combination of the CREATE TABLE
statement and the SELECT statement. You can check the complete details at Create
Table Using another Table.

Example

The following code block is an example, which creates a CUSTOMERS table with an
ID as a primary key and NOT NULL are the constraints showing that these fields cannot
be NULL while creating records in this table −
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
You can verify if your table has been created successfully by looking at the message
displayed by the SQL server, otherwise you can use the DESC command as follows −
SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Now, you have CUSTOMERS table available in your database which you can use to
store the required information related to customers.

The SQL DROP TABLE statement is used to remove a table definition and all the data,
indexes, triggers, constraints and permission specifications for that table.
NOTE − You should be very careful while using this command because once a table is
deleted then all the information available in that table will also be lost forever.

Syntax

The basic syntax of this DROP TABLE statement is as follows −


DROP TABLE table_name;

Example

DATABASE MANAGEMENT SYSTEM Page 81


Let us first verify the CUSTOMERS table and then we will delete it from the database
as shown below −
SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
This means that the CUSTOMERS table is available in the database, so let us now drop
it as shown below.
SQL> DROP TABLE CUSTOMERS;
Query OK, 0 rows affected (0.01 sec)
Now, if you would try the DESC command, then you will get the following error −
SQL> DESC CUSTOMERS;
ERROR 1146 (42S02): Table 'TEST.CUSTOMERS' doesn't exist
Here, TEST is the database name which we are using for our examples
The SQL INSERT INTO Statement is used to add new rows of data to a table in the
database.
Syntax
There are two basic syntaxes of the INSERT INTO statement which are shown below.
INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)
VALUES (value1, value2, value3,...valueN);
Here, column1, column2, column3,...columnN are the names of the columns in the table
into which you want to insert the data.
You may not need to specify the column(s) name in the SQL query if you are adding
values for all the columns of the table. But make sure the order of the values is in the
same order as the columns in the table.
The SQL INSERT INTO syntax will be as follows −
INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
Example
The following statements would create six records in the CUSTOMERS table.
INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

DATABASE MANAGEMENT SYSTEM Page 82


INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)
VALUES (2, 'Khilan', 25, 'Delhi', 1500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (3, 'kaushik', 23, 'Kota', 2000.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (4, 'Chaitali', 25, 'Mumbai', 6500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (5, 'Hardik', 27, 'Bhopal', 8500.00 );

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (6, 'Komal', 22, 'MP', 4500.00 );
You can create a record in the CUSTOMERS table by using the second syntax as shown
below.
INSERT INTO CUSTOMERS
VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );
All the above statements would produce the following records in the CUSTOMERS
table as shown below.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Populate one table using another table

You can populate the data into a table through the select statement over another table;
provided the other table has a set of fields, which are required to populate the first table.
Here is the syntax −
INSERT INTO first_table_name [(column1, column2, ... columnN)]
SELECT column1, column2, ...columnN
FROM second_table_name
[WHERE condition];
The SQL SELECT statement is used to fetch the data from a database table which
returns this data in the form of a result table. These result tables are called result-sets.

Syntax
DATABASE MANAGEMENT SYSTEM Page 83
The basic syntax of the SELECT statement is as follows −
SELECT column1, column2, columnN FROM table_name;
Here, column1, column2... are the fields of a table whose values you want to fetch. If
you want to fetch all the fields available in the field, then you can use the following
syntax.
SELECT * FROM table_name;

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code is an example, which would fetch the ID, Name and Salary fields of
the customers available in CUSTOMERS table.
SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;
This would produce the following result −
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 1 | Ramesh | 2000.00 |
| 2 | Khilan | 1500.00 |
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
If you want to fetch all the fields of the CUSTOMERS table, then you should use the
following query.
SQL> SELECT * FROM CUSTOMERS;
This would produce the result as shown below.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
DATABASE MANAGEMENT SYSTEM Page 84
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

The SQL WHERE clause is used to specify a condition while fetching the data from a
single table or by joining with multiple tables. If the given condition is satisfied, then
only it returns a specific value from the table. You should use the WHERE clause to
filter the records and fetching only the necessary records.
The WHERE clause is not only used in the SELECT statement, but it is also used in the
UPDATE, DELETE statement, etc., which we would examine in the subsequent
chapters.

Syntax

The basic syntax of the SELECT statement with the WHERE clause is as shown below.
SELECT column1, column2, columnN
FROM table_name
WHERE [condition]
You can specify a condition using the comparison or logical operators like >, <,
=, LIKE, NOT, etc. The following examples would make this concept clear.

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code is an example which would fetch the ID, Name and Salary fields
from the CUSTOMERS table, where the salary is greater than 2000 −
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
DATABASE MANAGEMENT SYSTEM Page 85
WHERE SALARY > 2000;
This would produce the following result −
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
The following query is an example, which would fetch the ID, Name and Salary fields
from the CUSTOMERS table for a customer with the name Hardik.
Here, it is important to note that all the strings should be given inside single quotes ('').
Whereas, numeric values should be given without any quote as in the above example.

SQL> SELECT ID, NAME, SALARY


FROM CUSTOMERS
WHERE NAME = 'Hardik';
This would produce the following result −
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 5 | Hardik | 8500.00 |
+----+----------+----------+
The SQL AND & OR operators are used to combine multiple conditions to narrow data
in an SQL statement. These two operators are called as the conjunctive operators.
These operators provide a means to make multiple comparisons with different operators
in the same SQL statement.

The AND Operator

The AND operator allows the existence of multiple conditions in an SQL statement's
WHERE clause.
Syntax
The basic syntax of the AND operator with a WHERE clause is as follows −
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] AND [condition2]...AND [conditionN];
You can combine N number of conditions using the AND operator. For an action to be
taken by the SQL statement, whether it be a transaction or a query, all conditions
separated by the AND must be TRUE.

DATABASE MANAGEMENT SYSTEM Page 86


Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would fetch the ID, Name and Salary fields from the
CUSTOMERS table, where the salary is greater than 2000 and the age is less than 25
years −
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000 AND age < 25;
This would produce the following result −
+----+-------+----------+
| ID | NAME | SALARY |
+----+-------+----------+
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+-------+----------+

The OR Operator

The OR operator is used to combine multiple conditions in an SQL statement's WHERE


clause.
Syntax
The basic syntax of the OR operator with a WHERE clause is as follows −
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] OR [condition2]...OR [conditionN]
You can combine N number of conditions using the OR operator. For an action to be
taken by the SQL statement, whether it be a transaction or query, the only any ONE of
the conditions separated by the OR must be TRUE.
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
DATABASE MANAGEMENT SYSTEM Page 87
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code block hasa query, which would fetch the ID, Name and Salary fields
from the CUSTOMERS table, where the salary is greater than 2000 OR the age is less
than 25 years.
SQL> SELECT ID, NAME, SALARY
FROM CUSTOMERS
WHERE SALARY > 2000 OR age < 25;
This would produce the following result −
+----+----------+----------+
| ID | NAME | SALARY |
+----+----------+----------+
| 3 | kaushik | 2000.00 |
| 4 | Chaitali | 6500.00 |
| 5 | Hardik | 8500.00 |
| 6 | Komal | 4500.00 |
| 7 | Muffy | 10000.00 |
+----+----------+----------+
The SQL UPDATE Query is used to modify the existing records in a table. You can use
the WHERE clause with the UPDATE query to update the selected rows, otherwise all
the rows would be affected.

Syntax

The basic syntax of the UPDATE query with a WHERE clause is as follows −
UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
You can combine N number of conditions using the AND or the OR operators.

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
DATABASE MANAGEMENT SYSTEM Page 88
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following query will update the ADDRESS for a customer whose ID number is 6 in
the table.
SQL> UPDATE CUSTOMERS
SET ADDRESS = 'Pune'
WHERE ID = 6;
Now, the CUSTOMERS table would have the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | Pune | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
If you want to modify all the ADDRESS and the SALARY column values in the
CUSTOMERS table, you do not need to use the WHERE clause as the UPDATE query
would be enough as shown in the following code block.
SQL> UPDATE CUSTOMERS
SET ADDRESS = 'Pune', SALARY = 1000.00;
Now, CUSTOMERS table would have the following records −
+----+----------+-----+---------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+---------+
| 1 | Ramesh | 32 | Pune | 1000.00 |
| 2 | Khilan | 25 | Pune | 1000.00 |
| 3 | kaushik | 23 | Pune | 1000.00 |
| 4 | Chaitali | 25 | Pune | 1000.00 |
| 5 | Hardik | 27 | Pune | 1000.00 |
| 6 | Komal | 22 | Pune | 1000.00 |
| 7 | Muffy | 24 | Pune | 1000.00 |

The SQL DELETE Query is used to delete the existing records from a table.

DATABASE MANAGEMENT SYSTEM Page 89


You can use the WHERE clause with a DELETE query to delete the selected rows,
otherwise all the records would be deleted.

Syntax

The basic syntax of the DELETE query with the WHERE clause is as follows −
DELETE FROM table_name
WHERE [condition];
You can combine N number of conditions using AND or OR operators.

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code has a query, which will DELETE a customer, whose ID is 6.
SQL> DELETE FROM CUSTOMERS
WHERE ID = 6;
Now, the CUSTOMERS table would have the following records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
If you want to DELETE all the records from the CUSTOMERS table, you do not need
to use the WHERE clause and the DELETE query would be as follows −
SQL> DELETE FROM CUSTOMERS;
Now, the CUSTOMERS table would not have any record.

DATABASE MANAGEMENT SYSTEM Page 90


The SQL LIKE clause is used to compare a value to similar values using wildcard
operators. There are two wildcards used in conjunction with the LIKE operator.

 The percent sign (%)


 The underscore (_)
The percent sign represents zero, one or multiple characters. The underscore represents
a single number or character. These symbols can be used in combinations.

Syntax

The basic syntax of % and _ is as follows −


SELECT FROM table_name
WHERE column LIKE 'XXXX%'

or

SELECT FROM table_name


WHERE column LIKE '%XXXX%'

or

SELECT FROM table_name


WHERE column LIKE 'XXXX_'

or

SELECT FROM table_name


WHERE column LIKE '_XXXX'

or

SELECT FROM table_name


WHERE column LIKE '_XXXX_'
You can combine N number of conditions using AND or OR operators. Here, XXXX
could be any numeric or string value.

Example

The following table has a few examples showing the WHERE part having different
LIKE clause with '%' and '_' operators −

Sr.No. Statement & Description

1 WHERE SALARY LIKE '200%'


Finds any values that start with 200.

DATABASE MANAGEMENT SYSTEM Page 91


WHERE SALARY LIKE '%200%'
2
Finds any values that have 200 in any position.

WHERE SALARY LIKE '_00%'


3
Finds any values that have 00 in the second and third positions.

WHERE SALARY LIKE '2_%_%'


4
Finds any values that start with 2 and are at least 3 characters in length.

WHERE SALARY LIKE '%2'


5
Finds any values that end with 2.

WHERE SALARY LIKE '_2%3'


6
Finds any values that have a 2 in the second position and end with a 3.

WHERE SALARY LIKE '2___3'


7
Finds any values in a five-digit number that start with 2 and end with 3.

Let us take a real example, consider the CUSTOMERS table having the records as
shown below.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would display all the records from the CUSTOMERS
table, where the SALARY starts with 200.
SQL> SELECT * FROM CUSTOMERS
WHERE SALARY LIKE '200%';
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
DATABASE MANAGEMENT SYSTEM Page 92
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+----------+-----+-----------+----------+
The SQL TOP clause is used to fetch a TOP N number or X percent records from a
table.
Note − All the databases do not support the TOP clause. For example MySQL supports
the LIMIT clause to fetch limited number of records while Oracle uses
the ROWNUM command to fetch a limited number of records.

Syntax

The basic syntax of the TOP clause with a SELECT statement would be as follows.
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE [condition]

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following query is an example on the SQL server, which would fetch the top 3
records from the CUSTOMERS table.
SQL> SELECT TOP 3 * FROM CUSTOMERS;
This would produce the following result −
+----+---------+-----+-----------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+---------+-----+-----------+---------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+---------+-----+-----------+---------+
If you are using MySQL server, then here is an equivalent example −
SQL> SELECT * FROM CUSTOMERS
LIMIT 3;
DATABASE MANAGEMENT SYSTEM Page 93
This would produce the following result −
+----+---------+-----+-----------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+---------+-----+-----------+---------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+---------+-----+-----------+---------+
If you are using an Oracle server, then the following code block has an equivalent
example.
SQL> SELECT * FROM CUSTOMERS
WHERE ROWNUM <= 3;
This would produce the following result −
+----+---------+-----+-----------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+---------+-----+-----------+---------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+---------+-----+-----------+---------+

The SQL ORDER BY clause is used to sort the data in ascending or descending order,
based on one or more columns. Some databases sort the query results in an ascending
order by default.

Syntax

The basic syntax of the ORDER BY clause is as follows −


SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
You can use more than one column in the ORDER BY clause. Make sure whatever
column you are using to sort that column should be in the column-list.

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
DATABASE MANAGEMENT SYSTEM Page 94
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code block has an example, which would sort the result in an ascending
order by the NAME and the SALARY −
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME, SALARY;
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
The following code block has an example, which would sort the result in the descending
order by NAME.
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME DESC;
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
+----+----------+-----+-----------+----------+

The SQL GROUP BY clause is used in collaboration with the SELECT statement to
arrange identical data into groups. This GROUP BY clause follows the WHERE clause
in a SELECT statement and precedes the ORDER BY clause.

Syntax

DATABASE MANAGEMENT SYSTEM Page 95


The basic syntax of a GROUP BY clause is shown in the following code block. The
GROUP BY clause must follow the conditions in the WHERE clause and must precede
the ORDER BY clause if one is used.
SELECT column1, column2
FROM table_name
WHERE [ conditions ]
GROUP BY column1, column2
ORDER BY column1, column2

Example

Consider the CUSTOMERS table is having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
If you want to know the total amount of the salary on each customer, then the GROUP
BY query would be as follows.
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result −
+----------+-------------+
| NAME | SUM(SALARY) |
+----------+-------------+
| Chaitali | 6500.00 |
| Hardik | 8500.00 |
| kaushik | 2000.00 |
| Khilan | 1500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh | 2000.00 |
+----------+-------------+
Now, let us look at a table where the CUSTOMERS table has the following records with
duplicate names −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
DATABASE MANAGEMENT SYSTEM Page 96
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Now again, if you want to know the total amount of salary on each customer, then the
GROUP BY query would be as follows −
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS
GROUP BY NAME;
This would produce the following result −
+---------+-------------+
| NAME | SUM(SALARY) |
+---------+-------------+
| Hardik | 8500.00 |
| kaushik | 8500.00 |
| Komal | 4500.00 |
| Muffy | 10000.00 |
| Ramesh | 3500.00 |
+---------+-------------+

The SQL DISTINCT keyword is used in conjunction with the SELECT statement to
eliminate all the duplicate records and fetching only unique records.
There may be a situation when you have multiple duplicate records in a table. While
fetching such records, it makes more sense to fetch only those unique records instead of
fetching duplicate records.

Syntax

The basic syntax of DISTINCT keyword to eliminate the duplicate records is as follows

SELECT DISTINCT column1, column2,.....columnN
FROM table_name
WHERE [condition]

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
DATABASE MANAGEMENT SYSTEM Page 97
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
First, let us see how the following SELECT query returns the duplicate salary records.
SQL> SELECT SALARY FROM CUSTOMERS
ORDER BY SALARY;
This would produce the following result, where the salary (2000) is coming twice which
is a duplicate record from the original table.
+----------+
| SALARY |
+----------+
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+----------+
Now, let us use the DISTINCT keyword with the above SELECT query and then see the
result.
SQL> SELECT DISTINCT SALARY FROM CUSTOMERS
ORDER BY SALARY;
This would produce the following result where we do not have any duplicate entry.
+----------+
| SALARY |
+----------+
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+----------+

The SQL ORDER BY clause is used to sort the data in ascending or descending order,
based on one or more columns. Some databases sort the query results in an ascending
order by default.

Syntax

DATABASE MANAGEMENT SYSTEM Page 98


The basic syntax of the ORDER BY clause which would be used to sort the result in an
ascending or descending order is as follows −
SELECT column-list
FROM table_name
[WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
You can use more than one column in the ORDER BY clause. Make sure that whatever
column you are using to sort, that column should be in the column-list.

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would sort the result in an ascending order by NAME
and SALARY.
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME, SALARY;
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
The following code block has an example, which would sort the result in a descending
order by NAME.
SQL> SELECT * FROM CUSTOMERS
ORDER BY NAME DESC;

DATABASE MANAGEMENT SYSTEM Page 99


This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
+----+----------+-----+-----------+----------+
To fetch the rows with their own preferred order, the SELECT query used would be as
follows −
SQL> SELECT * FROM CUSTOMERS
ORDER BY (CASE ADDRESS
WHEN 'DELHI' THEN 1
WHEN 'BHOPAL' THEN 2
WHEN 'KOTA' THEN 3
WHEN 'AHMEDABAD' THEN 4
WHEN 'MP' THEN 5
ELSE 100 END) ASC, ADDRESS DESC;
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
This will sort the customers by ADDRESS in your ownoOrder of preference first and
in a natural order for the remaining addresses. Also, the remaining Addresses will be
sorted in the reverse alphabetical order.
Constraints are the rules enforced on the data columns of a table. These are used to limit
the type of data that can go into a table. This ensures the accuracy and reliability of the
data in the database.
Constraints could be either on a column level or a table level. The column level
constraints are applied only to one column, whereas the table level constraints are
applied to the whole table.

DATABASE MANAGEMENT SYSTEM Page


100
Following are some of the most commonly used constraints available in SQL. These
constraints have already been discussed in SQL - RDBMS Concepts chapter, but it’s
worth to revise them at this point.
 NOT NULL Constraint − Ensures that a column cannot have NULL value.
 DEFAULT Constraint − Provides a default value for a column when none is
specified.
 UNIQUE Constraint − Ensures that all values in a column are different.
 PRIMARY Key − Uniquely identifies each row/record in a database table.
 FOREIGN Key − Uniquely identifies a row/record in any of the given database
table.
 CHECK Constraint − The CHECK constraint ensures that all the values in a
column satisfies certain conditions.
 INDEX − Used to create and retrieve data from the database very quickly.
Constraints can be specified when a table is created with the CREATE TABLE
statement or you can use the ALTER TABLE statement to create constraints even after
the table is created.

Dropping Constraints

Any constraint that you have defined can be dropped using the ALTER TABLE
command with the DROP CONSTRAINT option.
For example, to drop the primary key constraint in the EMPLOYEES table, you can use
the following command.
ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;
Some implementations may provide shortcuts for dropping certain constraints. For
example, to drop the primary key constraint for a table in Oracle, you can use the
following command.
ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
Some implementations allow you to disable constraints. Instead of permanently
dropping a constraint from the database, you may want to temporarily disable the
constraint and then enable it later.

Integrity Constraints

Integrity constraints are used to ensure accuracy and consistency of the data in a
relational database. Data integrity is handled in a relational database through the concept
of referential integrity.
There are many types of integrity constraints that play a role in Referential Integrity
(RI). These constraints include Primary Key, Foreign Key, Unique Constraints and
other constraints which are mentioned above.

DATABASE MANAGEMENT SYSTEM Page


101
Constraints are the rules enforced on the data columns of a table. These are used to limit
the type of data that can go into a table. This ensures the accuracy and reliability of the
data in the database.
Constraints could be either on a column level or a table level. The column level
constraints are applied only to one column, whereas the table level constraints are
applied to the whole table.
Following are some of the most commonly used constraints available in SQL. These
constraints have already been discussed in SQL - RDBMS Concepts chapter, but it’s
worth to revise them at this point.
 NOT NULL Constraint − Ensures that a column cannot have NULL value.
 DEFAULT Constraint − Provides a default value for a column when none is
specified.
 UNIQUE Constraint − Ensures that all values in a column are different.
 PRIMARY Key − Uniquely identifies each row/record in a database table.
 FOREIGN Key − Uniquely identifies a row/record in any of the given database
table.
 CHECK Constraint − The CHECK constraint ensures that all the values in a
column satisfies certain conditions.
 INDEX − Used to create and retrieve data from the database very quickly.
Constraints can be specified when a table is created with the CREATE TABLE
statement or you can use the ALTER TABLE statement to create constraints even after
the table is created.

Dropping Constraints

Any constraint that you have defined can be dropped using the ALTER TABLE
command with the DROP CONSTRAINT option.
For example, to drop the primary key constraint in the EMPLOYEES table, you can use
the following command.
ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;
Some implementations may provide shortcuts for dropping certain constraints. For
example, to drop the primary key constraint for a table in Oracle, you can use the
following command.
ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
Some implementations allow you to disable constraints. Instead of permanently
dropping a constraint from the database, you may want to temporarily disable the
constraint and then enable it later.

Integrity Constraints

DATABASE MANAGEMENT SYSTEM Page


102
Integrity constraints are used to ensure accuracy and consistency of the data in a
relational database. Data integrity is handled in a relational database through the concept
of referential integrity.
There are many types of integrity constraints that play a role in Referential Integrity
(RI). These constraints include Primary Key, Foreign Key, Unique Constraints and
other constraints which are mentioned above.

The SQL Joins clause is used to combine records from two or more tables in a database.
A JOIN is a means for combining fields from two tables by using values common to
each.
Consider the following two tables −
Table 1 − CUSTOMERS Table
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Table 2 − ORDERS Table
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
Now, let us join these two tables in our SELECT statement as shown below.
SQL> SELECT ID, NAME, AGE, AMOUNT
FROM CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result.
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
DATABASE MANAGEMENT SYSTEM Page
103
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Here, it is noticeable that the join is performed in the WHERE clause. Several operators
can be used to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT;
they can all be used to join tables. However, the most common operator is the equal to
symbol.
There are different types of joins available in SQL −
 INNER JOIN − returns rows when there is a match in both tables.
 LEFT JOIN − returns all rows from the left table, even if there are no matches in
the right table.
 RIGHT JOIN − returns all rows from the right table, even if there are no matches
in the left table.
 FULL JOIN − returns rows when there is a match in one of the tables.
 SELF JOIN − is used to join a table to itself as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
 CARTESIAN JOIN − returns the Cartesian product of the sets of records from the
two or more joined tables.
The SQL UNION clause/operator is used to combine the results of two or more
SELECT statements without returning any duplicate rows.
To use this UNION clause, each SELECT statement must have

 The same number of columns selected


 The same number of column expressions
 The same data type and
 Have them in the same order
But they need not have to be in the same length.

Syntax

The basic syntax of a UNION clause is as follows −


SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]

UNION

SELECT column1 [, column2 ]


FROM table1 [, table2 ]
[WHERE condition]
Here, the given condition could be any given expression based on your requirement.

DATABASE MANAGEMENT SYSTEM Page


104
Example

Consider the following two tables.


Table 1 − CUSTOMERS Table is as follows.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Table 2 − ORDERS Table is as follows.
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
Now, let us join these two tables in our SELECT statement as follows −
SQL> SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
UNION
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result −
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
DATABASE MANAGEMENT SYSTEM Page
105
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
+------+----------+--------+---------------------+

The UNION ALL Clause

The UNION ALL operator is used to combine the results of two SELECT statements
including duplicate rows.
The same rules that apply to the UNION clause will apply to the UNION ALL operator.
Syntax
The basic syntax of the UNION ALL is as follows.
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]

UNION ALL

SELECT column1 [, column2 ]


FROM table1 [, table2 ]
[WHERE condition]
Here, the given condition could be any given expression based on your requirement.
Example
Consider the following two tables,
Table 1 − CUSTOMERS Table is as follows.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Table 2 − ORDERS table is as follows.
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
DATABASE MANAGEMENT SYSTEM Page
106
+-----+---------------------+-------------+--------+
Now, let us join these two tables in our SELECT statement as follows −
SQL> SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
LEFT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
UNION ALL
SELECT ID, NAME, AMOUNT, DATE
FROM CUSTOMERS
RIGHT JOIN ORDERS
ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result −
+------+----------+--------+---------------------+
| ID | NAME | AMOUNT | DATE |
+------+----------+--------+---------------------+
| 1 | Ramesh | NULL | NULL |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
| 5 | Hardik | NULL | NULL |
| 6 | Komal | NULL | NULL |
| 7 | Muffy | NULL | NULL |
| 3 | kaushik | 3000 | 2009-10-08 00:00:00 |
| 3 | kaushik | 1500 | 2009-10-08 00:00:00 |
| 2 | Khilan | 1560 | 2009-11-20 00:00:00 |
| 4 | Chaitali | 2060 | 2008-05-20 00:00:00 |
+------+----------+--------+---------------------+
There are two other clauses (i.e., operators), which are like the UNION clause.
 SQL INTERSECT Clause − This is used to combine two SELECT statements, but
returns rows only from the first SELECT statement that are identical to a row in
the second SELECT statement.
 SQL EXCEPT Clause − This combines two SELECT statements and returns rows
from the first SELECT statement that are not returned by the second SELECT
statement.
 The SQL NULL is the term used to represent a missing value. A NULL value in a
table is a value in a field that appears to be blank.
 A field with a NULL value is a field with no value. It is very important to
understand that a NULL value is different than a zero value or a field that contains
spaces.
 Syntax
 The basic syntax of NULL while creating a table.
 SQL> CREATE TABLE CUSTOMERS(
 ID INT NOT NULL,
DATABASE MANAGEMENT SYSTEM Page
107
 NAME VARCHAR (20) NOT NULL,
 AGE INT NOT NULL,
 ADDRESS CHAR (25) ,
 SALARY DECIMAL (18, 2),
 PRIMARY KEY (ID)
 );
 Here, NOT NULL signifies that column should always accept an explicit value of
the given data type. There are two columns where we did not use NOT NULL,
which means these columns could be NULL.
 A field with a NULL value is the one that has been left blank during the record
creation.
 Example
 The NULL value can cause problems when selecting data. However, because
when comparing an unknown value to any other value, the result is always
unknown and not included in the results. You must use the IS NULL or IS NOT
NULL operators to check for a NULL value.
 Consider the following CUSTOMERS table having the records as shown below.
 +----+----------+-----+-----------+----------+
 | ID | NAME | AGE | ADDRESS | SALARY |
 +----+----------+-----+-----------+----------+
 | 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
 | 2 | Khilan | 25 | Delhi | 1500.00 |
 | 3 | kaushik | 23 | Kota | 2000.00 |
 | 4 | Chaitali | 25 | Mumbai | 6500.00 |
 | 5 | Hardik | 27 | Bhopal | 8500.00 |
 | 6 | Komal | 22 | MP | |
 | 7 | Muffy | 24 | Indore | |
 +----+----------+-----+-----------+----------+
 Now, following is the usage of the IS NOT NULLoperator.
 SQL> SELECT ID, NAME, AGE, ADDRESS, SALARY
 FROM CUSTOMERS
 WHERE SALARY IS NOT NULL;
 This would produce the following result −
 +----+----------+-----+-----------+----------+
 | ID | NAME | AGE | ADDRESS | SALARY |
 +----+----------+-----+-----------+----------+
 | 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
 | 2 | Khilan | 25 | Delhi | 1500.00 |
 | 3 | kaushik | 23 | Kota | 2000.00 |
 | 4 | Chaitali | 25 | Mumbai | 6500.00 |
 | 5 | Hardik | 27 | Bhopal | 8500.00 |
 +----+----------+-----+-----------+----------+
 Now, following is the usage of the IS NULL operator.
 SQL> SELECT ID, NAME, AGE, ADDRESS, SALARY
 FROM CUSTOMERS
 WHERE SALARY IS NULL;
 This would produce the following result −
DATABASE MANAGEMENT SYSTEM Page
108
 +----+----------+-----+-----------+----------+
 | ID | NAME | AGE | ADDRESS | SALARY |
 +----+----------+-----+-----------+----------+
 | 6 | Komal | 22 | MP | |
 | 7 | Muffy | 24 | Indore | |
 +----+----------+-----+-----------+----------+

You can rename a table or a column temporarily by giving another name known
as Alias. The use of table aliases is to rename a table in a specific SQL statement. The
renaming is a temporary change and the actual table name does not change in the
database. The column aliases are used to rename a table's columns for the purpose of a
particular SQL query.

Syntax

The basic syntax of a table alias is as follows.


SELECT column1, column2....
FROM table_name AS alias_name
WHERE [condition];
The basic syntax of a column alias is as follows.
SELECT column_name AS alias_name
FROM table_name
WHERE [condition];

Example

Consider the following two tables.


Table 1 − CUSTOMERS Table is as follows.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Table 2 − ORDERS Table is as follows.
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |

DATABASE MANAGEMENT SYSTEM Page


109
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
Now, the following code block shows the usage of a table alias.
SQL> SELECT C.ID, C.NAME, C.AGE, O.AMOUNT
FROM CUSTOMERS AS C, ORDERS AS O
WHERE C.ID = O.CUSTOMER_ID;
This would produce the following result.
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Following is the usage of a column alias.
SQL> SELECT ID AS CUSTOMER_ID, NAME AS CUSTOMER_NAME
FROM CUSTOMERS
WHERE SALARY IS NOT NULL;
This would produce the following result.
+-------------+---------------+
| CUSTOMER_ID | CUSTOMER_NAME |
+-------------+---------------+
| 1 | Ramesh |
| 2 | Khilan |
| 3 | kaushik |
| 4 | Chaitali |
| 5 | Hardik |
| 6 | Komal |
| 7 | Muffy |
+-------------+---------------+

Indexes are special lookup tables that the database search engine can use to speed up
data retrieval. Simply put, an index is a pointer to data in a table. An index in a database
is very similar to an index in the back of a book.
For example, if you want to reference all pages in a book that discusses a certain topic,
you first refer to the index, which lists all the topics alphabetically and are then referred
to one or more specific page numbers.

DATABASE MANAGEMENT SYSTEM Page


110
An index helps to speed up SELECT queries and WHERE clauses, but it slows down
data input, with the UPDATE and the INSERT statements. Indexes can be created or
dropped with no effect on the data.
Creating an index involves the CREATE INDEX statement, which allows you to name
the index, to specify the table and which column or columns to index, and to indicate
whether the index is in an ascending or descending order.
Indexes can also be unique, like the UNIQUE constraint, in that the index prevents
duplicate entries in the column or combination of columns on which there is an index.

The CREATE INDEX Command

The basic syntax of a CREATE INDEX is as follows.


CREATE INDEX index_name ON table_name;
Single-Column Indexes
A single-column index is created based on only one table column. The basic syntax is as
follows.
CREATE INDEX index_name
ON table_name (column_name);
Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A unique
index does not allow any duplicate values to be inserted into the table. The basic syntax
is as follows.
CREATE UNIQUE INDEX index_name
on table_name (column_name);
Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax is as
follows.
CREATE INDEX index_name
on table_name (column1, column2);
Whether to create a single-column index or a composite index, take into consideration
the column(s) that you may use very frequently in a query's WHERE clause as filter
conditions.
Should there be only one column used, a single-column index should be the choice.
Should there be two or more columns that are frequently used in the WHERE clause as
filters, the composite index would be the best choice.
Implicit Indexes
Implicit indexes are indexes that are automatically created by the database server when
an object is created. Indexes are automatically created for primary key constraints and
unique constraints.

The DROP INDEX Command


DATABASE MANAGEMENT SYSTEM Page
111
An index can be dropped using SQL DROP command. Care should be taken when
dropping an index because the performance may either slow down or improve.
The basic syntax is as follows −
DROP INDEX index_name;
You can check the INDEX Constraint chapter to see some actual examples on Indexes.
When should indexes be avoided?
Although indexes are intended to enhance a database's performance, there are times
when they should be avoided.
The following guidelines indicate when the use of an index should be reconsidered.
 Indexes should not be used on small tables.
 Tables that have frequent, large batch updates or insert operations.
 Indexes should not be used on columns that contain a high number of NULL
values.
 Columns that are frequently manipulated should not be indexed.

The SQL ALTER TABLE command is used to add, delete or modify columns in an
existing table. You should also use the ALTER TABLE command to add and drop
various constraints on an existing table.

Syntax

The basic syntax of an ALTER TABLE command to add a New Column in an existing
table is as follows.
ALTER TABLE table_name ADD column_name datatype;
The basic syntax of an ALTER TABLE command to DROP COLUMN in an existing
table is as follows.
ALTER TABLE table_name DROP COLUMN column_name;
The basic syntax of an ALTER TABLE command to change the DATA TYPE of a
column in a table is as follows.
ALTER TABLE table_name MODIFY COLUMN column_name datatype;
The basic syntax of an ALTER TABLE command to add a NOT NULL constraint to a
column in a table is as follows.
ALTER TABLE table_name MODIFY column_name datatype NOT NULL;
The basic syntax of ALTER TABLE to ADD UNIQUE CONSTRAINT to a table is as
follows.
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint UNIQUE(column1, column2...);
The basic syntax of an ALTER TABLE command to ADD CHECK CONSTRAINT to
a table is as follows.
DATABASE MANAGEMENT SYSTEM Page
112
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint CHECK (CONDITION);
The basic syntax of an ALTER TABLE command to ADD PRIMARY KEY constraint
to a table is as follows.
ALTER TABLE table_name
ADD CONSTRAINT MyPrimaryKey PRIMARY KEY (column1, column2...);
The basic syntax of an ALTER TABLE command to DROP CONSTRAINT from a
table is as follows.
ALTER TABLE table_name
DROP CONSTRAINT MyUniqueConstraint;
If you're using MySQL, the code is as follows −
ALTER TABLE table_name
DROP INDEX MyUniqueConstraint;
The basic syntax of an ALTER TABLE command to DROP PRIMARY
KEY constraint from a table is as follows.
ALTER TABLE table_name
DROP CONSTRAINT MyPrimaryKey;
If you're using MySQL, the code is as follows −
ALTER TABLE table_name
DROP PRIMARY KEY;

Example

Consider the CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is the example to ADD a New Column to an existing table −
ALTER TABLE CUSTOMERS ADD SEX char(1);
Now, the CUSTOMERS table is changed and following would be output from the
SELECT statement.
+----+---------+-----+-----------+----------+------+
| ID | NAME | AGE | ADDRESS | SALARY | SEX |
DATABASE MANAGEMENT SYSTEM Page
113
+----+---------+-----+-----------+----------+------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 | NULL |
| 2 | Ramesh | 25 | Delhi | 1500.00 | NULL |
| 3 | kaushik | 23 | Kota | 2000.00 | NULL |
| 4 | kaushik | 25 | Mumbai | 6500.00 | NULL |
| 5 | Hardik | 27 | Bhopal | 8500.00 | NULL |
| 6 | Komal | 22 | MP | 4500.00 | NULL |
| 7 | Muffy | 24 | Indore | 10000.00 | NULL |
+----+---------+-----+-----------+----------+------+
Following is the example to DROP sex column from the existing table.
ALTER TABLE CUSTOMERS DROP SEX;
Now, the CUSTOMERS table is changed and following would be the output from the
SELECT statement.
+----+---------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+---------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Ramesh | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | kaushik | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+---------+-----+-----------+----------+

The SQL TRUNCATE TABLE command is used to delete complete data from an
existing table.
You can also use DROP TABLE command to delete complete table but it would remove
complete table structure form the database and you would need to re-create this table
once again if you wish you store some data.

Syntax

The basic syntax of a TRUNCATE TABLE command is as follows.


TRUNCATE TABLE table_name;

Example

Consider a CUSTOMERS table having the following records −


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
DATABASE MANAGEMENT SYSTEM Page
114
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is the example of a Truncate command.
SQL > TRUNCATE TABLE CUSTOMERS;
Now, the CUSTOMERS table is truncated and the output from SELECT statement will
be as shown in the code block below −
SQL> SELECT * FROM CUSTOMERS;
Empty set (0.00 sec)

A view is nothing more than a SQL statement that is stored in the database with an
associated name. A view is actually a composition of a table in the form of a predefined
SQL query.
A view can contain all rows of a table or select rows from a table. A view can be created
from one or many tables which depends on the written SQL query to create a view.
Views, which are a type of virtual tables allow users to do the following −
 Structure data in a way that users or classes of users find natural or intuitive.
 Restrict access to the data in such a way that a user can see and (sometimes)
modify exactly what they need and no more.
 Summarize data from various tables which can be used to generate reports.

Creating Views

Database views are created using the CREATE VIEW statement. Views can be created
from a single table, multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to the
specific implementation.
The basic CREATE VIEW syntax is as follows −
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way as you use
them in a normal SQL SELECT query.
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
DATABASE MANAGEMENT SYSTEM Page
115
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example to create a view from the CUSTOMERS table. This view
would be used to have customer name and age from the CUSTOMERS table.
SQL > CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual
table. Following is an example for the same.
SQL > SELECT * FROM CUSTOMERS_VIEW;
This would produce the following result.
+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+

The WITH CHECK OPTION

The WITH CHECK OPTION is a CREATE VIEW statement option. The purpose of the
WITH CHECK OPTION is to ensure that all UPDATE and INSERTs satisfy the
condition(s) in the view definition.
If they do not satisfy the condition(s), the UPDATE or INSERT returns an error.
The following code block has an example of creating same view CUSTOMERS_VIEW
with the WITH CHECK OPTION.
CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS
WHERE age IS NOT NULL
WITH CHECK OPTION;

DATABASE MANAGEMENT SYSTEM Page


116
The WITH CHECK OPTION in this case should deny the entry of any NULL values in
the view's AGE column, because the view is defined by data that does not have a NULL
value in the AGE column.
Updating a View
A view can be updated under certain conditions which are given below −
 The SELECT clause may not contain the keyword DISTINCT.
 The SELECT clause may not contain summary functions.
 The SELECT clause may not contain set functions.
 The SELECT clause may not contain set operators.
 The SELECT clause may not contain an ORDER BY clause.
 The FROM clause may not contain multiple tables.
 The WHERE clause may not contain subqueries.
 The query may not contain GROUP BY or HAVING.
 Calculated columns may not be updated.
 All NOT NULL columns from the base table must be included in the view in
order for the INSERT query to function.
So, if a view satisfies all the above-mentioned rules then you can update that view. The
following code block has an example to update the age of Ramesh.
SQL > UPDATE CUSTOMERS_VIEW
SET AGE = 35
WHERE name = 'Ramesh';
This would ultimately update the base table CUSTOMERS and the same would reflect
in the view itself. Now, try to query the base table and the SELECT statement would
produce the following result.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Inserting Rows into a View
Rows of data can be inserted into a view. The same rules that apply to the UPDATE
command also apply to the INSERT command.

DATABASE MANAGEMENT SYSTEM Page


117
Here, we cannot insert rows in the CUSTOMERS_VIEW because we have not included
all the NOT NULL columns in this view, otherwise you can insert rows in a view in a
similar way as you insert them in a table.
Deleting Rows into a View
Rows of data can be deleted from a view. The same rules that apply to the UPDATE and
INSERT commands apply to the DELETE command.
Following is an example to delete a record having AGE = 22.
SQL > DELETE FROM CUSTOMERS_VIEW
WHERE age = 22;
This would ultimately delete a row from the base table CUSTOMERS and the same
would reflect in the view itself. Now, try to query the base table and the SELECT
statement would produce the following result.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Dropping Views
Obviously, where you have a view, you need a way to drop the view if it is no longer
needed. The syntax is very simple and is given below −
DROP VIEW view_name;
Following is an example to drop the CUSTOMERS_VIEW from the CUSTOMERS
table.
DROP VIEW CUSTOMERS_VIEW;

The HAVING Clause enables you to specify conditions that filter which group results
appear in the results.
The WHERE clause places conditions on the selected columns, whereas the HAVING
clause places conditions on groups created by the GROUP BY clause.

Syntax

The following code block shows the position of the HAVING Clause in a query.
SELECT
FROM
WHERE
DATABASE MANAGEMENT SYSTEM Page
118
GROUP BY
HAVING
ORDER BY
The HAVING clause must follow the GROUP BY clause in a query and must also
precede the ORDER BY clause if used. The following code block has the syntax of the
SELECT statement including the HAVING clause −
SELECT column1, column2
FROM table1, table2
WHERE [ conditions ]
GROUP BY column1, column2
HAVING [ conditions ]
ORDER BY column1, column2

Example

Consider the CUSTOMERS table having the following records.


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would display a record for a similar age count that
would be more than or equal to 2.
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY
FROM CUSTOMERS
GROUP BY age
HAVING COUNT(age) >= 2;
This would produce the following result −
+----+--------+-----+---------+---------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+--------+-----+---------+---------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
+----+--------+-----+---------+---------+

We have already discussed about the SQL LIKE operator, which is used to compare a
value to similar values using the wildcard operators.
SQL supports two wildcard operators in conjunction with the LIKE operator which are
explained in detail in the following table.
DATABASE MANAGEMENT SYSTEM Page
119
Sr.No. Wildcard & Description

The percent sign (%)


1 Matches one or more characters.
Note − MS Access uses the asterisk (*) wildcard character instead of the
percent sign (%) wildcard character.

The underscore (_)


Matches one character.
2
Note − MS Access uses a question mark (?) instead of the underscore (_) to
match any one character.

The percent sign represents zero, one or multiple characters. The underscore represents
a single number or a character. These symbols can be used in combinations.

Syntax

The basic syntax of a '%' and a '_' operator is as follows.


SELECT * FROM table_name
WHERE column LIKE 'XXXX%'

or

SELECT * FROM table_name


WHERE column LIKE '%XXXX%'

or

SELECT * FROM table_name


WHERE column LIKE 'XXXX_'

or

SELECT * FROM table_name


WHERE column LIKE '_XXXX'

or

SELECT * FROM table_name


WHERE column LIKE '_XXXX_'
You can combine N number of conditions using the AND or the OR operators. Here,
XXXX could be any numeric or string value.

Example
DATABASE MANAGEMENT SYSTEM Page
120
The following table has a number of examples showing the WHERE part having
different LIKE clauses with '%' and '_' operators.

Sr.No. Statement & Description

1 WHERE SALARY LIKE '200%'


Finds any values that start with 200.

WHERE SALARY LIKE '%200%'


2
Finds any values that have 200 in any position.

WHERE SALARY LIKE '_00%'


3
Finds any values that have 00 in the second and third positions.

WHERE SALARY LIKE '2_%_%'


4
Finds any values that start with 2 and are at least 3 characters in length.

WHERE SALARY LIKE '%2'


5
Finds any values that end with 2.

WHERE SALARY LIKE '_2%3'


6
Finds any values that have a 2 in the second position and end with a 3.

WHERE SALARY LIKE '2___3'


7
Finds any values in a five-digit number that start with 2 and end with 3.

Let us take a real example, consider the CUSTOMERS table having the following
records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

DATABASE MANAGEMENT SYSTEM Page


121
The following code block is an example, which would display all the records from the
CUSTOMERS table where the SALARY starts with 200.
SQL> SELECT * FROM CUSTOMERS
WHERE SALARY LIKE '200%';
This would produce the following result.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+----------+-----+-----------+----------+

What are Temporary Tables?

There are RDBMS, which support temporary tables. Temporary Tables are a great
feature that lets you store and process intermediate results by using the same
selection, update, and join capabilities that you can use with typical SQL Server tables.
The temporary tables could be very useful in some cases to keep temporary data. The
most important thing that should be known for temporary tables is that they will be
deleted when the current client session terminates.
Temporary tables are available in MySQL version 3.23 onwards. If you use an older
version of MySQL than 3.23, you can't use temporary tables, but you can use heap
tables.
As stated earlier, temporary tables will only last as long as the session is alive. If you
run the code in a PHP script, the temporary table will be destroyed automatically when
the script finishes executing. If you are connected to the MySQL database server
through the MySQL client program, then the temporary table will exist until you close
the client or manually destroy the table.
Example
Here is an example showing you the usage of a temporary table.
mysql> CREATE TEMPORARY TABLE SALESSUMMARY (
-> product_name VARCHAR(50) NOT NULL
-> , total_sales DECIMAL(12,2) NOT NULL DEFAULT 0.00
-> , avg_unit_price DECIMAL(7,2) NOT NULL DEFAULT 0.00
-> , total_units_sold INT UNSIGNED NOT NULL DEFAULT 0
);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO SALESSUMMARY


-> (product_name, total_sales, avg_unit_price, total_units_sold)
-> VALUES
-> ('cucumber', 100.25, 90, 2);
DATABASE MANAGEMENT SYSTEM Page
122
mysql> SELECT * FROM SALESSUMMARY;
+--------------+-------------+----------------+------------------+
| product_name | total_sales | avg_unit_price | total_units_sold |
+--------------+-------------+----------------+------------------+
| cucumber | 100.25 | 90.00 | 2|
+--------------+-------------+----------------+------------------+
1 row in set (0.00 sec)
When you issue a SHOW TABLES command, then your temporary table will not be
listed out in the list. Now, if you log out of the MySQL session and then issue a
SELECT command, you will find no data available in the database. Even your
temporary table will not be existing.

Dropping Temporary Tables

By default, all the temporary tables are deleted by MySQL when your database
connection gets terminated. Still if you want to delete them in between, then you can do
so by issuing a DROP TABLE command.
Following is an example on dropping a temporary table.
mysql> CREATE TEMPORARY TABLE SALESSUMMARY (
-> product_name VARCHAR(50) NOT NULL
-> , total_sales DECIMAL(12,2) NOT NULL DEFAULT 0.00
-> , avg_unit_price DECIMAL(7,2) NOT NULL DEFAULT 0.00
-> , total_units_sold INT UNSIGNED NOT NULL DEFAULT 0
);
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO SALESSUMMARY


-> (product_name, total_sales, avg_unit_price, total_units_sold)
-> VALUES
-> ('cucumber', 100.25, 90, 2);

mysql> SELECT * FROM SALESSUMMARY;


+--------------+-------------+----------------+------------------+
| product_name | total_sales | avg_unit_price | total_units_sold |
+--------------+-------------+----------------+------------------+
| cucumber | 100.25 | 90.00 | 2|
+--------------+-------------+----------------+------------------+
1 row in set (0.00 sec)
mysql> DROP TABLE SALESSUMMARY;
mysql> SELECT * FROM SALESSUMMARY;
ERROR 1146: Table 'TUTORIALS.SALESSUMMARY' doesn't exist

A Subquery or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause.

DATABASE MANAGEMENT SYSTEM Page


123
A subquery is used to return data that will be used in the main query as a condition to
further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE
statements along with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
 Subqueries must be enclosed within parentheses.
 A subquery can have only one column in the SELECT clause, unless multiple
columns are in the main query for the subquery to compare its selected columns.
 An ORDER BY command cannot be used in a subquery, although the main query
can use an ORDER BY. The GROUP BY command can be used to perform the
same function as the ORDER BY in a subquery.
 Subqueries that return more than one row can only be used with multiple value
operators such as the IN operator.
 The SELECT list cannot include any references to values that evaluate to a
BLOB, ARRAY, CLOB, or NCLOB.
 A subquery cannot be immediately enclosed in a set function.
 The BETWEEN operator cannot be used with a subquery. However, the
BETWEEN operator can be used within the subquery.

Subqueries with the SELECT Statement

Subqueries are most frequently used with the SELECT statement. The basic syntax is as
follows −
SELECT column_name [, column_name ]
FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
DATABASE MANAGEMENT SYSTEM Page
124
Now, let us check the following subquery with a SELECT statement.
SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500) ;
This would produce the following result.
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+

Subqueries with the INSERT Statement

Subqueries also can be used with INSERT statements. The INSERT statement uses the
data returned from the subquery to insert into another table. The selected data in the
subquery can be modified with any of the character, date or number functions.
The basic syntax is as follows.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]
Example
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table.
Now to copy the complete CUSTOMERS table into the CUSTOMERS_BKP table, you
can use the following syntax.
SQL> INSERT INTO CUSTOMERS_BKP
SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS) ;

Subqueries with the UPDATE Statement

The subquery can be used in conjunction with the UPDATE statement. Either single or
multiple columns in a table can be updated when using a subquery with the UPDATE
statement.
The basic syntax is as follows.
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
DATABASE MANAGEMENT SYSTEM Page
125
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have CUSTOMERS_BKP table available which is backup of
CUSTOMERS table. The following example updates SALARY by 0.25 times in the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
SQL> UPDATE CUSTOMERS
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally CUSTOMERS table would have the following
records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Subqueries with the DELETE Statement

The subquery can be used in conjunction with the DELETE statement like with any
other statements mentioned above.
The basic syntax is as follows.
DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have a CUSTOMERS_BKP table available which is a backup of the
CUSTOMERS table. The following example deletes the records from the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
SQL> DELETE FROM CUSTOMERS
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally the CUSTOMERS table would have the
following records.
DATABASE MANAGEMENT SYSTEM Page
126
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+

There may be a situation when you have multiple duplicate records in a table. While
fetching such records, it makes more sense to fetch only unique records instead of
fetching duplicate records.
The SQL DISTINCT keyword, which we have already discussed is used in conjunction
with the SELECT statement to eliminate all the duplicate records and by fetching only
the unique records.

Syntax

The basic syntax of a DISTINCT keyword to eliminate duplicate records is as follows.


SELECT DISTINCT column1, column2,.....columnN
FROM table_name
WHERE [condition]

Example

Consider the CUSTOMERS table having the following records.


+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
First, let us see how the following SELECT query returns duplicate salary records.
SQL> SELECT SALARY FROM CUSTOMERS
ORDER BY SALARY;
This would produce the following result where the salary of 2000 is coming twice which
is a duplicate record from the original table.
+----------+
DATABASE MANAGEMENT SYSTEM Page
127
| SALARY |
+----------+
| 1500.00 |
| 2000.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+----------+
Now, let us use the DISTINCT keyword with the above SELECT query and see the
result.
SQL> SELECT DISTINCT SALARY FROM CUSTOMERS
ORDER BY SALARY;
This would produce the following result where we do not have any duplicate entry.
+----------+
| SALARY |
+----------+
| 1500.00 |
| 2000.00 |
| 4500.00 |
| 6500.00 |
| 8500.00 |
| 10000.00 |
+----------+
Functional dependency in DBMS

The attributes of a table is said to be dependent on each other when an attribute of a table
uniquely identifies another attribute of the same table.

For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name,
Stu_Age. Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student
table because if we know the student id we can tell the student name associated with it.
This is known as functional dependency and can be written as Stu_Id->Stu_Name or in
words we can say Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can
represented as A->B (Attribute B is functionally dependent on attribute A)

Types of Functional Dependencies

 Trivial functional dependency


 non-trivial functional dependency
 Multivalued dependency
 Transitive dependency
DATABASE MANAGEMENT SYSTEM Page
128
Trivial functional dependency in DBMS with example

The dependency of an attribute on a set of attributes is known as trivial functional


dependency if the set of attributes includes that attribute.

Symbolically: A ->B is trivial functional dependency if B is a subset of A.

The following dependencies are also trivial: A->A & B->B

For example: Consider a table with two columns Student_id and Student_Name.

{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as


Student_Id is a subset of {Student_Id, Student_Name}. That makes sense because if we
know the values of Student_Id and Student_Name then the value of Student_Id can be
uniquely determined.

Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.

Non trivial functional dependency in DBMS

If a functional dependency X->Y holds true where Y is not a subset of X then this
dependency is called non trivial Functional dependency.

For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)

On the other hand, the following dependencies are trivial:


{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
Refer: trivial functional dependency.

Completely non trivial FD:


If a FD X->Y holds true where X intersection Y is null then this dependency is said to be
completely non trivial function dependency.

Multivalued dependency in DBMS

Multivalued dependency occurs when there are more than one independent multivalued
attributes in a table.

For example: Consider a bike manufacture company, which produces two colors (Black
and white) in each model every year.

DATABASE MANAGEMENT SYSTEM Page


129
bike_model manuf_year color

M1001 2007 Black

M1001 2007 Red

M2012 2008 Black

M2012 2008 Red

M2222 2009 Black

M2222 2009 Red

Here columns manuf_year and color are independent of each other and dependent on
bike_model. In this case these two columns are said to be multivalued dependent on
bike_model. These dependencies can be represented like this:

bike_model ->> manuf_year

bike_model ->> color

Transitive dependency in DBMS

A functional dependency is said to be transitive if it is indirectly formed by two


functional dependencies. For e.g.

X -> Z is a transitive dependency if the following three functional dependencies hold


true:

DATABASE MANAGEMENT SYSTEM Page


130
 X->Y
 Y does not ->X
 Y->Z

Note: A transitive dependency can only occur in a relation of three of more attributes.
This dependency helps us normalizing the database in 3NF (3rd Normal Form).

Example: Let’s take an example to understand it better:

Book Author Author_age

Game of Thrones George R. R. Martin 66

Harry Potter J. K. Rowling 49

Dying of the Light George R. R. Martin 66

{Book} ->{Author} (if we know the book, we knows the author name)

{Author} does not ->{Book}

{Author} -> {Author_age}

Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should
hold, that makes sense because if we know the book name we can know the author’s age.

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization is a process of organizing the data in database to avoid data redundancy,


insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies
first then we will discuss normal forms with examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.

DATABASE MANAGEMENT SYSTEM Page


131
Example: Suppose a manufacturing company stores the employee details in a table
named employee that has four attributes: emp_id for storing employee’s id, emp_name
for storing employee’s name, emp_address for storing employee’s address and emp_dept
for storing the department details in which the employee works. At some point of time
the table looks like this:

emp_id emp_name emp_address emp_dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is
not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If somehow,
the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would
lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.

DATABASE MANAGEMENT SYSTEM Page


132
Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the
information of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.

Normalization
Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple
values. It should hold only atomic values.

Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur

9900012222

103 Ron Chennai 7778881212

DATABASE MANAGEMENT SYSTEM Page


133
9990000123
104 Lester Bangalore
8123450987

Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.

This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)

DATABASE MANAGEMENT SYSTEM Page


134
A table is said to be in 2NF if both the following conditions hold:

 Table is in 1NF (First normal form)


 No non-prime attribute is dependent on the proper subset of any candidate key of
table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.

teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.

DATABASE MANAGEMENT SYSTEM Page


135
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:

teacher_id subject

111 Maths

111 Physics

222 Biology

333 Physics

DATABASE MANAGEMENT SYSTEM Page


136
333 Chemistry

Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

 Table must be in 2NF


 Transitive functional dependency of non-prime attribute on any super key should be
removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:

 X is a super key of table


 Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:

emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

DATABASE MANAGEMENT SYSTEM Page


137
1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of
any candidate keys.

Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is


dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key (emp_id). This violates the rule of
3NF.

To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:

employee table:

emp_id emp_name emp_zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

DATABASE MANAGEMENT SYSTEM Page


138
1101 Lilly 292008

1201 Steve 222999

employee_zip table:

emp_zip emp_state emp_city emp_district

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter
than 3NF. A table complies with BCNF if it is in 3NF and for every functional
dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:

DATABASE MANAGEMENT SYSTEM Page


139
emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

emp_id emp_nationality

1001 Austrian

1002 American

DATABASE MANAGEMENT SYSTEM Page


140
emp_dept table:

emp_dept dept_type dept_no_of_emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:

emp_id emp_dept

1001 Production and planning

1001 stores

1002 design and technical support

DATABASE MANAGEMENT SYSTEM Page


141
1002 Purchasing department

Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}

This is now in BCNF as in both the functional dependencies left side part is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists,
then the relation will be a multi-valued dependency.

Example

STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.

DATABASE MANAGEMENT SYSTEM Page


142
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

DATABASE MANAGEMENT SYSTEM Page


143
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:

P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

DATABASE MANAGEMENT SYSTEM Page


144
Semester 1 Chemistry

Semester 2 Math

P2

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen

DATABASE MANAGEMENT SYSTEM Page


145
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss
of information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Types of Decomposition

Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

DATABASE MANAGEMENT SYSTEM Page


146
46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing

DATABASE MANAGEMENT SYSTEM Page


147
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC).

Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of
each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on
a third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two


colors(white and black) of each model every year.

DATABASE MANAGEMENT SYSTEM Page


148
BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and


independent of each other.

In this case, these two columns can be called as multivalued dependent on


BIKE_MODEL. The representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and


"BIKE_MODEL multidetermined COLOR".

Join Dependency
o Join decomposition is a further generalization of Multivalued dependencies.
o If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists.
o Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given
relations R (A, B, C, D).
o Alternatively, R1 and R2 are a lossless decomposition of R.
o A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a
lossless-join decomposition.
o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to
the relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD
of R.
DATABASE MANAGEMENT SYSTEM Page
149
PL/SQL Trigger

Trigger is invoked by Oracle engine automatically whenever a specified event


occurs.Trigger is stored into database and invoked repeatedly, when specific condition
match.

Triggers are stored programs, which are automatically executed or fired when some
event occurs.

Triggers are written to be executed in response to any of the following events.

o A database manipulation (DML) statement (DELETE, INSERT, or UPDATE).


o A database definition (DDL) statement (CREATE, ALTER, or DROP).
o A database operation (SERVERERROR, LOGON, LOGOFF, STARTUP, or
SHUTDOWN).

Triggers could be defined on the table, view, schema, or database with which the event is
associated.

Advantages of Triggers

These are the following advantages of Triggers:

o Trigger generates some derived column values automatically


o Enforces referential integrity
o Event logging and storing information on table access
o Auditing
o Synchronous replication of tables
o Imposing security authorizations
o Preventing invalid transactions

Creating a trigger:

Syntax for creating trigger:

1. CREATE [OR REPLACE ] TRIGGER trigger_name


2. {BEFORE | AFTER | INSTEAD OF }
3. {INSERT [OR] | UPDATE [OR] | DELETE}
4. [OF col_name]
5. ON table_name
6. [REFERENCING OLD AS o NEW AS n]
7. [FOR EACH ROW]
8. WHEN (condition)
9. DECLARE
10. Declaration-statements
DATABASE MANAGEMENT SYSTEM Page
150
11. BEGIN
12. Executable-statements
13. EXCEPTION
14. Exception-handling-statements
15. END;

Here,

o CREATE [OR REPLACE] TRIGGER trigger_name: It creates or replaces an


existing trigger with the trigger_name.
o {BEFORE | AFTER | INSTEAD OF} : This specifies when the trigger would be
executed. The INSTEAD OF clause is used for creating trigger on a view.
o {INSERT [OR] | UPDATE [OR] | DELETE}: This specifies the DML operation.
o [OF col_name]: This specifies the column name that would be updated.
o [ON table_name]: This specifies the name of the table associated with the trigger.
o [REFERENCING OLD AS o NEW AS n]: This allows you to refer new and old
values for various DML statements, like INSERT, UPDATE, and DELETE.
o [FOR EACH ROW]: This specifies a row level trigger, i.e., the trigger would be
executed for each row being affected. Otherwise the trigger will execute just once
when the SQL statement is executed, which is called a table level trigger.
o WHEN (condition): This provides a condition for rows for which the trigger would
fire. This clause is valid only for row level triggers.

1. CREATE OR REPLACE TRIGGER display_salary_changes


2. BEFORE DELETE OR INSERT OR UPDATE ON customers
3. FOR EACH ROW
4. WHEN (NEW.ID > 0)
5. DECLARE
6. sal_diff number;
7. BEGIN
8. sal_diff := :NEW.salary - :OLD.salary;
9. dbms_output.put_line('Old salary: ' || :OLD.salary);
10. dbms_output.put_line('New salary: ' || :NEW.salary);
11. dbms_output.put_line('Salary difference: ' || sal_diff);
12. END;
13. /

After the execution of the above code at SQL Prompt, it produces the following result.

Trigger created.

Check the salary difference by procedure:

DATABASE MANAGEMENT SYSTEM Page


151
Use the following code to get the old salary, new salary and salary difference after the
trigger created.

1. DECLARE
2. total_rows number(2);
3. BEGIN
4. UPDATE customers
5. SET salary = salary + 5000;
6. IF sql%notfound THEN
7. dbms_output.put_line('no customers updated');
8. ELSIF sql%found THEN
9. total_rows := sql%rowcount;
10. dbms_output.put_line( total_rows || ' customers updated ');
11. END IF;
12. END;
13. /

Output:

Old salary: 20000


New salary: 25000
Salary difference: 5000
Old salary: 22000
New salary: 27000
Salary difference: 5000
Old salary: 24000
New salary: 29000
Salary difference: 5000
Old salary: 26000
New salary: 31000
Salary difference: 5000
Old salary: 28000
New salary: 33000
Salary difference: 5000
Old salary: 30000
New salary: 35000
Salary difference: 5000
6 customers updated

Note: As many times you executed this code, the old and new both salary is incremented
by 5000 and hence the salary difference is always 5000.

After the execution of above code again, you will get the following result.

Old salary: 25000


New salary: 30000
Salary difference: 5000
DATABASE MANAGEMENT SYSTEM Page
152
Old salary: 27000
New salary: 32000
Salary difference: 5000
Old salary: 29000
New salary: 34000
Salary difference: 5000
Old salary: 31000
New salary: 36000
Salary difference: 5000
Old salary: 33000
New salary: 38000
Salary difference: 5000
Old salary: 35000
New salary: 40000
Salary difference: 5000
6 customers updated

Important Points

Following are the two very important point and should be noted carefully.

o OLD and NEW references are used for record level triggers these are not avialable
for table level triggers.
o If you want to query the table in the same trigger, then you should use the AFTER
keyword, because triggers can query the table or change it again only after the
initial changes are applied and the table is back in a consistent state.

PL/SQL Trigger Example

Let's take a simple example to demonstrate the trigger. In this example, we are using the
following CUSTOMERS table:

Create table and have records:

ID NAME AGE ADDRESS SALARY

1 Ramesh 23 Allahabad 20000

2 Suresh 22 Kanpur 22000

3 Mahesh 24 Ghaziabad 24000

DATABASE MANAGEMENT SYSTEM Page


153
4 Chandan 25 Noida 26000

5 Alex 21 Paris 28000

6 Sunita 20 Delhi 30000

Create trigger:

Let's take a program to create a row level trigger for the CUSTOMERS table that would
fire for INSERT or UPDATE or DELETE operations performed on the CUSTOMERS
table. This trigger will display the salary difference between the old values and new
values:

.
UNIT-IV
TRANSACTION MANAGEMENT

What is a Transaction?
A transaction is an event which occurs on the database. Generally a transaction reads
a value from the database or writes a value to the database. If you have any concept of
Operating Systems, then we can say that a transaction is analogous to processes.

Although a transaction can both read and write on the database, there are some
fundamental differences between these two classes of operations. A read operation does
not change the image of the database in any way. But a write operation, whether
performed with the intention of inserting, updating or deleting data from the database,

DATABASE MANAGEMENT SYSTEM Page


154
changes the image of the database. That is, we may say that these transactions bring the
database from an image which existed before the transaction occurred (called the
Before Image or BFIM) to an image which exists after the transaction occurred (called
the After Image or AFIM).

The Four Properties of Transactions


Every transaction, for whatever purpose it is being used, has the following four
properties. Taking the initial letters of these four properties we collectively call them the
ACID Properties. Here we try to describe them and explain them.

Atomicity: This means that either all of the instructions within the transaction will be
reflected in the database, or none of them will be reflected.

Say for example, we have two accounts A and B, each containing Rs 1000/-. We now
start a transaction to deposit Rs 100/- from account A to Account B.
Read A;
A = A – 100;
Write A; Read
B;
B = B + 100;
Write B;

DATABASE MANAGEMENT SYSTEM Page


155
Fine, is not it? The transaction has 6 instructions to extract the amount from A and
submit it to B. The AFIM will show Rs 900/- in A and Rs 1100/- in B.

Now, suppose there is a power failure just after instruction 3 (Write A) has been
complete. What happens now? After the system recovers the AFIM will show Rs 900/-
in A, but the same Rs 1000/- in B. It would be said that Rs 100/- evaporated in thin air
for the power failure. Clearly such a situation is not acceptable.

The solution is to keep every value calculated by the instruction of the transaction not in
any stable storage (hard disc) but in a volatile storage (RAM), until the transaction
completes its last instruction. When we see that there has not been any error we do
something known as a COMMIT operation. Its job is to write every temporarily
calculated value from the volatile storage on to the stable storage. In this way, even if
power fails at instruction 3, the post recovery image of the database will show accounts
A and B both containing Rs 1000/-, as if the failed transaction had never occurred.

Consistency: If we execute a particular transaction in isolation or together with other


transaction, (i.e. presumably in a multi-programming environment), the transaction will
yield the same expected result.

To give better performance, every database management system supports the execution
of multiple transactions at the same time, using CPU Time Sharing. Concurrently
executing transactions may have to deal with the problem of sharable resources, i.e.
resources that multiple transactions are trying to read/write at the same time. For
example, we may have a table or a record on which two transaction are trying to read or
write at the same time. Careful mechanisms are created in order to prevent
mismanagement of these sharable resources, so that there should not be any change in
the way a transaction performs. A transaction which deposits Rs 100/- to account A
must deposit the same amount whether it is acting alone or in conjunction with another
transaction that may be trying to deposit or withdraw some amount at the same time.

Isolation: In case multiple transactions are executing concurrently and trying to access
DATABASE MANAGEMENT SYSTEM Page
156
a sharable resource at the same time, the system should create an ordering in their
execution so that they should not create any anomaly in the value stored at the sharable
resource.

DATABASE MANAGEMENT SYSTEM Page


157
There are several ways to achieve this and the most popular one is using some kind of
locking mechanism. Again, if you have the concept of Operating Systems, then you
should remember the semaphores, how it is used by a process to make a resource busy
before starting to use it, and how it is used to release the resource after the usage is
over. Other processes intending to access that same resource must wait during this time.
Locking is almost similar. It states that a transaction must first lock the data item that it
wishes to access, and release the lock when the accessing is no longer required. Once a
transaction locks the data item, other transactions wishing to access the same data item
must wait until the lock is released.

Durability: It states that once a transaction has been complete the changes it has made
should be permanent.

As we have seen in the explanation of the Atomicity property, the transaction, if


completes successfully, is committed. Once the COMMIT is done, the changes which
the transaction has made to the database are immediately written into permanent
storage. So, after the transaction has been committed successfully, there is no question
of any loss of information even if the power fails. Committing a transaction guarantees
that the AFIM has been reached.

There are several ways Atomicity and Durability can be implemented. One of them is
called Shadow Copy. In this scheme a database pointer is used to point to the BFIM of
the database. During the transaction, all the temporary changes are recorded into a
Shadow Copy, which is an exact copy of the original database plus the changes made
by the transaction, which is the AFIM. Now, if the transaction is required to COMMIT,
then the database pointer is updated to point to the AFIM copy, and the BFIM copy is
discarded. On the other hand, if the transaction is not committed, then the database
pointer is not updated. It keeps pointing to the BFIM, and the AFIM is discarded. This
is a simple scheme, but takes a lot of memory space and time to implement.

If you study carefully, you can understand that Atomicity and Durability is
essentially the same thing, just as Consistency and Isolation is essentially the same
thing.
DATABASE MANAGEMENT SYSTEM Page
158
DATABASE MANAGEMENT SYSTEM Page
159
Transaction States
There are the following six states in which a transaction may exist:
Active: The initial state when the transaction has just started execution.

Partially Committed: At any given point of time if the transaction is executing


properly, then it is going towards it COMMIT POINT. The values generated during the
execution are all stored in volatile storage.

Failed: If the transaction fails for some reason. The temporary values are no longer
required, and the transaction is set to ROLLBACK. It means that any change made to
the database by this transaction up to the point of the failure must be undone. If the
failed transaction has withdrawn Rs. 100/- from account A, then the ROLLBACK
operation should add Rs 100/- to account A.

Aborted: When the ROLLBACK operation is over, the database reaches the BFIM.
The transaction is now said to have been aborted.

Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All
the temporary values are written to the stable storage and the transaction is said to have
been committed.

Terminated: Either committed or aborted, the transaction finally reaches this state.

The whole process can be described using the following diagram:

COMMITTED
PARTIALLY
COMMITTED
Entry Point
ACTIVE

TERMINATED

FAILED ABORTED
DATABASE MANAGEMENT SYSTEM Page
160
DATABASE MANAGEMENT SYSTEM Page
161
Concurrent Execution
A schedule is a collection of many transactions which is implemented as a unit.
Depending upon how these transactions are arranged in within a schedule, a
schedule can be of two types:
 Serial: The transactions are executed one after another, in a non-preemptive
manner.
 Concurrent: The transactions are executed in a preemptive, time shared
method.

In Serial schedule, there is no question of sharing a single data item among many
transactions, because not more than a single transaction is executing at any point of
time. However, a serial schedule is inefficient in the sense that the transactions suffer
for having a longer waiting time and response time, as well as low amount of resource
utilization.

In concurrent schedule, CPU time is shared among two or more transactions in order to
run them concurrently. However, this creates the possibility that more than one
transaction may need to access a single data item for read/write purpose and the
database could contain inconsistent value if such accesses are not handled properly. Let
us explain with the help of an example.

Let us consider there are two transactions T1 and T2, whose instruction sets are given
as following. T1 is the same as we have seen earlier, while T2 is a new transaction.

T1
Read A;
A = A – 100;
Write A; Read
B;
B = B + 100;
Write B;

T2
Read A;
DATABASE MANAGEMENT SYSTEM Page
162
Temp = A * 0.1; Read
C;
C = C + Temp;
Write C;

DATABASE MANAGEMENT SYSTEM Page


163
T2 is a new transaction which deposits to account C 10% of the amount in account A.

If we prepare a serial schedule, then either T1 will completely finish before T2 can
begin, or T2 will completely finish before T1 can begin. However, if we want to create
a concurrent schedule, then some Context Switching need to be made, so that some
portion of T1 will be executed, then some portion of T2 will be executed and so on. For
example say we have prepared the following concurrent schedule.

T1 T2

Read A;
A = A – 100;
Write A;
Read A;
Temp = A * 0.1; Read
C;
C = C + Temp;
Write C;
Read B;
B = B + 100;
Write B;

No problem here. We have made some Context Switching in this Schedule, the first one
after executing the third instruction of T1, and after executing the last statement of T2.
T1 first deducts Rs 100/- from A and writes the new value of Rs 900/- into A. T2 reads
the value of A, calculates the value of Temp to be Rs 90/- and adds the value to C. The
remaining part of T1 is executed and Rs 100/- is added to B.

It is clear that a proper Context Switching is very important in order to maintain the
Consistency and Isolation properties of the transactions. But let us take another
example where a wrong Context Switching can bring about disaster. Consider the
following example involving the same T1 and T2

DATABASE MANAGEMENT SYSTEM Page


164
DATABASE MANAGEMENT SYSTEM Page
165
T1 T2

Read A;
A = A – 100;
Read A;
Temp = A * 0.1; Read
C;
C = C + Temp;
Write C;
Write A; Read
B;
B = B + 100;
Write B;

This schedule is wrong, because we have made the switching at the second
instruction of T1. The result is very confusing. If we consider accounts A and B both
containing Rs 1000/- each, then the result of this schedule should have left Rs 900/-
in A, Rs 1100/- in B and add Rs 90 in C (as C should be increased by 10% of the
amount in A). But in this wrong schedule, the Context Switching is being performed
before the new value of Rs 900/- has been updated in A. T2 reads the old value of A,
which is still Rs 1000/-, and deposits Rs 100/- in C. C makes an unjust gain of Rs
10/- out of nowhere.

Serializability
When several concurrent transactions are trying to access the same data item, the

DATABASE MANAGEMENT SYSTEMS Page 166


instructions within these concurrent transactions must be ordered in some way so as
there are no problem in accessing and releasing the shared data item. There are two
aspects of serializability which are described here:

DATABASE MANAGEMENT SYSTEMS Page 167


Conflict Serializability
Two instructions of two different transactions may want to access the same data
item in order to perform a read/write operation. Conflict Serializability deals with
detecting whether the instructions are conflicting in any way, and specifying the
order in which these two instructions will be executed in case there is any conflict.
A conflict arises if at least one (or both) of the instructions is a write operation. The
following rules are important in Conflict Serializability:
1. If two instructions of the two concurrent transactions are both for read
operation, then they are not in conflict, and can be allowed to take place
in any order.
2. If one of the instructions wants to perform a read operation and the other

instruction wants to perform a write operation, then they are in conflict,


hence their ordering is important. If the read instruction is performed first,
then it reads the old value of the data item and after the reading is over, the
new value of the data item is written. It the write instruction is performed
first, then updates the data item with the new value and the read instruction
reads the newly updated value.
3. If both the transactions are for write operation, then they are in conflict
but can be allowed to take place in any order, because the transaction do not
read the value updated by each other. However, the value that persists in the
data item after the schedule is over is the one written by the instruction that
performed the last write.

View Serializability:
This is another type of serializability that can be derived by creating another
schedule out of an existing schedule, involving the same set of transactions. These
two schedules would be called View Serializable if the following rules are followed
while creating the second schedule out of the first. Let us consider that the
DATABASE MANAGEMENT SYSTEMS Page 168
transactions T1 and T2 are being serialized to create two different schedules

S1 and S2 which we want to be View Equivalent and both T1 and T2 wants to


access the same data item.
1. If in S1, T1 reads the initial value of the data item, then in S2 also, T1
should read the initial value of that same data item.
2. If in S1, T1 writes a value in the data item which is read by T2, then in S2
also, T1 should write the value in the data item before T2 reads it.

DATABASE MANAGEMENT SYSTEMS Page 169


3. If in S1, T1 performs the final write operation on that data item, then in
S2 also, T1 should perform the final write operation on that data item.

Let us consider a schedule S in which there are two consecutive instructions, I and J
, of transactions Ti and Tj , respectively (i _= j). If I and J refer to different data
items, then we can swap I and J without affecting the results of any instruction
in the schedule. However, if I and J refer to the same data item Q, then the order of
the two steps may matter. Since we are dealing with only read and write
instructions, there are four cases that we need to consider:

□ I = read(Q), J = read(Q). The order of I and J does not matter, since the
same value of Q is read by Ti and Tj , regardless of the order.

□ I = read(Q), J = write(Q). If I comes before J , then Ti does not read the value
of Q that is written by Tj in instruction J . If J comes before I, then Ti reads
the value of Q that is written by Tj. Thus, the order of I and J matters.

□ I = write(Q), J = read(Q). The order of I and J matters for reasons similar to


those of the previous case.

4. I = write(Q), J = write(Q). Since both instructions are write operations, the


order of these instructions does not affect either Ti or Tj . However, the value
obtained by the next read(Q) instruction of S is affected, since the result of
only the latter of the two write instructions is preserved in the database. If
there is no other write(Q) instruction after I and J in S, then the order of I and J
directly affects the final value of Q in the database state that results from
schedule S.

DATABASE MANAGEMENT SYSTEMS Page 170


Fig: Schedule 3—showing only the read and write instructions.

DATABASE MANAGEMENT SYSTEMS Page 171


We say that I and J conflict if they are operations by different transactions on the
same data item, and at least one of these instructions is a write operation. To
illustrate the concept of conflicting instructions, we consider schedule 3in Figure
above. The write(A) instruction of T1 conflicts with the read(A) instruction of T2.
However, the write(A) instruction of T2 does not conflict with the read(B)
instruction of T1, because the two instructions access different data items.

Transaction Characteristics

Every transaction has three characteristics: access mode, diagnostics size, and
isolation level. The diagnostics size determines the number of error conditions that
can be recorded.

If the access mode is READ ONLY, the transaction is not allowed to modify
the database. Thus, INSERT, DELETE, UPDATE, and CREATE commands cannot
be executed. If we have to execute one of these commands, the access mode should
be set to READ WRITE. For transactions with READ ONLY access mode, only
shared locks need to be obtained, thereby increasing concurrency.

The isolation level controls the extent to which a given transaction is exposed to the
actions of other transactions executing concurrently. By choosing one of four
possible isolation level settings, a user can obtain greater concurrency at the cost of
increasing the transaction's exposure to other transactions' uncommitted changes.

Isolation level choices are READ UNCOMMITTED, READ COMMITTED,


REPEATABLE READ, and SERIALIZABLE. The effect of these levels is
summarized in Figure given below. In this context, dirty read and unrepeatable read
are defined as usual. Phantom is defined to be the possibility that a transaction
retrieves a collection of objects (in SQL terms, a collection of tuples) twice and sees
different results, even though it does not modify any of these tuples itself.

In terms of a lock-based implementation, a SERIALIZABLE transaction obtains

DATABASE MANAGEMENT SYSTEMS Page 172


locks before reading or writing objects, including locks on sets of objects that it
requires to be unchanged (see Section 19.3.1), and holds them until the end,
according to Strict 2PL.

REPEATABLE READ ensures that T reads only the changes made by committed
transactions, and that no value read or written by T is changed by any other
transaction until T is complete. However, T could experience the phantom
phenomenon; for example, while T examines all

DATABASE MANAGEMENT SYSTEMS Page 173


Sailors records with rating=1, another transaction might add a new such Sailors
record, which is missed by T.

A REPEATABLE READ transaction uses the same locking protocol as a


SERIALIZABLE transaction, except that it does not do index locking, that is, it
locks only individual objects, not sets of objects.

READ COMMITTED ensures that T reads only the changes made by committed
transactions, and that no value written by T is changed by any other transaction
until T is complete. However, a value read by T may well be modified by another
transaction while T is still in progress, and T is, of course, exposed to the phantom
problem.

A READ COMMITTED transaction obtains exclusive locks before writing objects


and holds these locks until the end. It also obtains shared locks before reading
objects, but these locks are released immediately; their only effect is to guarantee
that the transaction that last modified the object is complete. (This guarantee relies
on the fact that every SQL transaction obtains exclusive locks before writing objects
and holds exclusive locks until the end.)

A READ UNCOMMITTED transaction does not obtain shared locks before


reading objects. This mode represents the greatest exposure to uncommitted changes
of other transactions; so much so that SQL prohibits such a transaction from making
any changes itself - a READ UNCOMMITTED transaction is required to have an
access mode of READ ONLY. Since such a transaction obtains no locks for reading
objects, and it is not allowed to write objects (and therefore never requests exclusive
locks), it never makes any lock requests.

The SERIALIZABLE isolation level is generally the safest and is recommended


for most transactions. Some transactions, however, can run with a lower isolation
level, and the smaller number of locks requested can contribute to improved system
performance.
For example, a statistical query that finds the average sailor age can be run at the
READ COMMITTED level, or even the READ UNCOMMITTED level, because a
few incorrect or missing values will not significantly affect the result if the number
DATABASE MANAGEMENT SYSTEMS Page 174
of sailors is large. The isolation level and access mode can be set using the SET
TRANSACTION command. For example, the following command declares the
current transaction to be SERIALIZABLE and READ ONLY:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY
When a transaction is started, the default is SERIALIZABLE and READ WRITE.

PRECEDENCE GRAPH

DATABASE MANAGEMENT SYSTEMS Page 175


A precedence graph, also named conflict graph and serializability graph, is used in
the context of concurrency control in databases.

The precedence graph for a schedule S contains:

A node for each committed transaction in S


An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj's actions.

Precedence graph
example

A precedence graph of the schedule D, with 3 transactions. As there is a cycle (of


length 2; with two edges) through the committed transactions T1 and T2, this
schedule (history) is not Conflict serializable.
DATABASE MANAGEMENT SYSTEMS Page 176
The drawing sequence for the precedence graph:-

DATABASE MANAGEMENT SYSTEMS Page 177


□ For each transaction Ti participating in schedule S, create a node
labelled Ti in the precedence graph. So the precedence graph
contains T1, T2, T3
□ For each case in S where Ti executes a write_item(X) then Tj
executes a read_item(X), create an edge (Ti --> Tj) in the
precedence graph. This occurs nowhere in the above example, as
there is no read after write.
3. For each case in S where Ti executes a read_item(X) then Tj executes
a write_item(X), create an edge (Ti --> Tj) in the precedence graph.
This results in
directed edge from T1 to T2.
4. For each case in S where Ti executes a write_item(X) then Tj
executes a write_item(X), create an edge (Ti --> Tj) in the
precedence graph. This results in directed edges from T2 to T1, T1 to
T3, and T2 to T3.
5. The schedule S is conflict serializable if the precedence graph has
no cycles. As T1 and T2 constitute a cycle, then we cannot declare S
as serializable or not and serializability has to be checked using
other methods.

TESTING FOR CONFLICT SERIALIZABILITY


1 A schedule is conflict serializable if and only if its precedence graph is
acyclic.

2 To test for conflict serializability, we need to construct the precedence


graph and to invoke a cycle-detection algorithm.Cycle-detection

DATABASE MANAGEMENT SYSTEMS Page 178


algorithms exist which take order
n2 time, where n is the number of vertices in the graph.

(Better algorithms take order n + e where e is the number of edges.)

3 If precedence graph is acyclic, the serializability order can be


obtained by a topological sorting of the graph. That is, a linear order
consistent with the partial order of the graph.

For example, a serializability order for the schedule (a) would be one of either (b) or
(c)

DATABASE MANAGEMENT SYSTEMS Page 179


4 A serializability order of the transactions can be obtained by finding a
linear order consistent with the partial order of the precedence graph.

RECOVERABLE SCHEDULES
□ Recoverable schedule — if a transaction Tj reads a data item previously written
by a transaction Ti , then the commit operation of Ti must appear before the
commit operation of Tj.
□ The following schedule is not recoverable if T9 commits immediately after the
read(A) operation.

□ If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence, database must ensure that schedules are
recoverable.

CASCADING ROLLBACKS

□ Cascading rollback – a single transaction failure leads to a series of transaction


rollbacks. Consider the following schedule where none of the transactions has yet
committed (so the schedule is recoverable)

DATABASE MANAGEMENT SYSTEMS Page 180


If T10 fails, T11 and T12 must also be rolled back.

DATABASE MANAGEMENT SYSTEMS Page 181


□ Can lead to the undoing of a significant amount of work

CASCADELESS SCHEDULES

□ Cascadeless schedules — for each pair of transactions Ti and Tj such


that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj.

□ Every cascadeless schedule is also recoverable

□ It is desirable to restrict the schedules to those that are cascadeless

□ Example of a schedule that is NOT cascadeless

CONCURRENCY SCHEDULE
□ A database must provide a mechanism that will ensure that all possible
schedules are both:

□ Conflict serializable.

□ Recoverable and preferably cascadeless

□ A policy in which only one transaction can execute at a time generates serial
DATABASE MANAGEMENT SYSTEMS Page 182
schedules, but provides a poor degree of concurrency

DATABASE MANAGEMENT SYSTEMS Page 183


□ Concurrency-control schemes tradeoff between the amount of concurrency they
allow and the amount of overhead that they incur

□ Testing a schedule for serializability after it has executed is a little too late!

□ Tests for serializability help us understand why a concurrency control


protocol is correct

□ Goal – to develop concurrency control protocols that will assure


serializability.

WEEK LEVELS OF CONSISTENCY


□ Some applications are willing to live with weak levels of consistency,
allowing schedules that are not serializable

□ E.g., a read-only transaction that wants to get an approximate total balance


of all accounts

□ E.g., database statistics computed for query optimization can be


approximate (why?)

□ Such transactions need not be serializable with respect to other transactions

□ Tradeoff accuracy for performance


LEVELS OF CONSISTENCY IN SQL
□ Serializable — default

□ Repeatable read — only committed records to be read, repeated reads of same


record must return same value. However, a transaction may not be serializable –
it may find some records inserted by a transaction but not find others.

□ Read committed — only committed records can be read, but successive reads of
record may return different (but committed) values.

□ Read uncommitted — even uncommitted records may be read.

□ Lower degrees of consistency useful for gathering approximate information


about the database

DATABASE MANAGEMENT SYSTEMS Page 184


□ Warning: some database systems do not ensure serializable schedules by
default

□ E.g., Oracle and PostgreSQL by default support a level of consistency called


snapshot isolation (not part of the SQL standard)
TRANSACTION DEFINITION IN SQL
□ Data manipulation language must include a construct for specifying the set of
actions that comprise a transaction.

DATABASE MANAGEMENT SYSTEMS Page 185


□ In SQL, a transaction begins implicitly.

□ A transaction in SQL ends by:

□ Commit work commits current transaction and begins a new one.

□ Rollback work causes current transaction to abort.

□ In almost all database systems, by default, every SQL statement also


commits implicitly if it executes successfully

□ Implicit commit can be turned off by a database directive

□ E.g. in JDBC, connection.setAutoCommit(false);

RECOVERY SYSTEM

Failure Classification:
□ Transaction failure :

□ Logical errors: transaction cannot complete due to some internal error


condition

□ System errors: the database system must terminate an active transaction due to an
error condition (e.g., deadlock)

□ System crash: a power failure or other hardware or software failure causes the
system to crash.

□ Fail-stop assumption: non-volatile storage contents are assumed to not be


corrupted as result of a system crash

□ Database systems have numerous integrity checks to prevent corruption of


disk data

□ Disk failure: a head crash or similar disk failure destroys all or part of disk
storage

□ Destruction is assumed to be detectable: disk drives use checksums to detect


DATABASE MANAGEMENT SYSTEMS Page 186
failures

RECOVERY ALGORITHMS

□ Consider transaction Ti that transfers $50 from account A to account B

□ Two updates: subtract 50 from A and add 50 to B

DATABASE MANAGEMENT SYSTEMS Page 187


□ Transaction Ti requires updates to A and B to be output to the database.

□ A failure may occur after one of these modifications have been made but before
both of them are made.

□ Modifying the database without ensuring that the transaction will commit may
leave the database in an inconsistent state

□ Not modifying the database may result in lost updates if failure occurs just
after transaction commits

□ Recovery algorithms have two parts

1. Actions taken during normal transaction processing to ensure enough


information exists to recover from failures

2. Actions taken after a failure to recover the database contents to a state that
ensures atomicity, consistency and durability

STORAGE STRUCTURE

□ Volatile storage:

□ does not survive system crashes

□ examples: main memory, cache memory

□ Nonvolatile storage:

□ survives system crashes

□ examples: disk, tape, flash memory,

non-volatile (battery backed up) RAM

□ but may still fail, losing data

□ Stable storage:

DATABASE MANAGEMENT SYSTEMS Page 188


□ a mythical form of storage that survives all failures

□ approximated by maintaining multiple copies on distinct nonvolatile media

DATABASE MANAGEMENT SYSTEMS Page 189


Stable-Storage Implementation

□ Maintain multiple copies of each block on separate disks

□ copies can be at remote sites to protect against disasters such as fire or


flooding.

□ Failure during data transfer can still result in inconsistent

copies. Block transfer can result in

□ Successful completion

□ Partial failure: destination block has incorrect information

□ Total failure: destination block was never updated

□ Protecting storage media from failure during data transfer (one solution):

□ Execute output operation as follows (assuming two copies of each block):

1. Write the information onto the first physical block.

2. When the first write successfully completes, write the same information onto
the second physical block.

3. The output is completed only after the second write successfully completes.

□ Copies of a block may differ due to failure during output operation. To recover
from failure:

1. First find inconsistent blocks:

1. Expensive solution: Compare the two copies of every disk block.

2. Better solution:

□ Record in-progress disk writes on non-volatile storage (Non-volatile RAM or


special area of disk).
DATABASE MANAGEMENT SYSTEMS Page 190
□ Use this information during recovery to find blocks that may be inconsistent, and
only compare copies of these.

□ Used in hardware RAID systems

DATABASE MANAGEMENT SYSTEMS Page 191


2. If either copy of an inconsistent block is detected to have an error (bad checksum),
overwrite it by the other copy. If both have no error, but are different, overwrite
the second block by the first block.

DATA ACCESS

□ Physical blocks are those blocks residing on the disk.

□ System buffer blocks are the blocks residing temporarily in main memory.

□ Block movements between disk and main memory are initiated through the
following two operations:

□ input(B) transfers the physical block B to main memory.

□ output(B) transfers the buffer block B to the disk, and replaces the appropriate
physical block there.

□ We assume, for simplicity, that each data item fits in, and is stored inside, a
single block.

□ Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.

□ Ti's local copy of a data item X is denoted by xi.

□ BX denotes block containing X

□ Transferring data items between system buffer blocks and its private work-area
done by:

□ read(X) assigns the value of data item X to the local variable xi.

□ write(X) assigns the value of local variable xi to data item {X} in the buffer
block.

□ Transactions

□ Must perform read(X) before accessing X for the first time (subsequent reads
can be from local copy)

DATABASE MANAGEMENT SYSTEMS Page 192


□ The write(X) can be executed at any time before the transaction commits

□ Note that output(BX) need not immediately follow write(X). System can
perform the output operation when it seems fit.

DATABASE MANAGEMENT SYSTEMS Page 193


Lock-Based Protocols
A lock is a mechanism to control concurrent access to a
data item Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read
as well as written. X-lock is requested using lock-
X instruction.
2. shared (S) mode. Data item can only be read.
S-lock is requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can
proceed only after request is granted.
Lock-compatibility matrix

1) A transaction may be granted a lock on an item if the requested lock is compatible


with locks already held on the item by other transactions
2) Any number of transactions can hold shared locks on an item,
but if any transaction holds an exclusive on the item no other transaction may hold
any lock on the item.
3) If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is
then granted.

Example of a transaction performing locking:


T2: lock-S(A); read (A);
unlock(A); lock-S(B);
read (B); unlock(B);
display(A+B)
Locking as above is not sufficient to guarantee serializability — if A and B get
updated in-between the read of A and B, the displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions while requesting
and releasing locks. Locking protocols restrict the set of possible schedules.

Consider the partial schedule


DATABASE MANAGEMENT SYSTEMS Page 194
DATABASE MANAGEMENT SYSTEMS Page 195
Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for
T3 to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to
release its lock on A.
Such a situation is called a deadlock.
l To handle a deadlock one of T3 or T4 must be
rolled back and its locks released.
2. The potential for deadlock exists in most locking protocols. Deadlocks are a
necessary evil.
3. Starvation is also possible if concurrency control manager is badly
designed. For example:
a. A transaction may be waiting for an X-lock on an item, while a
sequence of other transactions request and are granted an S-lock on
the same item.
b. The same transaction is repeatedly rolled back due to
deadlocks. 4.Concurrency control manager can be designed to
prevent starvation.

THE TWO-PHASE LOCKING PROTOCOL

1.This is a protocol which ensures conflict-serializable


schedules. 2.Phase 1: Growing Phase
a.transaction may obtain locks b.transaction
may not release locks
3. Phase 2: Shrinking Phase
a.transaction may release
locks
DATABASE MANAGEMENT SYSTEMS Page 196
b.transaction may not obtain locks
4. The protocol assures serializability. It can be proved that the transactions
can be serialized in the order of their lock points (i.e. the point where a
transaction acquired its final lock).

5. Two-phase locking does not ensure freedom from deadlocks

DATABASE MANAGEMENT SYSTEMS Page 197


6. Cascading roll-back is possible under two-phase locking. To avoid this,
follow a modified protocol called strict two-phase locking. Here a
transaction must hold all its exclusive locks till it commits/aborts.
7. Rigorous two-phase locking is even stricter: here all locks are held till
commit/abort. In this protocol transactions can be serialized in the order in
which they commit.
8. There can be conflict serializable schedules that cannot be obtained
if two-phase locking is used.
9. However, in the absence of extra information (e.g., ordering of access
to data), two- phase locking is needed for conflict serializability in the
following sense:
Given a transaction Ti that does not follow two-phase locking, we can find a
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not
conflict serializable.

TIMESTAMP-BASED PROTOCOLS

1. Each transaction is issued a timestamp when it enters the system. If an old


transaction Ti has time-stamp TS(Ti), a new transaction Tj is assigned time-
stamp TS(Tj) such that TS(Ti) <TS(Tj).
2. The protocol manages concurrent execution such that the time-stamps
determine the serializability order.
3. In order to assure such behavior, the protocol maintains for each data Q two
timestamp values:
a.W-timestamp(Q) is the largest time-stamp of any transaction that executed
write(Q) successfully.
b.R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully.

4. Thetimestamp ordering protocol ensures that any conflicting read and


write
operations are executed in timestamp order.
5.Suppose a transaction Ti issues a read(Q)
1. If TS(Ti) W-timestamp(Q), then Ti needs to read a value of Q
that
was already overwritten.
n Hence, the read operation is rejected, and Ti is rolled back.
2. If TS(Ti) W-timestamp(Q), then the read operation is
executed, and R- timestamp(Q) is set to max(R-timestamp(Q),
TS(Ti)).
DATABASE MANAGEMENT SYSTEMS Page 198
6. Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing
was needed previously, and the system assumed that that value
would never be produced.
n Hence, the write operation is rejected, and Ti is rolled back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete
value of Q. n Hence, this write operation is rejected, and Ti is
rolled back.
3. Otherwise, the write operation is executed, and W-timestamp(Q) is set
to TS(Ti).

Thomas’ Write Rule

DATABASE MANAGEMENT SYSTEMS Page 199


1. Wenow present a modification to the timestamp-ordering protocol that allows
greater potential concurrency than does the protocol i.e., Timestamp ordering
Protocol . Let us consider schedule 4 of Figure below, and apply the timestamp-
ordering protocol. Since T27 starts before T28, we shall assume that TS(T27) <
TS(T28). The read(Q) operation of T27 succeeds, as does the write(Q) operation of
T28. When T27 attempts its write(Q) operation, we find that TS(T27) < W-
timestamp(Q), since Wtimestamp(Q) = TS(T28). Thus, the write(Q) by T27 is
rejected and transaction T27 must be rolled back.

2. Although the rollback of T27 is required by the timestamp-ordering protocol, it is


unnecessary. Since T28 has already written Q, the value that T27 is attempting to
write is one that will never need to be read. Any transaction Ti with TS(Ti ) <
TS(T28) that attempts a read(Q)will be rolled back, since TS(Ti)<W-timestamp(Q).

3. Anytransaction Tj with TS(Tj ) > TS(T28) must read the value of Q written by
T28, rather than the value that T27 is attempting to write. This observation leads to
a modified version of the timestamp-ordering protocol in which obsolete write
operations can be ignored under certain circumstances. The protocol rules for read
operations remain unchanged. The protocol rules for write operations, however,
are slightly different from the timestamp- ordering protocol.

The modification to the timestamp-ordering protocol, called Thomas’ write rule, is


this: Suppose that transaction Ti issues write(Q).
1. If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was
previously needed, and it had been assumed that the value would never be
produced. Hence, the system rejects the write operation and rolls Ti back.
DATABASE MANAGEMENT SYSTEMS Page 200
2. If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value
of Q. Hence, this write operation can be ignored.
3. Otherwise, the system executes the write operation and setsW-
timestamp(Q) to TS(Ti ).

VALIDATION-BASED PROTOCOLS
Phases in Validation-Based Protocols

DATABASE MANAGEMENT SYSTEMS Page 201


1) Read phase. During this phase, the system executes transaction Ti.
It reads the values of the various data items and stores them in variables local to
Ti. It performs all write operations on temporary local variables, without
updates of the actual database.
2) Validation phase. The validation test is applied to transaction Ti. This
determines whether Ti is allowed to proceed to the write phase without causing a
violation of serializability.
If a transaction fails the validation test, the system aborts the transaction.
3) Write phase. If the validation test succeeds for transaction Ti, the
temporary local variables that hold the results of any write operations performed
by Ti are copied to the database. Read-only transactions omit this phase.
MODES IN VALIDATION-BASED PROTOCOLS
1. Start(Ti)
2. Validation(Ti )
3. Finish
MULTIPLE GRANULARITY.

multiple granularity locking (MGL) is a locking method used in database


management systems (DBMS) and relational databases.

In MGL, locks are set on objects that contain other objects. MGL exploits the
hierarchical nature of the contains relationship. For example, a database may have
files, which contain pages, which further contain records. This can be thought of as a
tree of objects, where each node contains its children. A lock on such as a shared or
exclusive lock locks the targeted node as well as all of its descendants.

Multiple granularity locking is usually used with non-strict two-phase locking to


guarantee serializability. The multiple-granularity locking protocol uses these
lock modes to ensure serializability. It requires that a transaction Ti that attempts to
lock a node Q must follow these rules:
□ Transaction Ti must observe the lock-compatibility function of Figure above.
□ Transaction Ti must lock the root of the tree first, and can lock it in anymode.
□ Transaction Ti can lock a node Q in S or IS mode only if Ti currently has the
parent of Q
locked in either IX or IS mode.
□ Transaction Ti can lock a node Q in X, SIX, or IX mode only if Ti
currently has the parent of Q locked in either IX or SIX mode.
□ Transaction Ti can lock a node only if Ti has not previously unlocked
any node (that is, Ti is two phase).
□ Transaction Ti can unlock a node Q only if Ti currently has none of the children
of
DATABASE MANAGEMENT SYSTEMS Page 202
Q locked.

DATABASE MANAGEMENT SYSTEMS Page 203


UNIT-V
RECOVERY AND ATOMICITY
RECOVERY AND ATOMICITY
FAILURE WITH LOSS OF NON-VOLATILE STORAGE
What would happen if the non-volatile storage like RAM abruptly crashes? All
transaction, which are being executed are kept in main memory. All active logs, disk
buffers and related data is stored in non-volatile storage.
When storage like RAM fails, it takes away all the logs and active copy of database.
It makes recovery almost impossible as everything to help recover is also lost.
Following techniques may be adopted in case of loss of non-volatile storage.
□ A mechanism like checkpoint can be adopted which makes the entire content of
database be saved periodically.
□ State of active database in non-volatile memory can be dumped onto stable storage
periodically, which may also contain logs and active transactions and buffer
blocks.
□ <dump> can be marked on log file whenever the database contents are dumped
from non- volatile memory to a stable one.

RECOVERY:
□ When the system recovers from failure, it can restore the latest dump.
□ It can maintain redo-list and undo-list as in checkpoints.
□ It can recover the system by consulting undo-redo lists to restore the state of all
transaction up to last checkpoint.

DATABASE BACKUP & RECOVERY FROM CATASTROPHIC FAILURE


Remote backup, described next, is one of the solutions to save life. Alternatively,
whole database backups can be taken on magnetic tapes and stored at a safer place.
This backup can later be restored on a freshly installed database and bring it to the
state at least at the point of backup.

REMOTE BACKUP
Remote backup provides a sense of security and safety in case the primary location
where the database is located gets destroyed. Remote backup can be offline or real-
time and online. In case it is offline it is maintained manually.

[Image: Remote Data Backup]

DATABASE MANAGEMENT SYSTEMS Page 204


DATABASE MANAGEMENT SYSTEMS Page 205
DBMS DATA RECOVERY
CRASH RECOVERY
Though we are living in highly technologically advanced era where hundreds of
satellite monitor the earth and at every second billions of people are connected
through information technology, failure is expected but not every time acceptable.
FAILURE CLASSIFICATION
To see where the problem has occurred we generalize the failure into various
categories, as follows:
Transaction failure When a transaction is failed to execute or it reaches a point
after which it cannot be completed successfully it has to abort. This is called
transaction failure. Where only few transaction or process are hurt. Reason for
transaction failure could be:
□ Logical errors: where a transaction cannot complete because of it has some code
error or any internal error condition
□ System errors: where the database system itself terminates an active transaction
because DBMS is not able to execute it or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability systems
aborts an active transaction.
SYSTEM CRASH
There are problems, which are external to the system, which may cause the system
to stop abruptly and cause the system to crash. For example interruption in power
supply, failure of underlying hardware or software failure. Examples may include
operating system errors.
DISK FAILURE: In early days of technology evolution, it was a common problem
where hard disk drives or storage drives used to fail frequently. Disk failures include
formation of bad sectors, unreachability to the disk, disk head crash or any other
failure, which destroys all or part of disk storage
STORAGE STRUCTURE
We have already described storage system here. In brief, the storage structure can be
divided in various categories:
□ Volatile storage: As name suggests, this storage does not survive system crashes
and mostly placed very closed to CPU by embedding them onto the chipset
itself for examples: main memory, cache memory. They are fast but can store a
small amount of information.
□ Nonvolatile storage: These memories are made to survive system crashes. They
are huge in data storage capacity but slower in accessibility. Examples may
include, hard disks, magnetic tapes, flash memory, non-volatile (battery backed up)
RAM.
RECOVERY AND ATOMICITY
When a system crashes, it many have several transactions being executed and
DATABASE MANAGEMENT SYSTEMS Page 206
various files opened for them to modifying data items. As we know that transactions
are made of various operations, which are atomic in nature.
□ It should check the states of all transactions, which were being executed.
□ A transaction may be in the middle of some operation; DBMS must ensure the
atomicity of transaction in this case. It should check whether the transaction
can be completed now orneeds to be rolled back.
□ No transactions would be allowed to left DBMS in inconsistent state. There are
two types of techniques, which can help DBMS in recovering as well as
maintaining the atomicity of transaction:

DATABASE MANAGEMENT SYSTEMS Page 207


□ Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
□ Maintaining shadow paging, where are the changes are done on a volatile memory
and later the actual database is updated.
LOG-BASED RECOVERY
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to actual modification and
stored on a stable storage media, which is failsafe.
Log based recovery works as follows:
□ The log file is kept on stable storage media
□ When a transaction enters the system and starts execution, it writes a log about it
<Tn, Start>
□ When the transaction modifies an item X, it write logs as follows: <Tn,
X, V1, V2> It reads Tn has changed the value of X, from V1 to V2.
When transaction finishes, it logs: <Tn,
commit> Database can be modified using
two approaches:
1. Deferred database modification: All logs are written on to the stable storage
and database is updated when transaction commits.
2. Immediate database modification: Each log follows an actual database
modification. That is, database is modified immediately after every operation.
RECOVERY WITH CONCURRENT TRANSACTIONS
When more than one transactions are being executed in parallel, the logs are
interleaved. At the time of recovery it would become hard for recovery system to
backtrack all logs, and then start recovering. To ease this situation most modern
DBMS use the concept of 'checkpoints'.
Checkpoint Keeping and maintaining logs in real time and in real environment may
fill out all the memory space available in the system. At time passes log file may be
too big to be handled at all. Checkpoint is a mechanism where all the previous logs
are removed from the system and stored permanently in storage disk. Checkpoint
declares a point before which the DBMS was in consistent state and all the
transactions were committed.
Recovery When system with concurrent transaction crashes and recovers, it does
behave in the following manner:

DATABASE MANAGEMENT SYSTEMS Page 208


BUFFER MANAGEMENT
1. Database buffers are generally implemented in virtual memory in spite of
some drawbacks:
a.When operating system needs to evict a page that has been modified, the page is
written to swap space on disk.

DATABASE MANAGEMENT SYSTEMS Page 209


b.When database decides to write buffer page to disk, buffer page may be in swap
space, and may have to be read from swap space on disk and output to the database
on disk, resulting in extra I/O!
Known as dual paging problem.
c. Ideally when OS needs to evict a page from the buffer, it should pass
control to database, which in turn should
 Output the page to database instead of to swap space (making
sure to output log records first), if it is modified
 Release the page from the buffer, for the OS to use
Dual paging can thus be avoided, but common operating systems do not support
such functionality.

FUZZY CHECKPOINTING
a.To avoid long interruption of normal processing during checkpointing, allow
updates to happen during checkpointing
b.Fuzzy checkpointing is done as follows:
1. Temporarily stop all updates by transactions
2. Write a <checkpoint L> log record and force log to stable storage
3. Note list M of modified buffer blocks
4. Now permit transactions to proceed with their actions
5. Output to disk all modified buffer blocks in list M
H blocks should not be updated while being output
H Follow WAL: all log records pertaining to a block must be output before the
block is output
6. Store a pointer to the checkpoint record in a fixed position
last_checkpoint on disk

7.When recovering using a fuzzy checkpoint, start scan from the


checkpoint record pointed to by last_checkpoint
a. Log records before last_checkpoint have their updates reflected
in database on disk, and need not be redone.

DATABASE MANAGEMENT SYSTEMS Page 210


FAILURE WITH LOSS OF NONVOLATILE STORAGE
a.So far we assumed no loss of non-volatile storage
b.Technique similar to checkpointing used to deal with loss of non-volatile storage
1.Periodically dump the entire content of the database to stable storage
2. No transaction may be active during the dump procedure; a procedure
similar to checkpointing must take place

DATABASE MANAGEMENT SYSTEMS Page 211


 Output all log records currently residing in main memory
onto stable storage.
 Output all buffer blocks onto the disk.
 Copy the contents of the database to stable storage.
 Output a record <dump> to log on stable storage.

RECOVERING FROM FAILURE OF NON-VOLATILE STORAGE


a.To recover from disk failure
1. restore database from most recent dump.
2. Consult the log and redo all transactions that committed after the
dump
b.Can be extended to allow transactions to be active during
dump; known as fuzzy dump or online dump
1. Similar to fuzzy checkpointing
ARIES RECOVERY ALGORITHM

a.ARIES is a state of the art recovery method


1. Incorporates numerous optimizations to reduce overheads during
normal processing and to speed up recovery
2. The recovery algorithm we studied earlier is modeled after ARIES,
but greatly simplified by removing optimizations
b.Unlike the recovery algorithm described earlier, ARIES
1. Uses log sequence number (LSN) to identify log records
Stores LSNs in pages to identify what updates have already been applied to a
database page
2. Physiological redo
3. Dirty page table to avoid unnecessary redos during recovery
4. Fuzzy checkpointing that only records information about dirty pages,
and does not require dirty pages to be written out at checkpoint time

ARIES Recovery Algorithm


ARIES recovery involves three
passes a.Analysis pass:
Determines
l Which transactions to undo
l Which pages were dirty (disk version not up to date) at
time of crash l RedoLSN: LSN from which redo should start
b.Redo pass:
l Repeats history, redoing all actions from RedoLSN
DATABASE MANAGEMENT SYSTEMS Page 212
RecLSN and PageLSNs are used to avoid redoing actions already reflected on
page
c.Undo pass:
l Rolls back all incomplete transactions
Transactions whose abort was complete earlier are not undone
– Key idea: no need to undo these transactions: earlier
undo actions were logged, and are redone as required

DATABASE MANAGEMENT SYSTEMS Page 213


DBMS FILE STRUCTURE
Relative data and information is stored collectively in file formats. A file is
sequence of records stored in binary format.
FILE ORGANIZATION
The method of mapping file records to disk blocks defines file organization, i.e. how
the file records are organized. The following are the types of file organization

Heap File Organization: When a file is created using Heap File Organization
mechanism, the Operating Systems allocates memory area to that file without any
further accounting details. File records can be placed anywhere in that memory area.
□ Sequential File Organization: Every file record contains a data field (attribute) to
uniquely identify that record. In sequential file organization mechanism, records
are placed in the file in the some sequential order based on the unique key field or
search key. Practically, it is not possible to store all the records sequentially in
physical form.

Hash File Organization: This mechanism uses a Hash function computation on


some field of the records. As we know, that file is a collection of records, which has
to be mapped on some block of the disk space allocated to it.
□ Clustered File Organization: Clustered file organization is not considered good
for large databases. In this mechanism, related records from one or more relations
are kept in a same disk block, that is, the ordering of records is not based on primary
key or search key.
FILE OPERATIONS
Operations on database files can be classified into two categories broadly.
□ Update Operations
□ Retrieval Operations
Update operations change the data values by insertion, deletion or update. Retrieval
operations on the other hand do not alter the data but retrieve them after optional
conditional filtering. In both types of operations, selection plays significant role.
DATABASE MANAGEMENT SYSTEMS Page 214
Other than creation and deletion of a file, there could be several operations, which
can be done on files
. Open: A file can be opened in one of two modes, read mode or write mode. In
read mode, operating system does not allow anyone to alter data it is solely for
reading purpose. Files opened in read mode can be shared among several entities.
The other mode is write mode, in which, data modification is allowed. Files opened
in write mode can be read also but cannot be shared.

DATABASE MANAGEMENT SYSTEMS Page 215


□ Locate: Every file has a file pointer, which tells the current position where the data
is to be read or written. This pointer can be adjusted accordingly. Using find (seek)
operation it can be moved forward or backward.
□ Read: By default, when files are opened in read mode the file pointer points to the
beginning of file. There are options where the user can tell the operating system to
where the file pointer to be located at the time of file opening. The very next data
to the file pointer is read.
□ Write: User can select to open files in write mode, which enables them to edit
the content of file. It can be deletion, insertion or modification. The file pointer can
be located at the time of opening or can be dynamically changed if the operating
system allowed doing so.
□ Close: This also is most important operation from operating system point of view.
When a request to close a file is generated, the operating system removes all the
locks (if in shared mode) and saves the content of data (if altered) to the
secondary storage media and release all the buffers and file handlers associated
with the file.

DBMS INDEXING
We know that information in the DBMS files is stored in form of records.
Every record is equipped with some key field, which helps it to be recognized
uniquely.
Indexing is defined based on its indexing attributes. Indexing can be one of the
following types:
□ Primary Index: If index is built on ordering 'key-field' of file it is called
Primary Index. Generally it is the primary key of the relation.
□ Secondary Index: If index is built on non-ordering field of file it is called
Secondary Index.
□ Clustering Index: If index is built on ordering non-key field of file it is called
Clustering Index.

Ordering field is the field on which the records of file are ordered. It can be
different from primary or candidate key of a file.
Ordered Indexing is of two types:
□ Dense Index
□ Sparse Index

Dense Index

In dense index, there is an index record for every search key value in the database.

DATABASE MANAGEMENT SYSTEMS Page 216


This makes searching faster but requires more space to store index records
itself. Index record contains search key value and a pointer to the actual record on
the disk.

Sparse Index
In sparse index, index records are not created for every search key. An index
record here contains search key and actual pointer to the data on the disk. To search

a record we first proceed by index record and reach at the actual location of the data.

DATABASE MANAGEMENT SYSTEMS Page 217


Multilevel Index
Index records are comprised of search-key value and data pointers. This index itself
is stored on the disk along with the actual database files. As the size of database
grows so does the size of indices.

Multi-level Index helps breaking down the index into several smaller indices in
order to make the outer most level so small that it can be saved in single disk
block which can easily be accommodated anywhere in the main memory.
B+ TREE
B<sup+< sup=""> tree is multi-level index format, which is balanced binary
search trees. As mentioned earlier single level index records becomes large as the
database size grows, which also degrades performance.</sup+<> All leaf nodes
of B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at
the same height, thus balanced. Additionally, all leaf nodes are linked using link
list, which makes B+ tree to support random access as well as sequential access.
STRUCTURE OF B+ TREE
DATABASE MANAGEMENT SYSTEMS Page 218
Every leaf node is at equal distance from the root node. A B+ tree is of order n
where n is fixed for every B+ tree.

DATABASE MANAGEMENT SYSTEMS Page 219


Internal nodes:
□ Internal (non-leaf) nodes contains at least ⌈n/2⌉ pointers, except the root node.
□ At most, internal nodes contain n pointers.
Leaf nodes:
□ Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values
□ At most, leaf nodes contain n record pointers and n key values
□ Every leaf node contains one block pointer P to point to next leaf node and forms
a linked list.
B+ tree insertion
□ B+ tree are filled from bottom. And each node is inserted at leaf node.
□ If leaf node overflows:
o Split node into two parts
o Partition at i = ⌊(m+1)/2⌋
o First i entries are stored in one node
o Rest of the entries (i+1 onwards) are moved to a
new node ith key is duplicated in the parent of the
leaf
□ If non-leaf node overflows:
o Split node into two parts
o Partition the node at i = ⌈(m+1)/2⌉
o Entries upto i are kept in one node
o Rest of the entries are moved to a new node

B+ tree deletion
□ B+ tree entries are deleted leaf nodes.
□ The target entry is searched and deleted.
o If it is in internal node, delete and replace with the entry from the left position.
□ After deletion underflow is
tested o If underflow occurs
□ Distribute entries from nodes
left to it. o If distribution from left is
not possible
□ Distribute from nodes right to it
o If distribution from left and right is not possible
□ Merge the node with left and right to it.
DATABASE MANAGEMENT SYSTEMS Page 220
DBMS HASHING
For a huge database structure it is not sometime feasible to search index through all
its level and then reach the destination data block to retrieve the desired data.
Hashing is an effective technique to calculate direct location of data record on the
disk without using index structure. Hash Organization
□ Bucket: Hash file stores data in bucket format. Bucket is considered a unit of
storage. Bucket typically stores one complete disk block, which in turn can store
one or more records.

DATABASE MANAGEMENT SYSTEMS Page 221


□ Hash Function: A hash function h, is a mapping function that maps all set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.
Static Hashing In static hashing, when a search-key value is provided the hash
function always computes the same address.

Operation:
□ Insertion: When a record is required to be entered using static hash, the hash
function h, computes the bucket address for search key K, where the record will be
stored.

Bucket address = h(K)


□ Search: When a record needs to be retrieved the same hash function can be used
to retrieve the address of bucket where the data is stored.
□ Delete: This is simply search followed by deletion operation.

Bucket Overflow:
The condition of bucket-overflow is known as collision. This is a fatal state for
any static hash function. In this case overflow chaining can be used.
□ Overflow Chaining: When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.

DATABASE MANAGEMENT SYSTEMS Page 222


□ Linear Probing: When hash function generates an address at which data is
already stored, the next free bucket is allocated to it. This mechanism is called
Open Hashing.

DATABASE MANAGEMENT SYSTEMS Page 223


For a hash function to work efficiently and effectively the following must match:
□ Distribution of records should be uniform
□ Distribution should be random instead of any ordering

Dynamic Hashing
Problem with static hashing is that it does not expand or shrink dynamically as the
size of database grows or shrinks. Dynamic hashing provides a mechanism in which
data buckets are added and removed dynamically and on-demand. Dynamic hashing
is also known as extended hashing.

Operation
□ Querying: Look at the depth value of hash index and use those bits to compute
the bucket address.
□ Update: Perform a query as above and update data.
□ Deletion: Perform a query to locate desired data and delete data.
□ Insertion: compute the address of bucket o If the bucket is already full
□ Add more buckets
□ Add additional bit to hash value
□ Re-compute the hash function o Else
□ Add data to the bucket o If all buckets are full, perform the remedies of static
hashing. Hashing is not favorable when the data is organized in some ordering
and queries require range of data. When data is discrete and random, hash
DATABASE MANAGEMENT SYSTEMS Page 224
performs the best. Hashing algorithm and implementation have high complexity
than indexing. All hash operations are done in constant time.
QUERY OPTIMIZATION
Query Optimization works in a similar way:

DATABASE MANAGEMENT SYSTEMS Page 225


There can be many different ways to get an answer from a given query. The
result would be same in all scenarios.
DBMS strive to process the query in the most efficient way (in terms of ‘Time’) to
produce the answer.
Cost = Time needed to get all answers
Query optimization is the process of selecting the most efficient query-
evaluation plan from among the many strategies usually possible for processing a
given query, especially if the query is complex.
□ One aspect of optimization occurs at the relational-algebra level, where the system
attempts to find an expression that is equivalent to the given expression, but more
efficient to execute.
Another aspect is selecting a detailed strategy for processing the query, such as
choosing the algorithm to use for executing an operation, choosing the specific
indices to use, and so on.
The estimation should be accurate and easy. Another important point is the need for
being logically consistent because the least cost plan will always be consistently
low.
Steps in a Cost-based query optimization
1. Parsing
2. Transformation
3. Implementation
4. Plan selection based on cost estimates

QUERY FLOW

□ Query Parser – Verify validity of the SQL statement. Translate query into an
DATABASE MANAGEMENT SYSTEMS Page 226
internal structure using relational calculus.
□ Query Optimizer – Find the best expression from various different algebraic
expressions. Criteria used is ‘Cheapness’
□ Code Generator/Interpreter – Make calls for the Query processor as a result of the
work done by the optimizer.
□ Query Processor – Execute the calls obtained from the code generator.

DATABASE MANAGEMENT SYSTEMS Page 227


MEASURES OF QUERY COST
There are multiple possible evaluation plans for a query, and it is important to be
able to compare the alternatives in terms of their (estimated) cost, and choose the
best plan. To do so, we must estimate the cost of individual operations, and
combine them to get the cost of a query evaluation plan. Thus, as we study
evaluation algorithms for each operation later in this chapter, we also outline how
to estimate the cost of the operation.

We use the number of block transfers from disk and the number of disk seeks to
estimate the cost of a query-evaluation plan. If the disk subsystem takes an
average of tT seconds to transfer a block of data, and has an average block-
access time (disk seek time plus rotational latency) of tSseconds, then an
operation that transfers b blocks and performs S seeks would take b ∗ tT +
S ∗ tSseconds. The values of tT and tS must be calibrated for the disk system
used, but typical values for high-end disks today would be tS = 4 milliseconds and
tT = 0.1 milliseconds, assuming a 4-kilobyte block size and a transfer rate of 40
megabytes per second.

tT – time to transfer one


block tS – time for one
seek
Cost for b block transfers plus
S seeks b * tT + S * tS

DATABASE MANAGEMENT SYSTEMS Page 228


DATABASE MANAGEMENT SYSTEMS Page 229

You might also like