Dbms Unit-1 and 2, 3,4,5 Notes
Dbms Unit-1 and 2, 3,4,5 Notes
DBMS stands for Database Management System. We can break it like this DBMS =
Database + Management System. Database is a collection of data and Management
System is a set of programs to store and retrieve those data. Based on this we can define
DBMS like this: DBMS is a collection of inter-related data and set of programs to store
& access those data in an easy and effective manner.
Storage: According to the principles of database systems, the data is stored in such a
way that it acquires lot less space as the redundant data (duplicate data) has been
removed before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account
and another is salary account. Let’s say bank stores saving account data at one place
(these places are called tables we will learn them later) and salary account data at another
place, in that case if the customer information such as customer name, address etc. are
stored at both places then this is just a wastage of storage (redundancy/ duplication of
data), to organize the data in a better way the information should be stored at one place
and both the accounts should be linked to that information somehow. The same thing we
achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an optimized and systematic
manner, it is also important that we retrieve the data quickly when needed. Database
systems ensure that the data is retrieved as quickly as possible.
The main purpose of database systems is to manage the data. Consider a university that
keeps the data of students, teachers, courses, books etc. To manage this data we need to
store this data somewhere where we can add new data, delete unused data, update
outdated data, retrieve data, to perform these operations on data we need a Database
DATABASE MANAGEMENT SYSTEM Page 1
management system that allows us to store the data in such a way so that all these
operations can be performed on the data efficiently.
For example: Lets say Steve transfers 100$ to Negan’s account. This transaction
consists multiple operations such as debit 100$ from Steve’s account, credit 100$ to
Negan’s account. Like any other device, a computer system can fail lets say it fails
after first operation then in that case Steve’s account would have been debited by
100$ but the amount was not credited to Negan’s account, in such case the rollback
of operation should occur to maintain the atomicity of transaction. It is difficult to
achieve atomicity in file processing systems.
Data Security: Data should be secured from unauthorised access, for example a
student in a college should not be able to see the payroll details of the teachers, such
kind of security constraints are difficult to apply in file processing systems.
There are several advantages of Database management system over file system. Few of
them are as follows:
DATABASE MANAGEMENT SYSTEM Page 3
No redundant data: Redundancy removed by data normalization. No data
duplication saves storage and improves access time.
Data Consistency and Integrity: As we discussed earlier the root cause of data
inconsistency is data redundancy, since data normalization takes care of the data
redundancy, data inconsistency also been taken care of as part of it
Data Security: It is easier to apply access constraints in database systems so that
only authorized user is able to access the data. Each user has a different set of
access thus data is secured from the issues such as identity theft, data leaks and
misuse of data.
Privacy: Limited access means privacy of data.
Easy access to data – Database systems manages data in such a way so that the
data is easily accessible with fast response times.
Easy recovery: Since database systems keeps the backup of data, it is easier to do a
full recovery of data in case of a failure.
Flexible: Database systems are more flexible than file processing systems.
Disadvantages of DBMS:
DBMS Architecture
The architecture of DBMS depends on the computer system on which it runs. For
example, in a client-server DBMS architecture, the database systems at server machine
can run several requests made by client machine. We will understand this communication
with the help of diagrams.
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture,
applications on the client end can directly communicate with the database at the
server side. For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing
and transaction management.
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
1. External level
It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with
the help of conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table
definition etc. user is only concerned about data which is what returned back to the view
level after it has been fetched from database (present at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
It is also called logical level. The whole design of the database such as relationship
among data, schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture. This
level is maintained by DBA (database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the
data. This is the lowest level of the architecture.
Abstraction is one of the main features of database systems. Hiding irrelevant details
from user and providing abstract view of data to users, helps in easy and efficient user-
database interaction. In the previous tutorial, we discussed the three level of DBMS
architecture, The top level of that architecture is “view level”. The view level provides
the “view of data” to the users and hides the irrelevant details such as data relationship,
database schema, constraints, security etc from the user.
To fully understand the view of data, you must have a basic knowledge of data
abstraction and instance & schema.
Database systems are made-up of complex data structures. To ease the user interaction
with database, the developers hide internal irrelevant details from users. This process of
hiding irrelevant details from user is called data abstraction.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction
with database system.
Example: Let’s say we are storing customer information in a customer table. At physical
level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.)
in memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with
their data types, their relationship among each other can be logically implemented. The
programmers generally work at this level because they are aware of such things about
database systems.
At view level, user just interact with system with the help of GUI and enter the details at
the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.
DBMS Schema
For example: In the following diagram, we have a schema that shows the relationship
between three tables: Course, Student and Section. The diagram only shows the design of
the database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.
The design of a database at physical level is called physical schema, how the data stored
in blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of
data records gets stored in data structures, however the internal details such as
implementation of data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end
user interaction with database systems.
To learn more about these schemas, refer 3 level data abstraction architecture.
DATABASE MANAGEMENT SYSTEM Page 11
DBMS Instance
For example, lets say we have a single table student in the database, today the table has
100 records, so today the instance of the database has 100 records. Lets say we are going
to add another 100 records in this table by tomorrow so the instance of database
tomorrow will have 200 records in table. In short, at a particular moment the data stored
in database is called the instance, that changes over time when we add or delete data
from the database.
DBMS languages
Database languages are used to read, update and store data in a database. There are
several such languages that can be used for this purpose; one of them is SQL (Structured
Query Language).
DDL is used for specifying the database schema. It is used for creating tables, schema,
indexes, constraints etc. in database. Lets see the operations that we can perform on
database using DDL:
All of these commands either defines or update the database schema that’s why they
come under Data Definition language.
DML is used for accessing and manipulating data in a database. The following
operations on database comes under DML:
In practical data definition language, data manipulation language and data control
languages are not separate language, rather they are the parts of a single database
language such as SQL.
The changes in the database that we made using DML commands are either performed or
rollbacked using TCL.
Object based logical Models – Describe data at the conceptual and view levels.
1. E-R Model
2. Object oriented Model
Record based logical Models – Like Object based model, they also describe data at the
conceptual and view levels. These models specify logical structure of database with
records, fields and attributes.
1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it has
graph-like structure rather than a tree-based structure. Unlike hierarchical model,
this model allows each record to have more than one parent record.
Physical Data Models – These models describe data at the lowest level of abstraction.
An ER diagram shows the relationship among entity sets. An entity set is a group of
similar entities and these entities can have attributes. In terms of DBMS, an entity is a
table or attribute of a table in database, so by showing relationship among tables and
their attributes, ER diagram shows the complete logical structure of a database. Lets have
a look at a simple ER diagram to understand this concept.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship.
The relationship between Student and College is many to one as a college can have many
students however a student cannot study in multiple colleges at the same time. Student
DATABASE MANAGEMENT SYSTEM Page 14
entity has attributes such as Stu_Id, Stu_Name & Stu_Addr and College entity has
attributes such as Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss
these terms in detail in the next section(Components of a ER Diagram) of this guide so
don’t worry too much about these terms now, just go through them once.
Components of a ER Diagram
1. Entity
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
DATABASE MANAGEMENT SYSTEM Page 16
double rectangle. For example – a bank account cannot be uniquely identified without
knowing the bank to which the account belongs, so bank account is a weak entity.
2. Attribute
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined.
3. Multivalued attribute:
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It
is represented by dashed oval in an ER Diagram. For example – Person age is a derived
attribute as it changes over time and can be derived from another attribute (Date of birth).
When a single instance of an entity is associated with a single instance of another entity
then it is called one to one relationship. For example, a person has only one passport and
a passport is given to one person.
When a single instance of an entity is associated with more than one instances of another
entity then it is called one to many relationship. For example – a customer can place
many orders but a order cannot be placed by many customers.
When more than one instances of an entity is associated with a single instance of another
entity then it is called many to one relationship. For example – many students can study
in a single college but a student cannot study in many colleges at the same time.
When more than one instances of an entity is associated with more than one instances of
another entity then it is called many to many relationship. For example, a can be assigned
to many projects and a project can be assigned to many students.
A Total participation of an entity set represents that each entity in entity set must have at
least one relationship in a relationship set. For example: In the below diagram each
college must have at-least one associated Student.
Generalization is a process in which the common attributes of more than one entities
form a new entity. This newly formed entity is called generalized entity.
Generalization Example
Lets say we have two entities Student and Teacher.
Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary
These two entities have two common attributes: Name and Address, we can make a
generalized entity with these common attributes. Lets have a look at the ER model after
generalization.
DBMS Specialization
Specialization is a process in which an entity is divided into sub-entities. You can think
of it as a reverse process of generalization, in generalization two entities combine
together to form a new higher level entity. Specialization is a top-down process.
The idea behind Specialization is to find the subsets of entities that have few distinguish
attributes. For example – Consider an entity employee which can be further classified as
sub-entities Technician, Engineer & Accountant because these sub entities have some
distinguish attributes.
Specialization Example
DBMS Aggregration
Aggregation is a process in which a single entity alone is not able to make sense in a
relationship so the relationship of two entities acts as one entity. I know it sounds
confusing but don’t worry the example we will take, will clear all the doubts.
Aggregration Example
In relational model, the data and relationships are represented by collection of inter-
related tables. Each table is a group of column and rows, where column represents
attribute of an entity and rows represents records.
Sample relationship Model: Student table with 3 columns and four records.
Table: Student
123 Saurav 22
169 Lester 24
234 Lou 26
Table: Course
Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id
& Course_Name are attributes of table Course. The rows with values are the records
(commonly known as tuples).
In hierarchical model, data is organized into a tree like structure with each record is
having one parent record and many children. The main drawback of this model is that, it
can have only one to many relationships between nodes.
123 Steve 29
367 Chaitanya 27
234 Ajeet 28
Course Table:
Key plays an important role in relational database; it is used for identifying unique rows
from table. It also establishes relationship among tables.
Attribute Stu_Name alone cannot be a primary key as more than one students can have
same name.
Attribute Stu_Age alone cannot be a primary key as more than one students can have
same age.
Attribute Stu_Id alone is a primary key as each student has a unique id that can identify
the student record in the table.
Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that
case we try to find a set of attributes that can uniquely identify a row in table. We will
see the example of it after this example.
101 Steve 23
102 John 24
104 Steve 29
105 Carl 29
Consider this table ORDER, this table keeps the daily record of the purchases made by
the customer. This table has three
attributes: Customer_ID, Product_ID & Order_Quantity.
Customer_ID alone cannot be a primary key as a single customer can place more than
one order thus more than one rows of same Customer_ID value. As we see in the
following example that customer id 1011 has placed two orders with product if 9023 and
9111.
Product_ID alone cannot be a primary key as more than one customers can place a order
for the same product thus more than one rows with same product id. In the following
table, customer id 1011 & 1122 placed an order for the same product (product id 9023).
Order_Quantity alone cannot be a primary key as more more than one customers can
place the order for the same quantity.
Since none of the attributes alone were able to become a primary key, lets try to make a
set of attributes that plays the role of it.
1011 9023 10
1122 9023 15
1099 9031 20
1177 9031 18
1011 9111 50
Note: While choosing a set of attributes for a primary key, we always choose the
minimal set that has minimum number of attributes. For example, if there are two sets
that can identify row in table, the set that has minimum number of attributes should be
chosen as primary key.
Lets say we want to create the table that we have discussed above with the customer id
and product id set working as primary key. We can do that in SQL like this:
Definition of Super Key in DBMS: A super key is a set of one or more attributes
(columns), which can uniquely identify a row in a table.
Answer is simple – Candidate keys are selected from the set of super keys, the only thing
we take care while selecting candidate key is: It should not have any redundant attribute.
That’s the reason they are also termed as minimal super key.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
The following two set of super keys are chosen from the above sets as there are no
redundant attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes
that are not necessary for unique identification.
2. How we choose candidate keys from the set of super keys? We look for those keys
from which we cannot remove any fields. In the above example, we have not chosen
{Emp_SSN, Emp_Name} as candidate key because {Emp_SSN} alone can identify a
unique row in the table and Emp_Name is redundant.
Lets take an example of table “Employee”. This table has three attributes: Emp_Id,
Emp_Number & Emp_Name. Here Emp_Id & Emp_Number will be having unique
values and Emp_Name can have duplicate values as more than one employees can have
same name.
Lets select the candidate keys from the above set of super keys.
Note: A primary key is selected from the set of candidate keys. That means we can either
have Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database
administrator)
Definition: Foreign keys are the columns of a table that points to the primary key of
another table. They act as a cross-reference between tables.
For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it
points to the primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id
C01 101
C02 102
C05 102
C06 103
C07 102
Student table:
101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with the primary key tag of another
table, if it points to a unique column (not necessarily a primary key) of another table then
too, it would be a foreign key. So, a correct definition of foreign key would be: Foreign
keys are the columns of a table that points to the candidate key of another table.
Definition of Composite key: A key that has more than one attributes is known as
composite key. It is also known as compound key.
Note: Any key such as super key, primary key, candidate key etc. can be called
composite key if it has more than one attributes.
Table – Sales
Column cust_Id alone cannot become a key as a same customer can place multiple
orders, thus the same customer can have multiple entires.
Column order_Id alone cannot be a primary key as a same order can contain the order of
multiple products, thus same order_Id can be present multiple times.
Column product_code cannot be a primary key as more than one customers can place
order for the same product.
Column product_count alone cannot be a primary key because two orders can be placed
for the same product count.
Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}
As we have seen in the candidate key guide that a table can have multiple candidate keys.
Among these candidate keys, only one key gets selected as primary key, the remaining
keys are known as alternative or secondary keys.
Lets take an example to understand the alternate key concept. Here we have a table
Employee, this table has three attributes: Emp_Id, Emp_Number & Emp_Name.
Table: Employee/strong>
DBA (Database administrator) can choose any of the above key as primary key. Lets say
Emp_Id is chosen as primary key.
Since we have selected Emp_Id as primary key, the remaining key Emp_Number would
be called alternative or secondary key.
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during
an update/delete/insert into a table. In this tutorial we will learn several types of
constraints that can be created in RDBMS.
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints
NOT NULL:
NOT NULL constraint makes sure that a column does not hold NULL value. When we
don’t provide value for a particular column while inserting a record into a table, it takes
NULL value by default. By specifying NULL constraint, we can be sure that a particular
column(s) cannot have NULL values.
Example:
Here I am creating a table “STUDENTS”. I have specified NOT NULL constraint for
columns ROLL_NO, STU_NAME and STU_AGE which means you must provide the
value for these three fields while inserting/updating records in this table. It enforces these
column(s) not to accept null values.
In the above section we learnt how to specify the NULL constraint while creating a table.
However we can specify this constraint on a already present table also. For this we need
to use ALTER TABLE statement.
Syntax:
Here we are setting up the UNIQUE Constraint for two columns: STU_NAME &
STU_ADDRESS. which means these two columns cannot have duplicate values.
Note: STU_NAME column has two constraints (NOT NULL and UNIQUE both) setup.
Syntax:
Syntax:
IN MySQL:
syntax:
Syntax:
The DEFAULT constraint provides a default value to a column when there is no value
provided while inserting a record into a table. Lets see how to specify this constraint and
how it works.
Here we are creating a table “STUDENTS”, we have a requirement to set the exam fees
to 10000 if fees is not specified while inserting a record (row) into the STUDENTS table.
We can do so by using DEFAULT constraint. As you can see we have set the default
value of EXAM_FEE column to 10000 using DEFAULT constraint.
What if we want to set this constraint on a already existing table. For this we can ALTER
Table statement like this:
Syntax:
In the above sections, we have learnt the ways to set Constraint. Here we will see how to
drop (delete) a Constraint:
Syntax:
This constraint is used for specifying range of values for a particular column of a table.
When this constraint is being set on a column, it ensures that the specified column must
have the value falling in the specified range.
Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and
cannot contain nulls. In the below example the ROLL_NO field is marked as primary
key, that means the ROLL_NO field cannot have duplicate and null values.
Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.
Domain constraints:
A table is DBMS is a set of rows and columns that contain data. Columns in table have a
unique name, often referred as attributes in DBMS. A domain is a unique set of values
permitted for an attribute in a table. For example, a domain of month-of-year can accept
January, February….December as possible values, a domain of integers can accept whole
numbers that are negative, positive and zero.
Definition: Domain constraints are user defined data type and we can define them like
this:
Example:
For example I want to create a table “student_info” with “stu_id” field having value
greater than 100, I can create a domain and table like this:
Mapping Cardinality:
One to One: An entity of entity-set A can be associated with at most one entity of entity-
set B and an entity in entity-set B can be associated with at most one entity of entity-set
A.
One to Many: An entity of entity-set A can be associated with any number of entities of
entity-set B and an entity in entity-set B can be associated with at most one entity of
entity-set A.
Many to One: An entity of entity-set A can be associated with at most one entity of
entity-set B and an entity in entity-set B can be associated with any number of entities of
entity-set A.
Many to Many: An entity of entity-set A can be associated with any number of entities
of entity-set B and an entity in entity-set B can be associated with any number of entities
of entity-set A.
Example:
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute.
o The data type of domain includes string, character, integer, time, date, currency,
etc. The value of the attribute must be available in the corresponding domain.
Example:
Example:
Example:
Example:
Relational algebra is a procedural query language that works on relational model. The
purpose of a query language is to retrieve data from database or perform various
operations such as insert, update, delete on the data. When I say that relational algebra is
a procedural query language, it means that it tells what data to be retrieved and how to be
retrieved.
On the other hand relational calculus is a non-procedural query language, which means it
tells what data to be retrieved but doesn’t tell how to retrieve it. We will discuss
relational calculus.
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
Derived Operations:
Lets discuss these operations one by one with the help of examples.
If you understand little bit of SQL then you can think of it as a where clause in SQL,
which is used for the same purpose.
Table: CUSTOMER
---------------
σ Customer_City="Agra" (CUSTOMER)
Output:
Project operator is denoted by ∏ symbol and it is used to select desired columns (or
attributes) from a table (or relation).
In this example, we have a table CUSTOMER with three columns, we want to fetch only
two columns of the table, which we can do with the help of Project Operator ∏.
Table: CUSTOMER
Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Union Operator (∪)
Union operator is denoted by ∪ symbol and it is used to select all the rows (tuples) from
two tables (relations).
Lets discuss union operator a bit more. Lets say we have two relations R1 and R2 both
have same columns and we want to select all the tuples(rows) from these relations then
we can apply the union operator on these relations.
Note: The rows (tuples) that are present in both the tables will only appear once in the
union set. In short you can say that there are no duplicates present after the union
operation.
table_name1 ∪ table_name2
Union Operator (∪) Example
Table 1: COURSE
Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names present in the output even though we
had few common names in both the tables, also in the COURSE table we had the
duplicate name itself.
Lets say we have two relations R1 and R2 both have same columns and we want to select
all those tuples(rows) that are present in both the relations, then in that case we can apply
intersection operation on these two relations R1 ∩ R2.
Note: Only those rows that are present in both the tables will appear in the result set.
table_name1 ∩ table_name2
Intersection Operator (∩) Example
Student_Name
------------
Aditya
Steve
Paul
Lucy
Set Difference (-)
Set Difference is denoted by – symbol. Lets say we have two relations R1 and R2 and we
want to select all those tuples(rows) that are present in Relation R1 but not present in
Relation R2, this can be done using Set difference R1 – R2.
table_name1 - table_name2
Set Difference (-) Example
Lets take the same tables COURSE and STUDENT that we have seen above.
Query:
Lets write a query to select those student names that are present in STUDENT table but
not present in COURSE table.
Student_Name
------------
Carl
Rick
Cartesian product (X)
Cartesian Product is denoted by X symbol. Lets say we have two relations R1 and R2
then the cartesian product of these two relations (R1 X R2) would combine each tuple of
first relation R1 with the each tuple of second relation R2. I know it sounds confusing but
once we take an example of this, you will be able to understand this.
DATABASE MANAGEMENT SYSTEM Page 52
Syntax of Cartesian product (X)
R1 X R2
Cartesian product (X) Example
Table 1: R
Col_A Col_B
----- ------
AA 100
BB 200
CC 300
Table 2: S
Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.
RXS
Output:
Rename (ρ)
Lets say we have a table customer, we are fetching customer names and we are renaming
the resulted relation to CUST_NAMES.
Table: CUSTOMER
ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:
CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl
Relational calculus is a non-procedural query language that tells the system what data to
be retrieved but doesn’t tell how to retrieve it.
Tuple relational calculus is used for selecting those tuples that satisfy the given
condition.
Table: Student
Query to display the last name of those students where age is greater than 30
Last_Name
---------
Singh
Query to display all the details of students where Last name is ‘Singh’
In domain relational calculus the records are filtered based on the domains.
Again we take the same table to understand how DRC works.
Table: Student
Output:
First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28
Join Operations:
A Join operation combines related tuples from different relations, if and only if a given
join condition is satisfied. It is denoted by ⋈.
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
Result:
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
The outer join operation is an extension of the join operation. It is used to deal with
missing information.
Example:
EMPLOYEE
FACT_WORKERS
Input:
1. (EMPLOYEE ⋈ FACT_WORKERS)
Output:
Input:
1. EMPLOYEE ⟕ FACT_WORKERS
EMP_NAME STREET CITY BRANCH SALARY
Input:
1. EMPLOYEE ⟖ FACT_WORKERS
Output:
Input:
1. EMPLOYEE ⟗ FACT_WORKERS
Output:
3. Equi join:
It is also known as an inner join. It is the most common join. It is based on matched data
as per the equality condition. The equi join uses the comparison operator(=).
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input:
1. CUSTOMER ⋈ PRODUCT
1 John 1 Delhi
2 Harry 2 Mumbai
3 Harry 3 Noida
Unit -3 -SQL
SQL (Structured Query Language) is used to perform operations on the records stored in
the database such as updating records, deleting records, creating and modifying tables,
views, etc.
SQL is just a query language; it is not a database. To perform SQL queries, you need to
install any database, for example, Oracle, MySQL, MongoDB, PostGre SQL, SQL
Server, DB2, etc.
What is SQL
All DBMS like MySQL, Oracle, MS Access, Sybase, Informix, PostgreSQL, and SQL
Server use SQL as standard database language.
o With SQL, we can query our database in several ways, using English-like
statements.
o With SQL, a user can access data from a relational database management system.
o It allows the user to describe the data.
o It allows the user to define the data in the database and manipulate it when needed.
o It allows the user to create and drop database and table.
o It allows the user to create a view, stored procedure, function in a database.
o It allows the user to set permission on tables, procedures, and views.
SQL Syntax
SQL follows some unique set of rules and guidelines called syntax. Here, we are
providing all the basic SQL syntax.
o SQL is not case sensitive. Generally SQL keywords are written in uppercase.
o SQL statements are dependent on text lines. We can place a single SQL statement
on one or multiple text lines.
o You can perform most of the action in a database with SQL statements.
o SQL depends on relational algebra and tuple relational calculus.
SQL statement
SQL statements are started with any of the SQL commands/keywords like SELECT,
INSERT, UPDATE, DELETE, ALTER, DROP etc. and the statement ends with a
semicolon (;).
In this tutorial, we will use semicolon at the end of each SQL statement.
SQL Commands
Data types are used to represent the nature of the data that can be stored in the database
table. For example, in a particular column of a table, if we want to store a string type of
data then we will have to declare a string data type of this column.
Data types mainly classified into three categories for every database.
A list of data types used in MySQL database. This is based on MySQL 8.0.
BINARY(Size) It is equal to CHAR() but stores binary byte strings. Its size
parameter specifies the column length in the bytes. Default is
1.
ENUM(val1, val2, It is used when a string object having only one value, chosen
val3,...) from a list of possible values. It contains 65535 values in an
ENUM list. If you insert a value that is not in the list, a blank
value will be inserted.
INT(size) It is used for the integer value. Its signed range varies from -
2147483648 to 2147483647 and unsigned range varies from 0 to
4294967295. The size parameter specifies the max display width
that is 255.
char(n) It is a fixed width character string data type. Its size can be up to
8000 characters.
varchar(n) It is a variable width character string data type. Its size can be up to
8000 characters.
varchar(max) It is a variable width character string data types. Its size can be up
to 1,073,741,824 characters.
text It is a variable width character string data type. Its size can be up to
2GB of text data.
nchar It is a fixed width Unicode string data type. Its size can be up to
4000 characters.
nvarchar It is a variable width Unicode string data type. Its size can be up to
4000 characters.
binary(n) It is a fixed width Binary string data type. Its size can be up to 8000
bytes.
varbinary It is a variable width Binary string data type. Its size can be up to
8000 bytes.
image It is also a variable width Binary string data type. Its size can be up
to 2GB.
datetime It is used to specify date and time combination. It supports range from
datetime2 It is used to specify date and time combination. It supports range from
January 1, 0001 to December 31, 9999 with an accuracy of 100
nanoseconds
date It is used to store date only. It supports range from January 1, 0001 to
December 31, 9999
timestamp It stores a unique number when a new row gets created or modified.
The time stamp value is based upon an internal clock and does not
correspond to real time. Each table may contain only one-time stamp
variable.
Sql_variant It is used for various data types except for text, timestamp, and
ntext. It stores up to 8000 bytes of data.
DATE It is used to store a valid date-time format with a fixed length. Its
range varies from January 1, 4712 BC to December 31, 9999 AD.
BFILE It is used to store binary data in an external file. Its range goes up to
232-1 bytes or 4 GB.
CLOB It is used for single-byte character data. Its range goes up to 232-1
bytes or 4 GB.
RAW(size) It is used to specify variable length raw binary data. Its range is up to
2000 bytes per row. Its maximum size must be specified.
LONG It is used to specify variable length raw binary data. Its range up to
RAW 231-1 bytes or 2 GB, per row.
Arithmetic operators
Comparison operators
Logical operators
Operators used to negate conditions
Assume 'variable a' holds 10 and 'variable b' holds 20, then −
Show Examples
Assume 'variable a' holds 10 and 'variable b' holds 20, then −
Show Examples
Checks if the value of left operand is greater than the value of (a > b) is
>
right operand, if yes then condition becomes true. not true.
< Checks if the value of left operand is less than the value of
(a < b) is
Checks if the value of left operand is less than or equal to the (a <= b)
<=
value of right operand, if yes then condition becomes true. is true.
Checks if the value of left operand is not less than the value of (a !< b)
!<
right operand, if yes then condition becomes true. is false.
Checks if the value of left operand is not greater than the (a !> b)
!>
value of right operand, if yes then condition becomes true. is true.
ALL
1
The ALL operator is used to compare a value to all values in another value
set.
AND
2
The AND operator allows the existence of multiple conditions in an SQL
statement's WHERE clause.
ANY
3
The ANY operator is used to compare a value to any applicable value in the
list as per the condition.
4 BETWEEN
The BETWEEN operator is used to search for values that are within a set of
EXISTS
5
The EXISTS operator is used to search for the presence of a row in a
specified table that meets a certain criterion.
IN
6
The IN operator is used to compare a value to a list of literal values that
have been specified.
LIKE
7
The LIKE operator is used to compare a value to similar values using
wildcard operators.
NOT
8 The NOT operator reverses the meaning of the logical operator with which
it is used. Eg: NOT EXISTS, NOT BETWEEN, NOT IN, etc. This is a
negate operator.
OR
9
The OR operator is used to combine multiple conditions in an SQL
statement's WHERE clause.
10 IS NULL
The NULL operator is used to compare a value with a NULL value.
UNIQUE
11
The UNIQUE operator searches every row of a specified table for
uniqueness (no duplicates).
What is RDBMS?
What is a table?
The data in an RDBMS is stored in database objects which are called as tables. This
table is basically a collection of related data entries and it consists of numerous columns
and rows.
Remember, a table is the most common and simplest form of data storage in a relational
database. The following program is an example of a CUSTOMERS table −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
What is a field?
Every table is broken up into smaller entities called fields. The fields in the
CUSTOMERS table consist of ID, NAME, AGE, ADDRESS and SALARY.
A field is a column in a table that is designed to maintain specific information about
every record in the table.
A record is also called as a row of data is each individual entry that exists in a table. For
example, there are 7 records in the above CUSTOMERS table. Following is a single
row of data or record in the CUSTOMERS table −
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
+----+----------+-----+-----------+----------+
A record is a horizontal entity in a table.
What is a column?
A column is a vertical entity in a table that contains all information associated with a
specific field in a table.
DATABASE MANAGEMENT SYSTEM Page 76
For example, a column in the CUSTOMERS table is ADDRESS, which represents
location description and would be as shown below −
+-----------+
| ADDRESS |
+-----------+
| Ahmedabad |
| Delhi |
| Kota |
| Mumbai |
| Bhopal |
| MP |
| Indore |
+----+------+
A NULL value in a table is a value in a field that appears to be blank, which means a
field with a NULL value is a field with no value.
It is very important to understand that a NULL value is different than a zero value or a
field that contains spaces. A field with a NULL value is the one that has been left blank
during a record creation.
SQL Constraints
Constraints are the rules enforced on data columns on a table. These are used to limit the
type of data that can go into a table. This ensures the accuracy and reliability of the data
in the database.
Constraints can either be column level or table level. Column level constraints are
applied only to one column whereas, table level constraints are applied to the entire
table.
Following are some of the most commonly used constraints available in SQL −
NOT NULL Constraint − Ensures that a column cannot have a NULL value.
DEFAULT Constraint − Provides a default value for a column when none is
specified.
UNIQUE Constraint − Ensures that all the values in a column are different.
PRIMARY Key − Uniquely identifies each row/record in a database table.
FOREIGN Key − Uniquely identifies a row/record in any another database table.
CHECK Constraint − The CHECK constraint ensures that all values in a column
satisfy certain conditions.
INDEX − Used to create and retrieve data from the database very quickly.
Data Integrity
Database Normalization
Syntax
Example
If you want to create a new database <testDB>, then the CREATE DATABASE
statement would be as shown below −
SQL> CREATE DATABASE testDB;
Make sure you have the admin privilege before creating any database. Once a database
is created, you can check it in the list of databases as follows −
SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
DATABASE MANAGEMENT SYSTEM Page 78
| mysql |
| orig |
| test |
| testDB |
+--------------------+
7 rows in set (0.00 sec)
The SQL DROP DATABASE statement is used to drop an existing database in SQL
schema.
Syntax
Example
If you want to delete an existing database <testDB>, then the DROP DATABASE
statement would be as shown below −
SQL> DROP DATABASE testDB;
NOTE − Be careful before using this operation because by deleting an existing database
would result in loss of complete information stored in the database.
Make sure you have the admin privilege before dropping any database. Once a database
is dropped, you can check it in the list of the databases as shown below −
SQL> SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| information_schema |
| AMROOD |
| TUTORIALSPOINT |
| mysql |
| orig |
| test |
+--------------------+
6 rows in set (0.00 sec)
When you have multiple databases in your SQL Schema, then before starting your
operation, you would need to select a database where all the operations would be
performed.
The SQL USE statement is used to select any existing database in the SQL schema.
Syntax
Example
Creating a basic table involves naming the table and defining its columns and each
column's data type.
The SQL CREATE TABLE statement is used to create a new table.
Syntax
Example
The following code block is an example, which creates a CUSTOMERS table with an
ID as a primary key and NOT NULL are the constraints showing that these fields cannot
be NULL while creating records in this table −
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
You can verify if your table has been created successfully by looking at the message
displayed by the SQL server, otherwise you can use the DESC command as follows −
SQL> DESC CUSTOMERS;
+---------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------+------+-----+---------+-------+
| ID | int(11) | NO | PRI | | |
| NAME | varchar(20) | NO | | | |
| AGE | int(11) | NO | | | |
| ADDRESS | char(25) | YES | | NULL | |
| SALARY | decimal(18,2) | YES | | NULL | |
+---------+---------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Now, you have CUSTOMERS table available in your database which you can use to
store the required information related to customers.
The SQL DROP TABLE statement is used to remove a table definition and all the data,
indexes, triggers, constraints and permission specifications for that table.
NOTE − You should be very careful while using this command because once a table is
deleted then all the information available in that table will also be lost forever.
Syntax
Example
You can populate the data into a table through the select statement over another table;
provided the other table has a set of fields, which are required to populate the first table.
Here is the syntax −
INSERT INTO first_table_name [(column1, column2, ... columnN)]
SELECT column1, column2, ...columnN
FROM second_table_name
[WHERE condition];
The SQL SELECT statement is used to fetch the data from a database table which
returns this data in the form of a result table. These result tables are called result-sets.
Syntax
DATABASE MANAGEMENT SYSTEM Page 83
The basic syntax of the SELECT statement is as follows −
SELECT column1, column2, columnN FROM table_name;
Here, column1, column2... are the fields of a table whose values you want to fetch. If
you want to fetch all the fields available in the field, then you can use the following
syntax.
SELECT * FROM table_name;
Example
The SQL WHERE clause is used to specify a condition while fetching the data from a
single table or by joining with multiple tables. If the given condition is satisfied, then
only it returns a specific value from the table. You should use the WHERE clause to
filter the records and fetching only the necessary records.
The WHERE clause is not only used in the SELECT statement, but it is also used in the
UPDATE, DELETE statement, etc., which we would examine in the subsequent
chapters.
Syntax
The basic syntax of the SELECT statement with the WHERE clause is as shown below.
SELECT column1, column2, columnN
FROM table_name
WHERE [condition]
You can specify a condition using the comparison or logical operators like >, <,
=, LIKE, NOT, etc. The following examples would make this concept clear.
Example
The AND operator allows the existence of multiple conditions in an SQL statement's
WHERE clause.
Syntax
The basic syntax of the AND operator with a WHERE clause is as follows −
SELECT column1, column2, columnN
FROM table_name
WHERE [condition1] AND [condition2]...AND [conditionN];
You can combine N number of conditions using the AND operator. For an action to be
taken by the SQL statement, whether it be a transaction or a query, all conditions
separated by the AND must be TRUE.
The OR Operator
Syntax
The basic syntax of the UPDATE query with a WHERE clause is as follows −
UPDATE table_name
SET column1 = value1, column2 = value2...., columnN = valueN
WHERE [condition];
You can combine N number of conditions using the AND or the OR operators.
Example
The SQL DELETE Query is used to delete the existing records from a table.
Syntax
The basic syntax of the DELETE query with the WHERE clause is as follows −
DELETE FROM table_name
WHERE [condition];
You can combine N number of conditions using AND or OR operators.
Example
Syntax
or
or
or
or
Example
The following table has a few examples showing the WHERE part having different
LIKE clause with '%' and '_' operators −
Let us take a real example, consider the CUSTOMERS table having the records as
shown below.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would display all the records from the CUSTOMERS
table, where the SALARY starts with 200.
SQL> SELECT * FROM CUSTOMERS
WHERE SALARY LIKE '200%';
This would produce the following result −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
DATABASE MANAGEMENT SYSTEM Page 92
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
+----+----------+-----+-----------+----------+
The SQL TOP clause is used to fetch a TOP N number or X percent records from a
table.
Note − All the databases do not support the TOP clause. For example MySQL supports
the LIMIT clause to fetch limited number of records while Oracle uses
the ROWNUM command to fetch a limited number of records.
Syntax
The basic syntax of the TOP clause with a SELECT statement would be as follows.
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE [condition]
Example
The SQL ORDER BY clause is used to sort the data in ascending or descending order,
based on one or more columns. Some databases sort the query results in an ascending
order by default.
Syntax
Example
The SQL GROUP BY clause is used in collaboration with the SELECT statement to
arrange identical data into groups. This GROUP BY clause follows the WHERE clause
in a SELECT statement and precedes the ORDER BY clause.
Syntax
Example
The SQL DISTINCT keyword is used in conjunction with the SELECT statement to
eliminate all the duplicate records and fetching only unique records.
There may be a situation when you have multiple duplicate records in a table. While
fetching such records, it makes more sense to fetch only those unique records instead of
fetching duplicate records.
Syntax
The basic syntax of DISTINCT keyword to eliminate the duplicate records is as follows
−
SELECT DISTINCT column1, column2,.....columnN
FROM table_name
WHERE [condition]
Example
The SQL ORDER BY clause is used to sort the data in ascending or descending order,
based on one or more columns. Some databases sort the query results in an ascending
order by default.
Syntax
Example
Dropping Constraints
Any constraint that you have defined can be dropped using the ALTER TABLE
command with the DROP CONSTRAINT option.
For example, to drop the primary key constraint in the EMPLOYEES table, you can use
the following command.
ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;
Some implementations may provide shortcuts for dropping certain constraints. For
example, to drop the primary key constraint for a table in Oracle, you can use the
following command.
ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
Some implementations allow you to disable constraints. Instead of permanently
dropping a constraint from the database, you may want to temporarily disable the
constraint and then enable it later.
Integrity Constraints
Integrity constraints are used to ensure accuracy and consistency of the data in a
relational database. Data integrity is handled in a relational database through the concept
of referential integrity.
There are many types of integrity constraints that play a role in Referential Integrity
(RI). These constraints include Primary Key, Foreign Key, Unique Constraints and
other constraints which are mentioned above.
Dropping Constraints
Any constraint that you have defined can be dropped using the ALTER TABLE
command with the DROP CONSTRAINT option.
For example, to drop the primary key constraint in the EMPLOYEES table, you can use
the following command.
ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;
Some implementations may provide shortcuts for dropping certain constraints. For
example, to drop the primary key constraint for a table in Oracle, you can use the
following command.
ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
Some implementations allow you to disable constraints. Instead of permanently
dropping a constraint from the database, you may want to temporarily disable the
constraint and then enable it later.
Integrity Constraints
The SQL Joins clause is used to combine records from two or more tables in a database.
A JOIN is a means for combining fields from two tables by using values common to
each.
Consider the following two tables −
Table 1 − CUSTOMERS Table
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Table 2 − ORDERS Table
+-----+---------------------+-------------+--------+
|OID | DATE | CUSTOMER_ID | AMOUNT |
+-----+---------------------+-------------+--------+
| 102 | 2009-10-08 00:00:00 | 3 | 3000 |
| 100 | 2009-10-08 00:00:00 | 3 | 1500 |
| 101 | 2009-11-20 00:00:00 | 2 | 1560 |
| 103 | 2008-05-20 00:00:00 | 4 | 2060 |
+-----+---------------------+-------------+--------+
Now, let us join these two tables in our SELECT statement as shown below.
SQL> SELECT ID, NAME, AGE, AMOUNT
FROM CUSTOMERS, ORDERS
WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
This would produce the following result.
+----+----------+-----+--------+
| ID | NAME | AGE | AMOUNT |
+----+----------+-----+--------+
| 3 | kaushik | 23 | 3000 |
| 3 | kaushik | 23 | 1500 |
DATABASE MANAGEMENT SYSTEM Page
103
| 2 | Khilan | 25 | 1560 |
| 4 | Chaitali | 25 | 2060 |
+----+----------+-----+--------+
Here, it is noticeable that the join is performed in the WHERE clause. Several operators
can be used to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT;
they can all be used to join tables. However, the most common operator is the equal to
symbol.
There are different types of joins available in SQL −
INNER JOIN − returns rows when there is a match in both tables.
LEFT JOIN − returns all rows from the left table, even if there are no matches in
the right table.
RIGHT JOIN − returns all rows from the right table, even if there are no matches
in the left table.
FULL JOIN − returns rows when there is a match in one of the tables.
SELF JOIN − is used to join a table to itself as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
CARTESIAN JOIN − returns the Cartesian product of the sets of records from the
two or more joined tables.
The SQL UNION clause/operator is used to combine the results of two or more
SELECT statements without returning any duplicate rows.
To use this UNION clause, each SELECT statement must have
Syntax
UNION
The UNION ALL operator is used to combine the results of two SELECT statements
including duplicate rows.
The same rules that apply to the UNION clause will apply to the UNION ALL operator.
Syntax
The basic syntax of the UNION ALL is as follows.
SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]
UNION ALL
You can rename a table or a column temporarily by giving another name known
as Alias. The use of table aliases is to rename a table in a specific SQL statement. The
renaming is a temporary change and the actual table name does not change in the
database. The column aliases are used to rename a table's columns for the purpose of a
particular SQL query.
Syntax
Example
Indexes are special lookup tables that the database search engine can use to speed up
data retrieval. Simply put, an index is a pointer to data in a table. An index in a database
is very similar to an index in the back of a book.
For example, if you want to reference all pages in a book that discusses a certain topic,
you first refer to the index, which lists all the topics alphabetically and are then referred
to one or more specific page numbers.
The SQL ALTER TABLE command is used to add, delete or modify columns in an
existing table. You should also use the ALTER TABLE command to add and drop
various constraints on an existing table.
Syntax
The basic syntax of an ALTER TABLE command to add a New Column in an existing
table is as follows.
ALTER TABLE table_name ADD column_name datatype;
The basic syntax of an ALTER TABLE command to DROP COLUMN in an existing
table is as follows.
ALTER TABLE table_name DROP COLUMN column_name;
The basic syntax of an ALTER TABLE command to change the DATA TYPE of a
column in a table is as follows.
ALTER TABLE table_name MODIFY COLUMN column_name datatype;
The basic syntax of an ALTER TABLE command to add a NOT NULL constraint to a
column in a table is as follows.
ALTER TABLE table_name MODIFY column_name datatype NOT NULL;
The basic syntax of ALTER TABLE to ADD UNIQUE CONSTRAINT to a table is as
follows.
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint UNIQUE(column1, column2...);
The basic syntax of an ALTER TABLE command to ADD CHECK CONSTRAINT to
a table is as follows.
DATABASE MANAGEMENT SYSTEM Page
112
ALTER TABLE table_name
ADD CONSTRAINT MyUniqueConstraint CHECK (CONDITION);
The basic syntax of an ALTER TABLE command to ADD PRIMARY KEY constraint
to a table is as follows.
ALTER TABLE table_name
ADD CONSTRAINT MyPrimaryKey PRIMARY KEY (column1, column2...);
The basic syntax of an ALTER TABLE command to DROP CONSTRAINT from a
table is as follows.
ALTER TABLE table_name
DROP CONSTRAINT MyUniqueConstraint;
If you're using MySQL, the code is as follows −
ALTER TABLE table_name
DROP INDEX MyUniqueConstraint;
The basic syntax of an ALTER TABLE command to DROP PRIMARY
KEY constraint from a table is as follows.
ALTER TABLE table_name
DROP CONSTRAINT MyPrimaryKey;
If you're using MySQL, the code is as follows −
ALTER TABLE table_name
DROP PRIMARY KEY;
Example
The SQL TRUNCATE TABLE command is used to delete complete data from an
existing table.
You can also use DROP TABLE command to delete complete table but it would remove
complete table structure form the database and you would need to re-create this table
once again if you wish you store some data.
Syntax
Example
A view is nothing more than a SQL statement that is stored in the database with an
associated name. A view is actually a composition of a table in the form of a predefined
SQL query.
A view can contain all rows of a table or select rows from a table. A view can be created
from one or many tables which depends on the written SQL query to create a view.
Views, which are a type of virtual tables allow users to do the following −
Structure data in a way that users or classes of users find natural or intuitive.
Restrict access to the data in such a way that a user can see and (sometimes)
modify exactly what they need and no more.
Summarize data from various tables which can be used to generate reports.
Creating Views
Database views are created using the CREATE VIEW statement. Views can be created
from a single table, multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to the
specific implementation.
The basic CREATE VIEW syntax is as follows −
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way as you use
them in a normal SQL SELECT query.
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
DATABASE MANAGEMENT SYSTEM Page
115
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example to create a view from the CUSTOMERS table. This view
would be used to have customer name and age from the CUSTOMERS table.
SQL > CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual
table. Following is an example for the same.
SQL > SELECT * FROM CUSTOMERS_VIEW;
This would produce the following result.
+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
| kaushik | 23 |
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+
The WITH CHECK OPTION is a CREATE VIEW statement option. The purpose of the
WITH CHECK OPTION is to ensure that all UPDATE and INSERTs satisfy the
condition(s) in the view definition.
If they do not satisfy the condition(s), the UPDATE or INSERT returns an error.
The following code block has an example of creating same view CUSTOMERS_VIEW
with the WITH CHECK OPTION.
CREATE VIEW CUSTOMERS_VIEW AS
SELECT name, age
FROM CUSTOMERS
WHERE age IS NOT NULL
WITH CHECK OPTION;
The HAVING Clause enables you to specify conditions that filter which group results
appear in the results.
The WHERE clause places conditions on the selected columns, whereas the HAVING
clause places conditions on groups created by the GROUP BY clause.
Syntax
The following code block shows the position of the HAVING Clause in a query.
SELECT
FROM
WHERE
DATABASE MANAGEMENT SYSTEM Page
118
GROUP BY
HAVING
ORDER BY
The HAVING clause must follow the GROUP BY clause in a query and must also
precede the ORDER BY clause if used. The following code block has the syntax of the
SELECT statement including the HAVING clause −
SELECT column1, column2
FROM table1, table2
WHERE [ conditions ]
GROUP BY column1, column2
HAVING [ conditions ]
ORDER BY column1, column2
Example
We have already discussed about the SQL LIKE operator, which is used to compare a
value to similar values using the wildcard operators.
SQL supports two wildcard operators in conjunction with the LIKE operator which are
explained in detail in the following table.
DATABASE MANAGEMENT SYSTEM Page
119
Sr.No. Wildcard & Description
The percent sign represents zero, one or multiple characters. The underscore represents
a single number or a character. These symbols can be used in combinations.
Syntax
or
or
or
or
Example
DATABASE MANAGEMENT SYSTEM Page
120
The following table has a number of examples showing the WHERE part having
different LIKE clauses with '%' and '_' operators.
Let us take a real example, consider the CUSTOMERS table having the following
records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
There are RDBMS, which support temporary tables. Temporary Tables are a great
feature that lets you store and process intermediate results by using the same
selection, update, and join capabilities that you can use with typical SQL Server tables.
The temporary tables could be very useful in some cases to keep temporary data. The
most important thing that should be known for temporary tables is that they will be
deleted when the current client session terminates.
Temporary tables are available in MySQL version 3.23 onwards. If you use an older
version of MySQL than 3.23, you can't use temporary tables, but you can use heap
tables.
As stated earlier, temporary tables will only last as long as the session is alive. If you
run the code in a PHP script, the temporary table will be destroyed automatically when
the script finishes executing. If you are connected to the MySQL database server
through the MySQL client program, then the temporary table will exist until you close
the client or manually destroy the table.
Example
Here is an example showing you the usage of a temporary table.
mysql> CREATE TEMPORARY TABLE SALESSUMMARY (
-> product_name VARCHAR(50) NOT NULL
-> , total_sales DECIMAL(12,2) NOT NULL DEFAULT 0.00
-> , avg_unit_price DECIMAL(7,2) NOT NULL DEFAULT 0.00
-> , total_units_sold INT UNSIGNED NOT NULL DEFAULT 0
);
Query OK, 0 rows affected (0.00 sec)
By default, all the temporary tables are deleted by MySQL when your database
connection gets terminated. Still if you want to delete them in between, then you can do
so by issuing a DROP TABLE command.
Following is an example on dropping a temporary table.
mysql> CREATE TEMPORARY TABLE SALESSUMMARY (
-> product_name VARCHAR(50) NOT NULL
-> , total_sales DECIMAL(12,2) NOT NULL DEFAULT 0.00
-> , avg_unit_price DECIMAL(7,2) NOT NULL DEFAULT 0.00
-> , total_units_sold INT UNSIGNED NOT NULL DEFAULT 0
);
Query OK, 0 rows affected (0.00 sec)
A Subquery or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause.
Subqueries are most frequently used with the SELECT statement. The basic syntax is as
follows −
SELECT column_name [, column_name ]
FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
DATABASE MANAGEMENT SYSTEM Page
124
Now, let us check the following subquery with a SELECT statement.
SQL> SELECT *
FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS
WHERE SALARY > 4500) ;
This would produce the following result.
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
Subqueries also can be used with INSERT statements. The INSERT statement uses the
data returned from the subquery to insert into another table. The selected data in the
subquery can be modified with any of the character, date or number functions.
The basic syntax is as follows.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]
Example
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table.
Now to copy the complete CUSTOMERS table into the CUSTOMERS_BKP table, you
can use the following syntax.
SQL> INSERT INTO CUSTOMERS_BKP
SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID
FROM CUSTOMERS) ;
The subquery can be used in conjunction with the UPDATE statement. Either single or
multiple columns in a table can be updated when using a subquery with the UPDATE
statement.
The basic syntax is as follows.
UPDATE table
SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
DATABASE MANAGEMENT SYSTEM Page
125
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have CUSTOMERS_BKP table available which is backup of
CUSTOMERS table. The following example updates SALARY by 0.25 times in the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
SQL> UPDATE CUSTOMERS
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally CUSTOMERS table would have the following
records.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The subquery can be used in conjunction with the DELETE statement like with any
other statements mentioned above.
The basic syntax is as follows.
DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
Example
Assuming, we have a CUSTOMERS_BKP table available which is a backup of the
CUSTOMERS table. The following example deletes the records from the
CUSTOMERS table for all the customers whose AGE is greater than or equal to 27.
SQL> DELETE FROM CUSTOMERS
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally the CUSTOMERS table would have the
following records.
DATABASE MANAGEMENT SYSTEM Page
126
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
There may be a situation when you have multiple duplicate records in a table. While
fetching such records, it makes more sense to fetch only unique records instead of
fetching duplicate records.
The SQL DISTINCT keyword, which we have already discussed is used in conjunction
with the SELECT statement to eliminate all the duplicate records and by fetching only
the unique records.
Syntax
Example
The attributes of a table is said to be dependent on each other when an attribute of a table
uniquely identifies another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name,
Stu_Age. Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student
table because if we know the student id we can tell the student name associated with it.
This is known as functional dependency and can be written as Stu_Id->Stu_Name or in
words we can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can
represented as A->B (Attribute B is functionally dependent on attribute A)
For example: Consider a table with two columns Student_id and Student_Name.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial
dependencies too.
If a functional dependency X->Y holds true where Y is not a subset of X then this
dependency is called non trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
Multivalued dependency occurs when there are more than one independent multivalued
attributes in a table.
For example: Consider a bike manufacture company, which produces two colors (Black
and white) in each model every year.
Here columns manuf_year and color are independent of each other and dependent on
bike_model. In this case these two columns are said to be multivalued dependent on
bike_model. These dependencies can be represented like this:
Note: A transitive dependency can only occur in a relation of three of more attributes.
This dependency helps us normalizing the database in 3NF (3rd Normal Form).
{Book} ->{Author} (if we know the book, we knows the author name)
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should
hold, that makes sense because if we know the book name we can know the author’s age.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
The above table is not normalized. We will see the problems that we face when a table is
not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If somehow,
the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would
lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.
Normalization
Here are the most commonly used normal forms:
Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:
8812121212
102 Jon Kanpur
9900012222
Two employees (Jon & Lester) are having two mobile numbers so the company stored
them in the same field as you can see in the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic
(single) values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they
teach. They create a table that looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-
prime attribute is dependent on the proper subset of any candidate key of the table”.
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for
each functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Suppose a company wants to store the complete address of each employee,
they create a table named employee_details that looks like this:
To make this table complies with 3NF we have to break the table into two tables to
remove the transitive dependency:
employee table:
employee_zip table:
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter
than 3NF. A table complies with BCNF if it is in 3NF and for every functional
dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one
department. They store the data like this:
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept_mapping table:
emp_id emp_dept
1001 stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example
SUBJECT LECTURER SEMESTER
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 &
P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Types of Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Employee ⋈ Department
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC).
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of
each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on
a third attribute that's why it always requires at least three attributes.
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
Join Dependency
o Join decomposition is a further generalization of Multivalued dependencies.
o If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists.
o Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given
relations R (A, B, C, D).
o Alternatively, R1 and R2 are a lossless decomposition of R.
o A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a
lossless-join decomposition.
o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to
the relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD
of R.
DATABASE MANAGEMENT SYSTEM Page
149
PL/SQL Trigger
Triggers are stored programs, which are automatically executed or fired when some
event occurs.
Triggers could be defined on the table, view, schema, or database with which the event is
associated.
Advantages of Triggers
Creating a trigger:
Here,
After the execution of the above code at SQL Prompt, it produces the following result.
Trigger created.
1. DECLARE
2. total_rows number(2);
3. BEGIN
4. UPDATE customers
5. SET salary = salary + 5000;
6. IF sql%notfound THEN
7. dbms_output.put_line('no customers updated');
8. ELSIF sql%found THEN
9. total_rows := sql%rowcount;
10. dbms_output.put_line( total_rows || ' customers updated ');
11. END IF;
12. END;
13. /
Output:
Note: As many times you executed this code, the old and new both salary is incremented
by 5000 and hence the salary difference is always 5000.
After the execution of above code again, you will get the following result.
Important Points
Following are the two very important point and should be noted carefully.
o OLD and NEW references are used for record level triggers these are not avialable
for table level triggers.
o If you want to query the table in the same trigger, then you should use the AFTER
keyword, because triggers can query the table or change it again only after the
initial changes are applied and the table is back in a consistent state.
Let's take a simple example to demonstrate the trigger. In this example, we are using the
following CUSTOMERS table:
Create trigger:
Let's take a program to create a row level trigger for the CUSTOMERS table that would
fire for INSERT or UPDATE or DELETE operations performed on the CUSTOMERS
table. This trigger will display the salary difference between the old values and new
values:
.
UNIT-IV
TRANSACTION MANAGEMENT
What is a Transaction?
A transaction is an event which occurs on the database. Generally a transaction reads
a value from the database or writes a value to the database. If you have any concept of
Operating Systems, then we can say that a transaction is analogous to processes.
Although a transaction can both read and write on the database, there are some
fundamental differences between these two classes of operations. A read operation does
not change the image of the database in any way. But a write operation, whether
performed with the intention of inserting, updating or deleting data from the database,
Atomicity: This means that either all of the instructions within the transaction will be
reflected in the database, or none of them will be reflected.
Say for example, we have two accounts A and B, each containing Rs 1000/-. We now
start a transaction to deposit Rs 100/- from account A to Account B.
Read A;
A = A – 100;
Write A; Read
B;
B = B + 100;
Write B;
Now, suppose there is a power failure just after instruction 3 (Write A) has been
complete. What happens now? After the system recovers the AFIM will show Rs 900/-
in A, but the same Rs 1000/- in B. It would be said that Rs 100/- evaporated in thin air
for the power failure. Clearly such a situation is not acceptable.
The solution is to keep every value calculated by the instruction of the transaction not in
any stable storage (hard disc) but in a volatile storage (RAM), until the transaction
completes its last instruction. When we see that there has not been any error we do
something known as a COMMIT operation. Its job is to write every temporarily
calculated value from the volatile storage on to the stable storage. In this way, even if
power fails at instruction 3, the post recovery image of the database will show accounts
A and B both containing Rs 1000/-, as if the failed transaction had never occurred.
To give better performance, every database management system supports the execution
of multiple transactions at the same time, using CPU Time Sharing. Concurrently
executing transactions may have to deal with the problem of sharable resources, i.e.
resources that multiple transactions are trying to read/write at the same time. For
example, we may have a table or a record on which two transaction are trying to read or
write at the same time. Careful mechanisms are created in order to prevent
mismanagement of these sharable resources, so that there should not be any change in
the way a transaction performs. A transaction which deposits Rs 100/- to account A
must deposit the same amount whether it is acting alone or in conjunction with another
transaction that may be trying to deposit or withdraw some amount at the same time.
Isolation: In case multiple transactions are executing concurrently and trying to access
DATABASE MANAGEMENT SYSTEM Page
156
a sharable resource at the same time, the system should create an ordering in their
execution so that they should not create any anomaly in the value stored at the sharable
resource.
Durability: It states that once a transaction has been complete the changes it has made
should be permanent.
There are several ways Atomicity and Durability can be implemented. One of them is
called Shadow Copy. In this scheme a database pointer is used to point to the BFIM of
the database. During the transaction, all the temporary changes are recorded into a
Shadow Copy, which is an exact copy of the original database plus the changes made
by the transaction, which is the AFIM. Now, if the transaction is required to COMMIT,
then the database pointer is updated to point to the AFIM copy, and the BFIM copy is
discarded. On the other hand, if the transaction is not committed, then the database
pointer is not updated. It keeps pointing to the BFIM, and the AFIM is discarded. This
is a simple scheme, but takes a lot of memory space and time to implement.
If you study carefully, you can understand that Atomicity and Durability is
essentially the same thing, just as Consistency and Isolation is essentially the same
thing.
DATABASE MANAGEMENT SYSTEM Page
158
DATABASE MANAGEMENT SYSTEM Page
159
Transaction States
There are the following six states in which a transaction may exist:
Active: The initial state when the transaction has just started execution.
Failed: If the transaction fails for some reason. The temporary values are no longer
required, and the transaction is set to ROLLBACK. It means that any change made to
the database by this transaction up to the point of the failure must be undone. If the
failed transaction has withdrawn Rs. 100/- from account A, then the ROLLBACK
operation should add Rs 100/- to account A.
Aborted: When the ROLLBACK operation is over, the database reaches the BFIM.
The transaction is now said to have been aborted.
Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All
the temporary values are written to the stable storage and the transaction is said to have
been committed.
Terminated: Either committed or aborted, the transaction finally reaches this state.
COMMITTED
PARTIALLY
COMMITTED
Entry Point
ACTIVE
TERMINATED
FAILED ABORTED
DATABASE MANAGEMENT SYSTEM Page
160
DATABASE MANAGEMENT SYSTEM Page
161
Concurrent Execution
A schedule is a collection of many transactions which is implemented as a unit.
Depending upon how these transactions are arranged in within a schedule, a
schedule can be of two types:
Serial: The transactions are executed one after another, in a non-preemptive
manner.
Concurrent: The transactions are executed in a preemptive, time shared
method.
In Serial schedule, there is no question of sharing a single data item among many
transactions, because not more than a single transaction is executing at any point of
time. However, a serial schedule is inefficient in the sense that the transactions suffer
for having a longer waiting time and response time, as well as low amount of resource
utilization.
In concurrent schedule, CPU time is shared among two or more transactions in order to
run them concurrently. However, this creates the possibility that more than one
transaction may need to access a single data item for read/write purpose and the
database could contain inconsistent value if such accesses are not handled properly. Let
us explain with the help of an example.
Let us consider there are two transactions T1 and T2, whose instruction sets are given
as following. T1 is the same as we have seen earlier, while T2 is a new transaction.
T1
Read A;
A = A – 100;
Write A; Read
B;
B = B + 100;
Write B;
T2
Read A;
DATABASE MANAGEMENT SYSTEM Page
162
Temp = A * 0.1; Read
C;
C = C + Temp;
Write C;
If we prepare a serial schedule, then either T1 will completely finish before T2 can
begin, or T2 will completely finish before T1 can begin. However, if we want to create
a concurrent schedule, then some Context Switching need to be made, so that some
portion of T1 will be executed, then some portion of T2 will be executed and so on. For
example say we have prepared the following concurrent schedule.
T1 T2
Read A;
A = A – 100;
Write A;
Read A;
Temp = A * 0.1; Read
C;
C = C + Temp;
Write C;
Read B;
B = B + 100;
Write B;
No problem here. We have made some Context Switching in this Schedule, the first one
after executing the third instruction of T1, and after executing the last statement of T2.
T1 first deducts Rs 100/- from A and writes the new value of Rs 900/- into A. T2 reads
the value of A, calculates the value of Temp to be Rs 90/- and adds the value to C. The
remaining part of T1 is executed and Rs 100/- is added to B.
It is clear that a proper Context Switching is very important in order to maintain the
Consistency and Isolation properties of the transactions. But let us take another
example where a wrong Context Switching can bring about disaster. Consider the
following example involving the same T1 and T2
Read A;
A = A – 100;
Read A;
Temp = A * 0.1; Read
C;
C = C + Temp;
Write C;
Write A; Read
B;
B = B + 100;
Write B;
This schedule is wrong, because we have made the switching at the second
instruction of T1. The result is very confusing. If we consider accounts A and B both
containing Rs 1000/- each, then the result of this schedule should have left Rs 900/-
in A, Rs 1100/- in B and add Rs 90 in C (as C should be increased by 10% of the
amount in A). But in this wrong schedule, the Context Switching is being performed
before the new value of Rs 900/- has been updated in A. T2 reads the old value of A,
which is still Rs 1000/-, and deposits Rs 100/- in C. C makes an unjust gain of Rs
10/- out of nowhere.
Serializability
When several concurrent transactions are trying to access the same data item, the
View Serializability:
This is another type of serializability that can be derived by creating another
schedule out of an existing schedule, involving the same set of transactions. These
two schedules would be called View Serializable if the following rules are followed
while creating the second schedule out of the first. Let us consider that the
DATABASE MANAGEMENT SYSTEMS Page 168
transactions T1 and T2 are being serialized to create two different schedules
Let us consider a schedule S in which there are two consecutive instructions, I and J
, of transactions Ti and Tj , respectively (i _= j). If I and J refer to different data
items, then we can swap I and J without affecting the results of any instruction
in the schedule. However, if I and J refer to the same data item Q, then the order of
the two steps may matter. Since we are dealing with only read and write
instructions, there are four cases that we need to consider:
□ I = read(Q), J = read(Q). The order of I and J does not matter, since the
same value of Q is read by Ti and Tj , regardless of the order.
□ I = read(Q), J = write(Q). If I comes before J , then Ti does not read the value
of Q that is written by Tj in instruction J . If J comes before I, then Ti reads
the value of Q that is written by Tj. Thus, the order of I and J matters.
Transaction Characteristics
Every transaction has three characteristics: access mode, diagnostics size, and
isolation level. The diagnostics size determines the number of error conditions that
can be recorded.
If the access mode is READ ONLY, the transaction is not allowed to modify
the database. Thus, INSERT, DELETE, UPDATE, and CREATE commands cannot
be executed. If we have to execute one of these commands, the access mode should
be set to READ WRITE. For transactions with READ ONLY access mode, only
shared locks need to be obtained, thereby increasing concurrency.
The isolation level controls the extent to which a given transaction is exposed to the
actions of other transactions executing concurrently. By choosing one of four
possible isolation level settings, a user can obtain greater concurrency at the cost of
increasing the transaction's exposure to other transactions' uncommitted changes.
REPEATABLE READ ensures that T reads only the changes made by committed
transactions, and that no value read or written by T is changed by any other
transaction until T is complete. However, T could experience the phantom
phenomenon; for example, while T examines all
READ COMMITTED ensures that T reads only the changes made by committed
transactions, and that no value written by T is changed by any other transaction
until T is complete. However, a value read by T may well be modified by another
transaction while T is still in progress, and T is, of course, exposed to the phantom
problem.
PRECEDENCE GRAPH
Precedence graph
example
For example, a serializability order for the schedule (a) would be one of either (b) or
(c)
RECOVERABLE SCHEDULES
□ Recoverable schedule — if a transaction Tj reads a data item previously written
by a transaction Ti , then the commit operation of Ti must appear before the
commit operation of Tj.
□ The following schedule is not recoverable if T9 commits immediately after the
read(A) operation.
□ If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence, database must ensure that schedules are
recoverable.
CASCADING ROLLBACKS
CASCADELESS SCHEDULES
CONCURRENCY SCHEDULE
□ A database must provide a mechanism that will ensure that all possible
schedules are both:
□ Conflict serializable.
□ A policy in which only one transaction can execute at a time generates serial
DATABASE MANAGEMENT SYSTEMS Page 182
schedules, but provides a poor degree of concurrency
□ Testing a schedule for serializability after it has executed is a little too late!
□ Read committed — only committed records can be read, but successive reads of
record may return different (but committed) values.
RECOVERY SYSTEM
Failure Classification:
□ Transaction failure :
□ System errors: the database system must terminate an active transaction due to an
error condition (e.g., deadlock)
□ System crash: a power failure or other hardware or software failure causes the
system to crash.
□ Disk failure: a head crash or similar disk failure destroys all or part of disk
storage
RECOVERY ALGORITHMS
□ A failure may occur after one of these modifications have been made but before
both of them are made.
□ Modifying the database without ensuring that the transaction will commit may
leave the database in an inconsistent state
□ Not modifying the database may result in lost updates if failure occurs just
after transaction commits
2. Actions taken after a failure to recover the database contents to a state that
ensures atomicity, consistency and durability
STORAGE STRUCTURE
□ Volatile storage:
□ Nonvolatile storage:
□ Stable storage:
□ Successful completion
□ Protecting storage media from failure during data transfer (one solution):
2. When the first write successfully completes, write the same information onto
the second physical block.
3. The output is completed only after the second write successfully completes.
□ Copies of a block may differ due to failure during output operation. To recover
from failure:
2. Better solution:
DATA ACCESS
□ System buffer blocks are the blocks residing temporarily in main memory.
□ Block movements between disk and main memory are initiated through the
following two operations:
□ output(B) transfers the buffer block B to the disk, and replaces the appropriate
physical block there.
□ We assume, for simplicity, that each data item fits in, and is stored inside, a
single block.
□ Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.
□ Transferring data items between system buffer blocks and its private work-area
done by:
□ read(X) assigns the value of data item X to the local variable xi.
□ write(X) assigns the value of local variable xi to data item {X} in the buffer
block.
□ Transactions
□ Must perform read(X) before accessing X for the first time (subsequent reads
can be from local copy)
□ Note that output(BX) need not immediately follow write(X). System can
perform the output operation when it seems fit.
TIMESTAMP-BASED PROTOCOLS
3. Anytransaction Tj with TS(Tj ) > TS(T28) must read the value of Q written by
T28, rather than the value that T27 is attempting to write. This observation leads to
a modified version of the timestamp-ordering protocol in which obsolete write
operations can be ignored under certain circumstances. The protocol rules for read
operations remain unchanged. The protocol rules for write operations, however,
are slightly different from the timestamp- ordering protocol.
VALIDATION-BASED PROTOCOLS
Phases in Validation-Based Protocols
In MGL, locks are set on objects that contain other objects. MGL exploits the
hierarchical nature of the contains relationship. For example, a database may have
files, which contain pages, which further contain records. This can be thought of as a
tree of objects, where each node contains its children. A lock on such as a shared or
exclusive lock locks the targeted node as well as all of its descendants.
RECOVERY:
□ When the system recovers from failure, it can restore the latest dump.
□ It can maintain redo-list and undo-list as in checkpoints.
□ It can recover the system by consulting undo-redo lists to restore the state of all
transaction up to last checkpoint.
REMOTE BACKUP
Remote backup provides a sense of security and safety in case the primary location
where the database is located gets destroyed. Remote backup can be offline or real-
time and online. In case it is offline it is maintained manually.
FUZZY CHECKPOINTING
a.To avoid long interruption of normal processing during checkpointing, allow
updates to happen during checkpointing
b.Fuzzy checkpointing is done as follows:
1. Temporarily stop all updates by transactions
2. Write a <checkpoint L> log record and force log to stable storage
3. Note list M of modified buffer blocks
4. Now permit transactions to proceed with their actions
5. Output to disk all modified buffer blocks in list M
H blocks should not be updated while being output
H Follow WAL: all log records pertaining to a block must be output before the
block is output
6. Store a pointer to the checkpoint record in a fixed position
last_checkpoint on disk
Heap File Organization: When a file is created using Heap File Organization
mechanism, the Operating Systems allocates memory area to that file without any
further accounting details. File records can be placed anywhere in that memory area.
□ Sequential File Organization: Every file record contains a data field (attribute) to
uniquely identify that record. In sequential file organization mechanism, records
are placed in the file in the some sequential order based on the unique key field or
search key. Practically, it is not possible to store all the records sequentially in
physical form.
DBMS INDEXING
We know that information in the DBMS files is stored in form of records.
Every record is equipped with some key field, which helps it to be recognized
uniquely.
Indexing is defined based on its indexing attributes. Indexing can be one of the
following types:
□ Primary Index: If index is built on ordering 'key-field' of file it is called
Primary Index. Generally it is the primary key of the relation.
□ Secondary Index: If index is built on non-ordering field of file it is called
Secondary Index.
□ Clustering Index: If index is built on ordering non-key field of file it is called
Clustering Index.
Ordering field is the field on which the records of file are ordered. It can be
different from primary or candidate key of a file.
Ordered Indexing is of two types:
□ Dense Index
□ Sparse Index
Dense Index
In dense index, there is an index record for every search key value in the database.
Sparse Index
In sparse index, index records are not created for every search key. An index
record here contains search key and actual pointer to the data on the disk. To search
a record we first proceed by index record and reach at the actual location of the data.
Multi-level Index helps breaking down the index into several smaller indices in
order to make the outer most level so small that it can be saved in single disk
block which can easily be accommodated anywhere in the main memory.
B+ TREE
B<sup+< sup=""> tree is multi-level index format, which is balanced binary
search trees. As mentioned earlier single level index records becomes large as the
database size grows, which also degrades performance.</sup+<> All leaf nodes
of B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at
the same height, thus balanced. Additionally, all leaf nodes are linked using link
list, which makes B+ tree to support random access as well as sequential access.
STRUCTURE OF B+ TREE
DATABASE MANAGEMENT SYSTEMS Page 218
Every leaf node is at equal distance from the root node. A B+ tree is of order n
where n is fixed for every B+ tree.
B+ tree deletion
□ B+ tree entries are deleted leaf nodes.
□ The target entry is searched and deleted.
o If it is in internal node, delete and replace with the entry from the left position.
□ After deletion underflow is
tested o If underflow occurs
□ Distribute entries from nodes
left to it. o If distribution from left is
not possible
□ Distribute from nodes right to it
o If distribution from left and right is not possible
□ Merge the node with left and right to it.
DATABASE MANAGEMENT SYSTEMS Page 220
DBMS HASHING
For a huge database structure it is not sometime feasible to search index through all
its level and then reach the destination data block to retrieve the desired data.
Hashing is an effective technique to calculate direct location of data record on the
disk without using index structure. Hash Organization
□ Bucket: Hash file stores data in bucket format. Bucket is considered a unit of
storage. Bucket typically stores one complete disk block, which in turn can store
one or more records.
Operation:
□ Insertion: When a record is required to be entered using static hash, the hash
function h, computes the bucket address for search key K, where the record will be
stored.
Bucket Overflow:
The condition of bucket-overflow is known as collision. This is a fatal state for
any static hash function. In this case overflow chaining can be used.
□ Overflow Chaining: When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.
Dynamic Hashing
Problem with static hashing is that it does not expand or shrink dynamically as the
size of database grows or shrinks. Dynamic hashing provides a mechanism in which
data buckets are added and removed dynamically and on-demand. Dynamic hashing
is also known as extended hashing.
Operation
□ Querying: Look at the depth value of hash index and use those bits to compute
the bucket address.
□ Update: Perform a query as above and update data.
□ Deletion: Perform a query to locate desired data and delete data.
□ Insertion: compute the address of bucket o If the bucket is already full
□ Add more buckets
□ Add additional bit to hash value
□ Re-compute the hash function o Else
□ Add data to the bucket o If all buckets are full, perform the remedies of static
hashing. Hashing is not favorable when the data is organized in some ordering
and queries require range of data. When data is discrete and random, hash
DATABASE MANAGEMENT SYSTEMS Page 224
performs the best. Hashing algorithm and implementation have high complexity
than indexing. All hash operations are done in constant time.
QUERY OPTIMIZATION
Query Optimization works in a similar way:
QUERY FLOW
□ Query Parser – Verify validity of the SQL statement. Translate query into an
DATABASE MANAGEMENT SYSTEMS Page 226
internal structure using relational calculus.
□ Query Optimizer – Find the best expression from various different algebraic
expressions. Criteria used is ‘Cheapness’
□ Code Generator/Interpreter – Make calls for the Query processor as a result of the
work done by the optimizer.
□ Query Processor – Execute the calls obtained from the code generator.
We use the number of block transfers from disk and the number of disk seeks to
estimate the cost of a query-evaluation plan. If the disk subsystem takes an
average of tT seconds to transfer a block of data, and has an average block-
access time (disk seek time plus rotational latency) of tSseconds, then an
operation that transfers b blocks and performs S seeks would take b ∗ tT +
S ∗ tSseconds. The values of tT and tS must be calibrated for the disk system
used, but typical values for high-end disks today would be tS = 4 milliseconds and
tT = 0.1 milliseconds, assuming a 4-kilobyte block size and a transfer rate of 40
megabytes per second.