0% found this document useful (0 votes)
16 views129 pages

DBMS Lecture Notes

Uploaded by

josraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views129 pages

DBMS Lecture Notes

Uploaded by

josraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

CS630203 - DATABASE MANAGEMENT SYSTEMS

Course Category: Course Type: L T P C


Programme Core Theory 3 0 0 3
COURSE OBJECTIVES:
 To study the basic organization of a Database Management System.
 To study about Relational Database Management Structure
 To study about the design issues of a Database
 To study about transaction management of the database
 To study about various implementation techniques
UNIT 1: INTRODUCTION 9
Purpose of Database System - Files versus database systems – View of Data – Database
Language - Database Architecture – Database users and administrators – History of
Database System - E-R model – Constraints- E-R Diagram
UNIT 2: RELATIONAL MODEL 9
Relational Model – Structure of Relational Databases – Relational Algebra Operations –
Null Values – Modification of Relational Databases- SQL – Advanced SQL- Integrity
Constraints – Authorization – Embedded SQL – Dynamic SQL- The Tuple Relational
Calculus – The Domain Relational Calculus - QBE – Triggers.
UNIT 3: DATABASE DESIGN 9
Functional Dependencies – Non-loss Decomposition – Functional Dependencies – First,
Second, Third Normal Forms, Dependency Preservation – Boyce/Codd Normal Form-
Multi-valued Dependencies and Fourth Normal Form – Join Dependencies and Fifth
Normal Form
UNIT 4: TRANSACTIONS 9
Transaction Concepts - Transaction Recovery – ACID Properties – System Recovery –
Media Recovery – Two Phase Commit - Save Points – SQL Facilities for recovery –
Concurrency – Need for Concurrency – Locking Protocols – Two Phase Locking – Intent
Locking – Deadlock- Serializability – Recovery Isolation Levels – SQL Facilities for
Concurrency.
UNIT 5: IMPLEMENTATION TECHNIQUES 9
Physical Storage Media – Magnetic Disks – RAID – Tertiary storage – File Organization –
Organization of Records in Files – Indexing and Hashing –Ordered Indices – B+ tree Index
Files – B tree Index Files – Static and Dynamic Hashing – Query Processing Overview –
Catalog Information for Cost Estimation – Selection Operation – Sorting – Join Operation
– Web Technology and DBMS – Web as a Database Application Platform

TOTAL: 45 PERIODS

COURSE OUTCOMES: At the end of the course, the student will be able to,
CO1: Understand the major objectives of database technology
CO2: Understand the relational model for databases
CO3: Design issues of Database
CO4: Identify the problems in Transaction
CO5:Analyze the issues involved in Implementation
CO-PO MAPPING
CO1 1 2 2 1 2
CO2 2 2 2
CO3 1 2 1 1 1
CO4 2 1 2 2
CO5 1 2 2 2 1
CO5 1 2 2 2 1
1- low, 2 - medium, 3 - high, ‘-' no correlation
TEXT BOOKS:
1.Abraham Silberschatz, Henry F. Korth, S. Sudharshan, “Database System Concepts”,
Sixth Edition, Tata McGraw Hill, 2011 (Unit I and Unit-V).
2.C.J.Date, A.Kannan, S.Swamynathan, “An Introduction to Database Systems”, Eighth
Edition, Pearson Education, 2006.( Unit II, III and IV)
3. Data base Management Systems, Raghurama Krishnan, Johannes Gehrke, TATA
McGrawHill 3rd Edition.
REFERENCE BOOKS:
1.M.T. RamezElmasri, Shamkant B. Navathe, “Fundamentals of Database Systems”,
Fourth Edition , Pearson / Addisionwesley, 2007.
2.Raghu Ramakrishnan, “Database Management Systems”, Third Edition, McGraw Hill,
2003., “Programming In C++”, PHI Pvt. Ltd., 2008
3.Fundamentals of Database Systems, Elmasri Navathe Pearson Education.
WEB RESOURCES:
1.https://www.inmotionhosting.com/blog/what-is-a-database-management-system/
2.https://www.techtarget.com/searchdatamanagement/definition/database-management-
system
INDEX
SNO NAME OF THE TOPIC PAGE NO

1 UNIT I 1

2 UNIT II 22

3 UNIT III 54

4 UNIT IV 71

5 UNIT V 90
Unit 1
Introduction

Introduction to DBMS
 DBMS stands for Database Management System.
 DBMS = Database + Management System.
 Database is a collection of data and Management System is a set of programs
to store and retrieve those data.
 DBMS is a collection of inter-related data and set of programs to store &
access those data in an easy and effective manner.
DBMS:-
 DBMS is a software that is used to manage the data. Some of the popular DBMS
softwares are: MySQL, IBM Db2, Oracle,
 DBMS provides an interface to the user so that the operations on database can
be performed using the interface.
 DBMS secure the data, that is the main advantage of DBMS over file system.
 DBMS also secures the data from unauthorized access as well as corrupt data
insertions. It allows multiple users to access data simultaneously while
maintaining the data consistency and data integrity.
DBMS allows following operations to the authorized users of the database:

Data Definition: Creation of table, table schema creation, removal of table


definition etc. comes under data definition. It is basically a layout of the table and
their relation with the other tables in the database. This allows to properly structure
the data in such a way so that the data that is related or dependent on other data in real
world can be represented the same way in database.

Data Modification: DBMS allows users to insert, update and delete the data
from the tables. These tables contains rows and columns, where row represents a
record of data while column represents attributes of the records. You can also bulk
update the several records in DBMS with a single click.

Data Retrieval: DBMS allows users to fetch data from the database. Searching
and retrieval of data is fast in DBMS. The size of the database doesn’t impact this
operation, on the other hand in file system, the size of the data can hugely impact the
search operation efficiency.

User administration: DBMS also allows user management such as organizing


users in different groups with different access levels. Granting users access to certain
tables in database, revoking access from certain users etc. This allows the admin of the
database to efficiently manage the access to the database and prevent unauthorized
access to the databases.

1
Need of DBMS

Database systems are basically developed for large amount of data. When
dealing with huge amount of data, there are two things that require
optimization: Storage of data and retrieval of data.

Storage: According to the principles of database systems, the data is stored in


such a way that it acquires lot less space as the redundant data (duplicate data) has
been removed before storage. Let’s take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving
account and another is salary account. Let’s say bank stores saving account data at one
place (these places are called tables we will learn them later) and salary account data
at another place, in that case if the customer information such as customer name,
address etc. are stored at both places then this is just a wastage of storage
(redundancy/ duplication of data), to organize the data in a better way the information
should be stored at one place and both the accounts should be linked to that
information somehow. The same thing we achieve in DBMS.

Fast Retrieval of data: Along with storing the data in an optimized and
systematic manner, it is also important that we retrieve the data quickly when needed.
Database systems ensure that the data is retrieved as quickly as possible.

Purpose of Database Systems

The main purpose of database systems is to manage the data . Consider a


university that keeps the data of students, teachers, courses, books etc. To manage this
data we need to store this data somewhere where we can add new data, delete unused
data, update outdated data, retrieve data, to perform these operations on data we need
a Database management system that allows us to store the data in such a way so that
all these operations can be performed on the data efficiently.

DBMS applications

Applications where we use Database Management Systems are:

 Telecom: There is a database to keeps track of the information regarding calls


made, network usage, customer details etc. Without the database systems it is
hard to maintain that huge amount of data that keeps updating every
millisecond.
 Industry: Where it is a manufacturing unit, warehouse or distribution centre,
each one needs a database to keep the records of ins and outs. For example
distribution centre should keep a track of the product units that supplied into the
centre as well as the products that got delivered out from the distribution centre
on each day; this is where DBMS comes into picture.

2
 Banking System: For storing customer info, tracking day to day credit and debit
transactions, generating bank statements etc. All this work has been done with
the help of Database management systems. Also, banking system needs security
of data as the data is sensitive, this is efficiently taken care by the DBMS
systems.
 Sales: To store customer information, production information and invoice
details. Using DBMS, you can track, manage and generate historical data to
analyse the sales data.
 Airlines: To travel though airlines, we make early reservations, this reservation
information along with flight schedule is stored in database. This is where the
real-time update of data is necessary as a flight seat reserved for one
passenger should not be allocated to another passenger, this is easily handled
by the DBMS systems as the data updates are in real time and fast.
 Education sector: Database systems are frequently used in schools and colleges
to store and retrieve the data regarding student details, staff details, course
details, exam details, payroll data, attendance details, fees details etc. There is a
large amount of inter-related data that needs to be stored and retrieved in an
efficient manner.
 Online shopping: You must be aware of the online shopping websites such as
Amazon, Flipkart etc. These sites store the product information, your addresses
and preferences, credit details and provide you the relevant list of products based
on your query. All this involves a Database management system. Along with
managing the vast catalogue of items, there is a need to secure the user
private information such as bank & card details. All this is taken care of by
database management systems.

Advantages and Disadvantages of DBMS:


DBMS vs file System

Drawbacks of File system

 Data redundancy:
o Data redundancy refers to the duplication of data,
o Need more storage
o Data redundancy often leads to higher storage costs
o poor access time.

 Data inconsistency:
o Data redundancy leads to data inconsistency, lets take the same example
that we have taken above, a student is enrolled for two courses and we
have student address stored twice, now lets say student requests to
change his address, if the address is changed at one place and not on all
the records then this can lead to data inconsistency.

 Data Isolation:

3
o Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate
data is difficult.
 Dependency on application programs:
o Changing files would lead to change in application programs.
 Atomicity issues:
o Atomicity of a transaction refers to “All or nothing”, which means either
all the operations in a transaction executes or none.
o It is difficult to achieve atomicity in file processing systems.
 Data Security:
o Data should be secured from unauthorised access,
o for example a student in a college should not be able to see the payroll
details of the teachers, such kind of security constraints are difficult to
apply in file processing systems.

Advantage of DBMS over file system


There are several advantages of Database management system over file system. Few
of them are as follows:

 No redundant data:
o Redundancy removed by data normalization. No data duplication saves
storage and improves access time.
 Data Consistency and Integrity:
o As we discussed earlier the root cause of data inconsistency is data
redundancy, since data normalization takes care of the data redundancy,
data inconsistency also been taken care of as part of it
 Data Security:
o It is easier to apply access constraints in database systems so that only
authorized user is able to access the data.
o Each user has a different set of access thus data is secured from the
issues such as identity theft, data leaks and misuse of data.
 Privacy:
o Limited access means privacy of data. DBMS can grant and revoke
access to the database on user level that ensures who is accessing which
data. It also helps user to manage the constraints on database, this
ensures which type of data can be entered into the table.
 Easy access to data –
o Database systems manages data in such a way so that the data is easily
accessible with fast response times. Even if the database size is huge, the
DBMS can still provide faster access and updation of data.
 Easy recovery:
o Since database systems keeps the backup of data, it is easier to do a full
recovery of data in case of a failure. This is very useful especially for
almost all the organizations, as the data maintained over time should not
be lost during a system crash or failure.
 Flexible:

4
o Database systems are more flexible than file processing systems. DBMS
systems are scalable,
o The database size can be increased and decreased based on the amount
of storage required.
o It also allows addition of additional tables as well as removal of existing
tables without disturbing the consistency of data.

Disadvantages of DBMS

 DBMS implementation cost is high compared to the file system


 Complexity: Database systems are complex to understand
 Performance: Database systems are generic, making them suitable for various
applications. However this feature affect their performance for some
applications

View of Data
View of data in DBMS
Abstraction is one of the main features of database systems.
Hiding irrelevant details from user and providing abstract view of data to users,
helps in easy and efficient user-database interaction.
 The top level of that architecture is “view level”.
 The view level provides the “view of data” to the users and hides the irrelevant
details such as data relationship, database schema, constraints, security etc
from the user.
To fully understand the view of data, you must have a basic knowledge of data
abstraction and instance & schema.

Data abstraction:Database systems are made-up of complex data structures. To ease


the user interaction with database, the developers hide internal irrelevant details from
users. This process of hiding irrelevant details from user is called data abstraction.

1. Instance and schema:

 Design of a database is called the schema.


 Schema is of three types: Physical schema, logical schema and view
schema.
 The data stored in database at a particular moment of time is called
instance of Database.
 Database schema defines the variable declarations in tables that belong to
a particular database; the value of these variables at a moment of time is
called the instance of that database.

5
Three levels of abstraction

Physical level: This is the lowest level of data abstraction. It describes how data is
actually stored in database. You can get the complex data structure details at this level.
Logical level: This is the middle level of 3-level data abstraction architecture. It
describes what data is stored in database.
View level: Highest level of data abstraction. This level describes the user
interaction with database system.

Example: Let’s say we are storing customer information in a customer table.


At physical level these records can be described as blocks of storage (bytes, gigabytes,
terabytes etc.) in memory. These details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with
their data types, their relationship among each other can be logically implemented.
The programmers generally work at this level because they are aware of such things
about database systems.
At view level, user just interact with system with the help of GUI and enter the details
at the screen, they are not aware of how the data is stored and what data is stored; such
details are hidden from them.

Instance and schema in DBMS


DBMS Schema
Definition of schema: Design of a database is called the schema. For example:
An employee table in database exists with the following attributes:
EMP_NAME EMP_ID EMP_ADDRESS EMP_CONTACT

This is the schema of the employee table. Schema defines the attributes of tables in
the database. Schema is of three types: Physical schema, logical schema and view
schema.
6
 Schema represents the logical view of the database. It helps you understand
what data needs to go where.
 Schema can be represented by a diagram as shown below.
 Schema helps the database users to understand the relationship between
data. This helps in efficiently performing operations on database such as insert,
update, delete, search etc.
In the following diagram, we have a schema that shows the relationship between three
tables: Course, Student and Section. The diagram only shows the design of the
database, it doesn’t show the data present in those tables. Schema is only a structural
view(design) of a database as shown in the diagram below.

The design of a database at physical level is called physical schema, how the data
stored in blocks of storage is described at this level.
Design of database at logical level is called logical schema, programmers and
database administrators work at this level, at this level data can be described as certain
types of data records gets stored in data structures, however the internal details such as
implementation of data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally
describes end user interaction with database systems.

DBMS Instance
Definition of instance: The data stored in database at a particular moment of
time is called instance of database. Database schema defines the attributes in tables
that belong to a particular database. The value of these attributes at a moment of time
is called the instance of that database.
For example, we have seen the schema of table “employee” above. Let’s see
the table with the data now. At this moment the table contains two rows (records).

7
This is the the current instance of the table “employee” because this is the data that is
stored in this table at this particular moment of time.
EMP_NAME EMP_ID EMP_ADDRESS EMP_CONTACT

Chaitanya101Noida95********
Ajeet102Delhi99********
Let’s take another example: Let’s say we have a single table student in the database,
today the table has 100 records, so today the instance of the database has 100 records.
We are going to add another 100 records in this table by tomorrow so the instance of
database tomorrow will have 200 records in table. In short, at a particular moment the
data stored in database is called the instance, this changes over time as and when we
add, delete or update data in the database.

DBMS languages
Database languages are used to read, update and store data in a database. There
are several such languages that can be used for this purpose; one of them is SQL
(Structured Query Language).

Types of DBMS languages:


 DDL Data Definition Language
 DCL Data Control Language
 DML Data Manipulation Language
 TCL Transaction Control Language

Data Definition Language (DDL)


DDL is used for specifying the database schema. It is used for creating tables,
schema, indexes, constraints etc. in database. Lets see the operations that we can
perform on database using DDL:

 To create the database instance – CREATE


 To alter the structure of database – ALTER
 To drop database instances – DROP
 To delete tables in a database instance – TRUNCATE
 To rename database instances – RENAME
 To drop objects from database such as tables – DROP
 To Comment – Comment

All of these commands either defines or update the database schema that’s why they
come under Data Definition language.

Data Manipulation Language (DML)


DML is used for accessing and manipulating data in a database. The following
operations on database comes under DML:

8
 To read records from table(s) – SELECT
 To insert record(s) into the table(s) – INSERT
 Update the data in table(s) – UPDATE
 Delete all the records from the table – DELETE

Data Control language (DCL)


DCL is used for granting and revoking user access on a database –

 To grant access to user – GRANT


 To revoke access from user – REVOKE
In practical data definition language, data manipulation language and data control languages
are not separate language, rather they are the parts of a single database language such as SQL.

Transaction Control Language(TCL)


The changes in the database that we made using DML commands are either performed
or rollbacked using TCL.

 To persist the changes made by DML commands in database – COMMIT


 To rollback the changes made to the database – ROLLBACK

DBMS Architecture
The architecture of DBMS depends on the computer system on which it runs.
For example, in a client-server DBMS architecture, the database systems at server
machine can run several requests made by client machine. We will understand this
communication with the help of diagrams.

Types of DBMS Architecture


There are three types of DBMS architecture:
 Single tier architecture
 Two tier architecture
 Three tier architecture

1. Single tier architecture


In this type of architecture, the database is readily available on the client
machine, any request made by client doesn’t require a network connection to perform
the action on the database.
For example, lets say you want to fetch the records of employee from the
database and the database is available on your computer system, so the request to
fetch employee details will be done by your computer and the records will be fetched
from the database by your computer as well. This type of system is generally referred
as local database system.

9
2. Two tier architecture

In two-tier architecture, the Database system is present at the server machine and the
DBMS application is present at the client machine, these two machines are connected
with each other through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server
using a query language like sql, the server perform the request on the database and
returns the result back to the client. The application connection interface such as
JDBC, ODBC are used for the interaction between server and client.

10
3. Three tier architecture

In three-tier architecture, another layer is present between the client machine and
server machine. In this architecture, the client application doesn’t communicate
directly with the database systems present at the server machine, rather the client
application communicates with server application and the server application internally
communicates with the database system present at the server.

Data models in DBMS

Types of Data Models


There are several types of data models in DBMS. We will cover them in detail in
separate articles(Links to those separate tutorials are already provided below). In this
guide, we will just see a basic overview of types of models.
Object based logical Models – Describe data at the conceptual and view levels.

1. E-R Model
2. Object oriented Model

11
Record based logical Models – Like Object based model, they also describe data at
the conceptual and view levels. These models specify logical structure of database
with records, fields and attributes.

1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it
has graph-like structure rather than a tree-based structure. Unlike hierarchical
model, this model allows each record to have more than one parent record.

Physical Data Models – These models describe data at the lowest level of abstraction.

Entity Relationship Diagram – ER Diagram in DBMS


An Entity–relationship model (ER model) describes the structure of a database
with the help of a diagram, which is known as Entity Relationship Diagram (ER
Diagram). An ER model is a design or blueprint of a database that can later be
implemented as a database. The main components of E-R model are: entity set and
relationship set.

Entity Relationship Diagram (ER Diagram)


An ER diagram shows the relationship among entity sets. An entity set is a
group of similar entities and these entities can have attributes. In terms of DBMS, an
entity is a table or attribute of a table in database, so by showing relationship among
tables and their attributes, ER diagram shows the complete logical structure of a
database. Lets have a look at a simple ER diagram to understand this concept.

A simple ER Diagram:

In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as a college

12
can have many students however a student cannot study in multiple colleges at the
same time. Student entity has attributes such as Stu_Id, Stu_Name&Stu_Addr and
College entity has attributes such as Col_ID&Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will
discuss these terms in detail in the next section(Components of a ER Diagram) of this
guide so don’t worry too much about these terms now, just go through them once.

Rectangle:Represents Entity sets.


Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set

Components of a ER Diagram

As shown in the above diagram, an ER diagram has three main components:


1. Entity
2. Attribute
3. Relationship

1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an
ER diagram.

13
For example: In the following ER diagram we have two entities Student and College
and these two entities have many to one relationship as many students study in a
single college. We will read more about relationships later, for now focus on entities.

Weak Entity:

An entity that cannot be uniquely identified by its own attributes and relies on the
relationship with other entity is called weak entity. The weak entity is represented by a
double rectangle. For example – a bank account cannot be uniquely identified without
knowing the bank to which the account belongs, so bank account is a weak entity.

2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in
an ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

14
1. Key attribute:

A key attribute can uniquely identify an entity from an entity set. For example, student
roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is
underlined.

2. Composite attribute:

An attribute that is a combination of other attributes is known as composite attribute.


For example, In student entity, the student address is a composite attribute as an
address is composed of other attributes such as pin code, state, country.

3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is
represented with double ovals in an ER Diagram. For example – A person can have
more than one phone numbers so the phone number attribute is multivalued.

15
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute.
It is represented by dashed oval in an ER Diagram. For example – Person age is a
derived attribute as it changes over time and can be derived from another attribute
(Date of birth).

E-R diagram with multivalued and derived attributes:

3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the
relationship among entities. There are four types of relationships:

1. One to One
2. One to Many
3. Many to One
4. Many to Many

1. One to One Relationship

When a single instance of an entity is associated with a single instance of another


entity then it is called one to one relationship. For example, a person has only one
passport and a passport is given to one person.

16
2. One to Many Relationship

When a single instance of an entity is associated with more than one instances of
another entity then it is called one to many relationship. For example – a customer can
place many orders but a order cannot be placed by many customers.

3. Many to One Relationship

When more than one instances of an entity is associated with a single instance of
another entity then it is called many to one relationship. For example – many students
can study in a single college but a student cannot study in many colleges at the same
time.

4. Many to Many Relationship

When more than one instances of an entity is associated with more than one instances
of another entity then it is called many to many relationship. For example, a can be
assigned to many projects and a project can be assigned to many students.

Total Participation of an Entity set


Total participation of an entity set represents that each entity in entity set must have at
least one relationship in a relationship set. It is also called mandatory
participation. For example: In the following diagram each college must have at-least

17
one associated Student. Total participation is represented using a double line between
the entity set and relationship set.

Partial participation of an Entity Set


Partial participation of an entity set represents that each entity in the entity set
may or may not participate in the relationship instance in that relationship set. It is
also called as optional participation
Partial participation is represented using a single line between the entity set and
relationship set.
Example: Consider an example of an IT company. There are many employees
working for the company. Let’s take the example of relationship
between employee and role software engineer. Every software engineer is an employee
but not every employee is software engineer as there are employees for other roles as
well, such as housekeeping, managers, CEO etc. so we can say that participation of
employee entity set to the software engineer relationship is partial.

DBMS – ER Design Issues

1. Choosing Entity Setvs Attributes


Here we will discuss how choosing an entity set vs an attribute can change the whole
ER design semantics. To understand this lets take an example, let’s say we have an
entity set Student with attributes such as student-name and student-id. Now we can
say that the student-id itself can be an entity with the attributes like student-class and
student-section.
Now if we compare the two cases we discussed above, in the first case we can say that
the student can have only one student id, however in the second case when we chose
student id as an entity it implied that a student can have more than one student id.

2. Choosing Entity Set vs. Relationship Sets


It is hard to decide that an object can be best represented by an entity set or
relationship set. To comprehend and decide the perfect choice between these two
(entity vs relationship), the user needs to understand whether the entity would need a
18
new relationship if a requirement arise in future, if this is the case then it is better to
choose entity set rather than relationship set.
Let’s take an example to understand it better: A person takes a loan from a bank, here
we have two entities person and bank and their relationship is loan. This is fine until
there is a need to disburse a joint loan, in such case a new relationship needs to be
created to define the relationship between the two individuals who have taken joint
loan. In this scenario, it is better to choose loan as an entity set rather than a
relationship set.

3. Choosing Binary vs n-ary Relationship Sets


In most cases, the relationships described in an ER diagrams are binary. The n-
ary relationships are those where entity sets are more than two, if the entity sets are
only two, their relationship can be termed as binary relationship.
The n-ary relationships can make ER design complex, however the good news is that
we can convert and represent any n-ary relationship using multiple binary
relationships.
This may sound confusing so lets take an example to understand how we can convert
an n-ary relationship to multiple binary relationships. Now lets say we have to
describe a relationship between four family members: father, mother, son and
daughter. This can easily be represented in forms of multiple binary relationships,
father-mother relationship as “spouse”, son and daughter relationship as “siblings”
and father and mother relationship with their child as “child”.

4. Placing Relationship Attributes


The cardinality ratio in DBMS can help us determine in which scenarios we need to
place relationship attributes. It is recommended to represent the attributes of one to
one or one to many relationship sets with any participating entity sets rather than a
relationship set.
For example, if an entity cannot be determined as a separate entity rather it is
represented by the combination of participating entity sets. In such case it is better to
associate these entities to many-to-many relationship sets.

19
ER Diagram for Library

20
Hospital Management System

21
Unit II
RELATIONAL MODEL

Relational Model
Relational Model (RM) represents the database as a collection of relations. A
relation is nothing but a table of values. Every row in the table represents a collection of
related data values. These rows in the table denote a real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in
each row. The data are represented as a set of relations. In the relational model, data are
stored as tables. However, the physical storage of the data is independent of the way the data
are logically organized.

Some popular Relational Database management systems are:

 DB2 and Informix Dynamic Server – IBM


 Oracle and RDB – Oracle
 SQL Server and Access – Microsoft

Relational Model Concepts in DBMS

1. Attribute: Each column in a Table. Attributes are the properties which define a
relation. e.g., Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is
stored along with its entities. A table has two properties rows and columns. Rows
represent records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its
attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the
relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system.
Relation instances never have duplicate tuples.
9. Relation key – Every row has one, two or multiple attributes, which is called relation
key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is
known as attribute domain

22
Structure of Relational Database

Relational Integrity Constraints


Relational Integrity constraints in DBMS are referred to conditions which must be present for
a valid relation. These Relational constraints in DBMS are derived from the rules in the mini-
world that the database represents.

There are many types of Integrity Constraints in DBMS. Constraints on the Relational
database management system is mostly divided into three main categories are:

1. Domain Constraints
2. Key Constraints
3. Referential Integrity Constraints

Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the corresponding
domain or it is not of the appropriate data type.

Domain constraints specify that within each tuple, and the value of each attribute must be
unique. This is specified as data types which include standard data types integers, real
numbers, characters, Booleans, variable length strings, etc.

Example:

Create DOMAIN CustomerName


CHECK (value not NULL)

The example shown demonstrates creating a domain constraint such that CustomerName is
not NULL

23
Key Constraints
An attribute that can uniquely identify a tuple in a relation is called the key of the table. The
value of the attribute for different tuples in the relation has to be unique.

Example:

In the given table, CustomerID is a key attribute of Customer Table. It is most likely to have
a single key for one customer, CustomerID =1 is only for the CustomerName =” Google”.

CustomerID CustomerName Status

1 Google Active

2 Amazon Active

3 Apple Inactive

Referential Integrity Constraints


Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A
foreign key is an important attribute of a relation which should be referred to in other
relationships. Referential integrity constraint state happens where relation refers to a key
attribute of a different or same relation. However, that key element must exist in the table.

Example:

In the above example, we have 2 relations, Customer and Billing.

Tuple for CustomerID =1 is referenced twice in the relation Billing. So we know


CustomerName=Google has billing amount $300

Best Practices for creating a Relational Model

 Data need to be represented as a collection of relations

24
 Each relation should be depicted clearly in the table
 Rows should contain data about instances of an entity
 Columns must contain data about attributes of the entity
 Cells of the table should hold a single value
 Each column should be given a unique name
 No two rows can be identical
 The values of an attribute should be from the same domain

Advantages of Relational Database Model

 Simplicity: A Relational data model in DBMS is simpler than the hierarchical and
network model. 
 Structural Independence: The relational database is only concerned with data and
not with a structure. This can improve the performance of the model. 
 Easy to use: The Relational model in DBMS is easy as tables consisting of rows and
columns are quite natural and simple to understand
 Query capability: It makes possible for a high-level query language like SQL to
avoid complex database navigation.
 Data independence: The Structure of Relational database can be changed without
having to change any application.
 Scalable: Regarding a number of records, or rows, and the number of fields, a
database should be enlarged to enhance its usability. 

Disadvantages of Relational Model

 Few relational databases have limits on field lengths which can’t be exceeded.
 Relational databases can sometimes become complex as the amount of data grows,
and the relations between pieces of data become more complicated.
 Complex relational database systems may lead to isolated databases where the
information cannot be shared from one system to another.

Relational database systems are expected to be equipped with a query language that can assist
its users to query the database instances. There are two kinds of query languages − relational
algebra and relational calculus.

Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input
and yields instances of relations as output. It uses operators to perform queries. An operator
can be either unary or binary. They accept relations as their input and yield relations as their
output. Relational algebra is performed recursively on a relation and intermediate results are
also considered relations.
The fundamental operations of relational algebra are as follows −

 Select
 Project
 Union

25
 Set different
 Cartesian product
 Rename
We will discuss all these operations in the following sections.

Select Operation (σ)

It selects tuples that satisfy the given predicate from a relation.


Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic
formula which may use connectors like and, or, and not. These terms may use relational
operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those
books published after 2010.

Project Operation (∏)

It projects column(s) that satisfy a given predicate.


Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.

Union Operation (∪)

It performs binary union between two given relations and is defined as −


r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −

26
 r, and s must have the same number of attributes.
 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or
both.

Set Difference (−)

The result of set difference operation is tuples, which are present in one relation but are not in
the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.

Cartesian Product (Χ)

Combines information of two different relations into one.


Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.

Rename Operation (ρ)

The results of relational algebra are also relations but without any name. The rename
operation allows us to rename the output relation. 'rename' operation is denoted with small
Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −

 Set intersection
 Assignment
 Natural join

Relational Calculus

In contrast to Relational Algebra, Relational Calculus is a non-procedural query language,


that is, it tells what to do but never explains how to do it.

27
Relational calculus exists in two forms −
Tuple Relational Calculus (TRC)
Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with 'name' from Author who has written article on 'database'.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)
In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as
done in TRC, mentioned above).
Notation −
{ a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{< article, page, subject > | ∈ TutorialsPoint ∧ subject = 'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is
database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also
involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is
equivalent to Relational Algebra.
SQL NULL Values
In SQL there may be some records in a table that do not have values or data for
every field. This could be possible because at a time of data entry information is not
available. So SQL supports a special value known as NULL which is used to represent the
values of attributes that may be unknown or not apply to a tuple. SQL places a NULL value
in the field in the absence of a user-defined value. For example, the Apartment_number
attribute of an address applies only to address that are in apartment buildings and not to
other types of residences.
Importance of NULL value:
 It is important to understand that a NULL value is different from a zero value.
 A NULL value is used to represent a missing value, but that it usually has one of three
different interpretations:

28
 The value unknown (value exists but is not known)
 Value not available (exists but is purposely withheld)
 Attribute not applicable (undefined for this tuple)
 It is often not possible to determine which of the meanings is intended. Hence, SQL
does not distinguish between the different meanings of NULL.

Principles of NULL values:

 Setting a NULL value is appropriate when the actual value is unknown, or when a value
would not be meaningful.
 A NULL value is not equivalent to a value of ZERO if the data type is a number and is
not equivalent to spaces if the data type is character.
 A NULL value can be inserted into columns of any data type.
 A NULL value will evaluate NULL in any expression.
 Suppose if any column has a NULL value, then UNIQUE, FOREIGN key, CHECK
constraints will ignore by SQL.
In general, each NULL value is considered to be different from every other NULL in the
database. When a NULL is involved in a comparison operation, the result is considered to
be UNKNOWN. Hence, SQL uses a three-valued logic with values True, False,
and Unknown. It is, therefore, necessary to define the results of three-valued logical
expressions when the logical connectives AND, OR, and NOT are used.

How to test for NULL Values?


SQL allows queries that check whether an attribute value is NULL. Rather than using = or
to compare an attribute value to NULL, SQL uses IS and IS NOT. This is because SQL
considers each NULL value as being distinct from every other NULL value, so equality
comparison is not appropriate.
Now, consider the following Employee Table,

29
Suppose if we find the Fname, Lname of the Employee having no Super_ssn then the query
will be:
Query
SELECT Fname, Lname FROM Employee WHERE Super_ssn IS NULL;
Output:

Now if we find the Count of the number of Employees having Super_ssn.


Query:

SELECT COUNT(*) AS Count FROM Employee WHERE Super_ssn IS NOT NULL;


Output:

Modification of Relational Database


Four basic update operations performed on relational database model are

Insert, update, delete and select.

 Insert is used to insert data into the relation


 Delete is used to delete tuples from the table.
 Modify allows you to change the values of some attributes in existing tuples.
 Select allows you to choose a specific range of data.

Whenever one of these operations are applied, integrity constraints specified on the relational
database schema must never be violated.

30
Insert Operation
The insert operation gives values of the attribute for a new tuple which should be inserted
into a relation.

Update Operation
You can see that in the below-given relation table CustomerName= ‘Apple’ is updated from
Inactive to Active.

Delete Operation
To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.

In the above-given example, CustomerName= “Apple” is deleted from the table.

The Delete operation could violate referential integrity if the tuple which is deleted is
referenced by foreign keys from other tuples in the same database.

Select Operation

In the above-given example, CustomerName=”Amazon” is selected

31
Structured Query Language (SQL)
Structured Query Language is a standard Database language which is used to create,
maintain and retrieve the relational database. Following are some interesting facts about
SQL.
 SQL is case insensitive. But it is a recommended practice to use keywords (like
SELECT, UPDATE, CREATE, etc) in capital letters and use user defined things (liked
table name, column name, etc) in small letters.
 We can write comments in SQL using “–” (double hyphen) at the beginning of any line.
 SQL is the programming language for relational databases (explained below) like
MySQL, Oracle, Sybase, SQL Server, Postgre, etc. Other non-relational databases (also
called NoSQL) databases like MongoDB, DynamoDB, etc do not use SQL
 Although there is an ISO standard for SQL, most of the implementations slightly vary
in syntax. So we may encounter queries that work in SQL Server but do not work in
MySQL.
.
What is Relational Database?
Relational database means the data is stored as well as retrieved in the form of relations
(tables). Table 1 shows the relational database with only one relation
called STUDENT which stores ROLL_NO, NAME, ADDRESS, PHONE and AGE of
students.
STUDENT

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

TABLE 1

These are some important terminologies that are used in terms of relation.
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME etc.
Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one
of which is shown as:
1 RAM DELHI 9455123451 18

Degree: The number of attributes in the relation is known as degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality.
The STUDENT relation defined above has cardinality 4.

32
Column: Column represents the set of values for a particular attribute. The
column ROLL_NO is extracted from relation STUDENT.

ROLL_NO

The queries to deal with relational database can be categories as:


Data Definition Language: It is used to define the structure of the database. e.g; CREATE
TABLE, ADD COLUMN, DROP COLUMN and so on.
Data Manipulation Language: It is used to manipulate data in the relations. e.g.; INSERT,
DELETE, UPDATE and so on.
Data Query Language: It is used to extract the data from the relations. e.g.; SELECT
So first we will consider the Data Query Language. A generic query to retrieve from a
relational database is:
1. SELECT [DISTINCT] Attribute_List FROM R1,R2….RM
2. [WHERE condition]
3. [GROUP BY (Attributes)[HAVING condition]]
4. [ORDER BY(Attributes)[DESC]];

Part of the query represented by statement 1 is compulsory if you want to retrieve from a
relational database. The statements written inside [] are optional. We will look at the
possible query combination on relation shown in Table 1.
Case 1: If we want to retrieve attributes ROLL_NO and NAME of all students, the query
will be:

SELECT ROLL_NO, NAME FROM STUDENT;

ROLL_NO NAME

1 RAM

2 RAMESH

3 SUJIT

33
4 SURESH

Case 2: If we want to retrieve ROLL_NO and NAME of the students


whose ROLL_NO is greater than 2, the query will be:
SELECT ROLL_NO, NAME FROM STUDENT WHERE ROLL_NO>2;

ROLL_NO NAME

3 SUJIT

4 SURESH

CASE 3: If we want to retrieve all attributes of students, we can write * in place of writing
all attributes as:
SELECT * FROM STUDENT WHERE ROLL_NO>2;

ROLL_NO NAME ADDRESS PHONE AGE

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 9156768971 18

CASE 4: If we want to represent the relation in ascending order by AGE, we can use
ORDER BY clause as:
SELECT * FROM STUDENT ORDER BY AGE;

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

4 SURESH DELHI 9156768971 18

3 SUJIT ROHTAK 9156253131 20

Note: ORDER BY AGE is equivalent to ORDER BY AGE ASC. If we want to retrieve the
results in descending order of AGE, we can use ORDER BY AGE DESC.

34
CASE 5: If we want to retrieve distinct values of an attribute or group of attribute,
DISTINCT is used as in:
SELECT DISTINCT ADDRESS FROM STUDENT;

ADDRESS

DELHI

GURGAON

ROHTAK

If DISTINCT is not used, DELHI will be repeated twice in result set. Before understanding
GROUP BY and HAVING, we need to understand aggregations functions in SQL.
AGGRATION FUNCTIONS: Aggregation functions are used to perform mathematical
operations on data values of a relation. Some of the common aggregation functions used in
SQL are:
 COUNT: Count function is used to count the number of rows in a relation. e.g;
SELECT COUNT (PHONE) FROM STUDENT;

COUNT(PHONE)

 SUM: SUM function is used to add the values of an attribute in a relation. e.g;
SELECT SUM (AGE) FROM STUDENT;

SUM(AGE)

74

In the same way, MIN, MAX and AVG can be used. As we have seen above, all
aggregation functions return only 1 row.
AVERAGE: It gives the average values of the tupples. It is also defined as sum divided by
count values.
Syntax:AVG(attributename)
OR
Syntax:SUM(attributename)/COUNT(attributename)
The above mentioned syntax also retrieves the average value of tupples.
MAXIMUM:It extracts the maximum value among the set of tupples.
Syntax:MAX(attributename)
MINIMUM:It extracts the minimum value amongst the set of all the tupples.
Syntax:MIN(attributename)

35
GROUP BY: Group by is used to group the tuples of a relation based on an attribute or
group of attribute. It is always combined with aggregation function which is computed on
group. e.g.;

SELECT ADDRESS, SUM(AGE) FROM STUDENT


GROUP BY (ADDRESS);

In this query, SUM(AGE) will be computed but not for entire table but for each address.
i.e.; sum of AGE for address DELHI(18+18=36) and similarly for other address as well.
The output is:

ADDRESS SUM(AGE)

DELHI 36

GURGAON 18

ROHTAK 20

If we try to execute the query given below, it will result in error because although we have
computed SUM(AGE) for each address, there are more than 1 ROLL_NO for each address
we have grouped. So it can’t be displayed in result set. We need to use aggregate functions
on columns after SELECT statement to make sense of the resulting set whenever we are
using GROUP BY.
SELECT ROLL_NO, ADDRESS, SUM(AGE) FROM STUDENT
GROUP BY (ADDRESS);

Advanced SQL
Accessing SQL From a Programming Language
■ API (application-program interface) for a program to interact with a database server
■ Application makes calls to
● Connect with the database server
● Send SQL commands to the database server
● Fetch tuples of result one-by-one into program variables
■ Various tools:
● ODBC (Open Database Connectivity) works with C, C++, C#, and Visual
Basic. Other APIs such as ADO.NET sit on top of ODBC
● JDBC (Java Database Connectivity) works with Java
● Embedded SQL

36
Integrity Constraints
 The Set of rules which is used to maintain the quality of information are known
as integrity constraints.
 Integrity constraints make sure about data intersection, update and so on.
 Integrity constraints can be understood as a guard against unintentional damage to
the database.
For any stored data if we want to preserve the consistency and correctness, a relational
DBMS typically imposes one or more data integrity constraints. These constraints restrict the
data values which can be inserted into the database or created by a database update.

Data Integrity Constraints

There are different types of data integrity constraints that are commonly found in relational
databases, including the following −
 Required data − Some columns in a database contain a valid data value in each row;
they are not allowed to contain NULL values. In the sample database, every order has
an associated customer who placed the order. The DBMS can be asked to prevent
NULL values in this column.
 Validity checking − Every column in a database has a domain, a set of data values
which are legal for that column. The DBMS allowed preventing other data values in
these columns.
 Entity integrity − The primary key of a table contains a unique value in each row that
is different from the values in all other rows. Duplicate values are illegal because they
are not allowing the database to differentiate one entity from another. The DBMS can
be asked to enforce this unique values constraint.
 Referential integrity − A foreign key in a relational database links each row in the
child table containing the foreign key to the row of the parent table containing the
matching primary key value. The DBMS can be asked to enforce this foreign
key/primary key constraint.
 Other data relationships − The real-world situation which is modeled by a database
often has additional constraints which govern the legal data values that may appear in
the database. The DBMS is allowed to check modifications to the tables to make sure
that their values are constrained in this way.
 Business rules − Updates to a database that are constrained by business rules
governing the real-world transactions which are represented by the updates.
 Consistency − Many real-world transactions that cause multiple updates to a database.
The DBMS is allowed to enforce this type of consistency rule or to support
applications that implement such rules.

Different types of Integrity Constraints

37
Domain Constraint
 The Definition of an applicable set of values is known as domain constraint.
 Strings, character, time, integer, currency, date etc. Are examples of the data type of domain
constraints.
Example

ID NAME SEMESTER AGE

100 Jai 1st 27

101 BKadam 4th 34

102 Rajeev 3rd 31

103 Asmita 6th 29

104 Mahesh 2nd Twenty two


‘Twenty two’ is not allowed for 104 id because the attribute AGE is an integer

Entity Integer Constraint


 Entity Integrity Constraints states that the primary value key cannot be null because the
primary value key is used to find out individual rows in relation and if the value of the
primary key is null then it is not easy to identify those rows.
 There can be a null value in the table apart from the primary key field.
Example

Emp_ID Emp_Name Salary

11 Manish 30000

12 Vikram 20000

38
13 Sudhir 10000

Rajeev 40000
Null is not allowed in Emp_ID as it is a Primary key and cannot have a NULL value.

Referential Integrity Constraint


1. Referential Integrity Constraint is specific between two tables.
2. A foreign key in the 1st table refers to the primary key of the 2nd table, in this case
each value of the foreign key in the 1st table has to be null or present in the 2nd table.

Key Constraints
 The Entity within its entity set is identified uniquely by the key which is the entity set.
 There can be a number of keys in an entity set but only one will be the primary key out of all
keys. In a relational table a primary key can have a unique as well as a null value.
Example

ID NAME SEMESTER AGE

100 Naren 4 27

101 Lalit 6 28

102 Shivanshu 3 22

103 Navdeep 5 29

102 Karthik 7 25
All row ID must be unique hence 102 is not allowed.

Database authorization
Authorization is the process where the database manager gets information about the
authenticated user. Part of that information is determining which database operations the user
can perform and which data objects a user can access.

A privilege is a type of permission for an authorization name, or a permission to perform an


action or a task. The privilege allows a user to create or access database resources. Privileges
are stored in the database catalogs. Authorized users can pass on privileges on their own
objects to other users by using the GRANT statement. Privileges can be granted to individual
users, to groups, or to PUBLIC. PUBLIC is a special group that consists of all users,
including future users. Users that are members of a group will indirectly take advantage of
the privileges granted to the group, where groups are supported.

39
A role is a database object that groups one or more privileges. Roles can be assigned to users
or groups or other roles by using the GRANT statement. Users that are members of roles
have the privileges that are defined for the role with which to access data.

The forms of authorization, such as administrative authority, privileges, and Row and
column access (RCAC) access, are discussed in Authorization of Big SQL objects. In
addition, ownership of objects brings with it a degree of authorization on the objects
created.

 Administrative authority includes system-level authorization and database-level


authorization:

System-level authorization
SYSADM (system administrator) authority
The SYSADM (system administrator) authority provides control over all the resources
created and maintained by the database manager. The system administrator possesses all the
authorities of SYSCTRL, SYSMAINT, and SYSMON authority. The user who has
SYSADM authority is responsible both for controlling the database manager, and for
ensuring the safety and integrity of the data.
SYSCTRL authority
The SYSCTRL authority provides control over operations that affect system resources. For
example, a user with SYSCTRL authority can create, update, start, stop, or drop a database.
This user can also start or stop an instance, but cannot access table data. Users with
SYSCTRL authority also have SYSMON authority.
SYSMAINT authority
The SYSMAINT authority provides the authority required to perform maintenance
operations on all databases that are associated with an instance. A user with SYSMAINT
authority can update the database configuration, backup a database or table space, restore
an existing database, and monitor a database. Like SYSCTRL, SYSMAINT does not
provide access to table data. Users with SYSMAINT authority also have SYSMON
authority.
SYSMON (system monitor) authority
The SYSMON (system monitor) authority provides the authority required to use the
database system monitor.
Database-level authorization
DBADM (database administrator)
The DBADM authority level provides administrative authority over a single database. This
database administrator possesses the privileges required to create objects and issue database
commands. The DBADM authority can be granted only by a user with SECADM authority.
The DBADM authority cannot be granted to PUBLIC.
SECADM (security administrator)

40
The SECADM authority level provides administrative authority for security over a single
database. The security administrator authority possesses the ability to manage database
security objects (database roles, audit policies, trusted contexts, security label components,
and security labels) and grant and revoke all database privileges and authorities. A user
with SECADM authority can transfer the ownership of objects that they do not own. They
can also use the AUDIT statement to associate an audit policy with a particular database or
database object at the server.The SECADM authority has no inherent privilege to access
data stored in tables. It can only be granted by a user with SECADM authority. The
SECADM authority cannot be granted to PUBLIC.
SQLADM (SQL administrator)
The SQLADM authority level provides administrative authority to monitor and tune SQL
statements within a single database. It can be granted by a user with ACCESSCTRL or
SECADM authority.
WLMADM (workload management administrator)
The WLMADM authority provides administrative authority to manage workload
management objects, such as service classes, work action sets, work class sets, and
workloads. It can be granted by a user with ACCESSCTRL or SECADM
authority.EXPLAIN (explain authority)The EXPLAIN authority level provides
administrative authority to explain query plans without gaining access to data. It can only
be granted by a user with ACCESSCTRL or SECADM authority.
EXPLAIN (explain authority)
The EXPLAIN authority level provides administrative authority to explain query plans
without gaining access to data. It can only be granted by a user with ACCESSCTRL or
SECADM authority.
ACCESSCTRL (access control authority)
ACCESSCTRL authority can only be granted by a user with SECADM authority. The
ACCESSCTRL authority cannot be granted to PUBLIC. The ACCESSCTRL authority
level provides administrative authority to issue the following GRANT (and REVOKE)
statements:

o GRANT (Database Authorities)


o GRANT (Global Variable Privileges)
o GRANT (Index Privileges)
o GRANT (Module Privileges)
o GRANT (Package Privileges)
o GRANT (Routine Privileges)
o GRANT (Schema Privileges)
o GRANT (Sequence Privileges)
o GRANT (Server Privileges)
o GRANT (Table, View, or Nickname Privileges)
o GRANT (Table Space Privileges)
o GRANT (Workload Privileges)
o GRANT (XSR Object Privileges)

41
For more information about granting and revoking privileges, see Granting and revoking
access.

DATA ACCESS (data access authority)


DATAACCESS authority can be granted only by a user who holds SECADM authority. It
cannot be granted to PUBLIC. The DATAACCESS authority level provides the following
privileges and authorities:

 LOAD authority
 SELECT, INSERT, UPDATE, DELETE privilege on tables, views, nicknames, and
materialized query tables
 EXECUTE privilege on packages
 EXECUTE privilege on modules
 EXECUTE privilege on routines, except on the audit routines.
 USAGE privilege on all sequences

Database authorities (non-administrative)


To perform activities such as creating a table or a routine, or for loading data into a table,
specific database authorities are required. For example, the LOAD database authority is
required for use of the load utility to load data into tables (a user must also have INSERT
privilege on the table).

 Privileges

CONTROL privilege
If you possess the CONTROL privilege on an object, you can access that database object,
and grant and revoke privileges to or from other users on that object. The CONTROL
privilege only applies to tables, views, nicknames, indexes, and packages..

If a different user requires the CONTROL privilege to that object, a user with SECADM or
ACCESSCTRL authority can grant the CONTROL privilege to that object. The CONTROL
privilege cannot be revoked from the object owner, however, the object owner can be
changed by using the TRANSFER OWNERSHIP statement.

Individual privileges
Individual privileges can be granted to allow a user to carry out specific tasks on specific
objects. Users with the administrative authorities ACCESSCTRL or SECADM, or with the
CONTROL privilege, can grant and revoke privileges to and from users.
Revoking privileges
The REVOKE statement is used to revoke previously granted privileges. The revoking of a
privilege from an authorization name revokes the privilege granted by all authorization
names.

42
Authorization ID privileges: SETSESSION USER
Authorization ID privileges involve actions on authorization IDs. There is currently only
one such privilege: the SETSESSIONUSER privilege.
Schema privileges
Schema privileges are in the object privilege category.
Table and view privileges
Table and view privileges involve actions on tables or views in a database.
Package privileges
A package is a database object that contains the information needed by the database
manager to access data in the most efficient way for a particular application program.
Package privileges enable a user to create and manipulate packages.
Sequence privileges
The creator of a sequence automatically receives the USAGE and ALTER privileges on the
sequence. The USAGE privilege is needed to use NEXT VALUE and PREVIOUS VALUE
expressions for the sequence.
Routine privileges
Execute privileges involve actions on all types of routines such as functions, procedures,
and methods within a database. Once having EXECUTE privilege, a user can then invoke
that routine, create a function that is sourced from that routine (applies to functions only),
and reference the routine in any DDL statement such as CREATE VIEW or CREATE
TRIGGER.
Usage privilege on workloads
To enable use of a workload, a user who holds ACCESSCTRL, SECADM, or WLMADM
authority can grant USAGE privilege on that workload to a user, a group, or a role using
the GRANT USAGE ON WORKLOAD statement.

Introduction to embedded SQL

Embedded SQL applications connect to databases and execute embedded SQL statements.
The embedded SQL statements are contained in a package that must be bound to the target
database server.

You can develop embedded SQL applications for the Db2® database in the following host
programming languages: C, C++, and COBOL.

Building embedded SQL applications involves two prerequisite steps before application
compilation and linking.

 Preparing the source files containing embedded SQL statements using


the Db2 precompiler.

43
The PREP (PRECOMPILE) command is used to invoke the Db2 precompiler, which
reads your source code, parses and converts the embedded SQL statements
to Db2 run-time services API calls, and finally writes the output to a new modified
source file. The precompiler produces access plans for the SQL statements, which are
stored together as a package within the database.

 Binding the statements in the application to the target database.

Binding is done by default during precompilation (the PREP command). If binding is


to be deferred (for example, running the BIND command later), then
the BINDFILE option needs to be specified at PREP time in order for a bind file to be
generated.

Once you have precompiled and bound your embedded SQL application, it is ready to be
compiled and linked using the host language-specific development tools.

To aid in the development of embedded SQL applications, you can refer to the embedded
SQL template in Cembedded SQL template in C. Examples of working embedded SQL
sample applications can also be found in the %DB2PATH%\SQLLIB\samples directory.
Note: %DB2PATH% refers to the Db2 installation directory
Static and dynamic SQL

SQL statements can be executed in one of two ways: statically or dynamically.

Statically executed SQL statements


For statically executed SQL statements, the syntax is fully known at precompile
time. The structure of an SQL statement must be completely specified for a
statement to be considered static. For example, the names for the columns and tables
referenced in a statement must be fully known at precompile time. The only
information that can be specified at run time are values for any host variables
referenced by the statement. However, host variable information, such as data types,
must still be precompiled. You precompile, bind, and compile statically executed
SQL statements before you run your application. Static SQL is best used on
databases whose statistics do not change a great deal.
Dynamically executed SQL statements
Dynamically executed SQL statements are built and executed by an application at
run-time. An interactive application that prompts the end user for key parts of an
SQL statement, such as the names of the tables and columns to be searched, is a
good example of a situation suited for dynamic SQL.

 Embedding SQL statements in a host language


Structured Query Language (SQL) is a standardized language that you can use to
manipulate database objects and the data that they contain. Despite differences
between host languages, embedded SQL applications are made up of three main
elements that are required to setup and issue an SQL statement.

44
 Supported development software for embedded SQL applications
Before you begin writing embedded SQL applications, you must determine if your
development software is supported. The operating system that you are developing for
determines which compilers, interpreters, and development software you must use.
 Setting up the embedded SQL development environment
Before you can start building embedded SQL applications, install the supported
compiler for the host language you will be using to develop your applications and set
up the embedded SQL environment.
 Designing embedded SQL applications
When designing embedded SQL applications you must use static or dynamic executed
SQL statements.
 Programming embedded SQL applications
Programming embedded SQL applications involves the same steps required to
assemble an application in your host programming language.
 Building embedded SQL applications
After you have created the source code for your embedded SQL application, you must
follow additional steps to build the application. You should consider building 64-bit
executable files when developing new embedded SQL database applications. Along
with compiling and linking your program, you must precompile and bind it.
 Deploying and running embedded SQL applications
Embedded SQL applications are portable and can be placed in remote database
components. You can compile the application in one location and run the package on
a different component.
 Compatibility features for migration
The Db2 database manager provides features that facilitate the migration of embedded
SQL C applications from other database systems.

Dynamic SQL

Dynamic SQL enables you to write programs that reference SQL statements whose full text is
not known until runtime. Before discussing dynamic SQL in detail, a clear definition of static
SQL may provide a good starting point for understanding dynamic SQL. Static SQL
statements do not change from execution to execution. The full text of static SQL statements
are known at compilation, which provides the following benefits:

 Successful compilation verifies that the SQL statements reference valid database
objects.
 Successful compilation verifies that the necessary privileges are in place to access the
database objects.
 Performance of static SQL is generally better than dynamic SQL.

Because of these advantages, you should use dynamic SQL only if you cannot use static SQL
to accomplish your goals, or if using static SQL is cumbersome compared to dynamic SQL.
However, static SQL has limitations that can be overcome with dynamic SQL. You may not
always know the full text of the SQL statements that must be executed in a PL/SQL
procedure. Your program may accept user input that defines the SQL statements to execute,
or your program may need to complete some processing work to determine the correct course
of action. In such cases, you should use dynamic SQL.

45
For example, consider a reporting application that performs standard queries on tables in a
data warehouse environment where the exact table name is unknown until runtime. To
accommodate the large amount of data in the data warehouse efficiently, you create a new
table every quarter to store the invoice information for the quarter. These tables all have
exactly the same definition and are named according to the starting month and year of the
quarter, for
example INV_01_1997, INV_04_1997, INV_07_1997, INV_10_1997, INV_01_1998, etc. In
such a case, you can use dynamic SQL in your reporting application to specify the table name
at runtime.

With static SQL, all of the data definition information, such as table definitions, referenced
by the SQL statements in your program must be known at compilation. If the data definition
changes, you must change and recompile the program. Dynamic SQL programs can handle
changes in data definition information, because the SQL statements can change "on the fly"
at runtime. Therefore, dynamic SQL is much more flexible than static SQL. Dynamic SQL
enables you to write application code that is reusable because the code defines a process that
is independent of the specific SQL statements used.

In addition, dynamic SQL lets you execute SQL statements that are not supported in static
SQL programs, such as data definition language (DDL) statements. Support for these
statements allows you to accomplish more with your PL/SQL programs.

Tuple Relational Calculus


Tuple Relational Calculus is a non-procedural query language unlike relational algebra.
Tuple Calculus provides only the description of the query but it does not provide the
methods to solve it. Thus, it explains what to do but not how to do.
In Tuple Calculus, a query is expressed as
{t| P(t)}
where t = resulting tuples,
P(t) = known as Predicate and these are the conditions that are used to fetch t
Thus, it generates set of all tuples t, such that Predicate P(t) is true for t.
P(t) may have various conditions logically combined with OR (∨), AND (∧), NOT(¬).
It also uses quantifiers:
∃ t ∈ r (Q(t)) = ”there exists” a tuple in t in relation r such that predicate Q(t) is true.
∀ t ∈ r (Q(t)) = Q(t) is true “for all” tuples in relation r.
Example:
Table-1: Customer
Customer name Street City

Saurabh A7 Patiala

Mehak B6 Jalandhar

Sumiti D9 Ludhiana

46
Customer name Street City

Ria A5 Patiala

Table-2: Branch

Branch name Branch city

ABC Patiala

DEF Ludhiana

GHI Jalandhar

Table-3: Account

Account number Branch name Balance

1111 ABC 50000

1112 DEF 10000

1113 GHI 9000

1114 ABC 7000

Table-4: Loan

Loan number Branch name Amount

L33 ABC 10000

L35 DEF 15000

L49 GHI 9000

L98 DEF 65000

Table-5: Borrower

Customer name Loan number

47
Customer name Loan number

Saurabh L33

Mehak L49

Ria L98

Table-6: Depositor

Customer name Account number

Saurabh 1111

Mehak 1113

Sumiti 1114

Queries-1: Find the loan number, branch, amount of loans of greater than or equal to 10000
amount.
{t| t ∈ loan ∧ t[amount]>=10000}
Resulting relation:

Loan number Branch name Amount

L33 ABC 10000

L35 DEF 15000

L98 DEF 65000

In the above query, t[amount] is known as tuple variable.


Queries-2: Find the loan number for each loan of an amount greater or equal to 10000.
{t| ∃ s ∈ loan(t[loan number] = s[loan number]
∧ s[amount]>=10000)}
Resulting relation:

Loan number

L33

L35

48
Loan number

L98

Queries-3: Find the names of all customers who have a loan and an account at the bank.
{t | ∃ s ∈ borrower( t[customer-name] = s[customer-name])
∧ ∃ u ∈ depositor( t[customer-name] = u[customer-name])}
Resulting relation:

Customer name

Saurabh

Mehak

Queries-4: Find the names of all customers having a loan at the “ABC” branch.
{t | ∃ s ∈ borrower(t[customer-name] = s[customer-name]
∧ ∃ u ∈ loan(u[branch-name] = “ABC” ∧ u[loan-number] = s[loan-number]))}
Resulting relation:

Customer name

Saurabh

Domain Relational Calculus


Domain Relational Calculus is a non-procedural query language equivalent in power to
Tuple Relational Calculus. Domain Relational Calculus provides only the description of the
query but it does not provide the methods to solve it. In Domain Relational Calculus, a
query is expressed as,
{ < x1, x2, x3, ..., xn > | P (x1, x2, x3, ..., xn ) }
where, < x1, x2, x3, …, xn > represents resulting domains variables and P (x1, x2, x3, …, xn )
represents the condition or formula equivalent to the Predicate calculus.

Predicate Calculus Formula:


1. Set of all comparison operators
2. Set of connectives like and, or, not
3. Set of quantifiers
Example:
Table-1: Customer
Customer name Street City

Debomit Kadamtala Alipurduar

49
Customer name Street City

Sayantan Udaypur Balurghat

Soumya Nutanchati Bankura

Ritu Juhu Mumbai

Table-2: Loan
Loan number Branch name Amount

L01 Main 200

L03 Main 150

L10 Sub 90

L08 Main 60

Table-3: Borrower
Customer name Loan number

Ritu L01

Debomit L08

Soumya L03

Query-1: Find the loan number, branch, amount of loans of greater than or equal to 100
amount.
{≺l, b, a≻ | ≺l, b, a≻ ∈ loan ∧ (a ≥ 100)}
Resulting relation:

Loan number Branch name Amount

L01 Main 200

L03 Main 150

Query-2: Find the loan number for each loan of an amount greater or equal to 150.
{≺l≻ | ∃ b, a (≺l, b, a≻ ∈ loan ∧ (a ≥ 150)}

50
Resulting relation:

Loan number

L01

L03

Query-3: Find the names of all customers having a loan at the “Main” branch and find the
loan amount .
{≺c, a≻ | ∃ l (≺c, l≻ ∈ borrower ∧ ∃ b (≺l, b, a≻ ∈ loan ∧ (b = “Main”)))}
Resulting relation:

Customer Name Amount

Ritu 200

Debomit 60

Soumya 150

Query By Example (QBE)

Normal queries we fire on the database they should be correct and in a well-defined
structure which means they should follow a proper syntax if the syntax or query is wrong
definitely we will get an error and due to that our application or calculation definitely going
to stop. So to overcome this problem QBE was introduced. QBE stands for Query By
Example and it was developed in 1970 by Moshe Zloof at IBM.
It is a graphical query language where we get a user interface and then we fill some
required fields to get our proper result.
In SQL we will get an error if the query is not correct but in the case of QBE if the query is
wrong either we get a wrong answer or the query will not be going to execute but we will
never get any error.

Note-:
In QBE we don’t write complete queries like SQL or other database languages it comes
with some blank so we need to just fill that blanks and we will get our required result.

Example
Consider the example where a table ‘SAC’ is present in the database with Name,
Phone_Number, and Branch fields. And we want to get the name of the SAC-
Representative name who belongs to the MCA Branch. If we write this query in SQL we
have to write it like
SELECT NAME
FROM SAC

51
WHERE BRANCH = 'MCA'"
And definitely, we will get our correct result. But in the case of QBE, it may be done as like
there is a field present and we just need to fill it with “MCA” and then click on the
SEARCH button we will get our required result.
Points about QBE:

 Supported by most of the database programs.


 It is a Graphical Query Language.
 Created in parallel to SQL development.

SQL Trigger
Trigger: A trigger is a stored procedure in database which automatically invokes whenever
a special event in the database occurs. For example, a trigger can be invoked when a row is
inserted into a specified table or when certain table columns are being updated.

Syntax:
create trigger [trigger_name]
[before | after]
{insert | update | delete}
on [table_name]
[for each row]
[trigger_body]

Explanation of syntax:
1. create trigger [trigger_name]: Creates or replaces an existing trigger with the
trigger_name.
2. [before | after]: This specifies when the trigger will be executed.
3. {insert | update | delete}: This specifies the DML operation.
4. on [table_name]: This specifies the name of the table associated with the trigger.
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed for
each row being affected.
6. [trigger_body]: This provides the operation to be performed as trigger is fired

BEFORE and AFTER of Trigger:

BEFORE triggers run the trigger action before the triggering statement is run.
AFTER triggers run the trigger action after the triggering statement is run.

Example:
Given Student Report Database, in which student marks assessment is recorded. In such
schema, create a trigger so that the total and average of specified marks is automatically
inserted whenever a record is insert.
Here, as trigger will invoke before record is inserted so, BEFORE Tag can be used.

52
Suppose the database Schema –
mysql> desc Student;
+ + + + + + +
| Field | Type | Null | Key | Default | Extra |
+ + + + + + +
| tid | int(4) | NO | PRI | NULL | auto_increment |
| name | varchar(30) | YES | | NULL | |
| subj1 | int(2) | YES | | NULL | |
| subj2 | int(2) | YES | | NULL | |
| subj3 | int(2) | YES | | NULL | |
| total | int(3) | YES | | NULL | |
| per | int(3) | YES | | NULL | |
+ + + + + + +
7 rows in set (0.00 sec)
SQL Trigger to problem statement.
create trigger stud_marks
before INSERT
on
Student
for each row
set Student.total = Student.subj1 + Student.subj2 + Student.subj3, Student.per =
Student.total * 60 / 100;
Above SQL statement will create a trigger in the student database in which whenever
subjects marks are entered, before inserting this data into the database, trigger will compute
those two values and insert with the entered values. i.e.,
mysql> insert into Student values(0, "ABCDE", 20, 20, 20, 0, 0);
Query OK, 1 row affected (0.09 sec)
mysql> select * from Student;
+ + + + + + + +
| tid | name | subj1 | subj2 | subj3 | total | per |
+ + + + + + + +
| 100 | ABCDE | 20 | 20 | 20 | 60 | 36 |
+ + + + + + + +
1 row in set (0.00 sec)

53
Unit 3
Database Design

Functional dependencies in DBMS

A functional dependency is a constraint that specifies the relationship between two


sets of attributes where one set can accurately determine the value of other sets. It is
denoted as X → Y, where X is a set of attributes that is capable of determining the value of
Y. The attribute set on the left side of the arrow, X is called Determinant, while on the
right side, Y is called the Dependent.
Functional dependencies are used to mathematically express relations among
database entities and are very important to understand advanced concepts in Relational
Database System.
Example:
Roll_No Name Dept_Name Dept_Building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
 roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine
values of fields name, dept_name and dept_building, hence a valid Functional
dependency
 roll_no → dept_name , Since, roll_no can determine whole set of {name,
dept_name, dept_building}, it can determine its subset dept_name also.
 dept_name → dept_building , Dept_name can identify the dept_building
accurately, since departments with different dept_name will also have a different
dept_building
 More valid functional dependencies: roll_no → name, {roll_no, name} ⇢
{dept_name, dept_building}, etc.
Here are some invalid functional dependencies:

54
 name → dept_name Students with the same name can have different dept_name,
hence this is not a valid functional dependency.
 dept_building → dept_name There can be multiple departments in the same
building, For example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.
 More invalid functional dependencies: name → roll_no, {name, dept_name} →
roll_no, dept_building → roll_no, etc.

Armstrong’s axioms/properties of functional dependencies:


1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
For example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by
the augmentation rule.
For example, If {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.→
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is
also valid by the Transitivity rule.
For example, roll_no → dept_name & dept_name → dept_building, then roll_no
→ dept_building is also valid.

Types of Functional dependencies in DBMS:


1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency

1. Trivial Functional Dependency


In Trivial Functional Dependency, a dependent is always a subset of the
determinant.
i.e. If X → Y and Y is the subset of X, then it is called trivial functional dependency

For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

55
Here, {roll_no, name} → name is a trivial functional dependency, since the
dependent name is a subset of determinant set {roll_no, name}
Similarly, roll_no → roll_no is also an example of trivial functional dependency.

2. Non-trivial Functional Dependency


In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional
dependency.
For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the


dependent name is not a subset of determinant roll_no
Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}

3. Multivalued Functional Dependency


In Multivalued functional dependency, entities of the dependent set are not
dependent on each other.
i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

56
roll_no name age

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since the


dependents name & age are not dependent on each other(i.e. name → age or age →
name doesn’t exist !)

4. Transitive Functional Dependency


In transitive functional dependency, dependent is indirectly dependent on determinant.
i.e. If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive
functional dependency
For example,
enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no,


Hence, according to the axiom of transitivity, enrol_no → building_no is a valid
functional dependency. This is an indirect functional dependency, hence called Transitive
functional dependency.

Non-less Decomposition in DBMS


Lossless join decomposition is a decomposition of a relation R into relations R1, R2
such that if we perform a natural join of relation R1 and R2, it will return the original
relation R. This is effective in removing redundancy from databases while preserving the
original data…
In other words by lossless decomposition, it becomes feasible to reconstruct the
relation R from decomposed tables R1 and R2 by using Joins.
In Lossless Decomposition, we select the common attribute and the criteria for
selecting a common attribute is that the common attribute must be a candidate key or super
key in either relation R1, R2, or both.

57
Decomposition of a relation R into R1 and R2 is a lossless-join decomposition if at
least one of the following functional dependencies are in F+ (Closure of functional
dependencies)
R1 ∩ R2 → R1
OR
R1 ∩ R2 → R2

Functional dependencies
Normal Forms in DBMS
Normalization is the process of minimizing redundancy from a relation or set of
relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So,
it helps to minimize the redundancy in relations. Normal forms are used to eliminate or
reduce redundancy in database tables.

1. First Normal Form –

If a relation contain composite or multi-valued attribute, it violates first normal form or


a relation is in first normal form if it does not contain any composite or multi-valued
attribute. A relation is in first normal form if every attribute in that relation is singled
valued attribute.

 Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued


attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2. 

 Example 2 –

ID Name Courses

1 A c1, c2
2 E c3
3 M C2, c3

58
In the above table Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute
ID Name Course

1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

2. Second Normal Form –

To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate
key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime
attribute, it is called partial dependency.
 Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee. }
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the one only
candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute
COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial
dependency and so this relation is not in 2NF.

59
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance,
if there are 100 students taking C1 course, we don’t need to store its Fee as 1000 for all
the 100 records, instead, once we can store it in the second table as the course fee for
C1 is 1000.
 Example 2 – Consider following functional dependencies in relation R (A, B , C, D )
 AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency,
i.e., any proper subset of AB doesn’t determine any non-prime attribute.

3. Third Normal Form –

A relation is in third normal form, if there is no transitive dependency for non-prime


attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every non-trivial
function dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
 Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}

60
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on
STUD_NO. It violates the third normal form. To convert it in third normal form,
we will decompose the relation STUDENT (STUD_NO, STUD_N AME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
 Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attributes
are on right sides of all functional dependencies are prime.

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is
super key. A relation is in BCNF iff in every non-trivial functional dependency X –
> Y, X is a super key.
 Example 1 – Find the highest normal form of a relation R(A,B,C,D,E) with FD
set as {BC->D, AC->BE, B->E} 
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can
determine all attribute of relation, So AC will be candidate key. A or C can’t be
derived from any other attribute of the relation, so there will be only 1 candidate
key {AC}.

 Step 2. Prime attributes are those attributes that are part of candidate key {A, C}
in this example and others will be non-prime {B, D, E} in this example. 

 Step 3. The relation R is in 1st normal form as a relational DBMS does not
allow multi-valued or composite attribute. 
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).

 The relation is not in 3rd normal form because in BC->D (neither BC is a super
key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a
prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be prime attribute. 

 So the highest normal form of relation will be 2nd Normal form.


 Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in BCNF.

61
Key Points –
 BCNF is free from redundancy.
 If a relation is in BCNF, then 3NF is also satisfied.
 If all attributes of relation are prime attribute, then the relation is always in 3NF.
 A relation in a Relational Database is always and at least in 1NF form.
 Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
 If a Relation has only singleton candidate keys( i.e. every candidate key consists of
only 1 attribute), then the Relation is always in 2NF( because no Partial functional
dependency possible).
 Sometimes going for BCNF form may not preserve functional dependency. In that
case go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
 There are many more Normal forms that exist after BCNF, like 4NF and more. But
in real world database systems it’s generally not required to go beyond BCNF.

Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional
dependencies.

ABC --> D
CD --> AE
Important Points for solving above type of question.

1) It is always a good idea to start checking from BCNF, then 3 NF, and so on.

2) If any functional dependency satisfied a normal form then there is no need to check for
lower normal form. For example, ABC –> D is in BCNF (Note that ABC is a superkey), so
no need to check this dependency for lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super
key so this dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already
satisfied BCNF. Let us consider CD -> AE. Since E is not a prime attribute, so the
relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD is a proper
subset of a candidate key and it determines E, which is non-prime attribute. So,
given relation is also not in 2 NF. So, the highest normal form is 1 NF.

Multivalued dependency (MVD)


Multivalued dependency (MVD) is having the presence of one or more rows in a
table. It implies the presence of one or more other rows in that same table. A multivalued
dependency prevents fourth normal form. A multivalued dependency involves at least three
attributes of a table.
It is represented with a symbol "->->" in DBMS.
X->Y relates one value of X to one value of Y.

62
X- >->Y (read as X multidetermines Y) relates one value of X to many values of Y.

A Nontrivial MVD occurs when X->->Y and X->->z where Y and Z are not
dependent are independent to each other. Non-trivial MVD produces redundancy.
We use multivalued conditions in two different ways −
 To test relations to decide if they are lawful under a given arrangement of practical
and multivalued dependencies.
 To determine limitations on the arrangement of lawful relations. We will concern
ourselves just with relations that fulfill a given arrangement of practical and
multivalued dependencies.

MVD transitive rule


If A ->B holds, and B ->C holds, then A ->B −>C holds.
Example
Given FD set is as follows −
ISBN--> TITLE,PUBLISHER
ISBN,NO -->AUTHOR
PUBLISHER -->PU_URL
We need to prove the rule. Consider A=ISBN,B=PUBLISHER,C=PU_URL. To find the
Transitive rule is implied, find the cover of A+ and compute.
 Now start with x={ISBN}
 The FD ISBN--> TITLE, PUBLISHER has LHS which is completely contained in
current attribute set x.
 Extend x by FD RHS attribute set, giving x={ISBN,TITLE,PUBLISHER}
 Now FD:PUBLISHER -->PU_URL is applicable
 Add RHS attribute set of FD to current attribute SET x, giving
x={ISBN,TITLE,PUBLISHER,PU_URL}
Here we can conclude that ISBN-->PU_URL

Multivalued Dependencies
The 4th Normal Form can cause the Multivalued Dependencies. If a relation is in Boyce
codee Normal form, it has to remove the multivalued Dependencies.
Explanation − The multivalued dependencies is that, if there is a dependency or relation in a
table, then one value has multiple dependencies occur.
Let us consider an example as given below. Consider the following table −

id department shift

1 coding day

2 Hr day

63
id department shift

3 Network night

In the above table, id 2 has two departments Hr and Network. And shift timing day and
night.
When we select the details with the id 2, then it will result the table as follows −

id department shift

2 Hr day

2 Network night

2 Hr night

2 Network day

This means there exist multivalued dependencies. In this, the relation between department
and shift is nothing.
This can be rectified by removing the multivalued dependency as, making this data in to two
tables as below −
Table 1

id department

1 coding

2 Hr

2 network

Table 2

id shift

1 day

2 day

64
id shift

2 night

The 4th normal form is applied to remove the multivalued dependencies in the data table.
The fourth normal form thus defines the multivalued dependencies.

If two or more independent relation are kept in a single relation or we can say multivalue
dependency occurs when the presence of one or more rows in a table implies the presence
of one or more other rows in that same table. Put another way, two attributes (or columns)
in a table are independent of one another, but both depend on a third attribute.

A multivalued dependency always requires at least three attributes because it consists of


at least two attributes that are dependent on a third.

For a dependency A -> B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency. The table should have at least 3 attributes and B
and C should be independent for A ->> B multivalued dependency. For example,

Person Mobile Food_Likes

Mahesh 9893/9424 Burger / pizza

Ramesh 9191 Pizza

Person->-> mobile,
Person ->-> food_likes
This is read as “person multidetermines mobile” and “person multidetermines food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a
functional dependency X -> Y, every x determines exactly one y, never more than one.

Fourth normal form (4NF):

Fourth normal form (4NF) is a level of database normalization where there are no non-
trivial multivalued dependencies other than a candidate key. It builds on the first three
normal forms (1NF, 2NF and 3NF) and the Boyce-Codd Normal Form (BCNF). It states
that, in addition to a database meeting the requirements of BCNF, it must not contain more
than one multivalued dependency.
Properties – A relation R is in 4NF if and only if the following conditions are satisfied:

1. It should be in the Boyce-Codd Normal Form (BCNF).

65
2. the table should not have any Multi-valued Dependency.

A table with a multivalued dependency violates the normalization standard of Fourth


Normal Form (4NK) because it creates unnecessary redundancies and can contribute to
inconsistent data. To bring this up to 4NF, it is necessary to break this information into two
tables.
Example – Consider the database table of a class which has two relations R1 contains
student ID(SID) and student name (SNAME) and R2 contains course id(CID) and course
name (CNAME).
Table – R1(SID, SNAME)
SID SNAME

S1 A

S2 B

Table – R2(CID, CNAME)

CID CNAME

C1 C

C2 D

When there cross product is done it resulted in multivalued dependencies:


Table – R1 X R2
SID SNAME CID CNAME

S1 A C1 C

S1 A C2 D

S2 B C1 C

S2 B C2 D

66
Multivalued dependencies (MVD) are:

SID->->CID; SID->->CNAME; SNAME->->CNAME


Joint dependency

Join decomposition is a further generalization of Multivalued dependencies. If the join of


R1 and R2 over C is equal to relation R then we can say that a join
dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D)
of a given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition
of R. A JD ⋈ {R1, R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a
lossless-join decomposition. The *(A, B, C, D), (C, D) will be a JD of R if the join of join’s
attribute is equal to
the relation R. Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on
are a JD of R.
Let R is a relation schema R1, R2, R3 .......... Rn be the decomposition of R. r( R ) is said to
satisfy join dependency if and only if

Example –
Table – R1
Company Product

C1 pendrive

C1 mic

C2 speaker

C2 speaker

Company->->Product
Table – R2
Agent Company

Aman C1

Aman C2

67
Agent Company

Mohan C1

Agent->->Company
Table – R3
Agent Product

Aman pendrive

Aman mic

Aman speaker

Mohan speaker

Agent->->Product
Table – R1⋈R2⋈R3
Company Product Agent

C1 pendrive Aman

C1 mic Aman

C2 speaker speaker

C1 speaker Aman

Agent->->Product

Fifth Normal Form / Projected Normal Form (5NF):


A relation R is in 5NF if and only if every join dependency in R is implied by the candidate
keys of R. A relation decomposed into two relations must have loss-less join Property,
which ensures that no spurious or extra tuples are generated, when relations are reunited
through a natural join.

68
Properties – A relation R is in 5NF if and only if it satisfies following conditions:

1. R should be already in 4NF.

2. It cannot be further non loss decomposed (join dependency)

Example – Consider the above schema, with a case as “if a company makes a product and
an agent is an agent for that company, then he always sells that product for the company”.
Under these circumstances, the ACP table is shown as:
Table – ACP
Agent Company Product

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut

The relation ACP is again decompose into 3 relations. Now, the natural Join of all the three
relations will be shown as:
Table – R1
Agent Company

A1 PQR

A1 XYZ

A2 PQR

Table – R2

Agent Product

69
Agent Product

A1 Nut

A1 Bolt

A2 Nut

Table – R3

Company Product

PQR Nut

PQR Bolt

XYZ Nut

XYZ Bolt

Result of Natural Join of R1 and R3 over ‘Company’ and then Natural Join of R13 and R2
over ‘Agent’and ‘Product’ will be table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP
is a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the
property of lossless join.

70
Unit 4

TRANSACTIONS

The transaction is a set of logically related operation. It contains a group of tasks.

A transaction is an action or series of actions. It is performed by a single user to perform


operations for accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:

X's Account

Open_Account(X)

1. Old_Balance = X.balance
2. New_Balance = Old_Balance - 800
3. X.balance = New_Balance
4. Close_Account(X)
Y's Account

1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.

Write(X): Write operation is used to write the value back to the database from the buffer.

Let's take an example to debit transaction from an account which consists of following
operations:

1. R(X);
2. X = X - 500;
3. W(X);
Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain
3500.
o The third operation will write the buffer's value to the database. So X's final value will
be 3500.

But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
71
For example: If in the above transaction, the debit transaction fails after executing operation
2 then X's value will remain 4000 in the database which is not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

TRANSACTION RECOVERY

UNDO and REDO: lists of transactions UNDO = all transactions running at the last
checkpoint REDO = empty For each entry in the log, starting at the last checkpoint If a
BEGIN TRANSACTION entry is found for T Add T to UNDO If a COMMIT entry is found
for T Move T from UNDO to REDO

 Types of Transaction Recovery


 Recovery information is divided into two types:
 •Undo (or Rollback) Operations
 •Redo (or Cache Restore) Operations
 Ingres performs both online and offline recovery, as described in Recovery
Modes (see page Recovery Modes).
 Undo Operation
 Undo or transaction backout recovery is performed by the DBMS Server. For
example, when a transaction is aborted, transaction log file information is used to roll
back all related updates. The DBMS Server writes the Compensation Log Records
(CLRs) to record a history of the actions taken during undo operations. 
 Redo Operation
 A Redo recovery operation is database-oriented. Redo recovery is performed after a
server or an installation fails. Its main purpose is to recover the contents of the DMF
cached data pages that are lost when a fast-commit server fails. Redo recovery is
performed by the recovery process. Redo recovery precedes undo recovery. 
 Redo Operation in a Cluster Environment
 In an Ingres cluster environment where all nodes are active, the local recovery server
performs transaction redo/undo for a failed DBMS server on its node, just like in the
non-cluster case. The difference in a cluster installation is that if the recovery process
(RCP) dies on one node, either because of an Ingres failure, or a general failure of the
hardware, an RCP on another node will take responsibility for cleaning up
transactions for the failed nodes.

ACID PROPERTIES

A transaction is a very small unit of a program and it may contain several lowlevel tasks.
A transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
 Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either 

72
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
 Consistency − The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database. If
the database was in a consistent state before the execution of a transaction, it must
remain consistent after the execution of the transaction as well. 
 Durability − The database should be durable enough to hold all its latest updates even
if the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data.
 If a transaction commits but the system fails before the data could be written on to the
disk, then that data will be updated once the system springs back into action. 
 Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction. 

SYSTEM RECOVERY
 Any transaction that was running at the time of failure needs to be undone and
restarted
 Any transactions that committed since the last checkpoint need to be redone
 Transactions of type T1 need no recovery • Transactions of type T3 or T5 need to be
undone and restarted
 Transactions of type T2 or T4 need to be redone.
Media Failures
System failures are not too severe • Only information since the last checkpoint is
affected • This can be recovered from the transaction log • Media failures (disk crashes etc)
are more serious • The data stored to disk is damaged • The transaction log itself may be
damaged
Recovery from Media Failure
• Restore the database from the last backup
• Use the transaction log to redo any changes made since the last backup
• If the transaction log is damaged you can’t do step 2
• Store the log on a separate physical device to the database
• The risk of losing both is then reduced.

MEDIA RECOVERY

If you restore the archived redo log files and data files, then you must perform media
recovery before you can open the database. Any database transactions in the archived redo
log files not reflected in the data files are applied to the data files, bringing them to a
transaction-consistent state before the database is opened.

Media recovery requires a control file, data files (typically restored from backup), and
online and archived redo log files containing changes since the time the data files were
backed up. Media recovery is most often used to recover from media failure, such as the loss
of a file or disk, or a user error, such as the deletion of the contents of a table.

Media recovery can be a complete recovery or a point-in-time recovery. Complete


recovery can apply to individual datafiles, tablespaces, or the entire database. Point-in-time
recovery applies to the whole database (and also sometimes to individual tablespaces, with
automation help from Oracle Recover Manager (RMAN)).
73
In a complete recovery, you restore backup data files and apply all changes from the
archived and online redo log files to the data files. The database is returned to its state at the
time of failure and can be opened with no loss of data.

In a point-in-time recovery, you return a database to its contents at a user-selected


time in the past. You restore a backup of data files created before the target time and a
complete set of archived redo log files from backup creation through the target time.
Recovery applies changes between the backup time and the target time to the data files. All
changes after the target time are discarded.

RMAN enables you to perform both a complete and a point-in-time recovery of your
database. However, this documentation focuses on complete recovery.

TWO-PHASE COMMIT (2PC)


A two-phase commit is a standardized protocol that ensures that a database commit is
implementing in the situation where a commit operation must be broken into two separate
parts.

In database management, saving data changes is known as a commit and undoing


changes is known as a rollback. Both can be achieved easily using transaction logging when a
single server is involved, but when the data is spread across geographically-diverse servers in
distributed computing (i.e., each server being an independent entity with separate log
records), the process can become more tricky.

Two Phase commit protocol is a type of distributed commit protocol. There are two
different types of databases. In a local database system, every transaction needs to be
committed. Therefore, the transaction manager has the role to commit the decision by
conveying it to the reporting manager.

when it comes to a distributed system, the transaction manager should convey it from
all the servers from various sites included in the distributed system to commit the decision.
When each server completes the processing at each site. The transaction reaches a partially
committed state. But it has to wait until all the transaction reaches that state. Once all the
transactions from different servers reach the partially committed state, the transaction
manager can commit the transaction. However, it is necessary that all the sites must commit
the transaction.

Role of two phase commit protocol in Database Management


 Distributed Commit Protocols
 One-phase commit protocol
 Two Phase commit protocol
 Three Phase commit protocol
 Advantages of Two Phase commit protocol
 Disadvantages of Two Phase commit protocol

SAVE POINTS

A save point is a way of implementing sub transactions (also known as nested


transactions) within a relational database management system by indicating a point within a

74
transaction that can be "rolled back to" without affecting any work done in the transaction
before the savepoint was created.

SAVEPOINT command

SAVEPOINT command is used to temporarily save a transaction so that you can rollback to
that point whenever required.

Following is savepoint command's syntax,

SAVEPOINT savepoint_name;

Copy
In short, using this command we can name the different states of our data in any table and
then rollback to that state using the ROLLBACK command whenever required.

savepoint

Specify the name of the savepoint to be created.

Savepoint names must be distinct within a given transaction. If you create a second savepoint
with the same identifier as an earlier savepoint, then the earlier savepoint is erased. After a
savepoint has been created, you can either continue processing, commit your work, roll back
the entire transaction, or roll back to the savepoint.

Example

Creating Savepoints: Example To update the salary for Banda and Greene in the sample
table hr.employees, check that the total department salary does not exceed 314,000, then
reenter the salary for Greene:

UPDATE employees

SET salary = 7000

WHERE last_name = 'Banda';

SAVEPOINT banda_sal;

UPDATE employees

SET salary = 12000

WHERE last_name = 'Greene';

SAVEPOINT greene_sal;

SELECT SUM(salary) FROM employees;

ROLLBACK TO SAVEPOINT banda_sal;

75
UPDATE employees

SET salary = 11000

WHERE last_name = 'Greene';

COMMIT;

Recovery Facilities

Checkpoint facility allows updates to the database for getting the latest patches to be made
permanent and keep secure from vulnerability. Recovery manager allows the database system
for restoring the database to a reliable and steady-state after any failure occurs.

SQL BASIC FACILITIES

In addition to the advanced facilities noted above, SQL is rich in the type of ease of
use capabilities that are necessary to support relational databases from the simple to the
complex. Table Facility First and foremost, SQL provides a table facility that enables a
prompted, intuitive interface for the following functions: 9 Defining databases 9 Populating
databases with rows 9 Manipulating databases.

Table Editor SQL also provides a table editor that makes it easy for you to perform
the following functions against rows in table data that is structured in row and column
format:. 9 Access 9 Insert 9 Update 9 Delete Query Facility: With the Query facility, SQL
permits you to interactively define queries and have results displayed in a variety of report
formats including the following: 9 Tabular 9 Matrix 9 Free format For those readers who
have a System i5 background, you will notice that SQL brings with it its own naming scheme
that is significantly different from corresponding native objects. See table 4-1 for specifics

CONCURRENCY
Database concurrency is the ability of a database to allow multiple users to affect
multiple transactions. This is one of the main properties that separates a database from other
forms of data storage, like spreadsheets.

The ability to offer concurrency is unique to databases. Spreadsheets or other flat file
means of storage are often compared to databases, but they differ in this one important
regard.

Spreadsheets cannot offer several users the ability to view and work on the different
data in the same file, because once the first user opens the file it is locked to other users.
Other users can read the file, but may not edit data.

NEED FOR CONCURRENCY

Database concurrency is the ability of a database to allow multiple users to affect


multiple transactions. This is one of the main properties that separates a database from other
forms of data storage, like spreadsheets.

76
The ability to offer concurrency is unique to databases. Spreadsheets or other flat file
means of storage are often compared to databases, but they differ in this one important
regard.

Spreadsheets cannot offer several users the ability to view and work on the different
data in the same file, because once the first user opens the file it is locked to other users.
Other users can read the file, but may not edit data.

LOCKING PROTOCOLS

Lock Based Protocol in DBMS


The database management system (DBMS) stores data that can connect with one another as
well as can be altered at any point. There are instances where more than one user may attempt
to access the same data item simultaneously, resulting in concurrency.

As a result, there is a requirement to handle concurrency in order to handle the concurrent


processing of transactions across many databases in the picture. Lock based protocol in dbms
are an example of such an approach.

Introduction to Lock Based Protocol

We can define a lock based protocol in dbms as a mechanism that is responsible to


prevent a transaction from reading or writing data until the necessary lock is obtained. The
concurrency problem can be solved by securing or locking a transaction to a specific user.
The lock is a variable that specifies which activities are allowed on a certain data item.

Types of Locks in DBMS

In DBMS Lock based Protocols, there are two modes for locking and unlocking data
items Shared Lock (lock-S) and Exclusive Lock (lock-X). Let's go through the two types of
locks in detail:

Shared Lock
 Shared Locks, which are often denoted as lock-S(), are defined as locks that provide
Read-Only access to the information associated with them. Whenever a shared lock is
used on a database, it can be read by several users, but these users who are reading the
information or the data items will not have the permission to edit it or make any
changes to the data items.
 To put it another way, we can say that shared locks don't provide the access to write.
Because numerous users can read the data items simultaneously, multiple shared locks
can be installed on them at the same time, but the data item must not have any other
locks connected with it.
 A shared lock, also known as a read lock, is solely used to read data objects. Read
integrity is supported via shared locks.
 Shared locks can also be used to prevent records from being updated.
 S-lock is requested via the Lock-S instruction.

Exclusive Lock
 Exclusive Lock allows the data item to be read as well as written. This is a one-time
use mode that can't be utilized on the exact data item twice. To obtain X-lock, the user

77
needs to make use of the lock-x instruction. After finishing the 'write' step,
transactions can unlock the data item.
 By imposing an X lock on a transaction that needs to update a person's account
balance, for example, you can allow it to proceed. As a result of the exclusive lock,
the second transaction is unable to read or write.
 The other name for an exclusive lock is write lock.
 At any given time, the exclusive locks can only be owned by one transaction.

Example of exclusive locks: Consider the instance where the value of a data item X is equal
to 50 and a transaction requires a deduction of 20 from the data item X. By putting a Y lock
on this particular transaction, we can make it possible. As a result, the exclusive lock prevents
any other transaction from reading or writing.

Types of Lock-Based Protocols

There are basically four lock based protocols in dbms namely Simplistic Lock
Protocol, Pre-claiming Lock Protocol, Two-phase Locking Protocol, and Strict Two-Phase
Locking Protocol. Let's go through each of these lock-based protocols in detail.

Simplistic Lock Protocol


The simplistic method is defined as the most fundamental method of securing data
during a transaction. Simple lock-based protocols allow all transactions to lock the data
before inserting, deleting, or updating it. After the transaction is completed, the data item will
be unlocked.

Pre-Claiming Lock Protocol


Pre-claiming Lock Protocols are known to assess transactions to determine which data
elements require locks. Prior to actually starting the transaction, it asks the Database
management system for all of the locks on all of the data items. The pre-claiming protocol
permits the transaction to commence if all of the locks are obtained. Whenever the
transaction is finished, the lock is released. This protocol permits the transaction to roll back
if all of the locks are not granted, and then waits until all of the locks are granted.

TWO-PHASE LOCKING PROTOCOL


If Locking as well as the Unlocking can be performed in 2 phases, a transaction is
considered to follow the Two-Phase Locking protocol. The two phases are known as the
growing and shrinking phase.

1. Growing Phase: In this phase, we can acquire new locks on data items but none of
these locks can be released.
2. Shrinking Phase: In this phase, the existing locks can be released but no new locks
can be obtained.

Two-phase locking helps to reduce the amount of concurrency in a schedule but just like
the two sides of a coin two-phase locking has a few cons too. The protocol raises transaction
processing costs and may have unintended consequences. The likelihood of establishing
deadlocks is one bad result.

Strict Two-Phase Locking Protocol


In DBMS, Cascaded rollbacks are avoided with the concept of a Strict Two-Phase
Locking Protocol. This protocol necessitates not only two-phase locking but also the
78
retention of all exclusive locks until the transaction commits or aborts. The two-phase is with
deadlock.

It is responsible for assuring that if 1 transaction modifies data, there can be no other
transaction that will be able to read it until the first transaction commits. The majority of
database systems use a strict two-phase locking protocol.

Deadlock
When a transaction must wait an unlimited period for a lock, it is referred to as starvation.
The following are the causes of starvation :

1. When the locked item waiting scheme is not correctly controlled.


2. When a resource leak occurs.
3. The same transaction is repeatedly chosen as a victim.

Let's know how starvation can be prevented. Random process selection for resource or
processor allocation should be avoided since it encourages hunger. The resource allocation
priority scheme should contain ideas like aging, in which a process' priority rises as it waits
longer. This prevents starvation.

Deadlock- In a circular chain, a deadlock situation occurs when two or more processes
are expecting each other to release a resource, or when more than 2 processes are waiting for
the resource.

Two-Phase Locking –

A transaction is said to follow the Two-Phase Locking protocol if Locking and


Unlocking can be done in two phases.
1. Growing Phase: New locks on data items may be acquired but none can be released.
2. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Note – If lock conversion is allowed, then upgrading of lock( from S(a) to X(a) ) is allowed
in the Growing Phase, and downgrading of lock (from X(a) to S(a)) must be done in
shrinking phase.
Let’s see a transaction implementing 2-PL.
T1 T2
1 lock-S(A)
2 lock-S(A)
3 lock-X(B)
4 ……. ……
5 Unlock(A)
6 Lock-X(C)
7 Unlock(B)
8 Unlock(A)
9 Unlock(C)
10……. ……

This is just a skeleton transaction that shows how unlocking and locking work with 2-PL.
Note for:

79
Transaction T1:
 The growing Phase is from steps 1-3.
 The shrinking Phase is from steps 5-7.
 Lock Point at 3
Transaction T2:
 The growing Phase is from steps 2-6.
 The shrinking Phase is from steps 8-9.
 Lock Point at 6

DEADLOCK

In a database, a deadlock is an unwanted situation in which two or more transactions


are waiting indefinitely for one another to give up locks. Deadlock is said to be one of the
most feared complications in DBMS as it brings the whole system to a Halt.
Example – let us understand the concept of Deadlock with an example :
Suppose, Transaction T1 holds a lock on some rows in the Students table and needs to
update some rows in the Grades table. Simultaneously, Transaction T2 holds locks on
those very rows (Which T1 needs to update) in the Grades table but needs to update the
rows in the Student table held by Transaction T1.
Now, the main problem arises. Transaction T1 will wait for transaction T2 to give up the
lock, and similarly, transaction T2 will wait for transaction T1 to give up the lock. As a
consequence, All activity comes to a halt and remains at a standstill forever unless the
DBMS detects the deadlock and aborts one of the transactions.

Deadlock Avoidance –

When a database is stuck in a deadlock, It is always better to avoid the deadlock rather than
restarting or aborting the database. The deadlock avoidance method is suitable for smaller
databases whereas the deadlock prevention method is suitable for larger databases.
One method of avoiding deadlock is using application-consistent logic. In the above-given
example, Transactions that access Students and Grades should always access the tables in
the same order. In this way, in the scenario described above, Transaction T1 simply waits
for transaction T2 to release the lock on Grades before it begins. When transaction T2
releases the lock, Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both row-level locking mechanism and
READ COMMITTED isolation level. However, It does not guarantee to remove deadlocks
completely.

80
Deadlock Detection –

When a transaction waits indefinitely to obtain a lock, The database management system
should detect whether the transaction is involved in a deadlock or not.
Wait-for-graph is one of the methods for detecting the deadlock situation. This method is
suitable for smaller databases. In this method, a graph is drawn based on the transaction and
their lock on the resource. If the graph created has a closed-loop or a cycle, then there is a
deadlock.
For the above-mentioned scenario, the Wait-For graph is drawn below

Deadlock prevention –

For a large database, the deadlock prevention method is suitable. A deadlock can be
prevented if the resources are allocated in such a way that deadlock never occurs. The
DBMS analyzes the operations whether they can create a deadlock situation or not, If they
do, that transaction is never allowed to be executed.
Deadlock prevention mechanism proposes two schemes :
 Wait-Die Scheme –
In this scheme, If a transaction requests a resource that is locked by another transaction,
then the DBMS simply checks the timestamp of both transactions and allows the older
transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2, and Let the timestamp of any transaction
T be TS (T). Now, If there is a lock on T2 by some other transaction and T1 is
requesting for resources held by T2, then DBMS performs the following actions:
Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has held some
resource, then it allows T1 to wait until resource is available for execution. That means
if a younger transaction has locked some resource and an older transaction is waiting
for it, then an older transaction is allowed to wait for it till it is available. If T1 is an
older transaction and has held some resource with it and if T2 is waiting for it, then T2
is killed and restarted later with random delay but with the same timestamp. i.e. if the
older transaction has held some resource and the younger transaction waits for the
resource, then the younger transaction is killed and restarted with a very minute delay
with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.

 Wound Wait Scheme –


In this scheme, if an older transaction requests for a resource held by a younger
transaction, then an older transaction forces a younger transaction to kill the transaction
and release the resource. The younger transaction is restarted with a minute delay but
with the same timestamp. If the younger transaction is requesting a resource that is held
81
by an older one, then the younger transaction is asked to wait till the older one releases
it.

SERIALIZABILITY
A schedule is serialized if it is equivalent to a serial schedule. A concurrent schedule must
ensure it is the same as if executed serially means one after another. It refers to the sequence
of actions such as read, write, abort, commit are performed in a serial manner.
Example

Let’s take two transactions T1 and T2,


If both transactions are performed without interfering each other then it is called as serial
schedule, it can be represented as follows −

T1 T2

READ1(A)

WRITE1(A)

READ1(B)

C1

READ2(B)

WRITE2(B)

READ2(B)

C2

Non serial schedule − When a transaction is overlapped between the transaction T1 and T2.
Example

Consider the following example −

T1 T2

READ1(A)

WRITE1(A)

82
T1 T2

READ2(B)

WRITE2(B)

READ1(B)

WRITE1(B)

READ1(B)

Types of serializability

There are two types of serializability −


View serializability

A schedule is view-serializability if it is viewed equivalent to a serial schedule.


The rules it follows are as follows −
 T1 is reading the initial value of A, then T2 also reads the initial value of A.
 T1 is the reading value written by T2, then T2 also reads the value written by T1.
 T1 is writing the final value, and then T2 also has the write operation as the final
value.

Conflict serializability

It orders any conflicting operations in the same way as some serial execution. A pair of
operations is said to conflict if they operate on the same data item and one of them is a write
operation.
That means

Readi(x) readj(x) - non conflict read-read operation

Readi(x) writej(x) - conflict read-write operation.

Writei(x) readj(x) - conflict write-read operation.

Writei(x) writej(x) - conflict write-write operation.
RECOVERY ISOLATION LEVELS

In case of transaction the term ACID has been used significantly to state some of
important properties that a transaction must follow. We all know ACID stands for Atomicity,
Consistency, Isolation and Durability and these properties collectively called as ACID
Properties.

83
Properties of transaction

Database system ensures ACID property −


 Atomicity − Either all or none of the transaction operation is done.
 Consistency − A transaction transfer from one consistent (correct) state to another
consistent state.
 Isolation − A transaction is isolated from other transactions. i.e. A transaction is not
affected by another transaction. Although multiple transactions execute concurrently it
must appear as if the transaction are running serially (one after the other).
 Durability − The results of transactions are permanent i.e. the result will never be lost
with subsequent failure, durability refers to long lasting i.e. permanency. 

Isolation

It determines the visibility of transactions of other systems. A lower level allows every user
to access the same data. Therefore, it involves high risk of data privacy and security of the
system. However, a higher isolation level reduces the type of concurrency over the data but
requires more resources and is slower than lower isolation levels.
The isolation protocols help safeguards the data from unwanted transactions. They maintain
the integrity of every data by defining how and when the changes made by one operation are
visible to others.
Levels of isolation

There are four levels of isolations which are explained below −


 Read Uncommitted − It is the lowest level of isolation. At this level; the dirty reads
are allowed, which means one can read the uncommitted changes made by another. 
 Read committed − It allows no dirty reads, and clearly states that any uncommitted
data is committed now it is read.
 Repeatable Read − This is the most restricted level of isolation. The transaction holds
read locks on all the rows it references and write locks over all the rows it
updates/inserts/deletes. So, there is no chance of non-repeatable reads.
 Serializable − The highest level of civilization. It determines that all concurrent
transactions be executed serially. 
Example

Consider an example of isolation.


What is the isolation level of transaction E?
session begins
SET GLOBAL TRANSACTION
ISOLATION LEVEL SERIALIZABLE;
session ends
session begins

84
SET SESSION TRANSACTION
ISOLATION LEVEL REPEATABLE READ;
transaction A
transaction B
SET TRANSACTION
ISOLATION LEVEL READ UNCOMMITTED;
transaction C
SET TRANSACTION
ISOLATION LEVEL READ COMMITTED;
transaction D
transaction E
session ends
Check which option −
A- Serializable
B- Repeatable read
C- Read uncommitted
Solution

Repeatable Read is the right answer.


Reason & Explanation

 Step 1 − In the above program, the first session starts and ends without doing any
transaction.
 Step 2 − The second session begins at session-level with isolation level "Repeatable
Read". Transaction A& B gets executed with these settings. 
 Step 3 − Once again a new transaction begins with isolation level "Read
uncommitted". This setting is used only for "Transaction C" since "Set transaction"
alone is mentioned. If the "SET transaction" is used without global or session
keywords, then these particular settings will work only for a single transaction. 
 Step 4 − Once again "Set Transaction" with isolation level Read committed works
only for Transaction D. (Refer step 3 for reason)
 Step 5 − "Transaction E" gets continued at the "Repeatable Read" since the
transaction started at step 2 has not ended still. Transaction isolation level set at Step 3
and Step 4 vanishes once a single transaction is executed. So, automatically
"Transaction E" will refer to the prior transaction settings. 

Concurrency Control in SQL Server


A “Transaction” in SQL Server
The standard definition of a transaction states that “every query that runs in a SQL Server is
in a transaction,” that means any query you run on a SQL Server is considered as being in a
transaction. It could either be a simple SELECT query or any UPDATE or ALTER query.
85
 If you run a query without mentioning the BEGIN TRAN keyword then it would be
considered an implicit transition.
 If you run a query that starts with BEGIN TRAN and ends with COMMIT or
ROLLBACK, then it would be considered an explicit transaction.

Transaction Properties
A database management system (DBMS) is considered a relational database management
system (RDBMS) if it follows the transactional properties, ACID.

 A: Atomicity
 C: Consistency
 I: Isolation
 D: Durability

The SQL Server takes care of the Atomicity, Consistency, and Durability of the system, and
the user has to care about the Isolation property of the transaction. The meaning of each of
these properties is described below, as it applies to a transaction.

Atomicity
Transaction work should be atomic, which means all the work is one unit. If the user
performs a transition, either the transaction should complete and perform all the asked
operations, or it should fail and don’t do anything. Atomicity deals with the transaction
process and an RDBMS transaction does not leave the work incomplete.

Consistency
After the transaction is completed, the database should not be left in an inconsistent state,
which means the data on which transaction is applied must be logically correct, according to
the rules of the system.

Isolation
If two transactions are applied on a similar database, then both the transaction should be
isolated from each other, and the user must see the result. It can also be defined as a
transaction that should see the data only after or before the concurrent transaction process is
completed, which means if a one transaction process is in between, the other transaction
process should wait until the first transaction is completed.

For instance, if A performs a transaction process on data d1, and before the transaction
process gets completed, B also performs another transaction process on the same
data d1. Here, the isolation property will isolate the transaction process of A and B, and the
transaction process of B will only start after the transaction process of A gets completed.

Durability
Even if the system fails, the transaction should be persistent, which means, if the system fails
during a transaction process, the transaction should be dropped, too, without affecting the
data.

SQL FACILITIES FOR CONCURRENCY

86
Concurrency is a situation that arises in a database due to the transaction process.
Concurrency occurs when two or more than two users are trying to access the same data or
information. DBMS concurrency is considered a problem because accessing data
simultaneously by two different users can lead to inconsistent results or invalid behaviour.

Concurrency Problem Types


The concurrency problem mostly arises when both the users try to write the same data, or
when one is writing and the other is reading. Apart from this logic, there are some common
types of concurrency problems:

 Dirty Reads
 Lost Updates
 Non-repeatable Reads
 Phantom Reads

Dirty Read
This problem occurs when another process reads the changed, but uncommitted data. For
instance, if one process has changed data but not committed it yet, another process is able to
read the same data. This leads to the inconsistent state for the reader.

Lost Updates
This problem occurs when two processes try to manipulate the same data simultaneously.
This problem can lead to data loss, or the second process might overwrite the first processs
change.

Non-repeatable Reads
This problem occurs when one process is reading the data, and another process is writing the
data. In non-repeatable reads, the first process reading the value might get two different
values, as the changed data is read a second time because the second process changes the
data.

Phantom Reads
If two same queries executed by two users show different output, then it would be a Phantom
Read problem. For instance, If user A select a query to read some data, at the same time the
user B insert some new data but the user A only get able to read the old data at the first
attempt, but when user A re-query the same statement then he/she gets a different set of data.

Solve Concurrency Problems


SQL Server provides 5 different levels of transaction isolation to overcome these
Concurrency problems. These 5 isolation levels work on two major concurrency models:

1. Pessimistic model - In the pessimistic model of managing concurrent data access, the
readers can block writers, and the writers can block readers.
2. Optimistic model - In the optimistic model of managing concurrent data access, the
readers cannot block writers, and the writers cannot block readers, but the writer can
block another writer.

87
Note that readers are users are performing the SELECT operations. Writers are users are
performing INSERT, ALTER, UPDATE, S.E.T. operations.

Isolation Level
When we connect to a SQL server database, the application can submit queries to the
database with one of five different isolation levels. These levels are:

 Read Uncommitted
 Read Committed
 Repeatable Read
 Serializable
 Snapshot

Out of these five isolation levels, Read Uncommitted, Read Committed, Repeatable Read,
and Serializable come under the pessimistic concurrency model. Snapshot comes under the
optimistic concurrency model. These levels are ordered in terms of the separation of work by
two different processes, from minimal separation to maximal.

Let's look at each of these isolation levels and how they affect concurrency of operations.

Read Uncommitted
This is the first level of isolation, and it comes under the pessimistic model of concurrency. In
Read Uncommitted, one transaction is allowed to read the data that is about to be changed by
the commit of another process. Read Uncommitted allows the dirty read problem.

Read Committed
This is the second level of isolation and also falls under the pessimistic model of
concurrency. In the Read Committed isolation level, we are only allowed to read data that is
committed, which means this level eliminates the dirty read problem. In this level, if you are
reading data then the concurrent transactions that can delete or write data, some work is
blocked until other work is complete.

Repeatable Read
The Repeatable Read isolation level is similar to the Read Committed level and eliminates
the Non-Repeatable Read problem. In this level, the transaction has to wait till another
transaction's update or read query is complete. But if there is an insert transaction, it does not
wait for anyone. This can lead to the Phantom Read problem.

Serializable
This is the highest level of isolation in the pessimistic model. By implementing this level of
isolation, we can prevent the Phantom Read problem. In this level of isolation, we can ask
any transaction to wait until the current transaction completes.

Snapshot
Snapshot follows the optimistic model of concurrency, and this level of isolation takes a
snapshot of the current data and uses it as a copy for the different transactions. Here each

88
transaction has its copy of data, so if a user tries to perform a transaction like an update or
insert, it asks him to re-verify all the operation before the process gets started executing.

89
UNIT 5

Implementation Techniques

PHYSICAL STORAGE MEDIA

As discussed above, the data in database management system (DBMS) is stored on physical
storage devices such as main memory and secondary (external) storage. Thus, it is important
that the physical database (or storage) is properly designed to increase data processing
efficiency and minimise the time required by users to interact with the information system.

Fig. 3.1. System of physically accessing the database

When required, a record is fetched from the disk to main memory for further processing. File
manager is the software that manages the allocation of storage locations and data structure.

Cache
The fastest and most costly form of storage Volatile Managed by the computer system
hardware.
Main memory4 Fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9 seconds) 4
Generally too small (or too expensive) to store the entire database

90
Capacities of up to a few Gigabytes widely used currently Capacities have gone up and
per-byte costs have decreased steadily and rapidly
Volatile — contents of main memory are usually lost if a power failure or system crash
occurs.
Flash memory
Data survives power failureData can be written at a location only once, but location can be
erased and written to again Can support only a limited number of write/erase cycles.
Erasing of memory has to be done to an entire bank of memory.

MAGNETIC DISK
A magnetic disk is a storage device that uses a magnetization process to write, rewrite
and access data. It is covered with a magnetic coating and stores data in the form of tracks,
spots and sectors. Hard disks, zip disks and floppy disks are common examples of magnetic
disks.

A magnetic disk primarily consists of a rotating magnetic surface (called platter) and
a mechanical arm that moves over it. Together, they form a “comb”. The mechanical arm is
used to read from and write to the disk. The data on a magnetic disk is read and written using
a magnetization process.

The platter keeps spinning at high speed while the head of the arm moves across its
surface. Since the whole device is hermetically sealed, the head floats on a thin film of air.
When a small current is applied to the head, tiny spots on the disk surface are magnetized and
data is stored. Vice-versa, a small current could be applied to those tiny spots on the platter
when the head needs to read the data.

Data is organized on the disk in the form of tracks and sectors, where tracks are the
circular divisions of the disk. Tracks are further divided into sectors that contain blocks of
data. All read and write operations on the magnetic disk are performed on the sectors. The
floating heads require very precise control to read/write data due to the proximity of the
tracks.

Early devices lacked the precision of modern ones and allowed for just a certain
number of tracks to be placed in each disk. Greater precision of the heads allowed for a much
greater number of tracks to be closely packed together in subsequent devices. Together with
the invention of RAID (redundant array of inexpensive disks), a technology that combines
multiple disk drives, the storage capacity of later devices increased year after year.

Magnetic disks have traditionally been used as secondary storage devices in


computers, and represented the mainstream technology for decades. With the advent of solid-
state drives (SSDs), magnetic disks are no longer considered the only option, but are still
commonly used.

The first magnetic hard drive built by IBM in 1956 was a large machine consisting of
50 21-inch (53-cm) disks. Despite its size, it could store just 5 megabytes of data. Since then,

91
magnetic disks have increased their storage capacities many times-folds, while their size has
decreased comparably.

The size of modern hard disks is just about 3.5 inches (approx. 9 cm) with their
capacity easily reaching one or more terabytes. A similar fate happened to floppy disks,
which shrunk from the original 8 inches of the late 60s, to the much smaller 3.5 inches of the
early 90s. However, floppy disks have eventually became obsolete after the introduction of
CD-ROMs in the late 1990s and now have all but completely disappeared.

RAID
RAID works by placing data on multiple disks and allowing input/output (I/O)
operations to overlap in a balanced way, improving performance. Because using multiple
disks increases the mean time between failures, storing data redundantly also increases fault
tolerance.
RAID arrays appear to the operating system (OS) as a single logical drive.

RAID employs the techniques of disk mirroring or disk striping. Mirroring will copy identical
data onto more than one drive. Striping partitions help spread data over multiple disk drives.
Each drive's storage space is divided into units ranging from a sector of 512 bytes up to
several megabytes. The stripes of all the disks are interleaved and addressed in order. Disk
mirroring and disk striping can also be combined in a RAID array.

In a single-user system where large records are stored, the stripes are typically set up to be
small (512 bytes, for example) so that a single record spans all the disks and can be accessed
quickly by reading all the disks at the same time.

In a multiuser system, better performance requires a stripe wide enough to hold the typical or
maximum size record, enabling overlapped disk I/O across drives.

RAID controller
A RAID controller is a device used to manage hard disk drives in a storage array. It
can be used as a level of abstraction between the OS and the physical disks, presenting
groups of disks as logical units. Using a RAID controller can improve performance and help
protect data in case of a crash.
A RAID controller may be hardware- or software-based. In a hardware-based
RAID product, a physical controller manages the entire array. The controller can also be
designed to support drive formats such as Serial Advanced Technology Attachment and

92
Small Computer System Interface. A physical RAID controller can also be built into a
server's motherboard.
With software-based RAID, the controller uses the resources of the hardware system,
such as the central processor and memory. While it performs the same functions as a
hardware-based RAID controller, software-based RAID controllers may not enable as much
of a performance boost and can affect the performance of other applications on the server.
If a software-based RAID implementation is not compatible with a system's boot-up
process and hardware-based RAID controllers are too costly, firmware, or driver-based
RAID, is a potential option.
Firmware-based RAID controller chips are located on the motherboard, and all
operations are performed by the central processing unit (CPU), similar to software-based
RAID. However, with firmware, the RAID system is only implemented at the beginning of
the boot process. Once the OS has loaded, the controller driver takes over RAID
functionality. A firmware RAID controller is not as pricey as a hardware option, but it puts
more strain on the computer's CPU. Firmware-based RAID is also called hardware-assisted
software RAID, hybrid model RAID and fake RAID.
RAID levels
RAID devices use different versions, called levels. The original paper that coined the
term and developed the RAID setup concept defined six levels of RAID -- 0 through 5. This
numbered system enabled those in IT to differentiate RAID versions. The number of levels
has since expanded and has been broken into three categories: standard, nested and
nonstandard RAID levels.
Standard RAID levels
RAID 0.
This configuration has striping but no redundancy of data. It offers the best
performance, but it does not provide fault tolerance.

93
visualization of RAID 0.

RAID 1. Also known as disk mirroring, this configuration consists of at least two drives that
duplicate the storage of data. There is no striping. Read performance is improved, since either
disk can be read at the same time. Write performance is the same as for single disk storage.

RAID 2. This configuration uses striping across disks, with some disks storing error checking
and correcting (ECC) information. RAID 2 also uses a dedicated Hamming code parity, a
linear form of ECC. RAID 2 has no advantage over RAID 3 and is no longer used.

94
RAID 3. This technique uses striping and dedicates one drive to storing parity information.
The embedded ECC information is used to detect errors. Data recovery is accomplished by
calculating the exclusive information recorded on the other drives. Because an I/O operation
addresses all the drives at the same time, RAID 3 cannot overlap I/O. For this reason, RAID 3
is best for single-user systems with long record applications.

RAID 4. This level uses large stripes, which means a user can read records from any single
drive. Overlapped I/O can then be used for read operations. Because all write operations are
required to update the parity drive, no I/O overlapping is possible.

95
RAID 5. This level is based on parity block-level striping. The parity information is striped
across each drive, enabling the array to function, even if one drive were to fail. The array's
architecture enables read and write operations to span multiple drives. This results in
performance better than that of a single drive, but not as high as a RAID 0 array. RAID 5
requires at least three disks, but it is often recommended to use at least five disks for
performance reasons.

RAID 5 arrays are generally considered to be a poor choice for use on write-intensive
systems because of the performance impact associated with writing parity data. When a disk
fails, it can take a long time to rebuild a RAID 5 array.

96
RAID 6. This technique is similar to RAID 5, but it includes a second parity scheme
distributed across the drives in the array. The use of additional parity enables the array to
continue functioning, even if two disks fail simultaneously. However, this extra protection
comes at a cost. RAID 6 arrays often have slower write performance than RAID 5 arrays.

TERTIARY STORAGE

Tertiary storage comprises high-capacity data archives designed to incorporate vast


numbers of removable media, such as tapes or optical discs. The removable media are
normally not stored in suitable drives but held in specially arranged retention slots, shelves,
or carousels in an offline state. A tertiary storage platform may be perceived as a specialized
type of NAS that uses additional robotic mechanisms to transfer media between their long-
term storage locations and available drives without human intervention. To fulfill a client
access request, a separate database that maintains the catalogue of archive contents must be
consulted. As the tape library or optical jukebox cannot handle a large number of concurrent
requests (there is only a limited number of tape or optical drives which operate at nominal
data rates per device), the archive contents are typically copied to a data cache, for example a
regular NAS server. Clients may then access the data at high speeds and possibly in parallel.
The retrieved content is retained in the cache for as long as it is needed or until it is retired as
an effect of the application of relevant data retention policies. Tertiary storage also performs
periodic (or other policy-managed) scans of stored media to detect signs of content decay and
possibly activate recovery procedures. Examples of two high-capacity tertiary storage
systems, a tape library and an optical jukebox

97
FILE ORGANIZATION

A database consist of a huge amount of data. The data is grouped within a table in
RDBMS, and each table have related records. A user can see that the data is stored in form
of tables, but in actual this huge amount of data is stored in physical memory in form of
files.
File – A file is named collection of related information that is recorded on secondary
storage such as magnetic disks, magnetic tapes and optical disks.

What is File Organization?

File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific
record. In simple terms, Storing the files in certain order is called file Organization. File
Structure refers to the format of the label and data blocks and of any logical control
record.

ORGANIZATIONS OF RECORDS IN FILES


Various methods have been introduced to Organize files. These particular methods have
advantages and disadvantages on the basis of access or selection . Thus it is all upon the
programmer to decide the best suited file Organization method according to his
requirements.

98
Some types of File Organizations are :

 Sequential File Organization


 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization

We will be discussing each of the file Organizations in further sets of this article along
with differences and advantages/ disadvantages of each file Organization methods.

Sequential File Organization –

The easiest method for file Organization is Sequential method. In this method the file
are stored one after another in a sequential manner. There are two ways to implement this
method:
 Pile File Method – This method is quite simple, in which we store the records in a
sequence i.e one after other in the order in which they are inserted into the tables.

1. Insertion of new record –

Let the R1, R3 and so on upto R5 and R4 be four records in the sequence. Here, records
are nothing but a row in any table. Suppose a new record R2 has to be inserted in the
sequence, then it is simply placed at the end of the file.

99
 Sorted File Method –In this method, As the name itself suggest whenever a new record
has to be inserted, it is always inserted in a sorted (ascending or descending) manner.
Sorting of records may be based on any primary key or any other key.

1. Insertion of new record –

Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so
on upto R7 and R8. Suppose a new record R2 has to be inserted in the sequence, then it
will be inserted at the end of the file and then it will sort the sequence .

100
Pros and Cons of Sequential File Organization –

Pros –
 Fast and efficient method for huge amount of data.
 Simple design.
 Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.
Cons –
 Time wastage as we cannot jump on a particular record that is required, but we have to
move in a sequential manner which takes our time.
 Sorted file method is inefficient as it takes time and space for sorting records.

Heap File Organization –

Heap File Organization works with data blocks. In this method records are inserted
at the end of the file, into the data blocks. No Sorting or Ordering is required in this
method. If a data block is full, the new record is stored in some other block, Here the other
data block need not be the very next data block, but it can be any block in the memory. It is
the responsibility of DBMS to store and manage the new records.

Insertion of new record –

Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a
new record R2 has to be inserted in the heap then, since the last data block i.e data
block 3 is full it will be inserted in any of the data blocks selected by the DBMS,
lets say data block 1.

101
If we want to search, delete or update data in heap file Organization the we will
traverse the data from the beginning of the file till we get the requested record. Thus if the
database is very huge, searching, deleting or updating the record will take a lot of time.
Pros and Cons of Heap File Organization –
Pros –
 Fetching and retrieving records is faster than sequential record but only in case of small
databases.
 When there is a huge number of data needs to be loaded into the database at a time, then
this method of file Organization is best suited.

INDEXING AND HASHING


Indexing and Hashing Data is stored in the form of records and every record has a key
field, which helps it to be recognize uniquely. Indexing is a data structure technique to
efficiently retrieve records from the database on some attributes on which the indexing has
been done. indexing in database is similar what we see in books, Indexing in DBMS o
Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed. o The index is a type of data structure. It is used
to locate and access the data in a database table quickly. o It is defined based on the indexing
attribute. Index structure, Indexes can be created using some database columns. o The first
column of the database is the search key that contains a copy of the primary key or candidate
key of the table. The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily. o The second column of the database is the data
reference. It contains a set of pointers holding the address of the disk block where the value
of the particular key can be found. Ordered indices The indices are usually sorted to make
searching faster. The indices which are sorted are known as ordered indices. Example:
Suppose we have an employee table with thousands of record and each of which is 10 bytes
long. If their IDs start with 1, 2, 3. and so on and we have to search student with ID-543. o
In the case of a database with no index, we have to search the disk block from starting till it
reaches 543. The DBMS will read the record after reading 543*10=5430 bytes. o In the case
of an index, we will search using indexes and the DBMS will read the record after reading
542*2= 1084 bytes which are very less compared to the previous case.
Indexing Methods: Primary Index o If the index is created on the basis of the primary key of
the table, then it is known as primary indexing. These primary keys are unique to each record
and contain 1:1 relation between the records. o As primary keys are stored in sorted order, the
performance of the searching operation is quite efficient. o The primary index can be

102
classified into two types: Dense index and Sparse index. Dense index o The dense index
contains an index record for every search key value in the data file. It makes searching faster.
o In this, the number of records in the index table is same as the number of records in the
main table. o It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.
Clustering Index o A clustered index can be defined as an ordered data file. Sometimes the
index is created on non-primary key columns which may not be unique for each record. o In
this case, to identify the record faster, we will group two or more columns to get the unique
value and create index out of them. This method is called a clustering index. o The records
which have similar characteristics are grouped, and indexes are created for these group.
Example: suppose a company contains several employees in each department. Suppose we
use a clustering index, where all employees which belong to the same Dept_ID are
considered within a single cluster, and index pointers point to the cluster as a whole. Here
Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.
Secondary Index In the sparse indexing, as the size of the table grows, the size of mapping
also grows. These mappings are usually kept in the primary memory so that address fetch
should be faster. Then the secondary memory searches the actual data based on the address
got from mapping. If the mapping size grows then fetching the address itself becomes slower.
In this case, the sparse index will not be efficient. To overcome this problem, secondary
indexing is introduced. In secondary indexing, to reduce the size of mapping, another level of
indexing is introduced. In this method, the huge range for the columns is selected initially so
that the mapping size of the first level becomes small. Then each range is further divided into
smaller ranges. The mapping of the first level is stored in the primary memory, so that
address fetch is faster. The mapping of the second level and actual data are stored in the
secondary memory (hard disk).
Hashing
Hashing is an effective technique to calculate the direct location of a data record on the disk
without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of a data
record.

Hash Organization

 Bucket − A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can store one
or more records.
 Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.

Static Hashing

103
In static hashing, when a search-key value is provided, the hash function always computes the
same address. For example, if mod-4 hash function is used, then it shall generate only 5
values. The output address shall always be same for that function. The number of buckets
provided remains unchanged at all times.

Operation
 Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will be
stored.
Bucket address = h(K)
 Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.
 Delete − This is simply a search followed by a deletion operation.

Bucket Overflow

The condition of bucket-overflow is known as collision. This is a fatal state for any static
hash function. In this case, overflow chaining can be used.
 Overflow Chaining − When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.

104
 Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.

Dynamic Hashing

The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets
are added and removed dynamically and on-demand. Dynamic hashing is also known
as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a
few are used initially.

105
Types of Ordered Indices

There are three types of ordered indices:


 Dense Index
 Sparse Index
 Multi-Level Indexing

Dense Indexing
In a dense index, for every search-key in the file, an index entry is present.
In a dense-clustering index, the index record contains the search-key value and a
pointer to the first data record with that search-key value.

The remaining number of records with similar search-key value would be stored in a
sequence after the first record.
Sparse Indexing
In a sparse index, an index entry appears for only some of the search-key values.

106
These indices can be used only if the relation is stored in sorted order of the search-
key value.
To locate a record, one has to find the index entry with the largest search-key value
that is less than or equal to the search-key value for which we are looking. We start
looking from the record pointed to by that index entry, and then follow the pointers in
the file until we find the desired record.
Multi-Level Indexing
Let’s say we have a data file with a large number of records, then Multi-Level
Indexing will come to use.

As the size of the database grows, the size of the indices also grows, since a multilevel
index is stored in the disk with the actual database files.

Example of a 2-Level Sparse Index

B+ Tree Index Files

A B+ Tree Index is a multilevel index.

A B+ Tree is a rooted tree satisfying the following properties :

1. All paths from the root to leaf are equally long.


2. If a node isn’t a root or a leaf, it has between [n / 2] and ‘n’ children.
3. A leaf node has between [(n-1) / 2] and ‘n-1’ values.
The structure of any node of this tree is :

Example- 1: Construct a B+ Tree for the following search key values,


{10, 20, 30, 40 }
where n = 3 ( n is number of pointers)

Example- 2: Construct a B+ Tree for the following search key values, Where n = 4.
{10, 30, 40, 50, 60, 70, 90 }

107
Now, Let’s Insert and Delete some elements into this tree.
Insert 25,75

When we insert an element, we add it on the next right node of the value lower than
the inserting element.

Delete 70

Here, when you delete any element. The element that has been deleted will be
replaced with the element on the right.

B-Tree Index Files


A B Tree Index is a multilevel index.
A B Tree is a rooted tree satisfying the following properties :

108
1. All paths from the root to the leaf are equally long.
2. A node that is not a root or leaf, has between [n / 2] and ‘n’ children.
3. A leaf node has between [(n-1) / 2] and ‘n-1’ values.
The structures of leaf, non-leaf nodes of this tree is :

Example-1: Construct a B- Tree for the following search key values, where n = 3.
{10, 20, 30, 40, 50}

Let’s take another example and insert, delete elements from the tree

Example-2: Construct a B- Tree for the following search key values, where n = 3. (n
is no of pointers)
{10, 20, 30, 40, 50, 60, 70, 80, 90}

1. a) Delete 20 from above tree.

109
1. b) Insert 65 to the above tree.

Static Hashing

Let K denote all the search-key values.


Let B represent the set of all bucket addresses.
A bucket is a unit of storage that contains some records.

Here, h is a ‘hash function’ from K to B.


‘Hash function’ is used to avoid ‘index structure’.

Bucket Overflow :
This will occur only in two ways.
1. Insufficient buckets.
2. Skew in distribution of records. Some buckets are given more records than others, so a bucket
can overflow even though the other buckets still have space. This situation is called ‘bucket
skew’.

Overflow Chaining :
The overflows of a given bucket are chained together in a linked list. This is called
‘Closed Hashing’.

110
In ‘Open Hashing’, the set of buckets are fixed, and there are no overflow chains.
Here, if a bucket is full, the system inserts records in some other bucket in the initial
set of buckets.

A hash index arranges the search keys, with their associated pointers, into a hash
file structure. In this, one applies a hash function on a search key to helping identify a
bucket, and store the key and its associated pointers in the bucket.
Example of Static Hashing
Example-10: Hash file organization of DEPT file using DName as key, where there
are eight departments.

Note: In case of hash functions, the hash function is of two types :


1. The distribution is uniform: The hash function assigns each bucket the same number of
search-key values from the set of all possible search-key values.

1. The distribution is random : In the average case, each bucket will have nearly the same
number of values assigned to it, regardless of the actual distribution of search-key values.

DYNAMIC HASHING
The ‘Dynamic Hashing’ technique allows the hash function to be modified
dynamically to accommodate the growth or shrinkage of the database. The ‘dynamic
hashing’ technique we use is called ‘Extendible Hashing’.

This technique is used to know the address of the required record, whose key value is
given.

111
The binary equivalent of the key is considered to map the key value to the address of
the record.

A bucket concept is used here, whereas a bucket is some unit of memory to


accommodate a certain number of records.

How to search a key:


 First, the hash function of the key has to be calculated.
 Next, we need to check how many bits are used in the directory, and these bits are called i.
 Then, take the least significant i bits of the hash address. We can get the index of the
directory with this.
 Now we use the index, go to the directory and find the bucket address where the record can
be there.
QUERY PROCESSING

Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:

1. Parsing and translation


2. Optimization
3. Evaluation

The query processing works in the following way:

Parsing and Translation

As query processing includes certain activities for data retrieval. Initially, the given
user queries get translated in high-level database languages such as SQL. It gets translated
into expressions that can be further used at the physical level of the file system. After this, the
actual evaluation of the queries and a variety of query -optimizing transformations and takes
place. Thus before processing a query, a computer system needs to translate the query into a
human-readable and understandable language. Consequently, SQL or Structured Query
Language is the best suitable choice for humans. But, it is not perfectly suitable for the
internal representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is similar to the
parser of a query. When a user executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query, verifies the name of the
relation in the database, the tuple, and finally the required attribute value. The parser creates a
tree of the query, known as 'parse-tree.' Further, translate it into the form of relational algebra.
With this, it evenly replaces all the use of the views when used in the query.

Thus, we can understand the working of a query processing in the below-described diagram:

112
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query
is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:

o σsalary>10000 (πsalary (Employee))


o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.

Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a query
evaluation plan.

Query Evaluation Plan

o In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.

113
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query
execution plan.
o A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the user
query.

Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query evaluation plan, which
minimizes its cost. This type of task performed by the database system and is known
as Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.

Catalog Information Used in Cost Functions


To estimate the costs of various execution strategies, we must keep track of any information
that is needed for the cost functions. This information may be stored in the DBMS catalog,
where it is accessed by the query optimizer. First, we must know the size of each file. For a
file whose records are all of the same type, the number of records (tuples) (r), the
(average) record size (R), and the number of file blocks (b) (or close estimates of them) are
needed. The blocking factor (bfr) for the file may also be needed. We must also keep track
of the primary file organization for each file. The primary file organization records may
be unordered, ordered by an attribute with or without a primary or clustering index,
or hashed (static hashing or one of the dynamic hashing methods) on a key attribute.
Information is also kept on all primary, secondary, or clustering indexes and their indexing
attributes. The number of levels (x) of each multilevel index (primary, secondary, or
clustering) is needed for cost functions that estimate the number of block accesses that occur
during query execution. In some cost functions the number of first-level index blocks (bI1)
is needed.
Another important parameter is the number of distinct values (d) of an attribute and the
attribute selectivity (sl), which is the fraction of records satisfying an equality condition on
the attribute. This allows estimation of the selection cardinality (s sl * r) of an attribute,
which is the average number of records that will satisfy an equality selection condition on

114
that attribute. For a key attribute, d = r, sl = 1/r and s = 1. For a nonkey attribute, by making
an assumption that the d distinct values are uniformly distributed among the records, we
estimate sl = (1/d) and so s = (r/d).
Information such as the number of index levels is easy to maintain because it does not
change very often. However, other information may change frequently; for example, the
number of records r in a file changes every time a record is inserted or deleted. The query
optimizer will need reasonably close but not necessarily com-pletely up-to-the-minute values
of these parameters for use in estimating the cost of various execution strategies.
For a nonkey attribute with d distinct values, it is often the case that the records are
not uniformly distributed among these values. For example, suppose that a company has 5
departments numbered 1 through 5, and 200 employees who are distributed among the
departments as follows: (1, 5), (2, 25), (3, 70), (4, 40), (5, 60). In such cases, the optimizer
can store a histogram that reflects the distribution of employee records over different
departments in a table with the two attributes (Dno, Selectivity), which would contain the
following values for our example: (1, 0.025), (2, 0.125), (3, 0.35), (4, 0.2), (5, 0.3). The
selectivity values stored in the histogram can also be estimates if the employee table changes
frequently.
In the next two sections we examine how some of these parameters are used in cost
functions for a cost-based query optimizer.

select operation

Query is a question or requesting information. Query language is a language which is used to


retrieve information from a database.
Query language is divided into two types −
 Procedural language
 Non-procedural language

Procedural language
Information is retrieved from the database by specifying the sequence of operations to be
performed.
For Example − Relational algebra.
Structure Query language (SQL) is based on relational algebra.
Relational algebra consists of a set of operations that take one or two relations as an input
and produces a new relation as output.
Types of Relational Algebra operations
The different types of relational algebra operations are as follows −
 Select operation
 Project operation
 Rename operation
 Union operation

115
 Intersection operation
 Difference operation
 Cartesian product operation
 Join operation
 Division operation
Select, project, rename comes under unary operation (operate on one table).
Select operation
It displays the records that satisfy a condition. It is denoted by sigma (σ) and is a horizontal
subset of the original relation.
Syntax
Its syntax is as follows −
σcondition(table name)
Example
Consider the student table given below −

Regno Branch Section

1 CSE A

2 ECE B

3 CIVIL B

4 IT A

Now, to display all the records of student table, we will use the following command −
σ(student)
In addition to this, when we have to display all the records of CSE branch in student table,
we will use the following command −
σbranch=cse(student)
Hence, the result will be as follows −

RegNo Branch Section

1 CSE A

To display all the records in student tables whose regno>2, we will use the below mentioned
command −
σRegNo>2(student)
The output will be as follows −

116
RegNo Branch Section

3 CIVIL B

4 IT A

To display the record of ECE branch section B students, use the given command −
σbranch=ECE ^ section=B(student)
To display the records of section B CSE and IT branch, use the following command −
σSection=B ^ Branch=cse ∨ branch=IT(student)
Consider the EMPLOYEE TABLE as another example to know about selection operations.
Retrieve information about those employees whose salary is greater than 20,000
 If one condition is specified then, we can use the following command −
σ salary > 20,000 (emp)
 If more than one condition specified in the query then ( AND: ^, OR:∨ , Not:#, equal:
=, >, <, >=, <=)
Relational operator will be used to combine the multiple conditions into a single statement.
Example − In order to retrieve the information of those employee whose salary > 20,000 and
working in HOD and Dept no is 20, we can use the following command −
σ salary > 20,000 ^LOC=HOD ^Deptno=20(emp)
Sorting
It is the technique of storing the records in ascending or descending order of one or more
columns. It is useful because, some of the queries will ask us to return sorted records, or in
operations like joins will be more efficient in sorted records. All the records are by default
sorted based on the primary key column. In addition, we can specify to sort the records based
on other columns, as required. Two types of sorting methods are mainly used.

Quick Sort
If a table size is small and can be accommodated into current memory, then quick sort can be
used. As the name suggests it is simple and easy method of sorting. In this method a pivot
element is identified among the values of the column and values less the pivot element is
moved to the left of pivot and greater than pivot elements are moved to the right of the pivot.
It takes very less additional space (n log (n)) to sort. It takes only n log (n) time to sort at best
case and only n2 time at worst case. But this method is less stable as it can alter the position
of two similar records while sorting.

Merge Sort
For the larger tables which cannot be accommodated in the current memory, this type of
sorting is used. It has better performance compared to quick sort. Let us see how this sort can
be done.
Suppose each block can hold two records and memory can hold up to 2 blocks. That means
memory cannot hold all the records of large tables; it can hold up to 4 records only. Suppose

117
Initial table has 12 records with two records in each block. When merge sort is applied, the
records are grouped into 3 with two blocks each. Each block is merged at pass 1 and sorted
for the pass 2, where it again merged and sorted. At the last stage blocks at pass 2 are merged
and sorted to give the final result. This is how a merge sort works.

Join Operations:

A Join operation combines related tuples from different relations, if and only if a given join
condition is satisfied. It is denoted by ⋈.

Example:

EMPLOYEE

EMP_CODE EMP_NAME

101 Stephan

102 Jack

103 Harry

SALARY

EMP_CODE SALARY

101 50000

102 30000

103 25000

1. Operation: (EMPLOYEE ⋈ SALARY)

EMP_CODE EMP_NAME SALARY

101 Stephan 50000

102 Jack 30000

103 Harry 25000

118
Types of Join operations:

1. Natural Join:

o A natural join is the set of tuples of all combinations in R and S that are equal on their
common attribute names.
o It is denoted by ⋈.

Example: Let's use the above EMPLOYEE table and SALARY table:

Input:

1. ∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)

Output:

EMP_NAME SALARY

Stephan 50000

Jack 30000

Harry 25000

2. Outer Join:

119
The outer join operation is an extension of the join operation. It is used to deal with missing
information.

Example:

EMPLOYEE

EMP_NAME STREET CITY

Ram Civil line Mumbai

Shyam Park street Kolkata

Ravi M.G. Street Delhi

Hari Nehru nagar Hyderabad

FACT_WORKERS

EMP_NAME BRANCH SALARY

Ram Infosys 10000

Shyam Wipro 20000

Kuber HCL 30000

Hari TCS 50000

Web Technologies And DBMS

The database means a collection of many types of occurrences of logical records


containing relationships between records and data elementary aggregates. Management
System database (DBMS) - a set of programs for creating and operation of a database.
Theoretically, any relational DBMS can be used to store data needed by a Web server.
Basically, it was observed that the simple DBMS such as Fox Pro or Access is not suitable
for Web sites that are used intensively. For large-scale Web applications need high
performance DBMS's able to run multiple applications simultaneously. Hyper Text Markup
Language (HTML) is used to create hypertext documents for web pages. The purpose of
HTML is rather the presentation of information – paragraphs, fonts, tables, than semantics
description document.

120
Keywords: internet, informations, web, dates
Online database Placing on the Internet collections of complex information involves
storing them in the database which can then be accessed online by users. The term database
can easily be deceiving because in reality system that makes visible this database on the
Internet is far more complex. Any database that provides information to users of Internet
services should be stored on a server that is visible on the Internet and to use a scripting
technology. The information in the database is extracted according to the specific needs of
user and then formatted so that they can be properly displayed. For example, when someone
writes the word "Romania" on Google search engine. com system will request the search
form will search the database of items that the word "Romania" after which will format the
results so that they can be displayed by a browser such as Internet Explorer. A general view
of server architecture is provided in the scheme in the following figure. Fig. 1 Architecture of
Web sereverului support scripting As seen in Figure system architecture is structured on
several levels.
When the user wishes to access external information located on the server, it will use
an Internet navigator to connect to it. Accessing the server is done via a URL. The main
elements which enter into the composition architecture are server: Web server, the parser
scripts type server-side, the drivers for access to the database, the database and collections of
files. The Web server that is a complex application responsible for communication with
external Web browsers. Basically the Web server listens to the HTTP port (default 80) of the
machine on which it is installed. When a request arrives on this port, the Web Server
interprets to see what information has been requested. Information requested from the server
are actually files that reside on your hard disk. The Web server is to wrap these files so that
they can be sent ahead. The required Files can be divided into two categories: • files that
contain static information. They shall be sent forth to browsers without any kind of change.
Static Files are usually images, HTML files, movies, documents offered for download,
movies, Flash animations, etc. • scriptures. These are practices of small programs that run by
a intepretor, by sending to the Web server only the result of their execution.
The main role of these scripts is to dynamically generate documents type. The
technique of dynamic generation of HTML documents makes it possible to access the
databases on the Internet . • the role of the parser scripts type server-side has been described
above. Where a script needs the records from a database that will interact with it through a
driver. He will run in an application-level SQL database. Following the execution of this
application I return a cursor. Had this cursor is generates HTML code that once reached a
navigator determines the display of the data you want. Drivers for access to the database are
meant interaction between interpreter of scriptures and database itself. They are very
specialised software tools that usually are not visible to the programmer nor any user.
The drivers are important because the choice of their flawed significantly affect
system performance. Main SGBD sites used in Web applications are: mySQL, SQL Server
and Oracle. Collections of files are static information which are sent to users on demand. It is
important to note that ASP scripts are designed to produce HTML pages that you send to
Web browsers to display. The major Benefit of ASP scripts is that permit the production of
dynamic HTML code according to the concrete needs. For example, you can easily get the
records from a table to a database data Sourceand may wrap in HTML format can be
displayed in a browser. Although they were conceived as General Web application, the
overwhelming majority of applications THE ASP scripts are related to working with
databases on-line. In order to achieve THE ASP scripts must have the following: • a
computer on which to set up a Web server (for example, Internet Information Server and

121
Personal Web Server); any Windows system can be easily configured to support ASP scripts;
• a text editor; You can use Notepad or specialized editors such as FrontPage or Macromedia
Dreamweaver 2007. • a SGBD for creating and updating of the database used by means of
scripts; • a Web browser to see the result of script execution; Considering that THE ASP
scripts are usually made to work with databases is needed and a database to run the script. It
must be on the same computer with the script, preferably in the same directory.
Database access
ActiveX Data Objects (ADO) is a technology that allows accessing databases from
Web pages. Basically, ADO can be used for writing scripts compact for connecting to data
sources from the Web pages or to sources of OLE DB-compatible data; ADO is also
utilizeazăşi like databases, spreadsheets tabular, sequential data files, or e-mail directories.
OLE DB is a programmatic interface to system level that provides the standard set of COM
components for managing databases. Accessing COM components is carried out with the
object model using VBScript or ADOşi JScript scripts can access the databases of Web
applications. ADO is also used for opening databases compatible ODBC (Open DataBase
Connectivity). For creating an application with the access to the database, ADO will require
an identification of the data source. This is done by adding character to connect unuişir,
consisting of arguments separated with unşir ";"
for example, the name of the supplier of the data source (data source provider) and the
location of the data source. ADO use characters for login in order to identify THE OLE DB
provider (provider). The provider is a component that represents the data source, he also
available to your application information about the format of the data. For compatibility, the
OLE DB provider for ODBC supports the syntax of the string for the connection. The string
of characters for login that relates a source database on a remote computer, can contain
security information (user name, password). To prevent access to data sources creates
Windows accounts for the computers that will access data sources, with the appropriate
NTFS permissions to files. For the establishment and the handling of the connections
between the application and data sources compatible OLE DB or ODBC-compatible
databases, ADO provides the Connection object. He has properties and methods allowing the
opening and closing the logins, databases, respectively the formulation of queries to update
the data.
To establish a connection to a database, you will create an instance of the Connection
object. For example, the following script create Connection and open a connection. The string
for the connection does not contain any spaces before or after the equal sign (=) In the
previous example, Open method of Connection object refers to the character string for the
connection. Security is enforced by the security subsystem of the SGBD system, which
checks whether all applications access to satisfy the constraints of security (or authorities,
most likely) stored in the system catalog. Each authority from a discretionary scheme has a
name, a lot of privileges (RETRIVE, INSERT, etc.), a variable-by-appropriate relationship
(i.e., the data for which you apply the authority) and a lot of users. These authorities can be
used to provide control elements dependent on value, independent of the summary and
statistical value, dependent on context. Audit Collection can be used to record attempts of
violation of security .

122
Web technologies: HTML, ASP, PHP
HTML is a form of markup text oriented to the presentation of documents on a single
page, using a specialized rendering software, called HTML user agent, the best example of
such software as your Web browser. HTML provides the means by which the contents of a
document can be annotated with various types of metadata and indications of playback.
Indications of play can range from minor text decorations, such as specifying the fact that a
specific word or it must be stressed that an image should be introduced, up to sophisticated
scripts, images, maps and forms. The metadata may include information about the title and
author of the document, the structural information about how the document is divided into
different segments, paragraphs, lists, headings, etc. and crucial information that enable the
document can be linked to other documents to form such hyperlinks (or web site).
HTML is a text format designed to be read and edited using a simple text editor.
However writing and modifying pages in this way requires solid knowledge of HTML and is
time consuming. Graphical Editors (WYSIWYG) such as Macromedia Dreamweaver, Adobe
GoLive, Microsoft FrontPage or allow webpages to be treated like documetele Word. You
can generate HTML directly using the technologies of server-side encoding such as PHP, JSP
or ASP. Many applications like content management systems, wikis and forums web
generates HTML pages.
HTML is also used in e-mail. Most e-mail applications use a built-in HTML editor
for composing e-mails and a presentation engine of e-mails of this type. Using HTML e-mail
is a controversial topic and many mailing lists they intentionally blocked. Active Server
Pages (ASP) , also known under the names of Classic ASP or ASP Classic, was the first
language programming server-side Microsoft's for generating dynamic Web pages. Originally
was released as an add-on for IIS by Windows NT 4.0 Option Pack, after which it was
included as a free component in Windows Server, starting with the version of Windows 2000
Server). Currently was passed its version of ASP.NET. ASP.NET is a Microsoft technology
for creating Web applications and Web services. Asp.net is the successor of ASP (Active
Server Pages) and benefit from the power of the .NET development platform, and the set of
tools offered by the development environment of Visual Studio .NET application "".
Some of the advantages of the ASP .NET are:
• ASP .NET has a broad set of components, based on XML, thus providing a model
object oriented programming (OOP).
• ASP .NET runs code compiled, which increases performance of the web aplication.
Source code can be separated into two files, one for the executable code, and another one for
the content of the page (HTML code and the text of the page).
• .NET is compatible with over 20 different languages, the most used as C # and
Visual Basic. PHP is a programming language.
PHP Name comes from the English language and is a recursive acronym: Php:
Hypertext Preprocessor. Used originally to produce dynamic Web pages, is widely used in
the development of pages and web applications. It uses mainly incorporated into the HTML
code, but starting from version 4.3.0, you can also use the "command line" (CLI), allowing
for the creation of independent applications. It is one of the most important programming
languages open-source web and server-side, with versions available for most web servers and
for all operating systems. According to the statistics is installed over 20 million websites and
1 million Web servers . Conclusions A database, sometimes called "data bank" is a way of
storing information and data on external media (storage device), with the possibility to light

123
and their rapid retrieval. Typically a database is stored in one or more files. Databases are
handled by systems management databases.

Web as a Database Application platform


When listening to developers speak about databases you will usually hear buzz words like

robust, efficient, scalable, etc. Discussions will focal point on the strength of the DBMS and

how it integrates with different technologies. In our case, however, we don’t honestly care

about most of those things. Instead, we are going to be looking at the price of getting started,

tools, the consumer interface and availability of help, specifically helpful for the beginner. We

have a collection of database interview questions which is very helpful to crack your

interviews.
It usually depends on the following factors.
1. The deciding on of right database relies upon on the undertaking (Website) is about.

2. The focused type of person the utility is going to serve.

3. The programming language you choose.

1. MySQL

MySQL is used in almost all the open-source internet tasks that require a database in the back-

end. MySQL is phase of the effective LAMP stack alongside with Linux, Apache, and PHP.

If you are looking for new job then our MySQL interview Questions will help you to crack

your interviews.This used to be in the beginning created by means of a organisation referred to


as MySQL AB, which was acquired with the aid of Sun Microsystems, which was once

received by using Oracle. Since we don’t what Oracle will do with MySQL database, the

124
open-source community has created a number of forks of MySQL inclusive of Drizzle and

MariaDB.
Following are a few key features:

· MyISAM storage makes use of b-tree disk tables with index compression for high

performance.

· Support for partitioning and replication.


· Written in C and C++.

· Support for saved procedures, triggers, views, etc.

· Support for Xpath, full-text search.


2. PostgreSQL

PostgreSQL is an open-source object-relational database system. It runs on most *nix flavors,

Windows and Mac OS. This has full aid for views, joins, triggers, procedures. You can also

visit our PostgreSQL Interview Questions, if you are looking new job then it will help you

to crack your interviews.


Following are a few key features:

1. Support for tablespaces

2. MVCC — Multi-Version Concurrency Control

3. Hot backups and point-in-time recovery

4. Asynchronous replication

5. Highly scalable
3. Oracle

Oracle is the fine database for any mission-critical commercial application. Oracle has

following four extraordinary variations of the database: 1) Enterprise Edition 2) Standard

Edition 3) Standard Edition One 4) Express Edition

The following are a few key features of the oracle database.


1. Data Guard for standby database

2. Virtual Private Database

125
3. Real Application Cluster

4. Automatic Memory, Storage

5. OLAP, Partitioning, Data Mining

3.SQLite

SQLite does now not work like a usual client-server mannequin with a standalone process.

Instead, it is a self-contained, server-less SQL database engine.

Main Features of SQLite:


1. No exterior dependencies

2. Zero configuration with no setup or admin tasks.

3. The entire database is stored in a single disk file.

4. Supports database of a number of TB in size

5. WinCE is supported out-of-the-box


5. Microsoft SQL Server

This is Microsoft’s flagship Database product. If you are stuck in a agency that closely makes

use of Microsoft products, you would possibly end-up working on MS SQL Server.

Microsoft SQL Server is a relational database management system developed by way of


Microsoft. As a database server, it is a software product with the fundamental characteristic of

storing and retrieving data as requested through other software program applications — which

might also run either on the same computer or on any other pc across a network.

126

You might also like