4th Sem CSE Database Management System
4th Sem CSE Database Management System
LECTURE NOTES
ON
Compiled by
Swagatika Dalai
Lecturer, Department of Computer Science and Engineering,
KIIT Polytechnic, Bhubaneswar
KIIT POLYTECHNIC
CONTENTS
Chapter-1
BASIC CONCPETS OF DBMS
DATA : Data is a raw fact or figures of any object through which an object cannot be identified.
For example: Emp_No, Name, Salary of an employee individually known as one data.
DATA BASE: Organized collection of related data files or information is called database.
OR
A database is an integrated collection of logically related records and files. OR A database can
be defined as a collection of coherent, meaningful data.
For example: Qr_No, Road name, Area name, Dist name, State name, pin code Combined
together creates a postal address. The multiple addresses kept together in one place, such as an
address book is a database and the postal address in the book is the data that fills the data base.
DBMS: It is a collection of programs that enables user to create and maintain a database. In
other words it is general-purpose software that provides the users with the processes of defining,
constructing and manipulating the database for various applications.
DATABASE SYSTEM: The database and DBMS software together is called as Database
system.
NOTE: Though data is a raw fact through which an object cannot be identified, but it’s the bare
minimum requirement to build a database.
Applications of DBMS:
Databases are widely used. Some of them are as follows.
Banking-> For customer information, accounts, loans and banking transactions.
Airlines-> For reservation and scheduled information.
Universities-> For student, course and grade information.
Credit card transactions-> For purposes on credit cards and generation of monthly
statements.
Although there are many advantages of DBMS, the DBMS may also have some minor
disadvantages. These are:
1. Cost of Hardware & Software: A processor with high speed of data processing and
memory of large size is required to run the DBMS software. It means that you have to
upgrade the hardware used for file-based system. Similarly, DBMS software is also very
costly.
DBA
Database Administrator (DBA): This is the chief administrator, who oversees and manages the
database system (including the data and software). Duties include authorizing users to access the
database, coordinating/monitoring its use, acquiring hardware/software for upgrades, etc. In large
organizations, the DBA might have a support staff.
ROLE OF A DBA
Defining the Schema:
The DBA defines the schema which contains the structure of the data in the application. The
DBA determines what data needs to be present in the system how this data has to be represented
and organized.
Defining Security & Integrity Checks:
The DBA finds about the access restrictions to be defined and defines security checks
accordingly. Data Integrity checks are also defined by the DBA.
Defining Backup / Recovery Procedures:
The DBA also defines procedures for backup and recovery. Defining backup procedures includes
specifying what data is to backed up, the periodicity of taking backups and also the medium and
storage place for the backup data.
Monitoring Performance:
The DBA has to continuously monitor the performance of the queries and take measures to
optimize all the queries in the application.
The major purpose database system is to provide users with an abstract view of data (i.e.
the system hides certain details of data how data are stored and maintained).
The logical architecture describes how data in the database is perceived by users. It is not
concerned with how data is handled and processed by DBMS, but only with how it looks.
The system is called ANSI/SPARC MODEL and system is divided in to three levels of
abstraction: the internal or physical level, conceptual or logical level, external or view
level.
Schemas:
The overall design of the database is called database schema or database scheme.
In other words it is the logical structure of the database.
It is analogous to type information (i.e. data type) of a variable in a program.
Database system has several schemas, partitioned according to level of abstraction. They
are:
1. Physical schema: The physical schema describes the database design at the physical level,
which is the lowest level of abstraction describing how the data are actually stored.
2. Logical schema: The logical schema describes the database design at the logical level, which
describes what data are stored in the database and what relationship exists
3. subschemas: The several schemas present at the view level to describe different views of the
database.
Instances:
The collection of information stored in a database at a particular point in time is called an
instance of the database.
It is analogous to the value of a variable.
DATA DICTIONARY
The relational database system needs to maintain data about relation. This information is
called data dictionary.
In other word, a data dictionary is a set of metadata (i.e. data about data) which contains
the definition and representation of data elements.
Database Management System 5 Swagatika Dalai
KIIT POLYTECHNIC
There are different types of database system users differentiated by the way they expect
to interact with the system.
Naive Users:-
These are end users who interact with the system by invoking permanent application
program that have been written previously. They need not be aware about the application
program they only use it.
Ex:- ATM users
Application Programmer:-
They are the computer professional who develop the application program. The
application programs could be written in a general purpose programming language such
as PASCAL, COBOL, C, C++.
Sophisticated User:-
These users interact with the system without writing the program. They form the request
by writing queries in database query language.
Analysts who submit queries to explore data in the database fall in this category.
Specialized User:-
Users who are responsible to write specialized database applications that do not fit into
the conventional data processing system. For ex. Computer Aided Design (CAD),
Artificial Intelligence System etc.
Chapter-2
DATA MODELS
DATA INDEPENDENCE: The ability to modify a schema definition in one level without
affecting a schema definition in the next higher level is called data independence.
OR
Data and programs are independent of each other, so change is one has no or minimum effect on
the other. Data and its structure are stored in the database where as application programs
manipulating this data are stored separately, the change in one does not necessarily affect the
other.
DATA MODEL
RECORD BASED
OBJECT BASED PHYSICAL DATA
MODEL
MODEL MODEL
ENTITY
RELATIONSHIP
MODEL
This data model uses objects as the key data representation components.
For ex: Entity-Relationship model
ENTITY-RELATIONSHIP MODEL
It is a collection of real world objects called entities and their relationships.
It is mainly represented in graphical form using E-R diagrams.
This model is very useful in database design.
E-R diagram which consists of following components.
: - Represents Entity
:-Represents Attributes
:-Represents link between entity set and entity set to relationship set.
Entity:
An entity is an object of concern to represent the things in real world and is distinguishable from
other by certain property.e.g. car, table,book etc.
*An entity need not be physical entity, it can also be represents a concept in real world.e.g.
project, loan etc.
It is denoted by
2. Strong entity set: The entity type containing a key attribute are called strong entity or regular
entity set.
It is denoted by
e.g.- The ‘STUDENT’ entity has a key attitude ROLL NO which uniquely identifies it.
Attribute:
An attribute is a property used to describe the specific feature of entity. So to describe an entity
entirely, a set of attributes is used.
Types Of Attribute:
Attributes that have more than one value for a particular entity is called a multivalued attribute
.e.g. book entity have attributes name, author, no of pages. Here author is the multivalued
attribute as a single book has more than one author & others are single valued.
Multivalued attribute
Price No of pages
Author
Book
Name
Simple attribute
*Composite attributes are attributes that can be further divided into smaller units and each
individual unit contains a specific meaning. .e.g. The ‘NAME ‘attribute of an employee entity
can be subdivided into ‘First name’, middle name’, ‘last name’.
M Name
F Name L Name
Stored attribute
An attribute whose value is derived from the stored attribute is known as derived attribute.
Example: age, and it’s value is derived from the stored attribute Date of Birth.
Derived attribute
RELATIONSHIP
e.g. in college data base, the association between student and course entity. i.e.” student opts
course “is an example of relationship.
One college can have at most one principal and one principal can be assigned to only one
college.
one---to---many
1 N
Department Work in faculty
One department can appoint any no. of faculty but a faculty is assigned to one department.
1 N COMPUTER
N
HUB Connect
One HUB can connect to any no of COMPUTER but a computer can connect to any one HUB.
many---to---one
Any one of instance of entity A are associated with only one instance of entity B.
E.g. Relationship between course and instructor.
An instructor can teach various courses but a course can be taught only by one instructor.
many---to---many
Instances of entities A & B are associated with any number of instances from each other.
BOOK M N Author
Writes
One author can write many books and one book can be written by more than one authors.
One author can write many books and one book can be written by more than one authors.
Total participation: The participation of an entity set E in a relationship set R is said to be total
if every entity in E participates in at least one relationship in R.
E1
R E2 Total participation of E2 in R
Example:
A record based data model is used to specify the overall logical structure of the database.
Each record type defines a fixed no. of fields having a fixed length.
This model describes data at the conceptual level or view level and provides higher level of
description of the implementation.
This data model uses records as key data representation components.
ADVANTAGES
Adding and deleting records is easy.
Fast data retrieval through higher level records.
Efficient for 1 to many relationship fixed over time.
DISADVANTAGES
Management difficulties caused by parent segment deletion in that all child segments must also
be deleted.
Pointers requires large amount of computer storage.
NETWORK MODEL
In the network model, the entities are organized in a graph, in which some entities can be
accessed through several paths.
The data are represented by collection of records and relationship among the data represented by
links.
The network model is an improvement of hierarchical model; here multiple parent child
relationships are used.
In network model the relationship as well as the navigation through the data base is predefined at
the data base creation time.
ADVANTAGES
Data independence.
It can handle more relationships than hierarchical e.g. a child can have multiple parents.
DISADVANTAGES
Detailed structural knowledge is required.
Lack of structural independence.
RELATIONAL MODEL
In the relational model, data is organized in two-dimensional tables called relations. The tables or
relations are related to each other.
Constraints are stored in a meta-data table.
This is a very simple model and most widely used data base model.
The relational model is based on the relational algebra.
DISASVANTAGES
New relations can require considerable processing.
Sequential access is slow.
Method of storage on disk impacts processing time.
KEY
A key is an attribute or a set of attribute in a relation that uniquely identifies a tuple in a relation.
Types of keys:
1. Super Key: A super key is an attribute or any set of attributes that uniquely identifies a
row in a relation.
2. Composite Key: A composite key is a key that contains more than one attribute.
3. Candidate Key: A candidate key is an attribute or set of attribute that uniquely identifies
a row in a relation. In other words minimal super key is called candidate key.
4. Primary Key: A primary key is a candidate key which can uniquely identify a record or
tuple. Each table can have only one primary key. The primary key should be selected in
the manner such that it is unique and not null. .
5. Alternate Key: The alternate keys of any table are those candidate keys which are not
selected as the primary key.
AK(Alternate Key)=CK(Candidate Key)-PK(Primary Key)
6. Overlapping Key: The keys having common attributes are called overlapping key.
7. Foreign Key: Foreign key is a referential key which must be a primary key of another
table. This way we can link two tables for retrieving the data jointly. we only insert those
values which are present in the base table. If we delete the base table automatically the
the foreign key and primary key relationship is broken.
Chapter-3
RELATIONAL DATABASE
Relational Terminology
RELATION
Roughly a table in a relational database is called a relation.
A relation consist of
1. Relational scheme
2. Relational instances.
Attribute
The name of each column in a database is used to interpret its meaning is called attribute.
Domain
Each attribute is defined over a set of values known as its domain.
Tuple
A tuple is a row in a relation\table.
Degree/Arity
The no. of attributes in the relation scheme is called its degree or arity.
Cardinality
No. of tuples present in a relation scheme is called cardinality of a relation.
Union Compatible
Two relation P[P] and Q[Q] are said to be union compatible, if both P & Q are of same degree
and the domains of the corresponding attributes are equal.
i.e. if P={P1,P2 ………….Pn} & Q= {Q1,Q2……..Qn} then DOM(Pi) = DOM(Qi) for
i={1,2,3,4…….n},
where DOM(Pi) represents the domain of attribute Pi.
RELATIONAL ALGEBRA
Relational algebra is a collection of operation used to manipulate the data in relational model
NOTE: Result of each relation operation is also a relation.
The operation can be classified in two categories.
1. Basic set operation
a) Union
b) Intersection
c) Set difference
d) Cartesian product
2. Relational operation
a) Selection
b) Projection
c) Join
c) Division.
BASIC SET OPERATION
These are binary operations (i.e. each is applied to two relations)
These two relations should be union compatible expect in case of Cartesian product.
UNION
If R1 & R2 are two union compatible relations then the union of R1 & R2 (R3= R1 U R2) is the
relation containing tuples that are either in R1 & R2 or in both of them.
The duplicated tuples such that
R3 = {t | R1 ∈ t ∨ R2 ∈ t} and max (│R1│,│R2│) ≤ │R3│≤(│R1│+│R2│)
The cardinality of the resultant relation depends on duplication of tuple in R1 & R2.
If the tuples in R1 & R2 are disjoints, then │R3│=│R1│+│R2│
NOTE
Union is a commutative operation i.e.R1 U R2 = R2 U R1
Union is an associative operation i.e. R1 U ( R2 U R3 ) = (R1 U R2 ) U R3
INTERSECTION
If R1 & R2 are two union compatible relation the R=R1 …R2 is the relation that includes
all tuples that are in both the relations.
SET DIFFERENCE
If R1 & R2 are two union compatible relations, The results of R=R1-R2 is the relation that
includes only those tuples that are in R1 but not in R2 .
In other words we can say it removes that common tuples from the first relation.
R3 will have tuples such that
R3 = {t | R1 ∈ t 𝖠 t∉R2}. and 0≤│R│≤ │R1│
Note: -1) Difference operation is not commutative,
i.e., R1 – R2 ≠ R2 – R1
2) Difference operation is not associative,
i. e.,R1 ─ (R2 – R3) ≠ (R1 – R2) – R3
CARTESIAN PRODUCT
The Cartesian product of 2 relation is concatenations of tuples belonging to two
relations.
A new resultant relation scheme is created consisting of all possible combinations of
tuples that in the both relations.
R3 = R1 × R2 where a tuple is given by:
R3 = {t1 || t2 | R1 ∋ t1 𝖠 R2 ∋ t2}.
R3 is obtained by concatenation each tuple in relation R1 with each tuple in relation R2
here II represents the concatenation operation.
Degree (R3)=(Degree(R1)+Degree(R2)
The cardinality of resultant relation is given by IRI=IPI*IQI
Relational Operations
Select Operation
The select operation is used to select some specific records from the database based on
some criteria.
This is a unary operation mathematically denoted as σ
Syntax:
σ <Selection condition> (Relation)
The Boolean expression is specified in <Select condition> is made of a number of clauses of the
form:
<attribute name><comparison operator><constant value> or
<attribute name><comparison operator><attribute name>
Comparison operators in the set { ≤, ≥, ≠, =, <, <} apply to the attributes whose domains are
ordered value like integer.
Example :
Consider the relation PERSON. If you want to display details of persons having age less than or
equal to 30 than the select operation will be used as follows:
σ AGE <=30 (PERSON)
The resultant relation will be as follows:
Note:
1) Select operation is commutative; i.e.,
σ <condition1> (σ <condition2> (R)) = σ <condition2> (σ <condition1> (R))
Hence, Sequence of select can be applied in any order
2) More than one condition can be applied using Boolean operators AND & OR etc.
PROJECT Operation
The project operation is used to select the records with specified attributes while discarding the
others based on some specific criteria.
This is denoted as Π.
Π List of attribute for project (Relation)
Example :
Consider the relation PERSON. If you want to display only the names of persons then the project
operation will be used as follows:
Π Name (PERSON)
The resultant relation will be as follows:
Selection+ Projection
For example:
Find those students who live in Bhopal from student relation.
Πsname(σsaddress=” Bhopal” ( student));
JOIN:-
The join operator joins two or more relations to form another relation.
The join operator joints two relations on the basis of some comparison operator in meaningful
way.
Syntax:- R1 R2
x Θy
Where R1, R2→ Two relations
X → Attributes of R1
Y → Attributes of R2
→join operator
Θ → Comparison operator
JOIN
Equi join Non equi join Outer join Self join Cross join Natural join
1. Equi join:-
When there is a common attribute in two relations can be joined by equating the
common column value.
When the comparison operator is an equal operator then the join is called equi
join.
STUDENT
If we join the relation student with relation hostel then the resulting will as follows.
Student Hostel
Student.Name = Hostel.Name
2. Non-Equi join:-
A non-equi join is the join where comparison operator is other than equal operator.
The non equi join takes place where there is no common attribute is present in both relations.
Student
Roll no Name Mark
1 XXX 55
2 YYY 35
3 ZZZ 69
4 PPP 25
Grade
Lower Upper Division
60 100 1st
50 60 2nd
30 50 3rd
0 30 fail
If we want to retrieve the relation , what is the grade of each student then we have to join the
two relation as follows
Student Grade
(Marks between lower and upper)
3. Outer Join:-
Sometime we do not want to lose any information in the join operation and where this can be
possible is called outer join. The use of Outer Join is that it even joins those tuples that do not
have matching values in common columns are also included in the result table. Outer join places
null values in columns where there is not a match between them.
Student
If we join the relation student with hostel through left outer join then the result will be as
follows:
Student Hostel
Student.Name = Hostel.Name
Name Roll no Class Hostel Room no
Sita 1 3rd CS LH 1
Student Hostel
Student.Name = Hostel.Name
4. Cross Join -:
This type of join is rarely used as it does not have a join condition, so every row of table 1 is
joined to every row of table 2. For example, if both tables contain 100 rows the result will be
10,000 rows.
The size of a Cartesian product is the number of the rows in first table multiplied by the number
of rows in the second table. This is sometimes known as a Cartesian product and can be
specified in either one of the following ways:
5. Natural Join: It is the same like Equi join except one of the duplicate columns is eliminated
in the result table. The natural join is the most commonly used form of join operation.
Courses
HoD
CID Course Dept
CS Alex
EE01 Electronics EE
ME Maya
EE Mira
Courses ⋈ HoD
6. Self-Join: It is a join operation where a table is joined with itself. Consider the following
sample partial data of EMP table:
1 Nirmal 4
2 Kailash 4
3 Veena 1
4 Boss NULL
….. ….. …
Chapter-4
NORMALIZATION IN RELATIONAL SYSTEM
Database anomalies.
The real world is model in database through number of relations, since the real world
changes over time, the tuples changes throughout.
Those tuples may be added, deleted or updated and this frequent updation leads to some
anomalies (problem). These anomalies arise due to the bad design.
The anomalies are:-
Updated anomalies -
The multiple copies of some data may leads to updates anomalies or inconsistency.
When an update is made and only some of the multiple copies are updated.
Insertion anomalies:-
Sometimes the insertion is depends upon another if the dependent relation values are inserted of
inserting to the relation on which it is dependent, then it leads to bad database design.
Deletion anomalies:-
If the relations are dependent with each other then deletion of tuples in one relation should be
updated and stored in other relation otherwise it leads to deletion anomalies.
Functional dependency
A functional dependency is a particular relationship between two attributes.
Given a relation R, a set of attributes X in R is said to functionally determine another attribute
Y, also in R, (written X → Y) if and only if each X value is associated with at most one Y value.
Let R be a relation and x & y are two non empty set of attribute in R then a relation instance has
a functional; dependency x→y, if the following conditions holds for every pair of following
conditions holds for every pair of tuples of t1 &t2 is R.
If t1[x]=t2[x] then t1[y]=t2[y]
x→y denote: - x functionally determine y or y functionally dependent on x
Functional dependency does not imply mathematical dependence that the value of the
attributes may be computed from value of another attribute.
Functional dependent of y on x means that there can be one value of Y for each value of
x.
An attribute may be functionally dependent on two or more attributes rather than a single
attribute or vice versa.
Functional dependencies are consequences of interrelationship among attributes of an
entity represented by a relation or due to relationship between the entities i.e. also
represented by a relation.
EXAMPLE:-
Emp-id→Ename
Emp-id→Dept, addr
Lets x= {roll no}
y= {name, branch}
x→y
NOTE
The values of all attributes are functionally dependent on primary key of a relation.
INFERENCE RULE OF FDS
The set of all dependencies that include F as well as dependencies that can be inferred from
F is called closure of F is denoted as F+.
EXAMPLE:-
RULES(AMSTRONGS AXIOMS)
1)REFLEXIVE
6) PSEUDOTRANSITIVE
If x→y & yz→p then xz→p
Let X=Rollno, y=Name , z= Addr. , p= ph.no
Rollno-name, name,addr.Ph.no
Then address, rollnoPh.no
NOTE
The first three rule (reflexive, augmentation, transitive)are fundamental axioms and are
sufficient to generate all possible FDS.
Example
Let R(A,B,C,D,E,F) & the sets of FDS are given below :
F[{ABC},{BE},{CDEF}]
Ans:
(1) ABC
AE (Transitive)
(3) CDEF
ABCE(Union law)
(5) AC & AE
ACE(Union law)
F+=[{ABC},{BE},{CDEF},{AB},{AC},{AE},{CDE},{CDF},{ABCE},{A
CE}]
EXAMPLE
Let R = (A,B,C,D) & F={AB,AC,BCD}. Find F+.
Answer
(1) AB & AC
ABC(Union)
(2) ABC & BCD
AD (Transitive rule)
(3) AB & AD
ABD (Union)
PARTIAL DEPENDANCY:
If primary key is a composite key, some of the attributes will depend on the primary key
then those attribute are called partially depends upon the primary key.
EXAMPLE
Stud{s-Id, Name, DOB, Addr, Course, Date of completion}
The FDS are:-
S-id:- Name,DoB,ADDR,Course
S-id:- date of completion
The primary key of the relation is the composite key i.e s-id,course, therefore the non-key
attributes name, dob, address are functional dependant on only s-id, but not on course. Hence,
these are partial dependant.
NORMALIZATION
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like insertion. deletion and modification.
The poor database design gives rise to many anomalies. This anomalies can be avoided
by a process called Decomposition process or “Normalization”.
The guidelines for constructing relation scheme having desirable property and avoiding
anomalies are called normal forms.
The Table is not in first normal form because Multiple items present in color field.
To make the table in 1NF we should have the data like this.
Example :
item colors price tax
Now the table is in 1NF because each field contain single attribute. Here we have taken {item,
color} as primary key. However the table is not in 2NF because the non key attribute price and
tax depend only on item, but not on color which is a part of the primary key. This violates the
rule for 2NF as the rule says no non-key attribute is functionally dependant on part of the key
attribute.
To make the table complies with 2NF we can break it in two tables like this:
item color
Item price tax
T-shirt red,
T-shirt 12.00 0.60
T-shirt blue
sweatshirt 25.00 1.25
polo red, polo 12.00 0.60
polo yellow
blue,
sweatshirt
black
sweatshirt
In this table item is the Primary key. The non key attribute tax depends on price not on item.
Here itemprice and price tax. So item tax. That means transitive dependeny exists here.
Therefore the table is not in 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the
transitive dependency.
Item price
price tax
T-shirt 12.00 12.00 0.60
sweatshirt 25.00 25.00 1.25
polo 12.00
Now the tables are in 3NF.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 Stores
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
Chapter-5
STRUCTURED QUERY LANGUAGE
QUERY LANGUAGE:-
A query language is a language in which a user requests information from the database.
These languages are usually on a level higher than that of a standard programming
language.
Query languages can be categorized as either procedural or non-procedural. In a
procedural language the user instructs the system to perform a sequence of operations on
the database to compute the desired result. In a non-procedural language the user
describes the desired information without giving a specific procedure for obtaining that
information.
SQL:
Structured Query Language (SQL) is a standard query language.
It is commonly used with all relational databases for data definition and manipulation.
SQL is called a non-procedural language as it just specifies what is to be dome rather
than how it is to be done.
since SQL is a higher-level query language, it is closer to a language like English.
Therefore, it is very user friendly.
The American National Standard Institute (ANSI) has designed standard versions of
SQL.
Some of the important features of SQL are:
It is a non procedural language.
It is an English-like language.
It can process a single record as well as sets of records at a time.
It is different from a third generation language (C& COBOL). All SQL statements define
what is to be done rather than how it is to be done.
SQL is a data sub-language consisting of three built-in languages
Data definition language (DDL), Data manipulation language (DML) and Data
Control language (DCL).
It insulates the user from the underlying structure and algorithm.
SQL has facilities for defining database views, security, integrity constraints,
transaction controls, etc.
Example:-
Create table student(rollno varchar2(20), name varchar2(30), address varchar2(50),semester
varchar2(10));
To view the structure of a table:-
Syntax:-
Sql> desc table name;
Example:
SELECT Rollno, name FROM Student;
DELETE Command
This command is responsible for deleting certain values of a relation those satisfies the
conditions.
The data control basically refers to commands that allow system and data privileges to be passed
to various users. These commands are normally available to database administrator. Let us look
into some data control language commands:
Create a new user:
CREATE USER < user name > IDENTIFIED BY < Password>
Example:
CREATE USER MCA12 IDENTIFIED BY W123
1. Grant: This command grant users various privileges to table .It is used to provide database
access permission to users. It is used to allow specified user to perform specified task . Privileges
can be the combination of select, update, delete, alter. It is of two types
(1) System level permission (2) Object level permission.
System level permission
Syntax
GRANT CREATE SESSION TO USER;
Example:
GRANT CREATE SESSION TO CSE;
Syntax
GRANT <previllege_list> ON <table_name> to user;
Example:
GRANT select On student to CSE;
Example:
REVOKE ALL ON EMP FROM CSE;
(All permissions will be cancelled)
We can also revoke only some of the permissions.
Chapter-6
TRANSACTION PROCESSING CONCEPTS
TRANSACTION
Transaction is a programming unit of work whose execution may change the database.
It consists of a set of operation including reading, writing & modifying the database
objects.
Thus a transaction is a program until whose execution preserves the consistency of the
database is in a consistent state before a transaction is executed, and then the database is
also in a consistent state after execution.
A read operation brings a database objects into main memory from disk where the
database is residing & then the value is copied to a program variable defined within the
transaction.
A write operation identifies a copy of the data base object in the main memory, updates
its value & then writes this objects to the disk.
Example:
States of Transactions:
A transaction in a database can be in one of the following states −
Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select one
of the two operations after a transaction aborts −
i)Re-start the transaction
ii)Kill the transaction
1. Serial schedule: when the actions of different transactions are not interleaved (i.e. transactions
are executed from start to finish)then the schedule is called as serial schedule.
It always ensures a consistent state.
2. Non-Serial schedule: A schedule in which the operations from a set of concurrent
transactions are interleaved is called as a non-serial schedule.
A non-serial schedule is acceptable if the final state produced by a non-serial schedule is
equivalent to some state produced by a serial schedule.
3. Equivalent schedule: Two schedules are said to be equivalent schedules if they produce the
same result on any database state.
Equivalent schedules always produce identical results.
4. Serializable schedule: A non-serial schedule that is equivalent to some serial execution of
transactions is known as serializable schedule.
5. Complete schedule: A schedule that contains either an abort or commit for each transaction
whose actions are listed in it is called as a complete schedule. A complete schedule must contain
all the actions of every transactions that appears in it.
RECOVERABILITY
There are 2 main types of damage happened to the data base.
Type 1:When there is catastrophic error to the database. Ex:-when the hard disk crashes.
The only way to recreate the data is if we have a copy of it stored as backup.
During back up the entire database and the database transaction logs are copied to an
alternative storage medium.
Type 2: there are several reasons for a transactions to fail which may lead to errors in database.
This is called as non-catastrophic errors.
There are 2 mechanisms to do such error recovery.
Deferred update method
Immediate update method
Deferred update method
In this method, the physical data on a database is not changed until the entire transactions
is completed successfully.
The successful completion of the transaction is called COMMIT point.
During commit all update are stored in a file called as database log file.
After committing the contents of the log are copied to the database.
If the transactions fails before reaching the commit point there is no need to UNDO any
data in database.
Therefore, this method is called NO UNDO/REDO METHOD.
Immediate update method
The dbms which use this method changes can be made on the database while a
transaction is still being processed.
However, before making any change in database the change is recorded in the log file.
Only fter recording the change in log file the database is modified at each step.
In this acse, if the tranasactions fails at some intermediate point then all updates perform
before the fail point must be rolled back i.e. we need to UNDO this change.
Next, to complete the transactions we must REDO the entire transactions.
Therefore this method is called UNDO/REDO method.
Chapter-7
CONCURRENCY CONTROL CONCEPTS
If only one transaction to be executed at a time in serial order ,then performance will be quite
poor.
Then to improve the performance we can execute the transaction concurrently .But due
to concurrent execution the lost update problem ,incorrect summary anomaly and dirty
reads occurs.
To recover from these problems we have to control the concurrency.
If concurrency is controlled properly then the transaction through put can be maximized
while avoiding the corruption of the data base.
The concurrency can be controlled by the technique called locking.
LOCKING:
A lock is a variable associated with a data item that describes the status of the item
with respect to possible operations that can be applied to it.
Locking is a mechanism to ensure data integrity while allowing maximum concurrent
access to data. It is used to implement concurrency control when multiple users access
table to manipulate its data at the same time.
The simplest form of lock can be a binary lock having only 2 possible state only -
1. LOCK
2. UNLOCK
Depending upon the rules we have found, we can classify the locks into two types.
Shared Lock(S): A transaction may acquire shared lock on a data item in order to read its
content. The lock is shared in the sense that any other transaction can acquire the shared lock on
that same data item for reading purpose.
Exclusive Lock(X): A transaction may acquire exclusive lock on a data item in order to both
read/write into it. The lock is excusive in the sense that no other transaction can acquire any kind
of lock (either shared or exclusive) on that same data item.
The relationship between Shared and Exclusive Lock can be represented by the following table
which is known as LockMatrix.
SharedExclusive
Shared TRUE FALSE
Exclusive FALSE FALSE
Two Phase Locking Protocol:-
The Two Phase Locking Protocol defines the rules of how to acquire the locks on a data item and
how to release the locks.
The Two Phase Locking Protocol assumes that a transaction can only be in one of two phases.
The basic 2PL allows release of lock at any time after all the locks have been acquired.
Disadvantage: Once a lock has been released on a data item it can be modified by another transaction
before the first transaction comits or aborts.
Strict 2PL:
To avoid such a situation we use strict 2PL. Transaction T does not release any of its exclusive lock (X)
until that transaction comits or aborts.
In this way no other transaction can access the item that is written by T unless the transaction T comits.
A deadlock situation occurs when two transaction wait indefinitely for each other to unlock data.
Deadlock occurs when 2 transaction T1 &T2 exist in the following mode.
T1= Access data item X and Y
If T1 has not unlocked data item Y, T2 cannot begin. If T2 has not unlocked data item X ,T1 can
not continue. Consequently T1 and T2 wait indefinitely, each waiting for the other to unlock the
required data item. Such situation is called as deadlock.
NECESSARY CONDITION
There are four necessary condition for dead lock to occur
1.Mutual Exclusion
2.Non Primitive Locking
3.Partial Allocation
4.Circular wait
1. Mutual Exclusion: A resource can be locked in exclusive mode by only one transaction at a
time.
2. NON -Primitive Locking: A data item can only be unlocked by the transaction that lock it No
other transaction can unlock it.
3. PARTIAL ALLOCATION: A transaction can acquire locks on data base in a piecemeal
fashion.
4. CIRCULAR WAITING: Transaction lock part of the data resources needed and then wait
indefinitely to lock the resources currently locked by other transaction.
DEADLOCK PREVENTION
In order to prevent the dead lock one has to ensure that at least one of above condition does not
occurs.
The better prevention algorithm have been evolved to prevent a deadlock having the basic logic:
not to allow the circular wait to occur.
Their are 2 schemes to prevent deadlock
1. Wait die scheme
2. Wound wait scheme
2. WOUND WAIT
It is based on the primitive technique .
It is based on the simple rule:
If Ti request a database resources that is held by Tj
Then if Ti has a larger timestamp than that of Tj, It is allowed to wait
else Tj is wounded of by Ti
FOR EXAMPLE:
Assume that 3 transaction T1, T2, T3 are generated in that sequence. . If T1 request for a data
item which is currently held by transaction T2 , then T2 is rolled back and data item is allowed to
T1. however if T3 request for a data item which is currently held by T2 then T3 is allowed to
wait.
Wait/Die Wound/Wait
Older process needs a resource held by younger
Older process waits Younger process dies
process
Younger process needs a resource held by older Younger Younger process
process process dies waits
• When Ti requests a data item held by Tj, then Ti Tj is inserted in the wait-for graph.
– This edge is removed only when Tj is no longer holding a data item needed by Ti.
• The system is in a deadlock state if and only if the wait-for graph has a cycle.
DEADLOCK RECOVERY:
1. Through preemption
2. Roll back
keep check point periodically
when a deadlock is detected , see which resource is needed.
Take away the resource from the process currently having it.
Later on , you can restart this process from a check pointed state where it may need to re
acquire the resource. from deadlock.
If more then one process takes action , the deadlock detection algorithm can repeatedly trigger.
LIVELOCK
A live lock is similar to a deadlock, except a condition that occurs when two or more
process continuously change their state in response to change in other process.
The result is that none of the process will complete its transaction.
Live lock is a special case of resource starvation the general definition only state that a
specific process is not progressing.
A real world example of live lock is it occurs when two people meet in a narrow corridor
and each tries to polite by moving a side to let the other pass but they end up swaying
from side to side without making any progress because they both repeatedly move the
same way at the same time.
If more than one process takes action, the deadlock detection algorithm can repeatedly
trigger .
This can be avoided by ensuring that only one process takes action.
Serializability
Serializability is a property of a transaction schedule (history). It relates to the isolation property
of a database transaction.
Chapter-8
SECURITY AND INTEGRITY
DATA ENCRYPTION
Sometimes some users want to bypass the system by physically removing the part
of the database or by tapping into a communication line.
The Most effective counter measure against such threats is data encryption; it
means storing and transmitting of data in an encrypted form.
The original data is called plaintext the plain text is encrypted by subjecting it to
an encryption algorithm, whose I/P’s are the plain text and encryption key.
The O/P’s from this algorithm is the encrypted form of the plain text and is called
as the cipher text.
The details of the encryption algorithm can be made public but encryption key is
kept secret.
The cipher text is stored in the database and transmitted down the communication
line.
The data encryption can be made by two means :-
1. Substitution:-
In This type of encryption the plain text is substituted by another key, it gives the cipher text.
2. Rearrange
In this type of encryption, the plain text characters are simply rearranged into some different
sequence and then it combines with the encryption key and gives the cipher text.
Neither of the approaches are particularly secure in itself but the algorithm that combines are to
provide.
AUTHORIZATION:-
The person who is in charge of specifying the authorization is usually called the
authorizer. The authorizer is not the DBA but the person who owns the data.
The authorization usually maintained in the table called as access matrix . The access
matrix contains rows called subject is columns called as objects .
The entity in the matrix at the portion corresponds to the intersection of a row & column
indicates type of access that the subject has with respect to the object.
Objects:-
An object in the database environment could be a unit of data that needs to be protected.
Thus a object can be data that needs to be protected. Thus a object can be data field, a
record , a file or view.
Subjects:-
Access Type:-
The access allowed to the user or program could be for data manipulation or control.
The manipulation operations are read, insert, update. The control operations are add,
drop, alter.
VIEWS:-
The relations that really exist in the database are referred to as base relation.
Sometimes for security purpose it is undesirable to have all the users to see the
entire relation. Any relation that is not part of the physical database i.e. virtual
relation is called as views/subschemas.
The view can be created from the base relation according to the requirement the
views can be created by ‘SQL’ Query.
Types of View
There are two types of view,
Simple View
Complex View
Simple View Complex View
SYNTAX:-
Modern DBMS support either or both of 2 approaches to data security . They are
as follows:-
1) Discretionary Model
2) Mandatory Control
Most of Systems supports the discretionary control and some systems supports the
mandatory control.
Discretionary Control:-
Here are given user will have different access regular on privileges on different
object, different users will have different rights on the same object.
This Scheme is very flexible.
SYNTAX:-
CREATE SECURITY RULE rule name GRANT
The constraint can be imposed at the time of table creation or after the
table has been created.
The constraint can be defined at the column or table level.
All the constraint are stored in the data dictionary.
The Syntax when constraint is defined at the column level is a s
follows:-
PRIMARY KEY:-
A Primary key constraint creates a primary key for the table by which each row can be
identify uniquely.
Only one primary key can be created for each table.
The primary key is the combination of unique & not null.
SYNTAX:-
CREATE TABLE table name (Column name 1 data type, Column name 2 data type
primary key, Column name 3 data type);
EXAMPLE:-
CREATE TABLE std (roll number (10), CONSTRAINT PK-roll primary key, name
varchar2 (10) unique, branch varchar2 (20), dob date not null);
FOREIGN KEY:-
The foreign key or referential integrity constraint designates a column as a foreign
key and establishes a relationship between primary key in the same table or a
different table.
The table where the foreign key is present is called the depend or child table where
the primary key which is referred is present is called the referenced or parent table.
SYNTAX:-
CREATE TABLE table name 1 (Column name 1 data type primary key, Column name 2
data type reference table name 2, (Column));
EXAMPLE:-
CREATE TABLE std (roll number (10), CONSTRAINT PK-roll primary key, name
varchar2 (10) unique, branch varchar2 (10), CONSTRAINT fR-branch);
CHECK:-
Check defines the condition that each row must satisfy. This constraint is used for
specifying range of values for a particular column of a table. When this constraint is being set on
a column, it ensures that the specified column must have the value falling in the specified range.
SYNTAX:-
CREATE TABLE table name 1 (Column name 1 data type check (condition), Column
name 2 data type);
EXAMPLE:-
CREATE TABLE sal (salary number (10) check ( salary, name varchar2(10));
TRANSACTION FILE & MASTER FILE:-
SECURITY CONSTRAINTS:-
Security in a database involves both policies and mechanisms to protect the data and ensure that
it is not accessed, altered or deleted without proper authorization.
There are four levels of defense or security constraints are generally recognized for database
security: human factors, physical security, administrative control, and DBMS and Operating
System Mechanisms .
1. Human Factors:-.
An organization usually performs some type of clearance procedure for personnel who are
going to be dealing with sensitive information, including that contained in a database.
This clearance procedure can be a very informal one, in the form of the reliability and trust
that an employee has earned in the eyes of management.
The authorizer is responsible for granting proper database access authorization to the user
community.
2. Physical Security:-
Physical security mechanisms include appropriate locks and keys to computing facility
and terminals.
Security and physical storage devices (magnetic disk packs etc.) within the organization
and when being transmitted from one location to another must be maintained..
Authorized terminals from which database access is allowed to have to be physically
secure, otherwise unauthorized persons may be able to access the data.
User identification and passwords have to be kept confidential.
3. Administrative Controls:- Administrative controls are the security and access control policies
that determine what information will be accessible to what class of users and the type of access
that will be allowed to this class.
4. DBMS and Operating System Mechanisms:
The proper mechanisms for the identification and verification of users.
Each user is assigned an account number and a password. The operating system ensures
that access to the system is denied unless the number and password are valid.
In addition to the DBMS could also require a number and password before allowing the
user to perform any database operations.
References
1. Book Reference: A. Silberschatz, H.F. Korth Database System Concepts
2. Book Reference: Database Management System (DBMS)A Practical
Approach Book by Rajiv Chopra
3. Book Reference: Database Management System by P.K Yadav
4. Book Reference: An Introduction to Database Systems Bipin C.Desai
5. https://nptel.ac.in
6. https://www.javatpoint.com
7. https://www.geeksforgeeks.org