1 - Introductory Concepts of DBMS: Define The Following Terms
1 - Introductory Concepts of DBMS: Define The Following Terms
2
1 – Introductory Concepts of DBMS
This leaves database in inconsistent state. But, it is difficult to ensure atomicity in a file
processing system.
Concurrent Access Anomalies
Multiple users are allowed to access data simultaneously (concurrently). This is for the
sake of better performance and faster response.
Consider an operation to debit (withdrawal) an account. The program reads the old
balance, calculates the new balance, and writes new balance back to database. Suppose
an account has a balance of Rs. 5000. Now, a concurrent withdrawal of Rs. 1000 and Rs.
2000 may leave the balance Rs. 4000 or Rs. 3000 depending upon their completion time
rather than the correct value of Rs. 2000.
Here, concurrent data access should be allowed under some supervision.
But, due to lack of co-ordination among different application programs, this is not
possible in file processing systems.
Security Problems
Database should be accessible to users in a limited way.
Each user should be allowed to access data concerning his application only.
For example, a customer can check balance only for his/her own account. He/She
should not have access for information about other accounts.
But, in file processing system, application programs are added in an ad hoc manner by
different programmers. So, it is difficult to enforce such kind of security constraints.
Data Access
3
1 – Introductory Concepts of DBMS
DBMS utilizes a variety of techniques to retrieve data.
Required data can be retrieved by providing appropriate query to the DBMS.
Thus, data can be accessed in convenient and efficient manner.
Data Integrity
Data in database must be correct and consistent.
So, data stored in database must satisfy certain types of constraints (rules).
DBMS provides different ways to implement such type of constraints (rules).
This improves data integrity in a database.
Data Security
Database should be accessible to user in a limited way.
DBMS provides way to control the access to data for different user according to their
requirement.
It prevents unauthorized access to data.
Thus, security can be improved.
Concurrent Access
Multiple users are allowed to access data simultaneously.
Concurrent access to centralized data can be allowed under some supervision.
This results in better performance of system and faster response.
Guaranteed Atomicity
Any operation on database must be atomic. This means, operation must be executed
either 100% or 0%.
This type of atomicity is guaranteed in DBMS.
Conceptual
Conceptual View
Level
Internal
Level Internal View
6
1 – Introductory Concepts of DBMS
Three levels ANSI SPARC Database System
Internal Level
7
1 – Introductory Concepts of DBMS
This is the lowest level of the data abstraction.
It describes how the data are actually stored on storage devices.
It is also known as a physical level.
The internal view is described by internal schema.
Internal schema consists of definition of stored record, method of representing the data
field and access method used.
Conceptual Level
This is the next higher level of the data abstraction.
It describes what data are stored in the database and what relationships exist among
those data.
It is also known as a logical level.
Conceptual view is defined by conceptual schema. It describes all records and
relationship.
External Level
This is the highest level of data abstraction.
It is also known as view level.
It describes only part of the entire database that a particular end user requires.
External view is describes by external schema.
External schema consists of definition of logical records, relationship in the external
view and method of deriving the objects from the conceptual view.
This object includes entities, attributes and relationship.
Explain Mapping. OR
Explain external and internal mapping. OR
What is mapping? Describe type of mapping.
Mapping
The process of transforming requests and results between the three levels is called
mapping.
Types of Mapping
Conceptual/Internal Mapping
External/Conceptual Mapping
Conceptual/Internal Mapping
It relates conceptual schema with internal schema.
It defines correspondence between the conceptual schema and the database stored in
physical devices.
It specifies how conceptual records and fields are presented at the internal level.
If the structure of stored database is changed, then conceptual/internal mapping must
be changed accordingly and conceptual schema can remain invariant.
There could be one mapping between conceptual and internal levels.
External/Conceptual Mapping
It relates each external schema with conceptual schema.
8
1 – Introductory Concepts of DBMS
It defines correspondence between a particular external view and conceptual schema.
If the structure of conceptual schema is changed, then external/conceptual mapping
must be changed accordingly and external schema can remain invariant.
There could be several mappings between external and conceptual levels.
9
1 – Introductory Concepts of DBMS
written previously.
Examples, people accessing database over the web, bank tellers, clerical staff etc.
DA is a business focused person, but, he/she DBA is a technically focused person, but, he/she
should understand more about the database should understand more about the business to
technology. administer the databases effectively.
10
1 – Introductory Concepts of DBMS
Authorization Manager — Checks the authority of users to access data.
Integrity Manager — Checks for the satisfaction of the integrity constraints.
Transaction Manager — Preserves atomicity and controls concurrency.
File Manager — Manages allocation of space on disk storage.
Buffer Manager — Fetches data from disk storage to memory for being used.
In addition to these functional units, several data structures are required to implement physical storage system.
These are described below:
Data Files — To store user data.
Data Dictionary and System Catalog — To store metadata. It is used heavily, almost for each and
every data manipulation operation. So, it should be accessed efficiently.
Indices — To provide faster access to data items.
Statistical Data — To store statistical information about the data in the database. This information is
used by the query processor to select efficient ways to execute a query.
11
1 – Introductory Concepts of DBMS
naive users
sophisticated
(tellers, agents, application database
users
web-users) programmers administrator
(analyst)
use write use
use
application administra
application query
interfaces -tion tool
program tool
DDL
compiler and interpreter
DML queries
linker
application
program object DML compiler
code and organizer
query evaluation
engine
query processor
storage manager
12
Explain database system 3 tier architecture with clear diagram
in detail.
Most widely used architecture is 3-tier architecture.
3-tier architecture separates it tier from each other on basis of users.
application client
Application Tier
network
application server
server
Database Tier
database system
3 tier architecture
Database (Data) Tier
At this tier, only database resides.
Database along with its query processing languages sits in layer-3 of 3-tier architecture.
It also contains all relations and their constraints.
Application (Middle) Tier
At this tier the application server and program, which access database, resides.
For a user this application tier works as abstracted view of database.
Users are unaware of any existence of database beyond application.
For database-tier, application tier is the user of it.
Database tier is not aware of any other user beyond application tier.
This tier works as mediator between the two.
User (Presentation) Tier
An end user sits on this tier.
From a users aspect this tier is everything.
He/she doesn't know about any existence or form of database beyond this layer.
At this layer multiple views of database can be provided by the application.
All views are generated by an application, which resides in application tier.
2 – Relational Model
Explain keys.
Super key
A super key is a set of one or more attributes that allow us to identify each tuple
uniquely in a relation.
For example, the enrollment_no, roll_no with department_name of a student is
sufficient to distinguish one student tuple from another. So {enrollment_no} and
{roll_no, department_name} both are super key.
Candidate key
Candidate key is a super key for which no proper subset is a super key.
For example, combination of roll_no and department_name is sufficient to distinguish
one student tuple from another. But either roll_no or department_name alone is not
sufficient to distinguish one student tuple from another. So {roll_no, department_name}
is candidate key.
Combination of enrollment_no and department_name is sufficient to distinguish one
student tuple from another. But enrollment_no alone is also sufficient to distinguish one
student tuple from another. So {enrollment_no, department_name} is not candidate
key.
Primary key
A Primary key is a candidate key that is chosen by database designer to identify tuples
uniquely in a relation.
Alternate key
An Alternate key is a candidate key that is not chosen by database designer to identify
tuples uniquely in a relation.
Foreign key
A foreign key is a set of one or more attributes whose values are derived from the
primary key attribute of another relation.
What is relational algebra? Explain relational algebraic
operation.
Relational algebra is a language for expressing relational database queries.
Relation algebra is a procedural query language.
Relational algebraic operations are as follows:
Selection:-
Operation: Selects tuples from a relation that satisfy a given condition.
It is used to select particular tuples from a relation.
It selects particular tuples but all attribute from a relation.
Symbol: σ (Sigma)
Notation: σ(condition) <Relation>
Operators: The following operators can be used in a condition.
=, ?, <, >, <=,>=, Λ(AND), ∨(OR)
1
2 – Relational Model
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
Student
Rno Name Dept CPI
101 Ramesh CE 8
109 Amit CE 7
133 Anita CE 9
Projection:-
Operation: Selects specified attributes of a relation.
It selects particular attributes but all tuples from a relation.
Symbol: ∏ (Pi)
Notation: ∏ (attribute set) <Relation>
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
Example: List out all students with their roll no, name and department name.
∏Rno, Name, Dept (Student)
2
2 – Relational Model
Output: The above query returns all tuples with three attributes roll no, name and
department name.
Output of above query is as follows
Student
Rno Name Dept
101 Ramesh CE
108 Mahesh EC
109 Amit CE
125 Chetan CI
138 Mukesh ME
128 Reeta EC
133 Anita CE
Example: List out all students of CE department with their roll no, name and
department name.
∏Rno, Name, Dept (σDept=“CE” (Student))
Output: The above query returns tuples which contain CE as department with three
attributes roll no, name and department name.
Output of above query is as follows
Student
Rno Name Dept
101 Ramesh CE
109 Amit CE
133 Anita CE
Division:-
Operation: The division is a binary relation that is written as R÷S.
The result consists of the header of R but not in the header of S, for which it
holds that all the tuples in S are presented in R.
Symbol: ÷
Notation: R ÷ S
Consider following table
3
2 – Relational Model
Work
Student Task
Shah Database1
Project Shah Database2
Task Shah Compiler1
Database1 Vyas Database1
Database2 Vyas Compiler1
Patel Database1
Patel Database2
4
2 – Relational Model
Example: Find out all students having both tasks Database1 as well as Database2.
∏(student)(Work) ÷ ∏(Task)(Project)
Output: It gives name of all students whose task is both
Database1 as well as Database2. Output of above query is
as follows
Student
Shah
Patel
Cartesian product:-
Operation: Combines information of two relations.
It will multiply each tuples of first relation to each tuples of second relation.
It is also known as Cross product operation and similar to mathematical
Cartesian product operation.
Symbol: X (Cross)
Notation: Relation1 X Relation2
Resultant Relation :
If relation1 and relation2 have n1 and n2 attributes respectively, then resultant
relation will have n1 + n2 attributes from both the input relations.
If both relations have some attribute having same name, it can be distinguished
by combing relation-name.attribute-name.
If relation1 and relation2 have n1 and n2 tuples respectively, then resultant
relation will have n1*n2 attributes, combining each possible pair of tuples from
both the input relations.
R R×S
A 1 A 1 A 1
B 2 A 1 D 2
D 3 A 1 E 3
B 2 A 1
S
B 2 D 2
A 1 B 2 E 3
D 2 D 3 A 1
E 3 D 3 D 2
D 3 E 3
Consider following table
Emp Dept
Empid Empname Deptname Deptname Manager
S01 Manisha Finance Finance Arun
S02 Anisha Sales Sales Rohit
S03 Nisha Finance Production Kishan
5
2 – Relational Model
Example:
Emp × Dept
Empid Empname Emp.Deptname Dept.Deptname Manager
S01 Manisha Finance Finance Arun
S01 Manisha Finance Sales Rohit
S01 Manisha Finance Production Kishan
S02 Anisha Sales Finance Arun
S02 Anisha Sales Sales Rohit
S02 Anisha Sales Production Kishan
S03 Nisha Finance Finance Arun
S03 Nisha Finance Sales Rohit
S03 Nisha Finance Production Kishan
Join:-
Natural Join Operation (⋈)
Operation: Natural join will retrieve information from multiple relations. It works in
three steps.
1. It performs Cartesian product
2. Then it finds consistent tuples and inconsistent tuples are deleted
3. Then it deletes duplicate attributes
Symbol: ⋈
Notation: Relation1 ⋈ Relation2
Consider following table
Emp Dept
Empid Empname Deptname Deptame Manager
S01 Manisha Finance Finance Arun
S02 Anisha Sales Sales Rohit
S03 Nisha Finance Production Kishan
Example:
Emp ⋈ Dept
Empid Empname Deptname Manager
S01 Manisha Finance Arun
S02 Anisha Sales Rohit
S03 Nisha Finance Arun
6
2 – Relational Model
In natural join some records are missing if we want that missing records than we have to
use outer join.
The outer join operation can be divided into three different forms:
1. Left outer join ( )
2. Right outer join ( )
3. Full outer join ( )
Consider following tables
College Hostel
Name Id Department Name Hostel_name Room_no
Manisha S01 Computer Anisha Kaveri hostel K01
Anisha S02 Computer Nisha Godavari hostel G07
Nisha S03 I.T. Isha Kaveri hostel K02
Left outer join ( )
The left outer join retains all the tuples of the left relation even through there is no
matching tuple in the right relation.
For such kind of tuples having no matching, the attributes of right relation will be
padded with null in resultant relation.
Example : College Hostel
College Hostel
Name Id Department Hostel_name Room_no
Manisha S01 Computer Null Null
Anisha S02 Computer Kaveri hostel K01
Nisha S03 I.T. Godavari hostel G07
7
2 – Relational Model
Example : College Hostel
College Hostel
Name Id Department Hostel_name Room_no
Manisha S01 Computer Null Null
Anisha S02 Computer Kaveri hostel K01
Nisha S03 I.T. Godavari hostel G07
Isha Null Null Kaveri Hostel K02
Union
Operation: Selects tuples those are in either or both of the relations.
Symbol : U(Union)
Notation : Relation1 U Relation2
Requirement: Union must be taken between compatible
relations. Relations R and S are compatible, if
Both have same arity, i.e. total numbers of attributes, and
Domains of ith attribute of R and S are same type.
8
2 – Relational Model
Example :
R S RUS
A 1
B 2
A 1
A 1 C 2
B 2
C 2 D 3
D 3
D 3 F 4
F 4
E 4 E 5
E 5
E 4
9
Intersection
Operation: Selects tuples those are in both relations.
Symbol : ∩ (Intersection)
Notation : Relation1 ∩ Relation2
Requirement: Set-intersection must be taken between compatible relations.
Relations R and S are compatible, if
Both have same arity, i.e. total numbers of attributes, and
Domains of ith attributes of R and S are same type.
Example
R S R∩S
A 1
B 2 A 1
D 3 C 2 A 1
F 4 D 3 D 3
E 5 E 4
Cst — Emp
Id Name
4 Isha
Rename:-
Operation: It is used to rename a relation or attributes.
Symbol: ρ (Rho)
Notation: ρA(B) Rename relation B to A.
ρA(X1,X2….Xn)(B) Rename relation B to A and its attributes to X1, X2, …., Xn.
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
Example: Find out highest CPI from student table.
∏CPI (Student) — ∏A.CPI (σ A.CPI<B.CPI (ρA (Student) X ρB (Student)))
Output: The above query returns highest CPI.
Output of above query is as follows
CPI
9
Aggregate Function:-
Operation: It takes a more than one value as input and returns a single value as output
(result).
Symbol: G
Notation: G function (attribute) (relation)
Aggregate functions: Sum, Count, Max, Min, Avg.
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
Example: Find out sum of all students CPI.
G sum (CPI) (Student)
Output: The above query returns sum of CPI.
Output of above query is as follows
sum
51
Example: Find out max and min CPI.
G max (CPI), min (CPI) (Student)
Output: The above query returns sum of CPI.
Output of above query is as follows
max min
9 6
SQL Concepts
(ii) Find out all customer who have an account in ‘Ahmedabad’ city and
balance is greater than 10,000.
∏customer_name (σ Branch.branch_city=“Ahmedabad” Λ σ Account.balance >10000 (Branch ⋈ Account ⋈
Depositor))
(iii) find out list of all branch name with their maximum balance.
∏branch_name , G max (balance) (Account)
1
SQL Concepts
What is constraint? Explain types of constraints.
3
SQL Concepts
A table cannot have more than one primary key.
If multiple columns need to be defined as primary key column, then only table level
definition is applicable.
Maximum 16 columns can be combined as a composite primary key in a table.
Foreign Key Constraint
A foreign key constraint is also called referential integrity constraint, is specified
between two tables.
This constraint is used to ensure consistency among records of the two tables.
The table, in which a foreign key is defined, is called a foreign table, detail table or child
table.
The table, of which primary key or unique key is defined, is called a primary table,
master table or parent table.
Restriction on child table :
Child table contains a foreign key. And, it is related to master table.
Insert or update operation involving value of foreign key is not allowed, if
corresponding value does not exist in a master table.
Restriction on master table :
Master table contains a primary key, which is referred by foreign key in child
table.
Delete or update operation on records in master table are not allowed, if
corresponding records are present in child table.
Syntax : ColumnName datatype (size) REFERENCES TableName (ColumnName)
Example :
create table Account (ano char(3) ,
Balance number(9),
Branch varchar2(10) REFERENCES branch_detail(branch));
Master table must be exist before creating child table.
Explain DDL, DML, DCL and DQL. OR
Describe component of SQL.
DDL (Data Definition Language)
It is a set of SQL commands used to create, modify and delete database objects such as
tables, views, indices, etc.
It is normally used by DBA and database designers.
It provides commands like:
CREATE: to create objects in a database.
ALTER: to alter the schema, or logical structure, of the database.
DROP: to delete objects from the database.
TRUNCATE: to remove all records from the table.
4
SQL Concepts
DML (Data Manipulation Language)
It is a set of SQL commands used to insert, modify and delete data in a database.
It is normally used by general users who are accessing database via pre-developed
applications.
It provides commands like:
INSERT: to insert data into a table.
UPDATE: to modify existing data in a table.
DELETE: to delete records from a table.
LOCK: to lock tables to provide concurrency control among multiple users.
DQL (Data Query Language)
It is a component of SQL that allows data retrieval from the database.
It provides command like SELECT. This command is a heart of SQL, and allows
data retrieval in different ways.
DCL (Data Control Language)
It is set of SQL commands used to control access to data and database. Occasionally DCL
commands are grouped with DML commands.
It provides commands like:
COMMIT: to save work permanently.
ROLLBACK: to undo work and restore database to previous state.
SAVEPOINT: to identify a point in a transaction to which work can be undone.
GRANT: to give access privileges to users on the database.
REVOKE: to withdraw access privileges given to users on the database.
Describe the following SQL functions.
SQL Function Description SQL Query Example
Numeric function
Abs(n) Returns the absolute value of n. Select Abs(-15) from dual;
O/P : 15
Power (m,n) Returns m raised to n power.
th
Select power(3,2) from dual;
O/P : 9
Round (n,m) Returns n rounded to m places the Select round(15.91,1) from dual;
right of decimal point. o/p : 15.9
Sqrt(n) Returns square root of n. Select sqrt(25) from dual;
O/P : 5
th
Exp(n) Returns e raised to the n power, Select exp(1) from dual;
O/P : 1
e=2.17828183.
Gretest() Returns the greatest value in a list Select greatest(10, 20, 30) from dual;
of values. O/P : 30
5
SQL Concepts
Least () Returns the least value in a list of Select least(10,20,30) from dual;
values. O/P : 10
Mod(n,m) Returns remainder of n divided by Select mod(10,2) from dual;
m. O/P : 0
Ceil(n) Returns the smallest integer value Select ceil(24.8) from dual;
that is greater than or equal to a O/P : 25
number.
ASCII(x) Returns ASCII value of character. Select ascii(‘A’) from dual;
O/P : 97
Concat() Concatenates two strings. Select concat(‘great’,’DIET’) from dual;
O/P : greatDIET
Initcap () Changes the first letter of a word in Select initcap(‘diet’) from dual;
to capital. O/P : Diet
Instr () Returns a location within the string Select instr(‘this is test’,’is) from dual;
where search patterns begins. O/P : 3
Length () Returns the number of character in Select length(‘DIET’) from dual;
x. O/P : 4
Lower() Converts the string to lower case. Select lower(‘DIET’) from dual;
O/P : diet
Upper() Converts the string to upper case. Select upper(‘diet’) from dual;
O/P : DIET
Lpad() Pads x with spaces to left to bring Select lpad(‘abc’,9,’>’) from dual;
the total length of the string up to O/P : >>>>>>abc
width characters.
Rpad() Pads x with spaces to right to bring Select rpad(‘abc’,9,’>’) from dual;
the total length of the string up to O/P : abc>>>>>>
width characters.
Ltrim() Trim characters from the left of x. Select ltrim(‘sumita‘,’usae’) from dual;
O/P : mita
Rtrim() Trim characters from the right of x. Select rtrim(‘sumita‘,’tab’) from dual;
O/P : sumi
Replace() Looks for the string and replace the Select replace(‘this is college’,’is’,’may
string every time it occurs. be’) from dual;
O/P :thmay be may be college
Substr() Returns the part of string Select substr(‘this is college’,6,7) from
dual;
O/P : is my c
6
SQL Concepts
Vsize() Returns storage size of string in Select vsize(‘abc’) from dual;
oracle. O/P : 3
Miscellaneous function
Uid Returns integer value corresponding Select uid from dual;
to the UserId. O/P : 618
User Returns the name of user. Select user from dual;
O/P : admin
Userenv Returns the information about the Select userenv (‘language’) from dual;
current oracle session. O/P :
AMERICAN_AMERICA.WE8MSWIN1252
Avg(x) Returns the average value. Select avg(salary) from employee;
Count(x) Returns the number of row return Select count(deptno) from employee;
by a query.
Max(x) Returns the maximum value of x. Select max(salary) from employee;
Min(x) Returns the minimum value of x. Select min(salary) from employee;
Median(x) Returns the median value of x. Select median(salary) from employee;
Sum(x) Returns the sum of x. Select sum(salary) from employee;
Stddev(x) Returns standard deviation of x. Select stddev(salary) from employee;
Variance(x) Returns variance of x. Select variance(salary) from employee;
7
SQL Concepts
Months_between Gets the number of months between Select months_between(’01-jan-
x and y. 2005’,’01-feb-2005’) from dual;
O/P : 1
Next_day Returns date of the next day Select next_day(’01-jan-
following x. 2005’,’saturday’) from dual;
O/P : 08-jan-2005
8
SQL Concepts
Example: Consider following table
Student
Rollno Name CGPA
1 Raju 8
2 Hari 9
3 Mahesh 7
9
SQL Concepts
Explanation:
privilege_name is the access right or privilege granted to the user. Some
of the access rights are ALL, EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW,
STORED PROC and SEQUENCE.
user_name is the name of the user to whom an access right is being
granted.
PUBLIC is used to grant access rights to all users.
WITH GRANT OPTION - allows a user to grant (give) access rights to other
users.
2. REVOKE: The REVOKE is a command used to take back access or privileges or
rights on the database objects from the users.
Syntax:
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC }
Explanation:
privilege_name is the access right or privilege want to take back from the
user. Some of the access rights are ALL, EXECUTE, and SELECT.
object_name is the name of an database object like TABLE, VIEW,
STORED PROC and SEQUENCE.
user_name is the name of the user from whom an access right is being
taken back.
PUBLIC is used to take back access rights to all users.
Explain on delete cascade with example.
A foreign key with cascade delete means that if a record in the Parent (Master) table is
deleted, then the corresponding records in the Child (Foreign) table with automatically
be deleted. This is called a cascade delete in Oracle.
A foreign key with a cascade delete can be defined in either a CREATE TABLE statement
or an ALTER TABLE statement.
Syntax (Create table)
CREATE TABLE table_name
(
column1 datatype null/not null,
column2 datatype null/not null,
...
CONSTRAINT fk_column
FOREIGN KEY (column1, column2, ... column_n)
REFERENCES parent_table (column1, column2, ...
column_n)
10
SQL Concepts
ON DELETE CASCADE
);
11
PL/SQL Concepts
Explanation
Create:-It will create a procedure.
Replace:- It will re-create a procedure if it already exists.
We can pass parameters to the procedures in three ways.
1. IN-parameters: - These types of parameters are used to send values to stored
procedures.
1
2. OUT-parameters: - These types of parameters are used to get values from stored
procedures. This is similar to a return type in functions but procedure can return values
for more than one parameters.
3. IN OUT-parameters: - This type of parameter allows us to pass values into a procedure
and get output values from the procedure.
IS indicates the beginning of the body of the procedure. The code between IS and BEGIN forms
the Declaration section.
Begin:-It contains the executable statement.
Exception:- It contains exception handling part. This section is optional.
End:- It will end the procedure.
The syntax within the brackets [ ] indicates that they are optional.
By using CREATE OR REPLACE together the procedure is created if it does not exist and if it
exists then it is replaced with the current code.
Syntax of Trigger
CREATE [OR REPLACE] TRIGGER trigger_name
[BEFORE / AFTER]
[INSERT / UPDATE / DELETE [of columnname]]
ON table_name
[REFERENCING [OLD AS old, NEW AS new]]
[FOR EACH ROW [WHEN condition]]
DECLARE
Declaration section
BEGIN
Executable statements
END;
CREATE [OR REPLACE ] TRIGGER trigger_name:- This clause creates a trigger with the given
name or overwrites an existing trigger.
[BEFORE | AFTER]:- This clause indicates at what time the trigger should be fired. Before
updating the table or after updating a table.
[INSERT / UPDATE / DELETE]:- This clause determines on which kind of statement the trigger
should be fired. Either on insert or update or delete or combination of any or all. More than
one statement can be used together separated by OR keyword. The trigger gets fired at all the
specified triggering event.
[OF columnname]:- This clause is used when you want to trigger an event only when a specific
column is updated. This clause is mostly used with update triggers.
[ON table_name]:- This clause identifies the name of the table or view to which the trigger is
related.
[REFERENCING OLD AS old NEW AS new]:- This clause is used to reference the old and new
values of the data being changed. By default, you reference the values as old. column_name or
new.column_name. The reference names can also be changed from old or new to any other
user-defined name. You cannot reference old values while inserting a record, or new values
while deleting a record, because they do not exist.
[FOR EACH ROW]:- This clause is used to determine whether a trigger must fire when each row
gets affected ( i.e. a Row Level Trigger) or just once when the entire SQL statement is
executed(i.e.statement level Trigger).
WHEN (condition):- The trigger is fired only for rows that satisfy the condition specified. This
clause is valid only for row level triggers.
Example 1
We are creating trigger that display message if we insert negative value or update value to
negative value in bal column of account table.
CREATE OR REPLACE TRIGGER balnegative
BEFORE insert OR update
On account
FOR EACH ROW
BEGIN
IF :NEW.bal<0 THEN
dbms.output.put.line (‘balance is negative’)
END IF;
END;
OUTPUT:- Trigger is created.
Now when you perform insert operation on account table.
SQL:> Insert into account (bal) values (-2000); OR
Update account set bal=-5000 where bal=1000;
It displays following message before executing insert or update statement.
Output:- Balance is negative.
We get message that balance is negative it indicates that trigger has executed before
the insertion or update operation.
Example 2
CREATE TRIGGER trig1
AFTER insert ON T4
REFERENCING NEW AS newRow
FOR EACH ROW
WHEN (newRow.a <= 10)
BEGIN
INSERT INTO T5 VALUES (newRow.a, newRow.b);
END;
OUTPUT:- Trigger is created.
Explanation:- We have following two tables
CREATE TABLE T4 (a INTEGER, b CHAR(10));
CREATE TABLE T5 (c INTEGER, d CHAR(10));
We have created a trigger that may insert a tuple into T5 when a tuple is inserted into
T4. It will check that the record insert in T4 is 10 th or less than 10th record if so then the
same record is inserted into table T5.
Example 3
CREATE TRIGGER Person_Check_Age
AFTER insert OR update OF age ON Person
FOR EACH ROW
BEGIN
IF (:new.age < 0) THEN
dbms.output.put.line('no negative age allowed');
END IF;
END;
OUTPUT:- Trigger is created.
Explanation:-In above trigger if we are inserting negative value in Age column then it will give
error. If we insert negative value in other column then it will not give any error.
What is cursor? Explain with example. OR
Write short note on cursors and its types.
Cursors are database objects used to traverse the results of a select SQL query.
It is a temporary work area created in the system memory when a select SQL statement
is executed.
This temporary work area is used to store the data retrieved from the database, and
manipulate this data.
It points to a certain location within a record set and allow the operator to move
forward (and sometimes backward, depending upon the cursor type).
We can process only one record at a time.
The set of rows the cursor holds which is called the active set (active data set).
Cursors are often criticized for their high overhead.
There are two types of cursors in PL/SQL:
1. Implicit cursors:
These are created by default by ORACLE itself when DML statements like,
insert, update, and delete statements are executed.
They are also created when a SELECT statement that returns just one row
is executed.
We cannot use implicit cursors for user defined work.
2. Explicit cursors:
Explicit cursors are user defined cursors written by the developer.
They can be created when a SELECT statement that returns more than
one row is executed.
Even though the cursor stores multiple records, only one record can be
processed at a time, which is called as current row.
When you fetch a row, the current row position moves to next row.
Attributes of Cursor:-
Attributes Return Value Example
%FOUND It will return TRUE, if the DML statements like INSERT, SQL%FOUND
DELETE, UPDATE and SELECT will affect at least one row
else return FALSE
%NOTFOUND It will return FALSE, if the DML statements like INSERT, SQL%NOTFOUND
DELETE, UPDATE and SELECT will affect at least one row
else return TRUE
%ROWCOUNT Return the number of rows affected by the DML SQL%ROWCOUNT
operations INSERT, DELETE, UPDATE, SELECT
%ISOPEN It will return true if cursor is open else return false. SAL%OPEN
3) Fetching data:-
We cannot process selected row directly. We have to fetch column values of a row into
memory variables. This is done by FETCH statement.
Syntax:-
FETCH cursorname INTO variable1, variable2……….
4) Processing data:-
This step involves actual processing of current row.
5) Closing cursor:-
A cursor should be closed after the processing of data completes. Once you close the
cursor it will release memory allocated for that cursor.
Syntax:-
CLOSE cursorname;
Example:-
DECLARE CURSOR emp_cur IS SELECT emp_rec FROM emp_tbl WHERE salary > 10000;
BEGIN
OPEN emp_cur;
FETCH emp_cur INTO emp_rec;
dbms_output.put_line (emp_rec.first_name || ' ' || emp_rec.last_name);
CLOSE emp_cur;
END;
Explanation:-
In the above example, first we are delaring a cursor whose name is emp_cur with select
query.
In select query we have used where condition such that salary<10000. So active data set
contain only such records whose salalry less than 10000.
Then, we are opening the cursor in the execution section.
Then we are fetching the records from cursor to the variable named emp_rec.
Then we are displaying the first_name and last_name of records fetched in variable.
At last as soon as our work completed we are closing the cursor.
Write a PL/SQL block to print the sum of Numbers from 1 to 50.
declare
sum number:=0;
begin
for i in 1..50
loop
sum:=sum+i;
end loop;
dbms_output.put_line('The sum of 100 natural nos is = '||sum);
end;
1
3 – Entity-Relationship Model
A1 B1
A2 B2
A3 B3
A4 B4
customer-name customer-address
customer loan
borrower
A customer is connected with only one loan using the relationship borrower and a loan
is connected with only one customer using borrower.
2
3 – Entity-Relationship Model
One-to-many relationship
A B
A1 B1
A2 B2
A3 B3
A4 B4
An entity in A is associated with any number (zero or more) of entities in B and an entity
in B is associated with at most one (only) entity in A.
customer-name customer-address
loan-no amount
customer-id customer-city
customer loan
borrower
In the one-to-many relationship a loan is connected with only one customer using
borrower and a customer is connected with more than one loans using borrower.
Many-to-one relationship
A B
A1 B1
A2 B2
A3 B3
A4 B4
3
3 – Entity-Relationship Model
An entity in A is associated with at most (only) one entity in B and an entity in B is
associated with any number (zero or more) of entities in A.
customer-name customer-address
customer loan
borrower
In a many-to-one relationship a loan is connected with more than one customer using
borrower and a customer is connected with only one loan using borrower.
Many-to-many relationship
A B
A1 B1
A2 B2
A3 B3
A4 B4
An entity in A is associated with any number (zero or more) of entities in B and an entity
in B is associated with any number (zero or more) of entities in A.
customer-name customer-address
customer loan
borrower
A customer is connected with more than one loan using borrower and a loan is
connected with more than one customer using borrower.
4
3 – Entity-Relationship Model
customer-name customer-address
customer loan
borrower
5
3 – Entity-Relationship Model
The primary key of a weak entity set is created by combining the primary key of the
strong entity set on which the weak entity set is existence dependent and the weak
entity set’s discriminator.
We underline the discriminator attribute of a weak entity set with a dashed line.
E.g. in below fig. there are two entities loan and payment in which loan is strong entity
set and payment is weak entity set.
Payment entity has payment-no which is discriminator.
Loan entity has loan-no as primary key.
So primary key for payment is (loan-no, payment-no).
payment-date
loan payment
loan-payment
Strong Entity
Weak Entity Relationship Weak Entity
6
3 – Entity-Relationship Model
subclasses.
superclass
account
7
3 – Entity-Relationship Model
For example, customer entities may be described further by the attribute customer-id,
credit-rating and employee entities may be described further by the attributes
employee-id and salary.
The process of designating sub groupings within an entity set is called specialization. The
specialization of person allows us to distinguish among persons according to whether
they are employees or customers.
Now again, employees may be further classified as one of the following:
officer
teller
secretary
Each of these employee types is described by a set of attributes that includes all the
attributes of entity set employee plus additional attributes.
For example, officer entities may be described further by the attribute office-number,
tellerentities by the attributes station-number and hours-per-week, and secretary
entities by the attribute hours-per-week.
In terms of an E-R diagram, specialization is depicted by a triangle component labeled
ISA.
The label ISA stands for “is a” and represents, for example, that a customer “is a”
person.
The ISA relationship may also be referred to as a superclass subclass relationship.
Generalization
A bottom-up design process that combines number of entity sets that have same
features into a higher-level entity set.
The design process proceed in a bottom-up manner, in which multiple entity sets are
synthesized into a higher level entity set on the basis of common features.
The database designer may have to first identify a customer entity set with the
attributes name, street, city, and customer-id, and an employee entity set with the
attributes name, street, city, employee-id, and salary.
But customer entity set and the employee entity set have some attributes common. This
commonality can be expressed by generalization, which is a containment relationship
that exists between a higher level entity set and one or more lower level entity sets.
In our example, person is the higher level entity set and customer and employee are
lower level entity sets.
Higher level entity set is called superclass and lower level entity set is called subclass. So
the person entity set is the superclass of two subclasses customer and employee.
Differences in the two approaches may be characterized by their starting point and
overall goal.
8
3 – Entity-Relationship Model
person
salary
ISA credit-rating
employee customer
ISA
officer-number hours-worked
station-number hours-worked
Explain types of constraints on specialization and
Generalization.
There are mainly two types of constraints apply to a specialization/generalization:
Disjoint Constraint
Describes relationship between members of the superclass and subclass and indicates
whether member of a superclass can be a member of one, or more than one subclass.
It may be disjoint or non-disjoint (overlapping).
Disjoint Constraint
It specifies that the subclasses of the specialization must be disjointed (an entity can be
a member of only one of the subclasses of the specialization).
Specified by ‘d’ in EER diagram or by writing disjoint.
Non-disjoint (Overlapping)
It specifies that is the same entity may be a member of more than one subclass of the
specialization.
Specified by ‘o’ in EER diagram or by writing overlapping.
Participation Constraint
Determines whether every member in super class must participate as a member of a
subclass or not.
It may be total (mandatory) or partial (optional).
9
3 – Entity-Relationship Model
Total (Mandatory)
Total specifies that every entity in the superclass must be a member of some subclass in
the specialization.
Specified by a double line in EER diagram.
Partial (Optional)
Partial specifies that every entity in the super class not belong to any of the subclass of
specialization.
Specified by a single line in EER diagram.
Based on these two different kinds of constraints, a specialization or generalization can
be one of four types
Total, Disjoint
Total, Overlapping
Partial, Disjoint
Partial, Overlapping.
Explain aggregation in E-R diagram.
The E-R model cannot express relationships among relationships.
When would we need such a thing at that time aggregation is used.
Consider a database with information about employees who work on a particular
project and use a number of machines doing that work.
work
hours hours
name id number name id number
uses uses
machineary machineary
id id
Fig. A Fig. B
Relationship sets work and uses could be combined into a single set. We can
combine them by using aggregation.
10
3 – Entity-Relationship Model
Aggregation is an abstraction through which relationships are treated as higher-level
entities.
For our example, we treat the relationship set work and the entity sets employee and
project as a higher-level entity set called work.
Transforming an E-R diagram with aggregation into tabular form is easy. We create
a table for each entity and relationship set as before.
The table for relationship set uses contains a column for each attribute in the primary
key of machinery and work.
PersonID
Person
Phone
Name
Address Email
The initial relational schema is expressed in the following format writing the table
names with the attributes list inside a parentheses as shown below
Persons( personid, name, address, email )
Persons and Phones are Tables and personid, name, address and email are Columns
(Attributes).
personid is the primary key for the table : Person
Step 2: Multi-Valued Attributes
A multi-valued attribute is usually represented with a double-line oval.
PersonID Person
Phone
11
3 – Entity-Relationship Model
If you have a multi-valued attribute, take that multi-valued attribute and turn it into a
new entity or table of its own.
Then make a 1:N relationship between the new entity and the existing one.
In simple words.
1. Create a table for that multi-valued attribute.
2. Add the primary (id) column of the parent entity as a foreign key within
the new table as shown below:
First table is Persons ( personid, name, lastname, email )
Second table is Phones ( phoneid , personid, phone )
personid within the table Phones is a foreign key referring to the personid of Persons
Step 3: 1:1 Relationship
WifeID Name
Wife
Have
PersonID
Person
Phone
Name
Address Email
Let us consider the case where the Person has one wife. You can place the primary key
of the wife table wifeid in the table Persons which we call in this case Foreign key as
shown below.
Persons( personid, name, lastname, email , wifeid )
Wife ( wifeid , name )
Or vice versa to put the personid as a foreign key within the wife table as shown below:
Persons( personid, name, lastname, email )
Wife ( wifeid , name , personid)
For cases when the Person is not married i.e. has no wifeID, the attribute can set to
NULL
12
3 – Entity-Relationship Model
Step 4: 1:N Relationships
HouseID House Name
Address
Has
Person
PersonID
Phone
Name
Address Email
For instance, the Person can have a House from zero to many, but a House can have
only one Person.
In such relationship place the primary key attribute of table having 1 mapping in to the
table having many cardinality as a foreign key.
To represent such relationship the personid as the Parent table must be placed within
the Child table as a foreign key.
It should convert to :
Persons( personid, name, address, email )
House ( houseid, name , address, personid)
Step 5: N:N Relationships
CountryID Name
Country
Has
PersonID
Person
Phone
Name
Address Email
For instance, The Person can live or work in many countries. Also, a country can have
many people. To express this relationship within a relational schema we use a separate
table as shown below:
13
3 – Entity-Relationship Model
It should convert into :
Persons( personid, name, lastname, email )
Countries ( countryid, name, code)
HasRelat ( hasrelatid, personid , countryid)
What is E-R model (Entity-Relationship) model (diagram) also
draw various symbols using in E-R diagram.
E-R model
Entity-relationship (ER) diagram is a graphical representation of entities and their
relationships to each other with their attributes.
14
4 –Relational Database Design
Example
Consider the relation Account(ano, balance, bname). In this relation ano can determines
balance and bname. So, there is a functional dependency from ano to balance and
bname.
This can be denoted by ano → {balance, bname}.
Account:
ano balance bname
Types of Functional
Dependencies Full Dependency
In a relation, the attribute B is fully functional dependent on A if B is functionally
dependent on A, but not on any proper subset of A.
Eg. {Roll_No, Department_Name}→SPI
Partial Dependency
In a relation, the attribute B is partial functional dependent on A if B is functionally
dependent on A as well as on any proper subset of A.
1
4 –Relational Database Design
If there is some attribute that can be removed from A and the still dependency holds.
Eg. {Enrollment_No, Department_Name}→SPI
Transitive Dependency
In a relation, if attribute(s) A→B and B→C, then C is transitively depends on A via B
(provided that A is not functionally dependent on B or C).
Eg. Staff_No→Branch_No and Branch_No→Branch_Address
Trivial FD:
X→Y is trivial FDif Y is a subset of X
Eg.{Roll_No, Department_Name}→Roll_No
Nontrivial FD
X→Y is nontrivial FDif Y is not a subset of X
Eg.. {Roll_No, Department_Name}→Student_Name
Use of functional dependency in database design
To test relations to see if they are legal under a given set of functional dependencies. If a
relation r is legal under a set F of functional dependencies, we say that r satisfies F.
To specify constraints on the set of legal relations.
List and explain Armstrong's axioms (inference rules).
Let A, B, and C is subsets of the attributes of the relation R.
Reflexivity
If B is a subset of A then A → B
Augmentation
If A → B then AC → BC
Transitivity
If A → B and B → C then A → C
Self-determination
A→A
Decomposition
If A → BC then A → B and A → C
Union
If A → B and A → C then A → BC
Composition
If A → B and C → D then A,C → BD
What is redundant functional dependency? Write an algorithm
to find out redundant functional dependency.
Redundant functional dependency
A FD in the set is redundant, if it can be derived from the other FDs in the set.
2
4 –Relational Database Design
Algorithm
Input: Let F be a set of FDs for relation R.
Let f: A → B is a FD to be examined for redundancy.
Steps:
1. F’ = F – f # find out new set of FDs by removing f from F
2. T = A # set T = determinant of A → B
3. For each FD: X → Y in F’ Do
If X T Then # if X is contained in T
T=TUY # add Y to T
End if
End For
4. If B T Then # if B is contained in T
f : A → B is redundant. # given FD f : A → B is redundant. `
End if
Output: Decision whether a given FD f : A → B is redundant or not.
Example
Suppose a relation R is given with attributes A, B, C, D and E.
Also, a set of functional dependencies F is given with following FDs.
F = {A →B, C →D, BD → E, AC → E}
Account_Branch
Ano Balance Bname Baddress
A01 5000 Vvn Mota bazaar, VVNagar
A02 6000 Ksad Chhota bazaar, Karamsad
A03 7000 Anand Nana bazaar, Anand
A04 8000 Ksad Chhota bazaar, Karamsad
A05 6000 Vvn Mota bazaar, VVNagar
This relation can be divided with two different relations
1. Account (Ano, Balance, Bname)
2. Branch (Bname, Baddress)
These two relations are shown in below figure
Account Branch
Ano Balance Bname Bname Baddress
A01 5000 Vvn Vvn Mota bazaar, VVNagar
A02 6000 Ksad Ksad Chhota bazaar, Karamsad
A03 7000 Anand Anand Nana Bazar, Anand
A04 8000 Ksad
A05 6000 Vvn
A decomposition of relation can be either lossy decomposition or lossless
decomposition.
There are two types of decomposition
1. lossy decomposition
2. lossless decomposition (non-loss decomposition)
Lossy Decomposition
The decomposition of relation R into R1 and R2 is lossy when the join of R1 and R2
does not yield the same relation as in R.
This is also referred as lossy-join decomposition.
The disadvantage of such kind of decomposition is that some information is lost during
retrieval of original relation. And so, such kind of decomposition is referred as lossy
decomposition.
From practical point of view, decomposition should not be lossy decomposition.
Example
A figure shows a relation Account. This relation is decomposed into two relations
Acc_Bal and Bal_Branch.
Now, when these two relations are joined on the common attributeBalance, the
resultant relation will look like Acct_Joined. This Acct_Joined relation contains rows in
addition to those in original relation Account.
Here, it is not possible to specify that in which branch account A01 or A02 belongs.
So, information has been lost by this decomposition and then join operation.
Acct_Bal
Ano Balance
Acct_Joined
A01 5000
Account Ano Balance Bname
A02 5000
Ano Balance Bname A01 5000 Vvn
A01 5000 Vvn A01 5000 Ksad
A02 5000 Ksad Bal_Branch A02 5000 Vvn
Balance Bname A02 5000 Ksad
5000 Vvn
5000 Ksad
Not same
In other words, decomposition is lossy if decompose into R1 and R2 and again combine
(join) R1 and R2 we cannot get original table as R1, over X, where R is an original
relation, R1 and R2 are decomposed relations, and X is a common attribute between
these two relations.
Same
What is an anomaly in database design? How it can be solved.
Anomaly in database design:
Anomalies are problems that can occur in poorly planned, un-normalized database
where all the data are stored in one table.
There are three types of anomalies that can arise in the database because of
redundancy are
Insert anomalies
Delete anomalies
Update / Modification anomalies
Consider a relation emp_dept (E#, Ename, Address, D#, Dname, Dmgr#) with E# as a
primary key.
Insert anomaly:
Let us assume that a new department has been started by the organization but initially
there is no employee appointed for that department, then the tuple for this department
cannot be inserted in to this table as the E# will have NULL value, which is not allowed
because E# is primary key.
This kind of problem in the relation where some tuple cannot be inserted is known as
insert anomaly.
Delete anomaly:
Now consider there is only one employee in some department and that employee
leaves the organization, then the tuple of that employee has to be deleted from the
table, but in addition to that information about the department also will be deleted.
This kind of problem in the relation where deletion of some tuples can lead to loss of
some other data not intended to be removed is known as delete anomaly.
Update / Modification anomaly:
Suppose the manager of a department has changed, this requires that the Dmgr# in all
the tuples corresponding to that department must be changed to reflect the new status.
If we fail to update all the tuples of given department, then two different records of
employee working in the same department might show different Dmgr# lead to
inconsistency in the database.
This kind of problem is known as update or modification anomaly.
How anomalies in database design can be solved:
Such type of anomalies in database design can be solved by using normalization.
What is normalization? What is the need of it? OR
What is normalization? Why normalization process is needed?
Normalization
Database normalization is the process of removing redundant data from the given tables
to improve storage efficiency, data integrity, and scalability.
In the relational model, methods exist for quantifying how efficient a database is. These
classifications are called normal forms (or NF), and there are algorithms for converting a
given database between them.
Normalization generally involves splitting existing tables into multiple ones, which must
be re-joined or linked each time a query is issued.
Need of Normalization
Eliminates redundant data
Reduces chances of data errors
Reduces disk space
Improve data integrity, scalability and data consistency.
Explain different types of normal forms with example. OR
Explain 1NF, 2NF, 3NF, BCNF, 4NF and 5NF with example.
1NF
A relation R is in first normal form (1NF) if and only if all underlying domains contain
atomic values only. OR
A relation R is in first normal form (1NF) if and only if it does not contain any composite
or multi valued attributes or their combinations.
Example
Cid Name Address Contact_no
Society City
C01 Riya SaralSoc, Aand 9879898798,55416
C02 Jiya Birla Gruh, Rajkot 9825098254
Above relation has four attributes Cid, Name, Address, Contact_no. Here address is
composite attribute which is further divided in to sub attributes as Society and City.
Another attribute Contact_no is multi valued attribute which can store more than one
values. So above relation is not in 1NF.
Problem
Suppose we want to find all customers for some particular city then it is difficult to
retrieve. Reason is city name is combined with society name and stored whole as
address.
Solution
Insert separate attribute for each sub attribute of composite attribute.
Insert separate attribute for multi valued attribute and insert only one value on one
attribute and other in other attribute.
So above table can be created as follows.
2NF
A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-
prime attribute is fully functionally dependent on the primary key. OR
A relation R is in second normal form (2NF) if and only if it is in 1NF and no non-prime
attribute is partially dependent on the primary key.
Example
cid ano acess_date balance bname
Above relation has five attributes cid, ano, acess_date, balance, bname and two FDS
FD1 {cid,ano}{acess_date,balance,bname} and
FD2 ano{balance,bname}
We have cid and ano as primary key. As per FD2 balace and bname are only depend on
ano not cid. In above table balance and bname are not fully dependent on primary key
but these attributes are partial dependent on primary key. So above relation is not in
2NF.
Problem
For example in case of joint account multiple customers have common accounts. If
some account says ‘A02’ is jointly by two customers says ‘C02’ and ‘C04’ then data
values for attributes balance and bname will be duplicated in two different tuples of
customers ‘C02’ and ‘C04’.
Solution
Decompose relation in such a way that resultant relation does not have any partial FD.
For this purpose remove partial dependent attribute that violets 2NF from relation.
Place them in separate new relation along with the prime attribute on which they are
full dependent.
The primary key of new relation will be the attribute on which it if fully dependent.
Keep other attribute same as in that table with same primary key.
So above table can be decomposed as per following.
3NF
A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key
attribute is non-transitively dependent on the primary key.
An attribute C is transitively dependent on attribute A if there exist an attribute B such
that: A B and B C.
Example
ano balance bname baddress
Above relation has four attributes ano, balance, bname, baddress and two FDS
FD1 ano{balance, bname, baddress} and
FD2 bnamebaddress
So from FD1 and FD2 and using transitivity rule we get anobaddress.
So there is transitively dependency from ano to baddress using bname in which
baddress is non-prime attribute.
So there is a non-prime attribute baddress which is transitively dependent on primary
key ano.
So above relation is not in 3NF.
Problem
Transitively dependency results in data redundancy.
In this relation branch address will be stored repeatedly from each account of same
branch which occupy more space.
Solution
Decompose relation in such a way that resultant relation does not have any non-prime
attribute that are transitively dependent on primary key.
For this purpose remove transitively dependent attribute that violets 3NF from relation.
Place them in separate new relation along with the non-prime attribute due to which
transitive dependency occurred. The primary key of new relation will be this non-prime
attribute.
Keep other attributes same as in that table with same primary key.
So above table can be decomposed as per following.
bname baddress
BCNF
A relation R is in BCNF if and only if it is in 3NF and no any prime attribute is transitively
dependent on the primary key.
An attribute C is transitively dependent on attribute A if there exist an attribute B such
that AB and BC.
Example
Student_Project
Student Language Guide
Mita JAVA Patel
Nita VB Shah
Sita JAVA Jadeja
Gita VB Dave
Rita VB Shah
Nita JAVA Patel
Mita VB Dave
Rita JAVA Jadeja
student language guide
Above relation has five attributes cid, ano, acess_date, balance, bname and two FDS
FD1 {student,language}guide and
FD2 guidelanguage
So from FD1 and FD2 and using transitivity rule we get studentlanguage
So there is transitively dependency from student to language in which language is prime
attribute.
So there is on prime attribute language which is transitively dependent on primary key
student.
So above relation is not in BCNF.
Problem
Transitively dependency results in data redundancy.
In this relation one student have more than one project with different guide then
records will be stored repeatedly from each student and language and guides
combination which occupies more space.
Solution
Decompose relation in such a way that resultant relation does not have any prime
attribute transitively dependent on primary key.
For this purpose remove transitively dependent prime attribute that violets BCNF from
relation. Place them in separate new relation along with the non-prime attribute due to
which transitive dependency occurred. The primary key of new relation will be this non-
prime attribute.
So above table can be decomposed as per following.
Student_Info
Student_Id Subject Activity
100 Music Swimming
100 Accounting Swimming
100 Music Tennis
100 Accounting Tennis
150 Math Jogging
Note that all three attributes make up the Primary Key.
Note that Student_Id can be associated with many subject as well as many activities
(multi-valued dependency).
Suppose student 100 signs up for skiing. Then we would insert (100, Music, Skiing). This
row implies that student 100 skies as Music subject but not as an accounting subject, so
in order to keep the data consistent we must add one more row (100, Accounting,
Skiing). This is an insertion anomaly.
Suppose we have a relation R(A) with a multivalued dependency X Y. The MVD can
be removed by decomposing R into R1(R - Y) and R2(X U Y).
Here are the tables Normalized
P2
P1 Agent Product P3
Agent Company Suneet Nut Company Product
Suneet ABC Suneet Bolt ABC Nut
Suneet CDE Raj Bolt ABC Bolt
Raj ABC Raj Nut CDE Bolt
Now if we perform natural join between any of two above relations (tables) i.e P1 P2,
P2 P3 or P1 P3 then spurious (extra) rows are added so this decomposition is
called lossy decomposition.
But if we perform natural join between the above three relations then no spurious
(extra) rows are added so this decomposition is called lossless decomposition.
So above three tables P1, P2 and P3 are in 5 NF.
P1 P2
Agent Company Product P1 P2 P3
Suneet ABC Nut P3 Agent Company Product
Suneet ABC Bolt Company Product Suneet ABC Nut
Suneet CDE Nut ABC Nut Suneet ABC Bolt
Suneet CDE Bolt ABC Bolt Suneet CDE Bolt
Raj ABC Bolt CDE Bolt Raj ABC Nut
Raj ABC Nut Raj ABC Bolt
Normalize (decompose) following relation into lower to higher
normal form. (From 1NF to 4 NF)
OR
PLANT MANAGER MACHINE SUPPLIER_NAME SUPPLIER_CITY
Lathe Jay industry Ahmedabad
Plant-A Ravi Boiler Abb aplliance Surat
Cutter Raj machinery Vadodara
Boiler Daksh industry Rajkot
Plant-B Meena CNC Jay industry Ahmedabad
Explain with suitable example, the process of normalization
converting from 1NF to 3NF.
1 Normal Form (1NF)
PLANT PLANT MANAGER MANAGER MACHINE MACHINE SUPPLIER SUPPLIER SUPPLIER
_ID _NAME _ID _NAME _ID _NAME _ID _NAME _CITY
P1 Plant-A E1 Ravi M1 Lathe S1 Jay industry Ahmedabad
P1 Plant-A E1 Ravi M2 Boiler S2 Abb aplliance Surat
P2 Plant-B E2 Meena M3 Cutter S3 Raj machinery Vadodara
P2 Plant-B E2 Meena M2 Boiler S4 Daksh industry Rajkot
P2 Plant-B E2 Meena M4 CNC S1 Jay industry Ahmedabad
2 Normal Form (2NF)
Table-1
PLANT_ID PLANT_NAME MANAGER_ID MANAGER_NAME
P1 Plant-A E1 Ravi
P2 Plant-B E2 Meena
Table-2
PLANT_ID MACHINE_ID MACHINE_NAME
P1 M1 Lathe
P1 M2 Boiler
P2 M3 Cutter
P2 M2 Boiler
P2 M4 CNC
Table-3
MANAGER_ID SUPPLIER_ID SUPPLIER_NAME SUPPLIER_CITY
E1 S1 Jay industry Ahmedabad
E1 S2 Abb aplliance Surat
E2 S3 Raj machinery Vadodara
E2 S4 Daksh industry Rajkot
E2 S1 Jay industry Ahmedabad
3 Normal Form or BCNF (3NF or BCNF)
Table-1
PLANT_ID PLANT_NAME
P1 Plant-A
P2 Plant-B
Table-2
MANAGER_ID MANAGER_NAME
E1 Ravi
E2 Meena
Table-3
PLANT_ID MANAGER_ID
P1 E1
P2 E2
Table-4
MACHINE_ID MANAGER_NAME
M1 Lathe
M2 Boiler
M3 Cutter
M4 CNC
Table-5
PLANT_ID MACHINE_ID
P1 M1
P1 M2
P2 M3
P2 M2
P2 M4
Table-6
SUPPLIER_ID SUPPLIER_NAME SUPPLIER_CITY
S1 Jay industry Ahmedabad
S2 Abb aplliance Surat
S3 Raj machinery Vadodara
S4 Daksh industry Rajkot
Table-7
MANAGER_ID SUPPLIER_ID
E1 S1
E1 S2
E2 S3
E2 S4
E2 S1
4 Normal Form (4NF)
Table-1
PLANT_ID PLANT_NAME
P1 Plant-A
P2 Plant-B
Table-2
MANAGER_ID MANAGER_NAME
E1 Ravi
E2 Meena
Table-3
MACHINE_ID MACHINE_NAME
M1 Lathe
M2 Boiler
M3 Cutter
M4 CNC
Table-4
PLANT_MACHINE_ID PLANT_ID MACHINE_ID
PM1 P1 M1
PM2 P1 M2
PM3 P2 M3
PM4 P2 M2
PM5 P2 M4
Table-5
PLANT_MACHINE_ID MANAGER_ID
PM1 E1
PM2 E1
PM3 E2
PM4 E2
PM5 E2
Table-6
SUPPLIER_ID SUPPLIER_NAME SUPPLIER_CITY
S1 Jay industry Ahmedabad
S2 Abb aplliance Surat
S3 Raj machinery Vadodara
S4 Daksh industry Rajkot
Table-7
MANAGER_ID SUPPLIER_ID
E1 S1
E1 S2
E2 S3
E2 S4
E2 S1
result before step2 is AB and after step 2 is ABCE which is different so repeat same as
step 2.
Step-3: Second loop
result=ABCE # for A → BC, A result so result=result
BC result=ABCEF # for E→ CF, E result so result=result CF
result=ABCEF # for B→ E, B result so result=result E
result=ABCEF # for CD→ EF, CD result so result=result
result before step3 is ABCE and after step 3 is ABCEF which is different so repeat same
as step 3.
Step-4: Third loop
result=ABCEF # for A → BC, A result so result=result
BC result=ABCEF # for E→ CF, E result so result=result CF
result=ABCEF # for B→ E, B result so result=result E
result=ABCEF # for CD→EF, CD result so result=result
result before step4 is ABCEF and after step 3 is ABCEF which is same so stop.
So Closure of {A,B}+ is {A, B, C, E, F}.
State and explain Heath’s Theorem.
Heath’s theorem
A relation R(X, Y, Z) that satisfies a functional dependency X → Y can always be non-loss
decomposed into its projections R1=∏XY(R) and R2=∏XZ(R).
Proof.
First we show that R ⊆∏XY(R)⋈X∏XZ(R). This actually holds for any relation, does not
have to satisfy X → Y.
Assume r ∈R. We need to show r ∈∏XY(R) ⋈X∏XZ(R).
Since r ∈R, r(X,Y) ∈∏XY(R) and r(X,Z) ∈∏XZ(R).
Since r(X,Y) and r(X,Z) have the same value for X, their join r(X,Y,Z) = r is in
∏XY(R)⋈X∏XZ(R).
5 – Query Processing & Optimization
Execution plan
Query code generator
Code to execute the query
Result of query
1
5 – Query Processing & Optimization
If a runtime error results, an error message is generated by the runtime database
processor.
Query code generator will generate code for query.
Runtime database processor will select optimal plan and execute query and gives result.
Explain different search algorithm for selection operation. OR
Explain linear search and binary search algorithm for selection
operation.
There are two scan algorithms to implement the selection operation:
1. Linear search
2. Binary search
Linear search
In a linear search, the systems scans each file block and tests all records to see whether
they satisfy the selection condition.
For a selection on a key attribute, the system can terminate the scan if the requires
record is found, without looking at the other records of the relation.
The cost of linear search in terms of number of I/O operations is br where br is the
number of blocks in file.
Selection on key attribute has an average cost of br/2.
It may be a slower algorithm than any another algorithm.
This algorithm can be applied to any file regardless of the ordering of file or the
availability of indices or the nature of selection operation.
Binary search
If the file is ordered on attribute and the selection condition is an equality comparison
on the attribute, we can use a binary search to locate the records that satisfy the
condition.
The number of blocks that need to be examined to find a block containing the required
record is log(br).
If the selection is on non-key attribute more than one block may contain required
records and the cost of reading the extra blocks has to be added to the cost estimate.
Explain various steps involved in query evaluation.
3
5 – Query Processing & Optimization
∏customer-name
balalce<2500
customer
account
In our example, there is only one such operation, selection operation on account.
The inputs to the lowest level operation are relations in the database.
We execute these operations and we store the results in temporary relations.
We can use these temporary relations to execute the operation at the next level up in
the tree, where the inputs now are either temporary relations or relations stored in the
database.
In our example the inputs to join are the customer relation and the temporary relation
created by the selection on account.
The join can now be evaluated, creating another temporary relation.
By repeating the process, we will finally evaluate the operation at the root of the tree,
giving the final result of the expression.
In our example, we get the final result by executing the projection operation at the root
of the tree, using as input the temporary relation created by the join. Evaluation just
described is called materialized evaluation, since the results of each intermediate
operation are created and then are used for evaluation of the next level operations.
The cost of a materialized evaluation is not simply the sum of the costs of the operations
involved. To compute the cost of evaluating an expression is to add the cost of all the
operation as well as the cost of writing intermediate results to disk.
The disadvantage of this method is that it will create temporary relation (table) and that
relation is stored on disk which consumes space on disk.
It evaluates one operation at a time, starting at the lowest level.
Pipelining
We can reduce the number of temporary files that are produced by combining several
relations operations into pipeline operations, in which the results of one operation are
passed along to the next operation in the pipeline. Combining operations into a pipeline
eliminates the cost reading and writing temporary relations.
In this method several expression are evaluated simultaneously in pipeline by using the
result of one operation passed to next without storing it in a temporary relation.
4
5 – Query Processing & Optimization
σbranch-city=’pune’
σbranch-city=’pune’
account depositor
5
5 – Query Processing & Optimization
To choose from the different query evaluation plan, the optimizer has to estimate the
cost of each evaluation plan.
Optimizer use statically information about the relation such as relation size and index
depth to make a good estimate of the cost of a plan.
Explain transformation of relational expression to equivalent
relational expression.
Two relational algebra expressions are said to be equivalent (same) if on every legal
database operation, the two expressions gives the same set of tuples (records).
Sequence of records may be different but no of records must be same.
Equivalence rules
This rule says that expressions of two forms are same.
We can replace an expression of first form by an expression of the second form.
The optimizer uses equivalence rule to transform expression into other logically same
expression.
We use
θ1, θ2, θ3 and so on to denote condition
L1, L2, L3 and so on to denote list of attributes (columns)
E1, E2, E3 and so on to denote relational algebra expression.
Rules 1
Combined selection operation can be divided into sequence of individual selections. This
transformation is called cascade of σ.
σθ1Λθ2 (E)=σθ1(θ2 (E))
Rules 2
Selection operations are commutative.
σθ1(θ2 (E))=σθ2(θ1 (E))
Rules 3
If more than one projection operation is used in expression then only the outer
projection operation is required. So skip all the other inner projection operation.
∏L1 (∏L2 (… (∏Ln (E))…)) = ∏L1 (E)
Rules 4
Selection operation can be joined with Cartesian product and theta join.
σθ (E1 E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1Λθ2 E2
Rules 5
Theta operations are commutative.
E1 θ2 E2 = E2 θ2 E1
Rules 6
Natural join operations are associative.
(E1 E2) E3 = E1 (E2 E3)
Theta join operations are associative.
6
5 – Query Processing & Optimization
7
5 – Query Processing & Optimization
Merge the runs (N-way merge). We assume (for now) that N < M.
1) Use N blocks of memory to buffer input runs, and 1 block to
buffer output. Read the first block of each run into its buffer page
2) repeat
I. Select the first record (in sort order) among all buffer
pages
II. Write the record to the output buffer. If the output buffer
is full write it to disk.
III. Delete the record from its input buffer page.
If the buffer page becomes empty then read the next block
(if any) of the run into the buffer.
3) until all input buffer pages are empty:
If N M, several merge passes are required.
In each pass, contiguous groups of M - 1 runs are merged.
A pass reduces the number of runs by a factor of M -1, and creates runs longer
by the same factor.
E.g. If M=11, and there are 90 runs, one pass reduces the number of runs
to 9, each 10 times the size of the initial runs
Repeated passes are performed till all runs have been merged into one.
The output of the merge stage is the sorted relation.
The output file is buffered to reduce the number of disk operations.
If the relation is much larger than memory there may be M or more runs generated in
the first stage and it is not possible to allocate a page frame for each run during the
merge stage.
In this case merge operation proceed in multiple passes.
Since there is enough memory for M-1 input buffer pages each merge can take M-1 runs
as input.
Explain the measures of query cost, selection operation and
join.
OR
Explain the measures of finding out the cost of a query in query
processing.
Measures of query cost
The cost of query evaluation can be measured in terms of a number of different
resources including disk access, CPU time to execute a query and in a distributed or
8
5 – Query Processing & Optimization
parallel database system the cost of communication.
9
5 – Query Processing & Optimization
The response time for a query evaluation plan i.e the time required to execute the plan
(assuming no other activity is going on) on the computer would account for all these
activities.
In large database system, however disk accesses are usually the most important cost,
since disk access are slow compared to in memory operation.
Moreover, CPU speeds have been improving much faster than have a disk speed.
Therefore it is likely that the time spent in disk activity will continue to dominate the
total time to execute a query.
Estimating the CPU time is relatively hard, compared to estimating disk access cost.
Therefore disk access cost a reasonable measure of the cost of a query evaluation plan.
Disk access is the predominant cost (in terms of time) relatively easy to estimate;
therefore number of block transfers from/to disk is typically used as measures.
We use the number of block transfers from disk as a measure of actual cost.
To simplify our computation, we assume that all transfer of blocks have same cost.
To get more precise numbers we need to distinguish between sequential I/O where
blocks read are contiguous on disk and random I/O where blocks are non-contiguous
and an extra seek cost must be paid for each disk I/O operations.
We also need to distinguish between read and write of blocks since it takes more time
to write a block on disk than to read a block from disk.
Selection Operation
There are two scan algorithms to implement the selection operation:
1. Linear search
2. Binary search
Linear search
In a linear search, the system scans each file block and tests all records to see whether
they satisfy the selection condition.
For a selection on a key attribute, the system can terminate the scan if the requires
record is found, without looking at the other records of the relation.
The cost of linear search in terms of number of I/O operations is br where br is the
number of blocks in file.
Selection on key attribute has an average cost of br/2.
It may be a slower algorithm than any another algorithm.
This algorithm can be applied to any file regardless of the ordering of file or the
availability of indices or the nature of selection operation.
Binary search
If the file is ordered on attribute and the selection condition is an equality comparison
on the attribute, we can use a binary search to locate the records that satisfy the
condition.
1
0
5 – Query Processing & Optimization
The number of blocks that need to be examined to find a block containing the required
record is log(br).
If the selection is on non-key attribute more than one block may contain required
records and the cost of reading the extra blocks has to be added to the cost estimate.
Join
Like selection, the join operation can be implemented in a variety of ways.
In terms of disk access, the join operation can be very expensive, so implementing and
utilizing efficient join algorithms is critical in minimizing a query’s execution time.
For example consider
depositor customer
We assume following information about the two above relations
I. Number of records of customer ncustomer = 10,000
II. Number of blocks of customer bcustomer = 400
III. Number of records of depositor ndepositor = 5,000
IV. Number of blocks of depositor bdepositor = 100
There are four types of algorithms for join operations.
1. Nested loop join
2. Indexed nested loop join
3. Merge join
4. Hash join
1
1
6 – Transaction Management
1
6 – Transaction Management
Durability
After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
Once your transaction completed up to step 6 its result must be stored permanently. It
should not be removed if system fails.
Explain different states in transaction processing in database.
OR
Explain State Transition Diagram (Transaction State Diagram).
Because failure of transaction may occur, transaction is broken up into states to
handle various situations.
Following are the different states in transaction processing in database
Active
Partial committed
Failed
Aborted
Committed
Partial Committed
committed
Active
Failed Aborted
2
6 – Transaction Management
Fig. State Transition Diagram
Active
This is the initial state. The transaction stays in this state while it is executing.
Partially Committed
This is the state after the final statement of the transaction is executed.
At this point failure is still possible since changes may have been only done in main
memory, a hardware failure could still occur.
The DBMS needs to write out enough information to disk so that, in case of a failure, the
system could re-create the updates performed by the transaction once the system is
brought back up.
After it has written out all the necessary information, it is committed.
3
Failed
After the discovery that normal execution can no longer proceed.
Once a transaction cannot be completed, any changes that it made must be undone
rolling it back.
Aborted
The state after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
Committed
The transaction enters in this state after successful completion of the transaction.
We cannot abort or rollback a committed transaction.
Explain following terms.
Schedule
A schedule is the chronological (sequential) order in which instructions are executed in a
system.
A schedule for a set of transaction must consist of all the instruction of those
transactions and must preserve the order in which the instructions appear in each
individual transaction.
Example of schedule (Schedule 1)
T1 T2
read(A)
A:=A-50
write(A)
read(B)
B:= B+ 50
write(B)
read(A)
temp: A *
0.1 A: A-
temp write
(A) read(B)
B:=B +temp
write(B)
Serial schedule
Schedule that does not interleave the actions of different transactions.
In schedule 1 the all the instructions of T1 are grouped and run together. Then all the
instructions of T2 are grouped and run together.
Means schedule 2 will not start until all the instructions of schedule 1 are complete. This
type of schedules is called serial schedule.
Interleaved schedule
Schedule that interleave the actions of different transactions.
Means schedule 2 will start before all instructions of schedule 1 are completed. This
type of schedules is called interleaved schedule.
T1 T2
read(A)
A:=A-50
write(A)
read(A)
temp: A *
0.1 A: A-
temp write
(A)
read(B)
B:= B+ 50
write(B)
read(B)
B:=B +temp
Equivalent schedules write(B)
Two schedules are equivalent schedule if the effect of executing the first schedule is
identical (same) to the effect of executing the second schedule.
We can also say that two schedule are equivalent schedule if the output of executing
the first schedule is identical (same) to the output of executing the second schedule.
Serializable schedule
A schedule that is equivalent (in its outcome) to a serial schedule has the serializability
property.
Example of serializable schedule
Schedule 1 Schedule 2
T1 T2 T3 T1 T2 T3
read(X) read(X)
write(X) read(Y)
read(Y) read(Z)
write(Y) write(X)
read(Z) write(Y)
write(Z) write(Z)
In above example there are two schedules as schedule 1 and schedule 2.
In schedule 1 and schedule 2 the order in which the instructions of transaction are
executed is not the same but whatever the result we get is same. So this is known as
serializability of transaction.
Explain serializability of transaction. OR
Explain both the forms of serializability with example. Also
explain relation between two forms. OR
Explain conflict serializability and view serializability with
example.
Conflict serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions
wrote Q.
1. If li and lj access different data item then li and lj don’t conflict.
2. li = read(Q), lj = read(Q). li and lj don’t conflict.
3. li = read(Q), lj = write(Q). li and lj conflict.
4. li = write(Q), lj = read(Q). li and lj conflict.
5. li = write(Q), lj = write(Q). li and lj conflict.
Intuitively, a conflict between li and lj forces a (logical) temporal order between them.
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-
conflicting instructions, we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule.
Example
Schedule S can be transformed into Schedule S’ by swapping of non-conflicting series of
instructions. Therefore Schedule S is conflict serializable.
Schedule S Schedule S’
T1 T2 T1 T2
read(A) read(A)
write(A) write(A)
read(A) read(B)
write(A) write(B)
read(B) read(A)
write(B) write(A)
read(B) read(B)
write(B) write(B)
Instruction Ii of transaction T1 and I j of transaction T2 conflict if both of these instruction
access same data A and one of these two instructions performs write operation on that
data (A).
In above example the write(A) instruction of transaction T1 conflict with read(A)
instruction of transaction T2 because both the instructions access same data A. But
write(A) instruction of transaction T2 is not conflict with read(B) instruction of
transaction T1 because both the instructions access different data. Transaction T2
performs write operation in A and transaction T1 is reading B.
So in above example in schedule S two instructions read(A) and write(A) of transaction
T2 and two instructions read(B) and write(B) of transaction T1 are interchanged and we
get schedule S’.
Therefore Schedule S is conflict serializable.
Schedule S’’
T3 T4
read(Q)
write(Q)
write(Q)
We are unable to swap instructions in the above schedule S’’ to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.
So above schedule S’’ is not conflict serializable.
View serializability
Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also
transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also transaction T i must read the value
of Q that was produced by the same write(Q) operation of transaction Tj .
3. The transaction Ti (if any) that performs the final write(Q) operation in schedule
S then in schedule S’ also the final write(Q) operation must be performed by Ti.
A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable but every view serializable is
not conflict serializable.
Below is a schedule which is view serializable but not conflict serializable.
Schedule S
T3 T4 T6
read(Q)
write(Q)
write(Q)
write(Q)
Above schedule is view serializable but not conflict serializable because all the
transactions can use same data item (Q) and all the operations are conflict with each
other due to one operation is write on data item (Q) and that’s why we cannot
interchange any non conflict operation of any transaction.
Explain two phase commit protocol. OR
Explain working of two phase commit protocol.
Two phase commit protocol
• The two phase commit protocol provides an automatic recovery mechanism in case a
system or media failure occurs during execution of the transaction.
• The two phase commit protocol ensures that all participants perform the same action
(either to commit or to roll back a transaction).
• The two phase commit strategy is designed to ensure that either all the databases are
updated or none of them, so that the databases remain synchronized.
• In two phase commit protocol there is one node which is act as a coordinator and all
other participating node are known as cohorts or participant.
• Coordinator – the component that coordinates with all the participants.
• Cohorts (Participants) – each individual node except coordinator are participant.
• As the name suggests, the two phase commit protocol involves two phases.
1. The first phase is Commit Request phase OR phase 1
2. The second phase is Commit phase OR phase 2
Commit Request Phase (Obtaining Decision)
To commit the transaction, the coordinator sends a request asking for “ready for
commit” to each cohort.
The coordinator waits until it has received a reply from all cohorts to “vote” on the
request.
Each participant votes by sending a message back to the coordinator as follows:
It votes YES if it is prepared to commit
It may vote NO for any reason if it cannot prepare the transaction due to a local
failure.
It may delay in voting because cohort was busy with other work.
Commit Phase (Performing Decision)
If the coordinator receives YES response from all cohorts, it decides to commit. The
transaction is now officially committed. Otherwise, it either receives a NO response or
gives up waiting for some cohort, so it decides to abort.
The coordinator sends its decision to all participants (i.e. COMMIT or ABORT).
Participants acknowledge receipt of commit or about by replying DONE.
What is system recovery?
Database recovery is the process of restoring a database to the correct state in the
event of a failure.
Database recovery is a service that is provided by the DBMS to ensure that the database
is reliable and remain in consistent state in case of a failure.
Restoring a physical backup means reconstructing it and making it available to the
database server.
To recover a restored backup, data is updated using redo command after the backup
was taken.
Database server such as SQL server or ORACLE server performs cash recovery and
instance recovery automatically after an instance failure.
In case of media failure, a database administrator (DBA) must initiate a recovery
operation.
Recovering a backup involves two distinct operations: rolling the backup forward to a
more recent time by applying redo data and rolling back all changes made in
uncommitted transactions to their original state.
In general, recovery refers to the various operations involved in restoring, rolling
forward and rolling back a backup.
Backup and recovery refers to the various strategies and operations involved in
protecting the database against data loss and reconstructing the database.
Explain Log based recovery method.
Log based recovery
The most widely used structure for recording database modification is the log.
The log is a sequence of log records, recording all the update activities in the database.
In short Transaction log is a journal or simply a data file, which contains history of all
transaction performed and maintained on stable storage.
Since the log contains a complete record of all database activity, the volume of data
stored in the log may become unreasonable large.
For log records to be useful for recovery from system and disk failures, the log must
reside on stable storage.
Log contains
1. Start of transaction
2. Transaction-id
3. Record-id
4. Type of operation (insert, update, delete)
5. Old value, new value
6. End of transaction that is committed or aborted.
All such files are maintained by DBMS itself. Normally these are sequential files.
Recovery has two factors Rollback (Undo) and Roll forward (Redo).
When transaction Ti starts, it registers itself by writing a <Ti start>log record
Before Ti executes write(X), a log record <Ti, X, V1, V2> is written, where V1 is the value
of X before the write, and V2 is the value to be written to X.
Log record notes that Ti has performed a write on data item Xj
Xj had value V1 before the write, and will have value V2 after the write.
When Ti finishes it last statement, the log record <Ti commit> is written.
Two approaches are used in log based recovery
1. Deferred database modification
2. Immediate database modification
Log based Recovery Techniques
Once a failure occurs, DBMS retrieves the database using the back-up of database and
transaction log. Various log based recovery techniques used by DBMS are as per below:
1. Deferred Database Modification
2. Immediate Database Modification
Both of the techniques use transaction logs. Thesetechniques are explained in following
sub-sections.
Explain Deferred Database Modification log based recovery
method.
Concept
Updates (changes) to the database are deferred (or postponed) until the
transaction commits.
During the execution of transaction, updates are recorded only in the transaction log
and in buffers. After the transaction commits, these updates are recorded in the
database.
When failure occurs
If transaction has not committed, then it has not affected the database. And so, no
need to do any undoing operations. Just restart the transaction.
If transaction has committed, then, still, it may not have modified the database. And
so, redo the updates of the transaction.
Transaction Log
In this technique, transaction log is used in following ways:
Transaction T starts by writing <T start> to the log.
Any update is recorded as <T, X, V>, where V indicates new value for data item X. Here,
no need to preserve old value of the changed data item. Also, V is not written to the X in
database, but it is deferred.
Transaction T commits by writing <T commit> to the log. Once this is entered in log,
actual updates are recorded to the database.
If a transaction T aborts, the transaction log record is ignored, and no any updates are
recorded to the database.
Example
Consider the following two transactions, T 0 and T1 given in figure, where T0 executes
before T1. Also consider that initial values for A, B and C are 500, 600 and 700
respectively.
Transaction – T0 Transaction – T1
Read (A) Read (C)
A =A -100 C=C-200
Write (A) Write (C)
Read (B)
B =B+ 100
Write (B)
The following figure shows the transaction log for above
two transactions at three different instances of time.
Time Instance (a) Time Instance (b) Time Instance (c)
<T0 start> < T0 start> < T0 start>
< T0, A, 400> < T0, A, 400> < T0, A, 400>
< T0, B, 700> < T0, B, 700> < T0, B, 700>
< T0 commit> < T0 commit>
<T1 start> < T1 start>
<T1, C, 500> < T1, C, 500>
< T1 commit>
If failure occurs in case of
1. No any REDO actions are required.
2. As Transaction T0 has already committed, it must be redone.
3. As Transactions T0 and T1 have already committed, they must be redone.
Explain Immediate Database Modification log based recovery
method.
Concept
Updates (changes) to the database are applied immediately as they occur without
waiting to reach to the commit point.
Also, these updates are recorded in the transaction log.
It is possible here that updates of the uncommitted transaction are also written to the
database. And, other transactions can access these updated values.
When failure occurs
If transaction has not committed, then it may have modified the database. And so, undo
the updates of the transaction.
If transaction has committed, then still it may not have modified the database. And so,
redo the updates of the transaction.
Transaction Log
In this technique, transaction log is used in following ways:
Transaction T starts by writing <T start> to the log.
Any update is recorded as <T, X, Vold, Vnew > where Vold indicates the original value of data
item X and Vnew indicates new value for X. Here, as undo operation is required, it
requires preserving old value of the changed data item.
Transaction T commits by writing <T commit> to the log.
If a transaction T aborts, the transaction log record is consulted, and required undo
operations are performed.
Example
Again, consider the two transactions, T0 and T1, given in figure, where T0 executes
before T1.
Also consider that initial values for A, B and C are 500, 600 and 700 respectively.
The following figure shows the transaction log for above two transactions at three
different instances of time. Note that, here, transaction log contains original values also
along with new updated values for data items.
If failure occurs in case of -
Undo the transaction T0 as it has not committed, and restore A and B to 500 and 600
respectively.
Undo the transaction T1, restore C to 700; and, Redo the Transaction T0 set A and B to
400 and 700 respectively.
Redo the Transaction T0 and Transaction T0; and, set A and B
to 400 and 700 respectively, while set C to 500.
Time Instance (a) Time Instance (b) Time Instance (c)
<T0 start> < T0 start> < T0 start>
< T0, A, 500, 400> < T0, A, 500, 400> < T0, A, 500, 400>
< T0, B, 600, 700> < T0, B, 600, 700> < T0, B, 600, 700>
< T0 commit> < T0 commit>
<T1 start> < T1 start>
<T1, C, 700, 500> < T1, C, 700, 500>
< T1 commit>
Explain system recovery procedure with Checkpoint record
concept.
Problems with Deferred & Immediate Updates
Searching the entire log is time-consuming.
It is possible to redo transactions that have already been stored their updates to the
database.
Checkpoint
A point of synchronization between database and transaction log file.
Specifies that any operations executed before this point are done correctly and stored
safely.
At this point, all the buffers are force-fully written to the secondary storage.
Checkpoints are scheduled at predetermined time intervals
Used to limit -
1. The size of transaction log file
2. Amount of searching, and
3. Subsequent processing that is required to carry out on the transaction log file.
When failure occurs
Find out the nearest checkpoint.
If transaction has already committed before this checkpoint, ignore it.
If transaction is active at this point or after this point and has committed before failure,
redo that transaction.
If transaction is active at this point or after this point and has not committed, undo that
transaction.
Example
Consider the transactions given in following figure. Here, Tc indicates checkpoint, while
Tf indicates failure time.
Here, at failure time -
1. Ignore the transaction T1 as it has already been committed before checkpoint.
2. Redo transaction T2 and T3 as they are active at/after checkpoint, but have
committed before failure.
3. Undo transaction T4 as it is active after checkpoint and has not committed.
Time Tc Tf
T1
T2
T3
T4
Page 5 1
Page 1 2
Page 4 3
Page 2 4
Page 3 5
Page 6 6
Current Shadow
Page Tale Pages Page Table
Page 5(old) 1
Page 1 2
Page 4 3
Page 2(old) 4
Page 3 5
Page 6
6
Page 2(new)
Page 5(new)
Advantages
No overhead of maintaining transaction log.
Recovery is quite faster, as there is no any redo or undo operations required.
Disadvantages
Copying the entire· page table is very expensive.
Data are scattered or fragmented.
After each transaction, free pages need to be collected by garbage collector. Difficult to
extend this technique to allow concurrent transactions.
What is concurrency? What are the methods to control
concurrency?
Concurrency
Concurrency is the ability of a database to allow multiple (more than one) users to
access data at the same time.
Methods to control concurrency (Mechanisms)
Optimistic - Delay the checking of whether a transaction meets the isolation and other
integrity rules (e.g., serializability and recoverability) until its end, without blocking any
of its (read, write) operations and then abort a transaction to prevent the violation, if
the desired rules are to be violated upon its commit. An aborted transaction is
immediately restarted and re-executed, which incurs an obvious overhead. If not too
many transactions are aborted, then being optimistic is usually a good strategy.
Pessimistic - Block an operation of a transaction, if it may cause violation of the rules,
until the possibility of violation disappears. Blocking operations is typically involved with
performance reduction.
Semi-optimistic - Block operations in some situations, if they may cause violation of
some rules, and do not block in other situations while delaying rules checking (if
needed) to transaction's end, as done with optimistic.
Methods to control concurrency (Methods)
Locking (Two-phase locking - 2PL) - Controlling access to data by locks assigned to the
data. Access of a transaction to a data item (database object) locked by another
transaction may be blocked (depending on lock type and access operation type) until
lock release.
Serialization graph checking (also called Serializability, or Conflict, or Precedence
graph checking) - Checking for cycles in the schedule's graph and breaking them by
aborts.
Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling or
checking access to data by timestamp order.
Commitment ordering (Commit ordering or CO) - Controlling or checking transactions'
chronological order of commit events to be compatible with their respective precedence
order.
What are the three problems due to concurrency? How the
problems can be avoided.
Three problems due to concurrency
1. The lost update problem: This problem indicate that if two transactions T1 and T2
both read the same data and update it then effect of first update will be overwritten
by the second update.
T1 Time T2
--- T0 ---
Read X T1 ---
--- T2 Read X
Update X T3 ---
--- T4 Update X
--- T5 ---
How to avoid: In above example a transaction T2 must not update the data item (X)
until the transaction T1 can commit data item (X).
2. The dirty read problem: The dirty read arises when one transaction update some
item and then fails due to some reason. This updated item is retrieved by another
transaction before it is changed back to the original value.
T1 Time T2
--- T0 ---
--- T1 Update X
Read X T2 ---
--- T3 Rollback
--- T5 ---
How to avoid: In above example a transaction T1 must not read the data item (X)
until the transaction T2 can commit data item (X).
3. The incorrect retrieval problem: The inconsistent retrieval problem arises when one
transaction retrieves data to use in some operation but before it can use this data
another transaction updates that data and commits. Through this change will be
hidden from first transaction and it will continue to use previous retrieved data. This
problem is also known as inconsistent analysis problem.
Balance (A=200 B=250 C=150)
T1 Time T2
--- T0 ---
Read (A) T1 ---
Sum 200
Read (B) T2 ---
Sum Sum + 250 = 450
--- T3 Read (C)
--- T4 Update (C)
150 150 – 50 = 100
--- T5 Read (A)
--- T6 Update (A)
200 200 + 50 = 250
--- T7 COMMIT
Read (C) T8 ---
Sum Sum + 100 = 550
How to avoid: In above example a transaction T2 must not read or update data item
(X) until the transaction T1 can commit data item (X).
What is concurrency control? Why Concurrency control is
needed?
Concurrency control
The technique is used to protect data when multiple users are accessing (using) same
data concurrently (at same time) is called concurrency control.
Concurrency control needed
If transactions are executed serially, i.e., sequentially with no overlap in time, no
transaction concurrency exists. However, if concurrent transactions with interleaving
operations are allowed in an uncontrolled manner, some unexpected, undesirable result
may occur. Here are some typical examples:
1. The lost update problem: This problem indicates that if two transactions T1 and T2
both read the same data and update it then effect of first update will be overwritten
by the second update.
2. The dirty read problem: The dirty read arises when one transaction updates some
item and then fails due to some reason. This updated item is retrieved by another
transaction before it is changed back to the original value.
3. The incorrect retrieval problem: The inconsistent retrieval problem arises when one
transaction retrieves data to use in some operation But before it can use this data
another transaction update that data and commits. Through this change will be
hidden from first transaction and it will continue to use previous retrieved data. This
problem is also known as inconsistent analysis problem.
Most high-performance transactional systems need to run transactions concurrently to
meet their performance requirements. Thus, without concurrency control such systems
can neither provide correct results nor maintain their databases consistent.
Define lock. Define locking. Explain lock based protocol.
Lock
A lock is a variable associated with data item to control concurrent access to that data
item.
Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.
Locking
One major problem in databases is concurrency.
Concurrency problems arise when multiple users try to update or insert data into a
database table at the same time. Such concurrent updates can cause data to become
corrupt or inconsistent.
Locking is a strategy that is used to prevent such concurrent updates to data.
Lock based protocol
A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :
Data items can be locked in two modes :
1. Exclusive (X) mode. Data item can be both read as well as written. X-lock is
requested using lock-X instruction.
2. Shared (S) mode. Data item can only be read. S-lock is requested using lock-S
instruction.
Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.
Lock-compatibility matrix
S X
S TRUE FALSE
X FALSE FALSE
A transaction may be granted a lock on an item if the requested lock is compatible with
locks already held on the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction
holds an exclusive on the item no other transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is then
granted.
T1 T2 Concurrency Control
Manager
Lock-X(B)
Grant-X(B,T1)
Read(B)
B=B-50
Write(B)
Unlock(B)
Lock-S(A)
Grant-S(A,T2)
Read (A)
Unlock(A)
Lock-S(B)
Grant-S(B,T2)
Read (B)
Unlock(B)
Display(A+B)
Lock-X(A)
Grant-X(A,T1)
Read(A)
A=A+50
Write(A)
Unlock(A)
This locking protocol divides transaction execution phase into three parts.
1. In the first part, when transaction starts executing, transaction seeks grant for locks
it needs as it executes.
2. Second part is where the transaction acquires all locks and no other lock is required.
Transaction keeps executing its operation.
3. As soon as the transaction releases its first lock, the third phase starts. In this phase
a transaction cannot demand for any lock but only releases the acquired locks.
Lock acquisition Lock releasing
phase phase
Table 1
Held by Wait for
Transaction 1 Transaction 2
R1 R3 R1 R3
P1 P2 P3 P1 P2 P3
R2 R4 R2 R4
No cycle (circular chain) created
Cycle (circular chain) created
So no deadlock
(P2, R3, P3, R4, P2) So deadlock
Wait/Die Wound/Wait
O needs a resource held by Y O waits Y dies
Y needs a resource held by O Y dies Y waits
1
7 – Security
OR Data encryption
Encryption is a technique of encoding data, so that only authorized user can understand
(read) it.
The data encryption technique converts readable data into unreadable data by using
some techniques so that unauthorized person cannot read it
In above figure sender having data that he/she wants to send his/her data is known as
plaintext.
In first step sender will encrypt data (plain text) using encryption algorithm and some
key.
After encryption the plaintext becomes ciphertext.
This ciphertext is not able to read by any unauthorized person.
This ciphertext is send to receiver.
The sender will send that key separately to receiver.
Once receiver receives this ciphertext he/she will decrypt it using that key send by
sender and decryption algorithm.
After applying decryption algorithm and key receiver will get original data (plaintext)
that is sended by sender.
This technique is used to protect data when there a chance of data theft.
In such situation if encrypted is theft then it cannot be used (read) directly without
knowing the encryption technique and key.
There are two different method of data encryption
1. Symmetric encryption
2
7 – Security
2. Public key encryption
3
7 – Security
4
7 – Security
clearance.
5
7 – Security
Components
The commonly used mandatory access control technique for multi-level security uses
four components as given below
1. Subjects:-Such as users, accounts, programs etc.
2. Objects:-Such as relation (table), tuples (records), attribute (column), view etc.
3. Clearance level:-Such as top secret (TS), secret (S), confidential (C), Unclassified
(U). Each subject is classified into one of these four classes.
4. Security level: -Such as top secret (TS), secret (S), confidential (C), Unclassified
(U). Each object is classified into one of these four classes.
In the above system TS>S>C>U, where TS>S means class TS data is more sensitive than
class S data.
Rules:-
A user can access data by following two rules
1. Security property:-
Subject S will read object O if class (S)>=class (O)
2. Star (*) security property:-
Subject S will write object O if class (S) <=class (O)
What is authorization and authentication? OR
What is difference between authorization and authentication?
Authorization Authentication
It is protecting the data to ensure privacy and Authentication is providing integrity control
access control of data. Authorization is giving and security to the data.
access to authorized users.
Authorization is the process of verifies what Authentication is the process of verifying who
you are authorized to do or not to do. you are.
Accessing a file from hard disk is authorization Logging on to a PC with a username and
because the permissions are given to you to password is authentication.
access that file allow you access it that is
authorization.
6
7 – Security