Cs3492 Dbms Notes 2025-1
Cs3492 Dbms Notes 2025-1
24. What are the different types of data model? (Nov 2024)(A\M-2019)
• Relational Model
• Network Model
• Hierarchical Model
• Entity Relationship Model
• Object Based Model
• Semi-structured Model
25. List the difference between File Processing and database
System. (or) Explain the purpose of Database system. (Nov/Dec
2008) (Apr/May 2015) (Nov/Dec 2016, 2019) (Apr/May 2024)
File Processing Database System
System
Data independence is not Data independence is there
there
It is difficult to access the It is easy to access the data
data
Data integrity & Security is Data integrity & Security is
less more
It produces concurrent Data can be accessed
anomalies. Concurrently
26. Define atomicity in transaction management. (MAY/JUNE 2013)
➢ Either all operations of the transaction are reflected properly in the
database or none are. This is also known as the “all or nothing
property‟.
27. Define data model. (Nov/Dec 2011) (Apr/May 2019) (Nov 2024)
➢ A data model is a collection of conceptual tools for describing
data, data relationships, data semantics and consistency
constraints.
➢ Data Model is a Structure below the Database.
28. Explain the basic structure of a relational database with example.
➢ A relational database is one which consists of rows & columns. A
31. Define the two levels of data independence. (Nov/ Dec 2010)
➢ Physical Data Independence: Modification in physical level
should not affect the logical level.
➢ Logical Data Independence: Modification in logical level should affect
the view level.
ii. Right outer join- Returns all the rows from the right table
and the rows from the left table that matches with the right
table.
iii. Full outer join – Returns all the rows from both right and left table.
38. Define relational algebra.
➢ The relational algebra is a procedural query language.
➢ It consists of a set of operations that take one or two relation as
input and produce a new relation as output.
39. What is a SELECT operation?
➢ The select operation selects tuples that satisfy a given predicate.
➢ Allow duplicates.
3. Intersect:
➢ Includes all tuples that are in both relation r1 and r2.
➢ Avoid duplicates.
4. Intersect all:
➢ Includes all tuples that are in both relation r1and r2.
➢ Allow duplicates.
5. Except:
➢ Includes tuples that are in relation r1 and not in r2.
44. What are aggregate functions? and list the aggregate functions
supported by SQL. (Nov/Dec 2024)
➢ Aggregate functions are functions that take a collection of values
as input and return a single value.
➢ Aggregate functions supported by SQL are
• Average: avg
• Minimum: min
• Maximum: max
• Total: sum
• Count: count
45. What is the use of group by clause?
➢ Group by clause is used to apply aggregate functions to a set
of tuples.
➢ The attributes given in the group by clause are used to form groups.
➢ Tuples with the same value on all attributes in the group by
clause are placed in one group.
➢ Example: Select count(*) from student group by department.
<insert / update / delete > on < table name >for < each row > where<
condition >;
54. Define foreign key. Explain with an example. APR/MAY-23
➢ A value that appears in one relation for a given set of attributes
also appears for a certain set of attributes in another relation.
➢ The definition of a table has a declaration
“Foreign key(attribute) references <table name>”
➢ Example:
1. Create table account(accno number primary key, custname varchar(50));
2. Create table branch(accno number foreign key(accno)
references account, bname varchar(50));
Where, accno is a primary key in table account and a foreign key in
table branch.
55. Write the purpose of triggers. (May/June 2013)
➢ Triggers are useful mechanisms for alerting humans or for starting
certain tasks automatically when certain conditions are met.
56. List the requirements needed to design a trigger.
The requirements are
✓ Specifying when a trigger is to be executed.
✓ Specify the actions to be taken when the trigger executes.
Develop an SQL query that will find and display the average BASIC_PAY in
each DEPT. (DEC 2011)
Answer:
Select avg (basic_pay) from emp group by dept;
69. What is embedded SQL? What are its advantages? (MAY 2011)
➢ Embedded SQL is a method of combining the computing power of a
programming language.
➢ A language in which SQL queries are embedded is referred to as
71. What are the disadvantages of File processing System? Apr 2016
The file processing system has the following major disadvantages:
• Data redundancy and inconsistency.
• Integrity Problems.
• Security Problems
• Difficulty in accessing data.
• Data isolation.
PART-B
1. Explain the purpose of Database system. (APRIL/MAY 2010)
The database system arose in response to early methods of file
processing system.
File processing system:
• It is supported by conventional OS. It stores permanent records in
various files if needs different application programs to extract
records from and add records to appropriate file.
• The file processing system has number of disadvantages. To avoid
these disadvantages, we need database system.
1. Data Redundancy and Inconsistency
REDUNDANCY:
• Duplication of data in several files.
• Leads to higher storage and access cost.
INCOSISTENCY:
• changes made in one data is not reflected in all the copies
of the same data.
2. Difficult In Accessing Data:
• Does not allow needed data to be retrieved in convenient
& efficient manner.
3. Data Isolation:
• Since data are scattered & files are in different format
writing new application programs to retrieve appropriate
data is difficult.
4. Integrity Problem:
• The data values stored in the database must satisfy
certain type of consistency constraints.
• The constraints are enforced by adding appropriate code
in programs.
• When new constraints are added it is difficult to change
programs to enforce them.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 17
CS3492 –DBMS Unit-I Mailam Engineering College
5. Atomicity Problem:
• If any failure occurs the data is to be restored to the
consistent state that existed prior to failure.
• It must be atomic happen entirely or not at all.
6. Concurrent Anomalies:
• Interaction of multiple users to update the data
simultaneously is possible and may result in
inconsistent data.
7. Security Problems:
• Not every user of database system is allowed to access data.
• Enforcing security constraint is difficult in file processing system.
2. Explain the different views of Data. (MAY 2016) (or) Explain the
difference between physical level, conceptual level & view level of
data abstraction. (MAY 2011) (Nov/Dec 2019)
• Introduction
▪ Data abstraction
• Physical level
• Logical level
• View level
▪ Instances &schema
• Physical schema
• Logical schema
• Subschema
Introduction
The major purpose of a database system is to provide users with an
abstract view of data (i.e.) the system hides the details of how the
data is stored and maintained.
Data Abstraction:
• Figure 1.1 describes Developers hide the complexity of database
from users, to simplify users’ interaction with the system
• Three levels of abstraction are
✓ Physical level
✓ Logical level
✓ view level
Data Independence: -
➢ Changing the schema at one level, without affecting the schema at higher
levels.
Physical Data Independence: -
✓ Changing the schema at physical level, without
affecting the schema at logical level.
Logical Data Independence: -
✓ Changing the schema at logical level, without affecting
the schema at view level.
TYPES:
• Network data model
• Hierarchical data model
• Relational model
• Entity relationship model
• Object based data model
• Semi structured data model
Ex:
Customer record is defined as
Type customer =record
Customer_name :string;
Customer_street :string;
Customer_city :string;
End
Type account=record
Acc_number : string;
Balance :integer;
End
In network model the two records are represented as
A-102 4000
Johnson church road chennai
A-101 5000
The sample database shows that Johnson has account A-102 & Michael
has account A-101 &A201 & Peter Has Account a-203.
Hierarchical Model:
➢ In Figure 1.3 describes Hierarchical model consists of a
collection of records that are connected to each other through
links.
➢ Records are organized as a collection of trees.
Ex:
Root
A-102 4000
A-203 6500
A-101 5000 A-201 9000
11010 Vinoth IT
• Introduction
▪ Functional Components of database system
1. Storage Manager
• Authentication & Integrity Manager
• File Manager
• Buffer Manager
• Transaction Manager
2. Query Processor
• DDL Interpreter
• DML Compiler
• Query Evaluation Engine
3. Architecture of Database System
• Two Tier Architecture
• Three Tier Architecture
4. Database System Structure Diagram
5. Types of Database users and
Administrators
Introduction
➢ Database system is partitioned into modules that deal with each of
the responsibilities of the overall system.
➢ In Figure 1.6 shows that the functional components of the database
system are,
1. Storage Manager
2. Query Processor
1. Storage Manager:
It is a component of database system that provides the interface
between the low-level data stored in the database and the
application programs and queries submitted to the system.
✓ DML Compiler:
Translates DML statements into an evaluation plan consisting of
low-level instruction that the query evaluation engine understands.
Two-tier architecture:
Application resides at the client machine where it invokes
database system functionality at the server machine through
query language statements.
E.g. client programs using ODBC/JDBC to communicate with a
database.
Three-tier architecture:
Client machine acts as merely a front end and does not contain
any direct database calls. Figure 1.7
The client end communicates with an application server through
forms interface and the application server in turn communicates
with the database system to access data.
E.g. web-based applications & applications built using “middleware”.
2. Application Programmers
➢ They are computer professionals who write application programs.
➢ Rapid application development tools are tools that enable
an application programmer to construct forms and reports with
minimal programming effort
3. Sophisticated Users
➢ Interact with the system without writing programs.
➢ They form their request using database query language.
4. Specialized Users:
➢ These are sophisticated users who write specialized database
applications that do not fit into traditional data processing
framework
➢ EX: CAD design, audio, video.
5. Database Administrator
➢ A person who has central control of both the data and the
program that accesses those data is called as database
administrator.
➢ The functions of database Administrator
1. Schema Definition:
✓ The Database Administrator creates original database schema
by executing a set of data definition statements in the DDL.
2. Storage structure and Access method definition
3. Schema And Physical Organization Modification
✓ The Database Administrator performs changes to the schema
and physical organization to reflect the changing need of the
organization.
4. Granting of authorization for data access
✓ It grants different types of authorization, so that the
database administrator can regulate which parts of the
database various users can access.
5. Routine Maintenance
(i) Periodically backing up the database.
(ii) Ensuring that enough free disk space is available for normal
operations.
(iii) Monitoring the jobs running and ensuring that
performance is not degraded.
5. Write Short notes on Introduction to Relational Database.
• A relational database is one which consists of rows & columns.
• A relational model uses a collection of tables to represent both
data & relationship among those data.
• Each table has multiple columns & each column has unique name.
• In record- b a s e d model, the database is structured into fixed
format records of a particular type.
• Each record type defines a fixed number of fields or attributes.
• It is the primary data model for commercial data processing application
• The data is stored in two-dimensional tables (rows and columns).
Refer Figure 1.8
• The data is manipulated based on the relational theory of
mathematics.
Name Balance
900 55
556 100000
647 105366
801 10533
Table 1.2 Account
SELECT OPERATION:
Select from Department where budget >8M Refer Table 1.5
PROJECT OPERATION
Project Department over dept_no, budget Refer Table 1.6
Dept_no Budget
D1 10M
D2 20M
D3 5M
JOIN OPERATION:
Join department and emp over dept_no (Refer Table 1.7)
Data should be entered in the foreign key column with great care,
as wrongly entered data can invalidate the relationship between
the two tables.
R1 – referencing
relation R2 –
referenced relation
5. Referential integrity:
It requires that the values appearing in specified attribute of any
tuple in referencing relation also appear in specified attributes of at
least one tuple in referenced relation.
▪ Theta Join ( Θ)
▪ Outer Join
• Left Outer Join
• Right Outer Join
• Full Outer Join
Definition
➢ The relational algebra is a collection of operators that take one
or more relations as their input or operands and returns a
single relation as their output or result.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 37
CS3492 –DBMS Unit-I Mailam Engineering College
L_no Amt
L-11 1000
L-12 1500
L-13 900
L-14 800
L-15 1100
Table 1.10 Loan
Composition of Relational Operations: (Refer table 1.11)
L_no
∏loan_no( b_name=”B”(loan))
L-12
L-15
Rename operation [ ]
C_name
A
B
C
D
E
Table 1.14 From Borrower and Depositor
The values selected are discrete and values are eliminated.
C_name
B
D
Table 1.15 Difference
Two conditions hold
✓ The relation r and s must have same number attributes.
A_no Amt
C_name A_no
L_20 1000
L_21 A L_20
500
B L_21
L_22 1500
Example: r=borrowerxloan is
r s = r - (r - s) x: ∏ A_no(borrower) ∏ A_no(loan)
L_no
L_20
L_21
Table 1.19 Set Intersection
Natural Join Operation
The natural join forms a Cartesian product of its two arguments,
performs a selection forcing equality on those attributes that appear in
both relation schemas and finally
removes duplicate attributes.
Natural join is a binary operator.
Natural join between two or more relations will result set of all
combination of tuples where they have equal common attribute.
Example: Borrower loan (Refer Table 1.20)
• Parts of SQL
1. Data Definition Language (DDL)
o Definition
o Commands
▪ CreateTable
▪ AlterTable
▪ DropTable
2. Data Manipulation Language (DML)
o Definition
o Commands
▪ Select
▪ Insert
▪ Delete
▪ Update
3. Data Control Language (DCL)
o Definition
o Commands
▪ Grant
▪ Revoke
1. CREATE TABLE
It is used to create a table.
Syntax
Example:
Create table account (accno varchar (10), Bname char(15),balance
number, primary key(accno));
2. DROP TABLE
This command is used to delete a table.
Syntax:
drop table <tablename>;
Example:
drop table account;
3. ALTER TABLE –
It is used to add, modify or remove attributes from the table.
• Add
o Syntax:
• Modify
o Syntax:
Alter table <tablename> modify (attributes);
o Example: Alter table account modify (deptvarchar (20));
• Drop
o Syntax:
Alter table<tablename> drop (attributes);
TRUNCATE COMMAND
➢ This command is used to delete the records but retain the structure.
Syntax
Example
Truncate table account;
DESC COMMAND
➢ It is used to View the table structure.
Syntax
desc <tablename>;
Example
desc account;
RENAME COMMAND
➢ It is used to Rename a table
Syntax
Rename (old_table_name) to (new_table_name);
Example
Rename account to acc;
INSERT TABLE:
An SQL INSERT statement adds one or more records to any
single table in a relational database.
Syntax1:
insert into table name values (value1, [value2, ... ])
Example: insert into phone_book values ('john doe', '555-1212');
UPDATE TABLE
An SQLUPDATE statement changes the data of one or more
records in a table.
Either all the rows can be updated, or a subset may be chosen using a
condition.
Syntax:
Update table_name set column_name = value [, column_name =value ...]
[where condition]
Example:
SELECT TABLE:
A SELECT statement retrieves zero or more rows from one or
more database tables or database views.
PUBLISHER_NAME
PHI
Technical
Nirali
Technicalo
SciTech
PUBLISHER_NAME
PHI
Technical
Nirali
SciTech
TITLE
Oracle
DBMS
ADBMS
Select title from book where unit_price between 300 and 400;
(Or)
Select title from book where unit_price >=300 and unit_price <=400;
Output
TITLE
Oracle
DBMS
Unix
Order by:
The ORDER BY clause identifies which columns are used to sort the
resulting data, and in which direction they should be sorted (options
are ascending or descending). (Refer Table 1.32)
Output
TITLE UNIT_PRICE
ADBMS 450
DBMS 400
DOS 250
Oracle 399
Unix 300
Table 1.32 Order By
Group By:
The GROUP BY clause is used to group the rows based on
certain criteria.
Example:
select publisher_name,sum(unit_price)” TOTAL BOOK AMOUNT”
from book group by publisher_ name; (Refer Table 1.33)
Output
Having:
➢ Having is equivalent to the where clause
➢ It is used to specify the search criteria or search condition
when group by clause is specified.
➢ Example: (Refer Table 1.34)
select publisher_name, sum(unit_price)” TOTAL BOOK AMOUNT”
from book
String operation:
SQL Specifies string by enclosing them in single quotes.
Example:
Select author_name from book where author_name like “Ba%‟;
Output
AUTHOR_NAME
Basu
Table 1.35 String Operation
Aggregate functions:
It takes collection of values as input and provides a Single value
as output.
Minimum
➢ MIN function is used to find the minimum value of a certain column.
➢ This function determines the smallest value of all selected values of a
column.
Syntax
MIN() (or) MIN( [ALL|DISTINCT] expression)
Output
MINIMUM
PRICE
250
Table 1.37 Minimum
Maximum
➢ MAX function is used to find the maximum value of a certain column.
➢ This function determines the largest value of all selected values of a
column.
Syntax
MAX() (or) MAX( [ALL|DISTINCT] expression )
Example (Refer Table 1.37)
select max(unit_price)”Maximum Price” from book;
Output
Maximum
Price
450
Total
1799
Table 1.38 Total
Count
➢ COUNT function is used to Count the number of rows in a database
table.
➢ It can work on both numeric and non-numeric data types.
Syntax
COUNT(*) (or ) COUNT( [ALL|DISTINCT] expression )
Union operation:
Union c l a u s e m e r g e s t h e o u t p u t o f t w o o r m o r e q u e r i e s i n t o a
single set of rows and columns.
Example (Refer Table 1.42)
(select customer_name from depositor) union (select customer_name from
Borrower)
Output
CUSTOMER_NAME
John
Sita
Vishal
Ram
Rohit
Tonny
Table 1.42 Union operation
Intersect operation
The intersect clause outputs only rows produced by both the queries
intersected i.e the intersect operation returns common records from
output of both queries.
Except operation:
The except also called as Minus outputs rows that are in first table but not in
second table. (Refer Table 1.44)
Example
(select customer_name from depositor) except (select customer_name from
Borrower)
Output
CUSTOMER_NAME
John
Sita
Table 1.44 Except operation
DELETE TABLE:
The DELETE statement removes one or more records from a table.
Syntax:
delete from table_name [where condition];
Any rows that match the WHERE condition will be removed from
the table.
If the WHERE clause is omitted, all rows in the table are removed
Example:
• delete from trees where height <80;
• delete from book;
grant <pivilege list > on < object > to < user id >[ with grant option ];
Description:
• Privilege List – SELECT, INSERT, DELETE, UPDATE, EXECUTE etc.
• Objects – TYPE < Type name >, TABLE < Table name >
• User ID – User ID or Role Name. IT can be replaced by
PUBLIC – meaning all user known to the system.
• WITH GRANT OPTION - Can grant those privileges on
that object to some other users.
Example:
• Grant insert, delete on employee to user1, user2
• Grant select on employee, department to a3 with grant
option;
2. REVOKE:
It used to revoke privileges from users or roles.
Syntax:
revoke <pivilege list > on < object > from < user id>
Description:
• Objects – TYPE < Type name >, TABLE < Table name >
• User ID – User ID or Role Name. IT can be replaced by PUBLIC
– meaning all user known to the system.
Example:
• REVOKE INSERT, DELETE ON EMPLOYEE, DEPARTMENT FROM A2;
Syntax:
commit
ROLLBACK
➢ The Rollback Command is the Transaction Control Command
Used to undo all changes of a transaction that have not already
been saved to the database.
Syntax:
rollback to savepoint-name;
SAVEPOINT
➢ To divide the transaction into smaller sections.
Syntax:
savepoint savepoint-name;
Example
Triggers:
➢ Trigger is a statement that is executed automatically by the
system as a side effect of a modification to the database.
Triggering Events:
➢ Triggering event can be insert, delete or update.
➢ Triggers on update can be restricted to specific attributes.
➢ Values of attributes before and after an update can be referenced.
o Referencing old row as delete and updates
o Referencing new row as inserts and updates
➢ Triggers can be activated before an event, which can serve as extra
constraints.
Syntax:
Example:
No row selected.
Trigger created.
SQL> update product set price = 800 where pid=12;
1 row updated.
EXEC SQL
END_EXEC
DYNAMIC SQL
Dynamic SQL is a programming technique which Allows
programs to construct and submit SQL queries at run time.
Dynamic SQL statements are not embedded in the source
program but stored as strings of characters that are
manipulated during a program‟s runtime.
The simplest way to execute a dynamic SQL statement is with
an EXECUTE IMMEDIATE statement.
This statement passes the SQL Statement to the DBMS For
compilation and execution.
Phases:
✓ Prepare
✓ Execute
Example
char* sqlprog = “update account
set balance = balance * 1.05
where account_number =?”
What is DBMS?
One of the most important tasks for DBMS is to create a database for complex data
and manage the data. It gives relief to the user by creating a structure for the
complex data sets so that users can access it and manipulate them very easily.
Modern database systems not only provide storage for the data but they store and
manage the metadata (data of data) like data procedural rules, validation rules etc.
DBMS also provides performance tuning, which makes accessing data faster and
easier.
Security Management
To keep the data safe and ensure the integrity, the database system provides the
features for backup and recovery management. If the system fails due to some
reason then it recovers the data and keeps the data safe.
DBMS provides a database access language which is also called a query language.
Query languages are non-procedural languages used to access the database and
manipulate the data. SQL is an example of a query language. The majority of DBMS
vendors provide the support of various query languages to access the data.
For example, if a user asks for the date from a database and he receives it as 14
December 2022, but in the database, it is stored in different columns of month, date
and year.
Multi User Access control is another feature which is provided by the modern
Database Systems. So, more than one user can access the database at the same
time without any problem. This feature makes sure the integrity of the data present
in the database. It also follows the ACID property, so the database will be consistent
while multiple users are accessing it concurrently. It is very useful for the database
of organizations where multiple database engineers are working concurrently.
When a user requests data from the database, it uses some environments like
browsers (Chrome or Firefox etc.) to get the data.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for
an attribute.
o The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the corresponding
domain in given Table 1.46.
Example:
o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify those
rows.
o A table 1.47 can contain a null value other than the primary key field.
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set
uniquely.
o An entity set can have multiple keys, but out of which one key will be the
primary key. A primary key can contain a unique and null value in the
relational table 1.49.
Example:
Advertisement
APRIL-MAY 23
PART-A
PART-B
1.Explain the database management system architecture with a neat sketch
Q4. Pg 22
2.Outline select and project operations in relational algebra with an
example Q8. Pg 31
3.What is embedded SQL? Explain with an example. Q13.Pg 55
NOV-DEC 23
PART-A
1. Relate the terms database, database management system. Q1,2. Pg 1
2. What is the difference between logical data independence and
physical data independence? Q19.Pg 4
PART-B
1. Discuss the main categories of data models What are then the differences
between the relational model the object model, and XMI, model? Q3. Pg 16
APRIL-MAY 24
PART-A
1. Differentiate File processing system and Database processing system.
Q25.Pg 16
2. List some relational algebra operations. Q65.Pg 14
PART-B
1. What is data model? List its different types. Explain with example. Q3. Pg 20
NOV-DEC 24
PART-A
1. What is a data model? List the types of data model used. Q24.Pg 6
PART-B
1. Discuss in detail, the functions of DBMS. Q14.Pg 66
2. Explain the various functions of advanced SQL features and embedded SQL
with examples. Q13.Pg 63
UNIT II
DATABASE DESIGN
Entity-Relationship model – E-R Diagrams – Enhanced-ER Model – ER-to-Relational
Mapping – Functional Dependencies – Non-loss Decomposition – First, Second,
Third Normal Forms, Dependency Preservation – Boyce/Codd Normal Form – Multi-
valued Dependencies and Fourth Normal Form – Join Dependencies and Fifth
Normal Form.
1. What is meant by normalization of data? (Or) What is normalization?
(MAY 2010, 2014)
➢ It is a process of analyzing the given relation schemas based on their
Functional Dependencies (FDs) and primary key to achieve the properties
➢ Minimizing redundancy
➢ Minimizing insertion, deletion and updating anomalies.
2. Define canonical cover or Minimal Cover.
A canonical cover Fc for F is a set of dependencies such that F logically implies
all dependencies in FC and Fc logically implies all dependencies in F. Fc must have
the following properties
3. List the properties of canonical cover.
Fc must have the following properties.
• No functional dependency in Fc contains an extraneous attribute.
• Each left side of a functional dependency in Fc is unique.
4. Explain the desirable properties of decomposition. APR-2019
➢ Lossless-join or non loss decomposition
➢ Dependency preservation
➢ Repetition of information
5. What is first normal form?
➢ It is the process of eliminating duplicate forms from same data and creating
the separate tables for each group of related data and also to identify each
row with a unique column or set of columns.
➢ The domain of attribute must include only atomic (simple, indivisible) values.
• X->Y
• Y does not ->X
• Y->Z
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it
was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF.
50. Define single valued and multi valued attributes. (Nov/Dec 2024)
Single valued & multi valued attribute:
1. Single valued attribute:
➢ The attribute that has single value for a particular entity is called single
valued attribute.
2. Multivalued attributes:
➢ The attribute that has a set of values for a specific entity is called
multivalued attributes.
➢ Ex: phone_number.
PART-B
1. Explain the Entity Relationship Model. (APRIL/MAY 2010) (Nov/Dec
2010) (May/June 2012)
• Introduction
▪ Basic Concepts of ER Model
o Entity
o Entity Sets
o Relationship Set
o Attributes
▪ Constraints
o Mapping cardinalities
o Participation constraints
o Keys
▪ Entity Relationship Diagram –Notations
▪ Extended E-R Features
o Specialization
o Generalization
o Constraints on generalization
o Attribute inheritance
o
Introduction
➢ The E-R model is a high-level data model it distinguishes between basic
object called entities and relationship among those objects.
Basic Concepts of ER Model
➢ Entity
➢ Entity sets
➢ Relationship sets
➢ Attributes
Entity
➢ Entity: It is a thing or object in the real world that is distinguishable
from other objects.
➢ An entity has a set of properties and the values for some set of
properties may uniquely identify an entity.
➢ Ex: Person is an entity and person id– uniquely identifies that person.
Entity Sets:
➢ Entity set: An entity set is a set of entity of same type that share same
properties or attributes in Table 2.1.
➢ Ex: Student - represents the set of all students in the university.
Roll no S_name
101 John
102 Peter
301 Saran
405 Michael
Table 2.1 Entity Set
Types of Entity Set
1. Strong Entity Set
➢ An entity set that has a primary key is called strong entity set
2. Weak Entity Set
➢ An entity set that does not have a primary key is called weak entity set
3. Identifying or Owner Entity set
➢ Weak entity set must be associated with another entity set called
identifying or owner entity set.
Relationship Set
➢ Relationship: Association among several entities.
✓Derived
✓Descriptive
Simple & composite attributes:
1. Simple attributes:
➢ Attributes that cannot be divided into subparts are called simple
attributes
➢ Ex: S.no is a simple attribute.
2. Composite attribute:
➢ Attribute that can be divided into subparts are called as composite
attributes.
➢ Ex: name -first _name, last_name
4. Multivalued attributes:
➢ The attribute that has a set of values for a specific entity is called
multivalued attributes.
➢ Ex: phone_number.
Derived attributes:
➢ The value of this type of attribute can be derived from the values of other
related attributes or entities.
Example: DOB - base or stored attribute
Dat
aee
Mapping cardinalities:
➢ A mapping cardinality is a data constraint that specifies how many
entities an entity can be related to in a relationship set.
✓ one to one
✓ one to many
✓ many to one
✓ many to many
➢ Consider a binary relationship set R on entity sets A and B.
➢ Example: there is one project manager who manages only one project in
Fig 2.3.
Computer Science
Students * Takes 1
Course
4. Many-To-Many
➢ An entity in A is related to any number of entities in B, but an entity in B is
related to any number of entities in A.
➢ Example: Many teachers can teach many students in Fig 2.6
Students
Teachers teaches
Participation Constraints
1. Total Participation:
➢ The participation of an entity set E in a relationship set R is said to be
total, if every entity in E participates in at least one relationship
in R
➢ Example:
2. Partial Participation:
➢ If only some entities in E participate in relationship in R, then the participation
of entity E in relationship R is said to be partial in Fig 2.8 & Fig 2.9
CStreet
Borrower Loan
Customer
Generalization:
• Super class: An Entity type that represents a general concept at high level is
called Super class.
• Sub Class: An Entity type that represents a specific concept at lower levels is
called sub class.
➢ Entities that satisfy account type= ―saving‖, belong to low level entity
saving account.
➢ Entities that satisfy account type =‖checking‖ belongs to low level entity set
checking account.
➢ Since the lower-level entities are evaluated on the basics of same attribute
it is called as attribute defined.
User defined:
➢ Not constrained by a membership condition.
Constraint 2:
▪ Disjoint
▪ Overlapping
Disjoint:
➢ A disjoint ness constrains that an entity belongs to no longer one low level
entity set.
➢ Ex: Account entities satisfy only one condition either a saving account or
checking account, but cannot be both.
Overlapping:
➢ The same entity may belong to more than one lower-level entity set within
a single generalization.
➢ Ex: Generalization applied to customer & employee leads to higher level
entity set person.
Constraint 3:
Attribute Inheritance:
• The attributes of higher-level entity are said to be inherited by lower-level
entity set.
Aggregation
• Aggregation is a process when relation between two entities is treated as
a single entity.
Job
Manages
Manager
Fig 2.12 redundant relationship
EXAMPLE: ER diagram with Aggregation
Job
Manages
Manager
Functional dependencies:
Multivalued dependency
➢ Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
➢ A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three
attributes.
➢ Example: Suppose there is a bike manufacturer company which produces
two colors(white and black) of each model every year in Table 2.2.
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Table 2.3 Trivial functional dependency
Consider this table with two columns Emp_id and Emp_name.
{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a
subset of {Emp_id,Emp_name}.
Transitive dependency:
A transitive is a type of functional dependency which happens when t is indirectly
formed by two functional dependencies in Table 2.5.
Example:
Alibaba Jack Ma 54
Decomposition (Apr/May-24)
➢ Decomposition is the process of breaking down in parts or elements.
➢ It replaces a relation with a collection of smaller relations.
➢ It breaks the table into multiple tables in a database.
➢ It should always be lossless, because it confirms that the information in the
original relation can be accurately reconstructed based on the decomposed
relations.
➢ If there is no proper decomposition of the relation, then it may lead to
problems like loss of information.
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
Example-
Consider the following relation R( A , B , C )- in Table 2.6
A B C
1 2 1
2 5 3
3 3 3
Consider this relation is decomposed into two sub relations R1(A, B) and R2(B, C)-
The two sub relations are- Table 2.7 & Table 2.8
A B
1 2
2 5
3 3
B C
2 1
5 3
3 3
A B C
1 2 1
2 5 3
3 3 3
Table 2.9 R1 ⋈ R2 = R
➢ This relation is same as the original relation R. Thus, we conclude that the
above decomposition is lossless join decomposition.
2. Dependency Preservation
➢ Dependency is an important constraint on the database.
➢ Every dependency must be satisfied by at least one decomposed table.
➢ If {A → B} holds, then two sets are functional dependent. And, it becomes
more useful for checking the dependency easily if both sets in a same relation.
➢ This decomposition property can only be done by maintaining the functional
dependency.
➢ In this property, it allows to check the updates without computing the natural
join of the database structure.
3. Lack of Data Redundancy
➢ Lack of Data Redundancy is also known as a Repetition of Information.
➢ The proper decomposition should not suffer from any data redundancy.
➢ The careless decomposition may cause a problem with the data.
➢ The lack of data redundancy property may be achieved by Normalization
process.
Normalization:
➢ It is the process of eliminating duplicate forms from same data and creating
the separate tables for each group of related data and also to identify each
row with a unique column or set of columns.
➢ The table has repeating groups; it is called an unnormalized table in Table
2.10.
➢ Relational databases require that each row only has a single value per
attribute, and so a repeating group in a row is not allowed.
➢ A relation is in first normal form if it meets the definition of a relation:
1. Each attribute (column) value must be a single value only.
2. All values for a given attribute (column) must be of the same type.
3. Each attribute (column) name must be unique.
4. No two tuples (rows) in a relation can be identical.
o Example:
Summary: 1NF
• A relation is in 1NF if it contains no repeating groups
• To convert an unnormalised relation to 1NF either
• Flatten the table and change the primary key or
• Decompose the relation into smaller relations, one for the repeating groups
and one for the non-repeating groups.
• Remember to put the primary key from the original relation into both new
relations.
• This option is liable to give the best results.
➢ A relation is in 2NF if and only if, it is in 1NF and every non-key attribute is
fully functionally dependent on the whole key (primary key). (i.e) All
attributes are fully dependent on primary key.
➢ A relation is in second normal form (2NF) if its non-prime attributes are fully
functionally dependent on primary key.
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
➢ To convert the given table 2.13 & table 2.14 into 2NF, we decompose it into
two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Table 2.14 TEACHER_SUBJECT
Summary: 2NF
➢ A relation is in 2NF if it contains no repeating groups and no partial key
functional dependencies.
➢ Rule: A relation in 1NF with a single key field must be in 2NF
101 1 AAA
102 2 BBB
103 3 CCC
104 4 DDD
Table 2.15 Transitive Dependency
Super keys
• {RegID}
• {RegID, RollNo}
• {RegID, Sname}
• {RollNo, Sname}
• {RegID, RollNo, Sname}
Candidate Keys
• {RegID}
• {RollNo}
Third Normal Form
➢ A table is said to be in the Third Normal Form when,
i) It is in the Second Normal form. (i.e. it does not have partial functional
dependency
ii) It doesn't have transitive dependency.
Or in other words 3NF can be defined as:
➢ A table is in 3NF if it is in 2NF and for each functional dependency x-> y at
least one of the following conditions holds:
i) X is a super key of table
ii) Y is a prime attribute of table in Table 2.16
➢ For example: Consider following table Student _details as follows -
• Rule: A relation in 2NF with only one non-key attribute must be in 3NF
• In a normalized relation a non-key field must provide a fact about the key,
the whole key and nothing but the key.
• Relations in 3NF are sufficient for most practical database design problems.
However, 3NF does not guarantee that all anomalies have been removed.
• There can be multiple teachers teaching one course, for example course C car
taught by both the teachers namely - Ankita and Archana.
• The candidate key for above table can be (sid,course), because using these
columns we can find
• The above table holds following dependencies
(sid, course)->Teacher
Teacher->course
• The above table is not in BCNF because of the dependency teacher-s-course.
that the teacher is not a super key or in other words, teacher is a non p
attribute and course is a prime attribute and non-prime attribute derives the p
attribute.
• To convert the above table to BCNF we must decompose above table into Stu
and Course tables
Student
sid Teacher
1 Ankita
1 Poonam
2 Ankita
3 Supriya
4 Archana
Student
Here sid =1 leads to multiple values for courses and skill. Following table
2.21 shows this
1 C English
1 C++ German
1 C German
1 C++ English
2 Java English
2 Java French
Here sid and course are dependent but the Course and Skill are
independent. The multivalued dependency is denoted as:
sid →>Course
sid →> Skill
Fourth Normal Form
Definition: For a table to satisfy the Fourth Normal Form, it should satisfy
the following two conditions:
1) It should be in the Boyce-Codd Normal Form (BCNF).
2) And, the table should not have any multi-valued dependency.
For example: Consider following student relation which is not in 4NF as it cont
'
multivalued dependency.
Student Table
sid Course
1 C
1 C++
2 Java
Table 2.23 Student course table
Student skill table:
Key Skill
:
1 English
(sid,
1 German
skill
2
)id English
2 French
12. Explain Join Dependency and Fifth Normal Form (5 NF). (MAY 2013)
The fifth normal form is also called as project join normal form.
For example - Consider following table 2.25
Seller _Company
Seller Company
Rupali Godrej
Sharda Dabur
Sunil Amul
Sunil Britania
Table 2.26 Seller _Company
Seller product
Seller Product
Rupali Cinthol
Sharda Honey
Sharda HairOil
Sharda RoseWater
Sunil Icecream
Sunil . Biscuits
dependency. But it not in 5th normal form because if we join the above two
table we may get table 2.28
Company_Product
Company Product
Godrej Cinthol
Dabur Honey
Dabur HairOil
Dabur RoseWater
Amul Icecream
Britania Biscuit
Table 2.30 Company_Product
Mapping Relationship
A relationship is an association among entities in Fig 2.21.
Mapping Process
• Add the primary keys of all participating Entities as fields of table with their
respective data types.
• If relationship has any attribute, add each attribute as field of table.
• Declare a primary key composing all the primary keys of participating entities.
• Declare all foreign key constraints.
Mapping Process
Mapping Process
• Create tables for all higher-level entities.
• Create tables for lower-level entities.
• Add primary keys of higher-level entities in the table of lower-level entities.
• In lower-level tables, add all other attributes of lower-level entities.
• Declare primary key of higher-level table and the primary key for lower-level
table.
• Declare foreign key constraints.
14. Consider the following relational schemes for a library database: Book
(Title, Author, Catalog_no, Publisher, Year, Price) Collection (Title,
Author, Catalog_no) the following are functional dependencies:
(i) Title Author Catalog_no
(ii) Catalog_no→ Title Author Publisher Year
(iii) Publisher Title Year → Price
(iv) Assume (Author Title) is the key for both schemes. Apply the
appropriate normal form for Book and Cancellation? (Apr/may 2024)
Answer:
Table Collection is in BCNF as there is only one functional dependency “Title Author
–> Catalog_no” and {Author, Title} is key for collection. Book is not in BCNF
because Catalog_no is not a key and there is a functional Dependency “Catalog_no
–> Title Author Publisher Year”. Book is not in 3NF because non-prime attributes
(Publisher Year) are transitively dependent on key [Title, Author]. Book is
in 2NF because every non-prime attribute of the table is either dependent on the
key [Title, Author], or on another non-prime attribute.
APRIL/MAY-2023
PART-A
1.What is a derived attribute? Give examples. Q52 Pg 12
2.What is fuctional dependency? Give examples Q6 Pg 2
PART-B
1.Elaborate on first normal form, second normal form and third normal form with
examples. Q7,8,9 Pg40
2.Explain Boyce Codd normal form, fourth normal form and fifth normal form with
Examples. Q10,11,12 Pg52
PART C
1. Consider the following relations: Q10, Pg52
EMPLOYEE
ENO NAME DOB GENDER DCODE
12345 HAMEN 24-MAR-2001 M 201
12346 VINI 12-MAR-2001 F 202
12347 ANI 11-JAN-1999 F
12348 PETER 14-FEB-2001 M
DEPARTMENT
DCODE DNAME
201 COMPUTER SC
202 INFN SC
203 CIVIL
204 MECHANICAL
The Primary Key of each relation is underlined. Outline Cartesian product, equi join,
left outer join, right outer join and full outer join operations in relational algebra.
Illustrate the above relational algebra operation with the EMPLOYEE and
DEPARTMENT relations.
has to be kept track. The company also keeps track of the dependents of each
employee. The attributes of dependent include dependent name, date of birth,
gender and relationship with the employee.
(i) Model an Entity Relationship diagram for the above scenario (7)
(ii) Map the entity relationship diagram you have modeled to relations (8) Q4c,
Pg29
NOV/DEC-2023
PART-A
1. Define functional dependency. Q6 Pg2
2. Transform the following ER diagram into a relational schema diagram.
(Partial Participation Diagram)
PART-B
1. Consider the following schemas. The primary key for each relation is denoted by
the underlined attribute Q7, Pg40
LIVES (Person- name, street, city)
WORKS (Person-name company name, salary)
LOCATED-IN (Company-name, city)
MANAGES (Person-name, manager-name)
Write relational algebra expressions for the following queries:
(i) Find the name of all employees (ie, persons) who work for the City Bank
company (which is a specific
company in the database).
(ii) Find the name and city of all employees who work for City Bank.
(iii) Find the name, Street and city of all employees who work for City Bank and
earn more than $10,000.
(iv) Find all employees who live in the same city as the company they work for
(v) Find all persons who do not work for City Bank.
(vi) Find the second largest salary earned by the employee.
APRIL/MAY-2024
PART-A
1. Define Entity, Relationship and attributes in ER model. Q41 Pg 8
2. Why BCNF is preferred over 3NF? Q15, Q17 Pg 4
PART-B
1. What is normalizations? List its benefits and explain briefly about 3NF, 4NF and
BCNF with suitable example. Q6, 9, 10, 11 Pg 35
2. Illustrate functional dependency with an example. (6) Q4 Pg 27
3. Discuss about dependency preservation. Q5 Pg 34
PART-C
1. Consider the following relational schemes for a library database: Book (Title,
Author, Catalog_no, Publisher, Year, Price) Collection (Title, Author, Catalog_no)
the following are functional dependencies:
(i) Title Author Catalog_no
(ii) Catalog_no→ Title Author Publisher Year
(iii) Publisher Title Year → Price
(iv) Assume (Author Title) is the key for both schemes. Apply the appropriate
normal form for Book and Cancellation? Q14 Pg 53
NOV/DEC-2024
PART-A
1. Define single valued and multi valued attributes. Q50 Pg 10
2. What is BCNF? Q17 Pg 4
PART-B
1. Construct an ER diagram for Election Commission for online voting system as per
the information in the system: Voters' details, candidates and the political party
they belong or independent type, vote casted, winner of any particular election-
area. Q15 Pg 53
PART-C
1. Consider the following relational schemas: Employee (empno, name, office, age)
Books (ISBN, title, author, publisher) Borrow (empno, ISBN, issuedate) Write
relational algebra queries for the following:
(i) Find the names of employees who have borrowed a book published by
“Springer”.
(ii) Find the names of employees who have borrowed more than five different
books published by “PHI”.
(iii) For each publisher, find the names of employees who have borrowed more
than five books of that publisher. Q16 Pg 54
2. Consider a relation with schema R(A, B, C, D) and FD's AB -> C, C -> D, and
D -> A.
a) What are all the nontrivial FD's that follow from the given FD's? You should
restrict yourself to FD's with single attributes on the right side.
b) What are all the keys of R?
c) What are all the superkeys for R that are not keys? Q17 Pg 54
1. What are the ACID properties? (Nov/Dec 2020 & Apr/May 2021)
A - Atomicity
C - Consistency
I - Isolation
D - Durability
It is a set of properties that guarantee database transactions are processed reliably.
2. What are two pitfalls of lock-based protocols? (APRIL/MAY-2011)
● Deadlock
● Starvation
• R-timestamp (Q) denotes the largest time stamp if any transaction that executed
READ (Q) successfully.
38. What are the facilities available in SQL for Database recovery?
(MAY/JUNE 2010)
● SQL support for transaction, and hence for transaction-based recovery.
39. What are the benefits and disadvantages do strict two-phase locking
provide?
Benefits:
• Transaction only read values of committed transaction
• No cascaded aborts.
Disadvantages:
• Limited concurrency
• Deadlocks
40. List the two commonly used Concurrency Control techniques.
• Two phased Locking
• Time stamp-based ordering
41. List the SQL statements used for transaction control.
• Commit, Roll back, Savepoint.
42. What is shadow paging?
• An alternative to log-based crash recovery technique is shadow paging.
• This technique needs fewer disk accesses than do the log-based methods. Here
is divided into pages that can be stored anywhere in the disk. In order to
identify the location of a give page we use page table.
43. List the properties that must be satisfied by the transaction.
• Atomicity
• Consistency reservation
• Isolation
• Durability or permanency
44. What are the types of transparencies that a distributed database must
support?
1. Location transparency
2. Fragmentation transparency
3. Replication transparency
45. What are the three kinds of intent locks?
• Intent Exclusive
• Intent shared
• Shared-Intent Exclusive
46. Define serializability. (Apr/May 2023)
Serializability is a property of a system describing how different process
operate on shared data A system is serializable if its result is the same as if the
operations were executed in some sequential order, meaning there is no overlap in
execution.
47. How will you handle deadlock during two transactions in database?
(Apr/May 2024)
To handle a deadlock between two database transactions, the best
approach is to design your transactions to minimize the possibility of
deadlocks by consistently accessing resources in the same order, keeping
transactions short, using appropriate isolation levels, and implementing retry
logic in your application code to automatically handle deadlocks when they
occur; if a deadlock is detected, analyze the situation to identify the involved
PART – B
Collections of operations (processes) that form a single logical unit of work are
called transactions. (or) Transaction is an executing program that forms a logical
unit of database processing.
⮚ A transaction includes one or more database access operations. These can
include insertion, deletion, modification or retrieval operations.
⮚ The database operation can be embedded within an application or can be
specified by high level query language.
⮚ These operations executed between the begin transaction and end
transaction.
In a real database system, the write operation does not automatically result in the
immediate update of the data on the disk;
⮚ The write operation may be temporarily stored in memory and executed on
the disk later. For now, however, we shall assume that the write operation
updates the database immediately.
⮚ A transaction must be in one of the following states is shown in Fig 3.1.
✔ Active
✔ Partially committed
✔ Failed
✔ Aborted
✔ Committed
Active
✔ In this state, the transaction is being executed.
✔ This is the initial state of every transaction.
Partially Committed
When a transaction executes its final operation, it is said to be in a partially
committed state.
Aborted
If any of the checks fails and the transaction has reached a failed state, then the
recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the
transaction. Transactions in this state are called aborted.
The database recovery module can select one of the two operations after a
transaction aborts.
✔ Re-start the transaction
✔ Kill the transaction
Committed
If a transaction executes all its operations successfully, it is said to be
committed.
Example:
Let Ti be a transaction that transfer money from account A (5000) to account B. The
transaction can be defined as
Ti:
read (A)
A: = A – 5000 (withdraw from A)
write (A); (update A)
read (B) (deposit B)
B: = B + 5000
write (B) (update B)
B:=B+50;
Write(B);
ATOMICITY:
Either all operation of the transaction is reflected properly in the database, or
none are.
Example:
Consider before execution of transaction Ti, A=1000 and B=2000.
Suppose during the execution of transaction Ti , a failure occurs after the write(A)
operation but before write (B) operation the values of A & B reflected in the
database are A=950, B=2000 is an inconsistent state.
● To ensure atomicity, the database keeps track of the old values of any
data. This information is written to file called log.
● If the transaction does not complete its execution, the database system
restores the old values from the log.
● Responsibility
● Ensuring the atomicity is the responsibility of the database system
particularly recovery system.
CONSISTENCY:
Execution of the transaction in isolation (i.e.) with no other transaction
executing in parallel preserves the consistency.
Example:
The consistency requirement is the sum of A and B be unchanged by the execution
of transaction.
ISOLATION:
Even though multiple transactions may execute concurrently, the system
guarantees that for every pair of transaction Ti and Tj it appears to Ti that either Tj
finished execution before Ti started execution or after Ti finished.
● Each transaction in unaware of other transaction executing
concurrently in the system.
Example:
Consider the database is temporarily in consistent while the transaction to transfer
funds from A to B is executing with the deducted total written toB.
DURABILITY:
After a transaction completes successfully, the changes made to the database
persists, even if there are system failures.
● Durability ensures that Updates carried out by the transaction have been written
to disk before the transactions completes.
o Responsibility:
● Ensuring Durability is the responsibility of a component called the recovery
management.
The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
Example: 1
Example: 1
Non serial schedule S1
T1 T2
read(x)
write(x)
read(x)
write(x)
read(y)
write(y)
read(y)
write(y)
Serial schedule S3
T1 T2
read(x)
write(x)
read(y)
write(y)
read(x)
write(x)
read(y)
write(y)
Conflict serializability:
Conflict instruction:
The write (x) instruction of T1 conflicts with the read (x) instruction of T2.
❖ However, the write (x) instruction of T2 does not conflict with the read (y)
instruction of T1, because the two instructions access different data items.
❖ Swap the write (y) instruction of T1 with the write (x) instruction of T2.
❖ Swap the write (y) instruction of T1 with the read (x) instruction of T2.
T1 T2
read(x)
write(x)
read(y)
write(y)
read(x)
write(x)
read(y)
write(y)
The concept of conflict equivalence leads to the concept of conflict serializability.
● A schedule S is conflict serializable, if it is conflict equivalents to a serial
schedule.
● Thus, schedule S2 is conflict serializable, since it is conflict equivalent to the
serial schedule S1.
View serializability:
A schedule S is view serializable if it is view equivalent to a serial schedule.The
schedule S and S’ are said to be view equivalent if the following conditions met:
1. For each data item x, if transactions T1 reads the initial value of x in schedule
S, then transaction T1 must, in schedule S’, also read the initial value of x.
2. For each data item x, if transaction T1 executes read (x) in schedule S, and if
that value was produced by a write(x) operation executed by transaction T2,
then the read(x) operation of transaction T1 must, in schedule S, also read the
value of x that was produced by the same write (x) operation of transaction
T2.
3. For each data item, x, the transaction that performs the final write(x)
operation in schedule S must perform the final write(x) operation in schedule
S’.
Example: Schedule1
T1 T2
read(x)
write(x)
read(y)
write(y)
read(x)
write(x)
read(y)
write(y)
Schedule2
T1 T2
read(x)
write(x)
read(x)
read(y) write(x)
write(y)
read(y)
write(y)
⮚ The concept of the view equivalent leads to the concept of view serializability.
The set of all edges Ti->Tj exists if and only if these three conditions must hold
If the precedence graph has no cycles, it is conflict serializable is shown in fig 3.2
If the precedence graph has cycles, it is not conflict serializable is shown in fig 3.3
Example 1:
Example 2:
T1 T2
Read(A)
A:=A-50
Read(A)
Temp=A*0.1
A:=A-temp
Write(A)
Read(B)
Write(A) B:=B+temp
Read(B) Write(B)
B:=B+50
Write(B)
CONCURRENCY:
Concurrency allows many transactions to access the same database at the same
time. Process of managing simultaneous execution of transactions in a shared
database, to ensure the serializability of transactions, is known as concurrency
control.
● Process of managing simultaneous operations on the database without having
them interferes with one another.
● Prevents interference when two or more users are accessing database
simultaneously and at least one is updating data.
● Although two transactions may be correct in themselves, interleaving of
operations may produce an incorrect result.
Need for Concurrency:
problem
This problem occurs when two transactions that access the same database items
have their operations interleaved in a way that makes the value of some database
item incorrect. Successfully completed update is overridden by another user.
❖ Transaction A updates the tuple at time t3 and transaction B updates the same
tuple at time t4.
❖ So, Transaction A’s update is lost at time t4, because transaction B overwrites
it without even looking at it.
This problem occurs when one transaction updates a database item and then the
transaction fails for some reason. The updated item is accessed by another
transaction before it is changed back to its original value.
● Occurs when one transaction can see intermediate results of another
transaction before it has committed.
● The uncommitted dependency problem arises if one transaction is allowed
to retrieve and update a tuple that has been updated by another
transaction but not yet committed by that other transaction.
● There is possibility that it never will be committed but will be rolled back, so
first transaction will have seen some data that no longer exists.
A Closer Look:
The operations in concurrency point of view are database retrievals and database
updates. The transaction is consisting of sequence of such operations (BEGIN
TRANSACTION and COMMIT or ROLLBACK).
A and B are concurrent transaction. The Problems can occur if A and B want to read
or write the same database object, say tuple t.
There are four possibilities:
● RR- A and B want read same tuple t. Read cannot interface with each other so
there is no problem in this case.
● RW- A read t and then B wants to write t. If B is allowed to perform its write,
inconsistent analysis is affected by RW conflict.
The B does perform its write and a then reads t again, it will see a value difference
from what it saw before, state of affairs referred to as non-repeatable read, it
causes RW conflicts.
▪ WR: A writes t and then B wants to read it. If B is allowed to perform its
read then the uncommitted dependency problem can arise. The
uncommitted dependencies are caused by WR conflicts.
▪ WW: A writes t and B wants to write t. If B is allowed to perform its write,
then the lost problem can arise, the lost updates are caused by WW
conflicts. Example: B write, if it is allowed, is said to be dirty write.
● Locking protocols restrict the number of possible schedules. The set of all
such schedules is a proper subset of all possible serializable schedules.
1) Shared Mode
⮚ It is denoted by S.
2) Exclusive
⮚ It is denoted by X
● To access a data item, transaction Ti must first lock that item. If the data
item is already locked by another transaction in an incompatible mode, the
concurrency-control manager will not grant the lock until all incompatible
locks held by other transactions have been released.
⮚ The compatibility relation between the two modes of locking appears in the
matrix Figure 3.8. An element comp (A, B) of the matrix has the value true if
and only if mode is compatible with mode B.
S X
S True False
X False False
Consider again the banking example. Let A and B be two accounts that are accessed
by transactions T1 and T2.
T1: lock-X(B);
read(B);
B: =B −50;
write(B);
unlock(B);
lock-X(A);
read(A);
A: =A + 50;
write(A);
unlock(A).
T2: lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A+B).
Suppose now that unlocking is delayed to the end of the transaction. Trans-action
T3corresponds to T1 with unlocking delayed. Transaction T4corresponds to T2with
unlocking delayed.
This two-phase locking protocol requires not only that locking be two phases
(Growing phase and Shrinking phase), but also that all exclusive-mode locks taken
by a transaction be held until that transaction commits. This is called strict two-
phase locking protocol.
A schedule in which the transactions participate is then serializable, and the only
equivalent serial schedule permitted has the transactions in order of their timestamp
values. This is called timestamp ordering (TO).
The algorithm associates with each database item X two timestamp (TS) values:
1. read_TS(X). The read timestamp of item X is the largest timestamp among all
the timestamps of transactions that have successfully read item X—that is,
read_TS(X) = TS(T), where T is the youngest transaction that has read X
successfully.
The concurrency control algorithm must check whether conflicting operations violate
the timestamp ordering in the following two cases:
a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back T and
reject the operation. This should be done because some younger transaction with a
timestamp greater than TS(T)—and hence after T in the timestamp ordering—has
already read or written the value of item X before T had a chance to write X, thus
violating the timestamp ordering.
b. If the condition in part (a) does not occur, then execute the write_item(X)
operation of T and set write_TS(X) to TS(T).
These protocols for concurrency control keep copies of the old values of a data item
when the item is updated (written); they are known as multiversion concurrency
control because several versions (values) of an item are kept by the system.
● When a transaction requests to read an item, the appropriate version is
chosen to maintain the serializability of the currently executing schedule.
● One reason for keeping multiple versions is that some read operations that
would be rejected in other techniques can still be accepted by reading an older
version of the item to maintain serializability.
● When a transaction writes an item, it writes a new version and the old
version(s) of the item is retained. Some multiversion concurrency control
algorithms use the concept of view serializability rather than conflict
serializability.
In this method, several versions X1, X2, … , Xk of each data item X are maintained.
For each version, the value of version Xi and the following two timestamps
associated with version Xi are kept:
In this multiple-mode locking scheme, there are three locking modes for an item—
read, write, and certify—instead of just the two modes (read, write).
● Hence, the state of LOCK(X) for an item X can be one of read-locked, write-
locked, certify-locked, or unlocked. In the standard locking scheme, with only
read and write locks a write lock is an exclusive lock. We can describe the
relationship between read and write locks in the standard scheme is shown in
fig 3.9.
In this scheme, updates in the transaction are not applied directly to the database
items on disk until the transaction reaches its end and is validated.
● During transaction execution, all updates are applied to local copies of the
data items that are kept for the transaction. At the end of transaction
execution, a validation phase checks whether any of the transaction’s
updates violate serializability.
1. Read phase. A transaction can read values of committed data items from the
database. However, updates are applied only to local copies (versions) of the data
items kept in the transaction workspace.
3. Write phase. If the validation phase is successful, the transaction updates are
applied to the database; otherwise, the updates are discarded and the transaction is
restarted.
The optimistic protocol we describe uses transaction timestamps and also requires
that the write_sets and read_sets of the transactions be kept by the system.
Additionally, start and end times for the three phases need to be kept for each
transaction.
● The write_set of a transaction is the set of items it writes, and the read_set is
the set of items it reads.
● In the validation phase for transaction Ti, the protocol checks that Ti does not
interfere with any recently committed transactions or with any other
concurrent transactions that have started their validation phase.
The validation phase for Ti checks that, for each such transaction Tj that is either
recently committed or is in its validation phase, one of the following conditions
holds:
1. Transaction Tj completes its write phase before Ti starts its read phase.
2. Ti starts its write phase after Tj completes its write phase, and the read_set of Ti
has no items in common with the write_set of Tj.
3. Both the read_set and write_set of Ti have no items in common with the
write_set of Tj, and Tj completes its read phase before Ti completes its read phase.
When validating transaction Ti against each one of the transactions Tj, the first
condition is checked first since (1) is the simplest condition to check.
● If any one of these three conditions holds with each transaction Tj, there is no
interference and Ti is validated successfully.
● If none of these three conditions holds for any one Tj, the validation of
transaction Ti fails (because Ti and Tj may violate serializability) and so Ti is
aborted and restarted later because interference with Tj may have occurred.
The basic definition of snapshot isolation is that a transaction sees the data items
that it reads based on the committed values of the items in the database snapshot
(or database state) when the transaction starts.
● Snapshot isolation will ensure that the phantom record problem does not
occur, since the database transaction, or, in some cases, the database
statement, will only see the records that were committed in the database at
the time the transaction started.
● Any insertions, deletions, or updates that occur after the transaction starts will
not be seen by the transaction.
● In addition, snapshot isolation does not allow the problems of dirty read and
non-repeatable read to occur.
● In this scheme, read operations do not require read locks to be applied to the
items, thus reducing the overhead associated with two-phase locking.
● However, write operations do require write locks. Thus, for transactions that
have many reads, the performance is much better than 2PL.
When writes do occur, the system will have to keep track of older versions of the
updated items in a temporary version store, with the timestamps of when the
version was created.
● This is necessary so that a transaction that started before the item was written
can still read the value (version) of the item that was in the database snapshot
when the transaction started.
● To keep track of versions, items that have been updated will have pointers to
a list of recent versions of the item in the tempstore, so that the correct item
can be read for each transaction.
● Variations of this method have been used in several commercial and open-
source DBMSs, including Oracle and PostGRES.
All concurrency control techniques assume that the database is formed of a number
of named data items. A database item could be chosen to be one of the following:
■ A database record
■ A field value of a database record
■ A disk block
■ A whole file
■ The whole database
Granularity Level Considerations for Locking
The size of data items is often called the data item granularity. Fine granularity
refers to small item sizes, whereas coarse granularity refers to large item sizes.
Several tradeoffs must be considered in choosing the data item size.
● First, notice that the larger the data item size is, the lower the degree of
concurrency permitted. For example, if the data item size is a disk block, a
transaction T that needs to lock a single record B must lock the whole disk
block X that contains B because a lock is associated with the whole data item
(block).
● If the data item size was a single record instead of a disk block, transaction S
would be able to proceed, because it would be locking a different data item
(record).
● On the other hand, the smaller the data item size is, the more the number of
items in the database. Because every item is associated with a lock, the
system will have a larger number of active locks to be handled by the lock
manager.
● More lock and unlock operations will be performed, causing a higher overhead.
In addition, more storage space will be required for the lock table.
● For timestamps, storage is required for the read_TS and write_TS for each
data item, and there will be similar overhead for handling a large number of
items.
Since the best granularity size depends on the given transaction, it seems
appropriate that a database system should support multiple levels of granularity,
where the granularity level can be adjusted dynamically for various mixes of
transactions.
Figure 3.10 shows a simple granularity hierarchy with a database containing two
files, each file containing several disk pages, and each page containing several
records.
• This can be used to illustrate a multiple granularity level 2PL protocol, with
shared/exclusive locking modes, where a lock can be requested at any level.
Consider the following scenario, which refers to the example in Figure 3.10.
● Suppose transaction T1 wants to update all the records in file f1, and T1
requests and is granted an exclusive lock for f1. Then all of f1’s pages (p11
through p1n)—and the records contained on those pages—are locked in
exclusive mode.
● This is beneficial for T1 because setting a single file-level lock is more efficient
than setting n page level locks or having to lock each record individually.
● Now suppose another transaction T2 only wants to read record r1nj from page
p1n of file f1; then T2 would request a shared record-level lock on r1nj.
● However, the database system (that is, the transaction manager or, more
specifically, the lock manager) must verify the compatibility of the requested
lock with already held locks.
● One way to verify this is to traverse the tree from the leaf r1nj to p1n to f1 to
db. If at any time a conflicting lock is held on any of those items, then the lock
request for r1nj is denied and T2 is blocked and must wait. This traversal
would be fairly efficient.
● The idea behind intention locks is for a transaction to indicate, along the path
from the root to the desired node, what type of lock (shared or exclusive) it
will require from one of the node’s descendants.
1. Intention-shared (IS) indicates that one or more shared locks will be requested
on some descendant node(s).
The compatibility table of the three intention locks, and the actual shared and
exclusive locks, is shown in Figure 3.11.
The multiple granularity locking (MGL) protocol consists of the following rules:
● Rule 1 simply states that conflicting locks cannot be granted. Rules 2, 3, and 4
state the conditions when a transaction may lock a given node in any of the
lock modes.
● Rules 5 and 6 of the MGL protocol enforce 2PL rules to produce serializable
schedules.
Basically, the locking starts from the root and goes down the tree until the node that
needs to be locked is encountered, whereas unlocking starts from the locked node
and goes up the tree until the root itself is unlocked.
Figure 3.12 shows a possible serializable schedule for these three transactions.
Only the lock and unlock operations are shown.
● The notation <lock_type>(<item>) is used to display the locking operations in
the schedule. The multiple granularity level protocol is especially suited when
processing a mix of transactions that include (1) short transactions that access
only a few items (records or fields) and (2) long transactions that access
entire files.
● In this environment, less transaction blocking and less locking overhead are
incurred by such a protocol when compared to a single-level granularity
locking approach.
Deadlock
⮚ The below figure 3.13 shows a deadlock involving two transactions, but
deadlocks involving three, four or more transactions are also possible.
⮚ However, the deadlock never does involve more than two transactions in
practice.
The victim has failed and been rolled back through no fault of its own. Some
systems automatically restart such a transaction from beginning.
Deadlock Handling:
✔ Timeouts
✔ Deadlock prevention
✔ Deadlock detection and recovery
Timeouts:
A transaction that requests a lock will wait for only a system defined period of time.
If the lock has not been granted within this period, the lock request times out. In
this case, the DBMS assumes the transaction may be deadlocked, even though it
may not be, and it aborts and automatically restarts the transaction.
Deadlock prevention:
Scheme 1:
1. Requires each transaction locks all its data items before it begins
execution.
2. Either all are locked in one step or none are locked.
Disadvantage of Scheme 1:
1. It is often hard to predict, before the transaction begins, what data
items need to be locked.
2. Data item utilization may be very low, since many of the data items may
be locked but unused for a long time.
Scheme 2:
Impose an ordering of all data items and to require that a transaction lock data
items only in a sequence consistent with the ordering.
It is non-preemptive technique.
When a transaction Ti requests a data item currently held by Tj , Ti is allowed to
wait only if it has a timestamp smaller than that of tj (ie Ti is older thanTj).
Otherwise, Ti is rolled back.
Example:
⮚ Suppose that transaction T22, T23, T24 have timestamps 5, 10 and 15
respectively.
⮚ If T22 requests a data item held by t23, then T22 will wait.
⮚ If T24 requests a data item held by T23, then T24 will be rolled back.
Wound-wait:
⮚ It is a preemptive technique.
⮚ When transaction Ti requests a data item currently held by Tj, Ti is allowed to
wait only if it has time a time stamp larger than that of Tj (ie) (Ti is younger
than Tj). Otherwise, Tj is rolled back. (Tj is wounded by TI)
Example:
⮚ If T22 requests a data item held T23, then data item will be preempted from
T23 and T23 will be rolled back.
⮚ If T24 requests data item held by T23, then T23 will wait.
Table 3.1 shows the Difference between Wait -Die and Wound - Wait
Deadlock detection:
⮚ It can be described precisely in terms of a directed graph called a wait for
graph.
⮚ Graph consists of a pair G (V, E) where V is a set of vertices and E is set of
edges. Each element in set E of edges is an ordered pair Ti->Tj.
⮚ If Ti->Tj is in E, then there is a directed edge from transaction Ti to implying
that transaction Ti is waiting for transaction Tj to release a data item that it
needs.
⮚ When transaction Ti requests a data item currently being held by transaction
Tj, then edge Ti->Tj is inserted in the wait for graph.
⮚ This edge is removed only when transaction Tj is no longer holding a data item
needed by transaction Ti.
⮚ A deadlock exists in the system if and only if the wait-for graph contains a
cycle.
⮚ Each transaction involved in the cycle is said to be deadlocked.
If deadlock occur frequently, then the detection algorithm should be invoked more
frequently than usual.
2. Deadlock recovery:
When a detection algorithm determines that a deadlock exists, the system
must recover from deadlock.
Actions need to be taken:
• Select a victim:
● Given a set of deadlocked transaction determine which transaction to roll
back to break the deadlock.
● We should roll back the transactions that will incur the maximum cost.
Many factors determine the cost of rollback.
● Transaction execution time.
● No. of data items used by the transaction.
● More no. of data items needed by the transaction to complete.
● No. of transactions involved in rollback.
Starvation: If the transaction never completes its designated task, thus there is
Starvation.
12. Explain in detail about recovery and its types. (Nov/Dec 2019)
RECOVERY:
TRANSACTION RECOVERY:
Disk failure, physical problems and catastrophe: ex. Power failure, fire,
overwrite disk
For recovery purpose, the system needs to keep track of when the transaction
✔ starts,
✔ terminates and
✔ commits or aborts.
1. The transaction has committed only if it has entered the committed state.
2. The transaction has aborted only if it has entered the aborted state.
3. The transaction is said to have terminated if has either committed or
aborted.
✔ The system maintain log by keep track of all transactions that affect the
database.
✔ Log is kept on Disk.
✔ Effected only by disk or catastrophic failure
✔ Keep Log records
Log records:
T is transaction ID
✔ (Start_transaction, T) start transaction
✔ (Write_item, T, X, old_value, new_value) transaction write/update item x
✔ (Read_item, T, X) transaction read item X
✔ (Commit, T) complete operation of T
✔ (Abort, T) terminate transaction T
✔ Log file keep track
✔ System fail occur
✔ Restart system, system will recovery
■ Redo transaction that already commit
■ Undo no commit transaction
SYSTEM RECOVERY:
⮚ The system must be prepared to recover from local failures such as overflow
exception and also from global failure such as power outage.
⮚ A local failure affects only the transaction which the failure has occurred, but
global failure affects all the transactions in progress at the time of the failure.
⮚ The particular state of any transaction that was in progress at the time of the
failure is therefore no longer known;
o A transaction not successfully completed and so much is undone – i.e.,
rolled back – when the system restarts.
o A transaction successfully completed and so much is redo certain
transactions at restart time.
It should be clear that when the system is restarted. Transactions of types T3 and
T5 must be undone, and transactions of types T2 and T4 must be redone.
Transactions T1 do not enter into restart process at all because their updates were
forced to the database at time tc as part of the did point process.
At restart time, the system first goes through the following procedure
1. Start with two lists of transactions, the undo and Redo list
2. Set the undo list equal to the list of all transactions given in the most the
checkpoint and redo list to empty
3. If BEGIN TRANSACTION log record is found for transaction T,addT the
UNDO list.
4. If a COMMIT log record is found for transaction T, move T from the list to
the REDO.
5. When the end of the log is reached, the undo and redo list identify.
Transaction T3 and T5 must be undo. Transaction T2 and T4 are redo
MEDIA RECOVERY:
A media failure is a failure such as a disk head crash or a disk controller failure, in
which some portion of the database has been physically destroyed.
⮚ Recovery from such a failure basically involves reloading the database from a
backup copy and then using the log to redo all transactions that completed
since that backup copy was taken.
⮚ There is no need to undo transactions that were still in progress at the time of
the failure, since by definition all updates of such transactions have been
undone anyway.
• Provisions must be made for undoing the effect of update operations that have
been applied to the database by a failed transaction. This is accomplished by
rolling back the transaction and undoing the effect of the transaction’s
write_item operations is shown in fig 3.18.
• Therefore, the UNDO-type log entries, which include the old value of the
item, must be stored in the log. Because UNDO can be needed during
recovery, these methods follow a steal strategy for deciding when updated
main memory buffers can be written back to disk.
1. If the recovery technique ensures that all updates of a transaction are recorded in
the database on disk before the transaction commits, there is never a need to REDO
any operations of committed transactions. This is called the UNDO/NO-REDO
recovery algorithm.
In this method, all updates by a transaction must be recorded on disk before the
transaction commits, so that REDO is never needed. Hence, this method must utilize
the steal/force strategy for deciding when updated main memory buffers are
written back to disk.
2. If the transaction is allowed to commit before all its changes are written to the
database, we have the most general case, known as the UNDO/REDO recovery
algorithm. In this case, the steal/no-force strategy is applied.
This is also the most complex technique, but the most commonly used in practice.
● Assume that the log includes checkpoints and that the concurrency control
protocol produces strict schedules—as, for example, the strict two-phase
locking protocol does.
● That a strict schedule does not allow a transaction to read or write an item
unless the transaction that wrote the item has committed. However, deadlocks
can occur in strict two-phase locking, thus requiring abort and UNDO of
transactions.
Shadow paging considers the database to be made up of a number of fixed size disk
pages (or disk blocks)—say, n—for recovery purposes. A directory with n entries is
constructed, where the ith entry points to the ith database page on disk.
● The directory is kept in main memory if it is not too large, and all references—
reads or writes—to database pages on disk go through it.
● When a transaction begins executing, the current directory—whose entries
point to the most recent or current database pages on disk—is copied into a
shadow directory.
● The shadow directory is then saved on disk while the current directory is used
by the transaction. During transaction execution, the shadow directory is
never modified.
● When a write_item operation is performed, a new copy of the modified
database page is created, but the old copy of that page is not overwritten.
Instead, the new page is written
elsewhere on some previously unused disk block.
● The current directory entry is modified to point to the new disk block, whereas
the shadow directory is not modified and continues to point to the old
unmodified disk block.
Fig 3.19 illustrates the concepts of shadow and current directories. For pages
updated by the transaction, two versions are kept. The old version is referenced by
the shadow directory and the new version by the current directory.
The modified database pages are discarded from the current directory. The state of
the database before transaction execution is available through the shadow directory,
and that state is recovered by reinstating the shadow directory.
● The database thus is returned to its state prior to the transaction that was
executing when the crash occurred, and any modified pages are discarded.
Committing a transaction corresponds to discarding the previous shadow
directory.
● Since recovery involves neither undoing nor redoing data items, this technique
can be categorized as a NO-UNDO/NO-REDO technique for recovery.
● This makes it difficult to keep related database pages close together on disk
without complex storage management strategies. Furthermore, if the directory
is large, the overhead of writing shadow directories to disk as transactions
commit is significant. A further complication is how to handle garbage
collection when a transaction commits.
● The old pages referenced by the shadow directory that have been updated
must be released and added to a list of free pages for future use.
● These pages are no longer needed after the transaction commits. Another
issue is that the operation to migrate between current and shadow directories
must be implemented as an atomic operation.
1. The write-ahead logging, repeating history during redo, and logging changes
during undo.
2. The second concept, repeating history, means that ARIES will retrace all actions
of the database system prior to the crash to reconstruct the database state when
the crash occurred. Transactions that were uncommitted at the time of the crash
(active transactions) are undone.
3.The third concept, logging during undo, will prevent ARIES from repeating the
completed undo operations if a failure occurs during recovery, which causes a
restart of the recovery process.
The ARIES recovery procedure consists of three main steps:
● Analysis
● REDO and
● UNDO.
The Analysis step identifies the dirty (updated) pages in the buffer and the set of
transactions active at the time of the crash. The appropriate point in the log where
the REDO operation should start is also determined.
The REDO phase actually reapplies updates from the log to the database.
Generally, the REDO operation is applied only to committed transactions. However,
this is not the case in ARIES.
The actual buffers may be lost during a crash, since they are in main memory.
Additional tables stored in the log during checkpointing (Dirty Page Table,
Transaction Table) allow ARIES to identify this information.
Finally, during the UNDO phase, the log is scanned backward and the operations of
transactions that were active at the time of the crash are undone in reverse order.
The information needed for ARIES to accomplish its recovery procedure includes the
log, the Transaction Table, and the Dirty Page Table. Additionally, checkpointing is
used. These tables are maintained by the transaction manager and written to the log
during checkpointing.
● In ARIES, every log record has an associated log sequence number (LSN)
that is monotonically increasing and indicates the address of the log record on
disk. Each LSN corresponds to a specific change (action) of some transaction.
In addition to the log, two tables are needed for efficient recovery:
The Transaction Table and the Dirty Page Table, which are maintained by the
transaction manager.
● The Transaction Table contains an entry for each active transaction, with
information such as the transaction ID, transaction status, and the LSN of the
most recent log record for the transaction.
● The Dirty Page Table contains an entry for each dirty page in the DBMS
cache, which includes the page ID and the LSN corresponding to the earliest
update to that page.
Checkpointing in ARIES consists of the following:
● This special file is accessed during recovery to locate the last checkpoint
information.
● With the end_checkpoint record, the contents of both the Transaction Table
and Dirty Page Table are appended to the end of the log.
Consider the recovery example shown in Figure 3.20. There are three
transactions:
● T1, T2, and T3. T1 updates page C, T2 updates pages B and C, and T3
updates page A. Figure 3.20(a) shows the partial contents of the log, and
Figure 3.20(b) shows the contents of the Transaction Table and Dirty Page
Table.
● Now, suppose that a crash occurs at this point. Since a checkpoint has
occurred, the address of the associated begin_checkpoint record is retrieved,
which is location 4. The analysis phase starts from location 4 until it reaches
the end.
● The end_checkpoint record contains the Transaction Table and Dirty Page
Table in Figure 3.20(b), and the analysis phase will further reconstruct these
tables. When the analysis phase encounters log record 6,a new entry for
transaction T3 is made in the Transaction Table and a new entry for page A is
made in the Dirty Page Table.
● Figure 3.20(c) shows the two tables after the analysis phase.
The characteristics are the access mode, the diagnostic area size, and the
isolation level.
• The access mode can be specified as READ ONLY or READ WRITE. The
default is READ WRITE, unless the isolation level of READ UNCOMMITTED is
specified in which case READ ONLY is assumed. A mode of READ WRITE
allows select, update, insert, delete, and create commands to be executed. A
mode of READ ONLY, as the name implies, is simply for data retrieval.
• The isolation level option is specified using the statement ISOLATION LEVEL
<isolation>, where the value for <isolation> can be READ UNCOMMITTED,
READ COMMITTED, REPEATABLE READ, or SERIALIZABLE.15 The default
isolation level is SERIALIZABLE, although some systems use READ
COMMITTED as their default. The use of the term SERIALIZABLE here is based
on not allowing violations that cause dirty read, unrepeatable read, and
phantoms.
IN SERIALIZABLE, then one or more of the following three violations may occur:
1. Dirty read. A transaction T1 may read the update of a transaction T2, which has
not yet committed. If T2 fails and is aborted, then T1 would have read a value that
does not exist and is incorrect.
2. Nonrepeatable read. A transaction T1 may read a given value from a table. If
another transaction T2 later updates that value and T1 reads that value again, T1
will see a different value.
3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps based
on some condition specified in the SQL WHERE-clause. Now suppose that a
transaction T2 inserts a new row r that also satisfies the WHERE-clause condition
used in T1, into the table used by T1.
• The record r is called a phantom record because it was not there when T1
starts but is there when T1 ends. T1 may or may not see the phantom, a row
that previously did not exist. If the equivalent serial order is T1 followed by
T2, then the record r should not be seen;
• But if it is T2 followed by T1, then the phantom record should be in the result
given to T1. If the system cannot ensure the correct behavior, then it does not
deal with the phantom record problem.
Fig 3.21 summarizes the possible violations for the different isolation levels. An
entry of Yes indicates that a violation is possible and an entry of No indicates that it
is not possible. READ UNCOMMITTED is the most forgiving, and SERIALIZABLE is the
most restrictive in that it avoids all three of the problems mentioned above.
A sample SQL transaction might look like the following:
EXEC SQL WHENEVER SQLERROR GOTO UNDO;
EXEC SQL SET TRANSACTION
READ WRITE
DIAGNOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE;
EXEC SQL INSERT INTO EMPLOYEE (Fname, Lname, Ssn, Dno, Salary)
VALUES ('Robert', 'Smith', '991004321', 2, 35000);
EXEC SQL UPDATE EMPLOYEE
SET Salary = Salary * 1.1 WHERE Dno = 2;
EXEC SQL COMMIT;
GOTO THE_END;
UNDO: EXEC SQL ROLLBACK;
THE_END: ...;
The above transaction consists of first inserting a new row in the EMPLOYEE table
and then updating the salary of all employees who work in department 2. If an error
occurs on any of the SQL statements, the entire transaction is rolled back. This
implies that any updated salary (by this transaction) would be restored to its
previous value and that the newly inserted row would be removed.
APR/MAY-23
PART-A
1. Define serializability. Q46 pg.8
2. Name the four conditions for deadlock. Q27 pg.5
PART-B
1. What is a transaction? list and explain ACID properties with an example.
Q1,2 pg.9
2. Outline the two-phase locking protocol with an example. Q7 pg.28
3. What is recovery? Outline the steps in the algorithm for recovery and isolation
Exploiting semantics (ARIES) algorithm with an example Q15 pg.55
NOV/DEC-23
PART-A
1. State ACID Properties. Q1 pg.1
PART-B
1. Explain the concepts of serial, non-serial and conflict-serialızable schedules with
examples. Q4 pg.14
APR/MAY 2024
PART-A
1. List the properties of transactions. Q5 pg.1
2. How will you handle deadlock during two transactions in database? Q47 pg.7
PART-B
1. Demonstrate conflict serializability and view serializability. Q4 pg.14
NOV/DEC 2024
PART-A
1. State the need for concurrency control. Q11 pg.2
PART-B
1. Write briefly about the states and desirable properties on transactions in a
database. Q2 pg.10
The time for repositioning the arm is called the seek time and it
increases with the distance that the arm is called the seek time.
The average seek time is the average of the seek times, measured over
a sequence of random requests.
The average latency time of the disk is one-half the time for a full
rotation of the disk.
The data-transfer rate is the rate at which data can be retrieved from
or stored to the disk.
The mean time to failure is the amount of time that the system could
run continuously without failure.
File systems that support log disks are called journaling file systems.
• If one of the disks fails the data can be read from the other. Data will
be lost if the second disk fails before the first failed disk is repaired.
The mean time to failure is the time it takes to replace a failed disk
and to restore the data on it.
Data striping consists of splitting the bits of each byte across multiple
disks. This is called bit-level striping.
Block level striping stripes blocks across multiple disks. It treats the
array of disks as a large disk, and gives blocks logical numbers.
25. What are the factors to be taken into account when choosing a
RAID level?
• Monetary cost of extra disk storage requirements.
• Performance requirements in terms of number of I/O operations
• Performance when a disk has failed.
• Performances during rebuild.
RAID level 1 is the RAID level of choice for many applications with
moderate storage requirements and high I/O requirements. RAID 1
follows mirroring and provides best write performance.
30. What are the ways in which the variable-length records arise in
database Systems?
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields.
• Record types that allow repeating fields.
33. What are the two types of blocks in the fixed –length
representation?
Define them.
In the heap file organization, any record can be placed anywhere in the
file where there is space for the record. There is no ordering of records.
There is a single file for relation.
39. What are the two types of ordered indices? DEC 2009
• Primary index
• Secondary index
• Clustered index
41. What are the techniques to be evaluated for both ordered indexing
and hashing? (Nov/Dec-20)
• Access types
• Access time
• Insertion time
• Deletion time
• Space overhead.
The files that are ordered sequentially with a primary index on the
search key are called index-sequential files.
A B+ Tree index takes the form of a balanced tree in which every path
from the root of the root of the root of the tree to a leaf of the tree is of
the same length.
Static hashing
Static hashing uses a hash function in which the set of bucket adders is
fixed. Such hash functions cannot easily accommodate databases that
grow larger over time.
Dynamic hashing
A hash index organizes the search keys, with their associated pointers,
into a hash file structure.
The set of buckets is fixed, and there are no overflow chains. Instead, if a
bucket is full, the system inserts records in some other bucket in the initial
set of buckets.
The query execution engine takes a query evaluation plan, executes that
plan, and returns the answers to the query.
• Selection operation
• Join operations.
• Sorting.
• Projection
• Set operations
• Aggregation.
Block nested loop join is the variant of the nested loop join where every
block of the inner relation is paired with ever y block of the outer
relation. Within each pair of blocks ever y tuple in one block is paired with
every tuple in the other blocks to generate all pairs of tuples.
The system repeats the splitting of the input until each partition of the
build input fits in the memory. Such partitioning is called recursive
partitioning.
Secondary index:
• An index whose search key specifies an order different from the
sequential order of the file. Also called non-clustering index.
Replication:
The system maintains several identical copies of the relation and stores
each replica at a different site.
77. What are structured data types? What are collection types, in
particular? DEC-2008
B+ tree considers all the keys in nodes except the leaves as dummies. All
keys are duplicated in the leaves. This has the advantage that is all the
leaves are linked together sequentially; the entire tree may be scanned
without visiting the higher nodes at all.
B+ tree considers all the keys in nodes except the leaves as dummies. All
keys are duplicated in the leaves. This has the advantage that is all the
leaves are linked together sequentially; the entire tree may be scanned
without visiting the higher nodes at all.
82. How does a B tree differ from B+ tree? Why is a B+ tree usually
preferred as an access structure to a data file? (NOV/DEC-2024)
✓ A B tree of order m (maximum number of children for each node) is a
tree which satisfies:
✓ Every node has at most m children.
✓ Every node (except root) has at least m⁄2 children.
✓ The root has at least two children if it is not a leaf node.
✓ All leaves appear in the same level, and carry information.
✓ non-leaf node with k children contains k−1 keys.
✓ In a B+ tree, in contrast to a B-tree, all records are stored at the leaf
level of the tree, only keys are stored in interior nodes.
✓ Advantage of B+ trees over B trees is they allow you to in pack more
pointers to other nodes by removing pointers to data, thus increasing
the fanout and thus decreasing the depth of the tree.
✓ In B tree we have to traverse the whole tree structure to get result
while it is easier to search in B+ tree as the leaves form a linked list.
✓ Since the internal nodes in B+ tree contain only pointers(index) to the
keyvalues it very useful in implementing Indexed Sequential files.
✓ We can traverse easily in B+ tree using the indices to reach the
required key.
PART-B
1. Briefly explain RAID and RAID levels.
• RAID stands for Redundant array of independent disks.
• It is a way of storing the same data in different places on
multiple hard disks to protect data in the case of a drive failure.
• RAID storage uses multiple disks in order to provide fault
tolerance, to improve overall performance, and to increase
storage capacity in a system.
• RAID allows you to store the same data redundantly (in
multiple paces) in a balanced way to improve overall
performance.
Synopsis:
1. RAID 0 - Striped disk array without fault tolerance
2. RAID 1: (Mirroring and duplexing)
3. RAID 2: (Error-correcting coding)
4. RAID 3: (Bit-interleaved parity)
5. RAID 4: (Dedicated parity drive)
6. RAID 5: (Block interleaved distributed parity)
7. RAID 6: (Independent data disks with double parity)
• record and then sort the file, and lastly, the updated record
is placed in the right place.
Advantage:
• Fast and efficient when dealing with large volume of data.
• In the file, every record has a unique id, and every page in a
file is of the same size as in below Fig 4.12.
• Good for group or batch transactions.
• Simple to implement
• Good for report generation, statistical computation and
inventory control.
Disadvantages:
• All new transaction should be sorted for a sequential file
processing
• Not good for interactive transactions
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE
CS3492-DBMS Unit – 4 Mailam Engineering College
• In this method, records are stored in the file using the primary key.
An index value is generated for each primary key and mapped with the record.
This index contains the address of the record in the file Fig 4.16.
2. Hash Clusters:
• It is similar to the indexed cluster. In hash cluster, instead of storing the
records based on the cluster key, we generate the value of the hash key
for the cluster key and store the records with the same hash key value.
Index structure:
The first column of the database is the search key that contains a
copy of the primary key or candidate key of the table.
Indexing Methods:
From the Fig 4.18, The indices are usually sorted to make
searching faster. The indices which are sorted are known as
ordered indices.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 28
CS3492-DBMS Unit – 4 Mailam Engineering College
Primary Index
➢ If the index is created on the basis of the primary key of the
table, then it is known as primary indexing.
➢ These primary keys are unique to each record and contain 1:1
relation between the records.
➢ As primary keys are stored in sorted order, the performance
of the searching operation is quite efficient.
➢ The primary index can be classified into two types:
i. Dense index
ii. Sparse index.
i. Dense index
o The dense index contains an index record for every search key
value in the data file. It makes searching faster.
o The number of records in the index table is same as the
number of records in the main table.
o It needs more space to store index record itself.
o The index records have the search key and a pointer to the
actual record on the disk as in Table 4.5.
Sparse index
o In the data file, index record appears only for a few items. Each item
points to a block as in Table 4.6.
If you want to find the record of roll 111 in the diagram, then it will
search the highest entry which is smaller than or equal to 111 in the
first level index. It will get 100 at this level.
Then in the second index level, again it does max (111) <= 111
and gets 110. Now using the address 110, it goes to the data
block and starts searching each record till it gets 111 as in Table
4.7.
This is how a search is performed in this method. Inserting,
updating, or deleting is also done in the same manner.
Types of Hashing:
Operations:
Insertion – When a new record is inserted into the table, The hash
function h generates a bucket address for the new record
based on it hash key K. Bucket address = h(K)
➢ Searching – When a record needs to be searched, the
same hash function is used to retrieve the bucket address for the
record.
➢ Updation – The data record that needs to be updated is
first searched using hash function, and then the data record is
updated.
➢ Deletion – If we want to delete a record, using the hash
function we will first fetch the record which is supposed to be
deleted.
i. Open Hashing
ii. Close Hashing
i. Open Hashing
When a hash function generates an address at which data is
already stored, then the next bucket will be allocated to it. This
mechanism is called as Linear Probing.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 34
CS3492-DBMS Unit – 4 Mailam Engineering College
2. Dynamic Hashing.
For example:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5
and 6 are 01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will
go into bucket B2. The last two bits of 7 are 11, so it will go into B3 as in Fig 4.26
Insert key 9 with hash address 10001 into the above structure:
o Since key 9 has hash address 10001, it must go into the first
bucket. But bucket B1 is full, so it will get split as in fig 4.27.
Dynamic hashing
o In this method, the performance does not decrease as the data
grows in the system. It simply increases the size of memory to
accommodate the data.
o In this method, memory is well utilized as it grows and shrinks
with the data. There will not be any unused memory lying.
o This method is good for the dynamic database where data
grows and shrinks frequently.
6. Briefly explain about B+ tree index and B Tree file with example.
(Apr/May-23) (Apr/May-24)
• B+ Tree is an advanced method of Indexed Sequential Access
Method (ISAM) file organization.
• It uses the same concept of key-index, but in a treelike structure.
• B+ tree is similar to binary search tree, but it can have more than two
leaf nodes.
• It stores all the records only at the leaf node.
• Intermediary nodes will have pointers to the leaf nodes.
• They do not contain any data/records.
The main goal of B+ tree is:
• Sorted Intermediary and leaf nodes
• Fast traversal and Quick Search
• No overflow pages
Searching a record in B+ Tree
• Suppose we want to search 65 in the below Fig 4.28(a) B+ tree structure.
• First, we will fetch for the intermediary node which will direct
to the leaf node that can contain record for 65.
• So, we find branch between 50 and 75 nodes in the intermediary node.
• Then we will be redirected to the third leaf node at the end.
• Here DBMS will perform sequential search to find 65.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 39
CS3492-DBMS Unit – 4 Mailam Engineering College
Insertion in B+ tree:
• Suppose we have to insert a record 60 in the following example.
Delete in B+ tree:
• Suppose we have to delete 60 from the above example.
• What will happen in this case?
• We have to remove 60 from 4th leaf node as well as from the
intermediary node too.
• So we need to modify it have a balanced tree.
• After deleting 60 from above B+ tree and re-arranging nodes, it will
appear as below Fig 4.30.
• B tree index file is similar to B+ tree index files, but it uses binary search
concepts.
• In this method, each root will branch to only two nodes and
each intermediary node will also have the data.
• And leaf node will have lowest level of data.
• However, in this method also, records will be sorted.
• Since all intermediary nodes also have records, it reduces the
traversing till leaf node for the data.
• A simple B tree can be represented as below Fig 4.31
See the difference between this tree structure and B+ tree for the same
example above. Here there is no pointers leaf node.
• All the records are stored in all the nodes.
• If we need to insert any record, it will be done as B+ tree
index files, but it will make sure that each node will branch only
to two nodes.
• If there is not enough space in any of the node, it will
split the node and store the records as in Fig 4.32.
Example:
Select student name whose CGPA is greater than 9 in the following Student
table in Table 4.10
The major steps involved in query processing are depicted in the figure 4.33
below
SQL QUERY:
Parsing:
• In this step, the parser of the query processor module checks
the syntax of the query, the user’s privileges to execute the
query, the table names and attribute names, etc.
CS3492-DBMS Unit – 4 Mailam Engineering College
Scan each file block and test all records to see whether they
satisfy the selection condition
Cost = br block transfers + 1 seek
Where, br denotes number of blocks containing records from relation
r If selection is on a key attribute, can stop on finding record
Cost = (br/2) block transfers + 1 seek
• This algorithm requires no indices and can be used with any kind of join
condition.
• It is expensive since it examines every pair of tuples in the two relations.
Merge Join
• In this operation, both the relations are sorted on their join attributes.
Then merged these sorted relations to join them.
• Only equijoin and natural joins are used.
• The cost of merge join is, br + bs block transfers + [br [bbl + [bs/bbl
seeks + the cost of sorting, if relations are unsorted.
Hash Join
• In this operation, the hash function h is used to partition tuples of both
the relations.
• h maps A values to {0, 1, ..., n}, where A denotes the attributes of r
and s used in the join. Cost of hash join is:
• 3(br+ bs) +4 n block transfers +2([br/bb]+bs/bb]) seeks
• If the entire build input can be kept in main memory no partitioning is
required Cost estimate goes down to br + bs.
APR-MAY 2023
Part-A
Part-B
NOV-DEC 2023
Part A
1. Define B+ trees. Q4, Pg 3
2. Write the problems of executing two concurrent transactions.
Part B
1. Construct a B+ - tree for the following set of key values: Q6, Pg 47
(2,3,5,7,11, 17, 19, 23, 29, 31)
2. Assume that the tree is initially empty and values are added in ascending
order, Construct B+ - trees for the cases where the number of pointers
that will fit in one node is as follows
(1) Four (2) Six (3) Eight
3. Show the form of the tree after each of the following series of operations
(6)
(1) Insert 9 (2) Insert 10 (3) Delete 23 (4) Delete 19
APR-MAY 2024
Part A
1. What is hash based indexing? Q86, Pg 14
2. List three components of Query processor. Q59, Pg 9
Part B
1. Explain B+ trees. Discuss about this Dynamic Index Structure. Q6, Pg 39
2. Compare I/O costs for all File Organizations. Q2, Pg 18
Part C
1. Consider a B+-tree in which the maximum number of keys in a node is 5.
Calculate the minimum number of keys in any non-root node. Q6, Pg 39
answer
Consider a B+-tree in which the maximum number of keys in a node is 5.
Calculate the minimum number of keys in any non-root node.
here Given Order (Maximum No of Keys in a Node) is P=5 So No of Pointers
=6
There are 2 cases: -
NOV-DEC 2024
Part-A
1. What are the advantages of B+ tree indexing? Q82, Pg 13
2. Mention the use of query optimization. Q61, Pg 9
Part B
PART – A
1. What is DDB? (APR/ MAY 2024)
Distributed database (DDB) as a collection of multiple logically
interrelated databases distributed over a computer network, and a
distributed data base management system (DDBMS) as a software
system that manages a distributed database while making the
distribution transparent to the user.
23. Which security issue access control based on granting and revoking
privileges?
o In a discretionary access control system, each resource is assigned a
unique owner account, which is responsible for managing access to that
resource.
o The owner has the ability to grant or revoke access privileges, such as
30. What are the challenges in using public key encryption? (APR/MAY 2024)
The challenges associated with public key cryptography are:
It has been susceptible to attacks through spoofed or compromised
certification authorities.
Public key Encryption is vulnerable to Brute-force attack.
This algorithm also fails when the user lost his private key, then the
DAC MAC
DAC stands for Discretionary Access MAC stands for Mandatory Access
Control. Control.
DAC is easier to implement. MAC is difficult to implement.
DAC is less secure to use. MAC is more secure to use.
DAC has complete trust in users. MAC has trust only in administrators.
Information flow is impossible to Information flow can be easily
control. controlled.
DAC is supported by commercial MAC is not supported by commercial
DBMSs. DBMSs.
PART-B
1. Explain the concept of distributed database Management
system. Discuss in detail about the Distributed databases.
(APR-2019) (APR/MAY 2024)
A distributed database is a collection of multiple interconnected
databases, which are spread physically across various locations that
communicate via a computer network.
Features
• Databases in the collection are logically interrelated with each
other. Often, they represent a single logical database.
• Data is physically stored across multiple sites. Data in each site
can be managed by a DBMS independent of the other sites.
• The processors in the sites are connected via a network. They
do not have any multiprocessor configuration.
• A distributed database is not a loosely connected file system.
• A distributed database incorporates transaction processing, but it
is not synonymous with a transaction processing system.
Architectural Models
Some of the common architectural models are −
❖ Client - Server Architecture for DDBMS
❖ Peer - to - Peer Architecture for DDBMS
❖ Multi - DBMS Architecture
Disadvantages of no replication
• Poor availability of data.
• Slows down the query execution process, as multiple clients
are accessing the same server.
3. Partial replication
Partial replication means only some fragments are replicated from
the database. – as in fig 5.6
2. Vertical Fragmentation
Vertical fragmentation divides a relation(table) vertically into
groups of columns to create subsets of tables.
For example, let us consider that a University database keeps
records of all registered students in a Student table having the
following schema.
STUDENT
Regd_No Name Course Address Semester Fees Marks
Now, the fees details are maintained in the accounts section. In
this case, the designer will fragment the database as follows −
CREATE TABLE STD_FEES AS SELECT Regd_No, Fees FROM
STUDENT;
3. Hybrid Fragmentation
Hybrid fragmentation can be achieved by performing horizontal and
vertical partition together. Mixed fragmentation is group of rows and
columns in relation.
Advantages of Fragmentation
• Since data is stored close to the site of usage, efficiency of
the database system is increased.
• Local query optimization techniques are sufficient for most
queries since data is locally available.
• Since irrelevant data is not available at the sites, security and
privacy of the database system can be maintained.
Disadvantages of Fragmentation
• When data from different fragments are required, the access
speeds may be very high.
• In case of recursive fragmentations, the job of
reconstruction will need expensive techniques.
• Lack of back-up copies of data in different sites
may render the database ineffective in case of failure of a
site.
Types of Schedules
There are two types of schedules −
Step-1:
Parser:
During parse call, the database performs the following checks- Syntax check,
Semantic check and Shared pool check, after converting the query into relational
algebra.
1. Syntax check – concludes SQL syntactic validity.
Example: SELECT * FORM employee
Here error of wrong spelling of FROM is given by this check.
2. Semantic check – determines whether the statement is
meaningful or not. Example: query contains a table name which
does not exist is checked by this check.
3. Shared Pool check – Every query possesses a hash code during its
execution. So, this check determines existence of written hash code in shared
pool if code exists in shared pool, then database will not take additional steps
for optimization and execution.
Hard Parse and Soft Parse:
If there is a fresh query and its hash code does not exist in shared pool then
that query has to pass through from the additional steps known as hard
parsing otherwise if hash code exists then query does not passes through
additional steps.
It just passes directly to execution engine (refer detailed diagram).
This is known as soft parsing.
Hard Parse includes following steps – Optimizer and Row source generation.
Step-2:
Optimizer:
During optimization stage, database must perform a hard parse at least for
one unique DML statement and perform optimization during this parse.
This database never optimizes DDL unless it includes a DML component
such as sub-query that require optimization.
Step-3:
Execution Engine:
Finally runs the query and display the required result.
Suppose a user executes a query. As we have learned that there are various
methods of extracting the data from the database.
In SQL, a user wants to fetch the records of the employees whose salary is
greater than or equal to 10000.
For doing this, the following query is undertaken:
SELECT EMP_NAME FROM EMPLOYEE WHERE
SALARY>10000;
Thus, to make the system understand the user query, it needs to be translated
in the form of relational algebra. We can bring this query in the relational algebra
form as:
o σsalary>10000 (πEmp_Name(Employee))
o πEmp_Name(σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra
operation by using different algorithms. So, in this way, a query processing begins
its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to
annotate the translated relational algebra expression with the instructions
used for specifying and evaluating each operation. Thus, after translating the
user query, the system executes a query evaluation plan.
Optimization
The cost of the query evaluation can vary for different types of queries.
Although the system is responsible for constructing the evaluation plan, the
user does need not to write their query efficiently.
Usually, a database system generates an efficient query evaluation plan,
which minimizes its cost. This type of task performed by the database system
and is known as Query Optimization.
For optimizing a query, the query optimizer should have an estimated cost
analysis of each operation. It is because the overall operation cost depends on
the memory allocations to several operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and
produces the output of the query.
Example:
SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > (SELECT MAX
(SALARY) FROM EMPLOYEE WHERE DNO=5);
It provides a mechanism for storage and retrieval of data other than tabular
relations model used in relational databases.
NoSQL database doesn't use tables for storing data. It is generally used to
store big data and real- time web applications.
➢ Document-Based Database:
• The document-based database is a nonrelational database. Instead of storing
the data in rows and columns (tables), it uses the documents to store the
data in the database. A document database stores data in JSON, BSON, or
XML documents.
• Documents can be stored and retrieved in a form that is much closer to the
data objects used in applications which means less translation is required to
use these data in the applications. In the Document database, the particular
elements can be accessed by using the index value that is assigned for faster
querying.
• Collections are the group of documents that store documents that have
similar contents. Not all the documents are in any collection as they require a
similar schema because document databases have a flexible schema.
• Document-oriented NoSQL database solutions include MongoDB,
CouchDB, Riak, Amazon SimpleDB, and Lotus Notes.
➢ Key-Value Stores:
• A key-value store is a nonrelational database. The simplest form of a NoSQL
database is a key- value store. Every data element in the database is stored
in key-value pairs.
• The data can be retrieved by using a unique key allotted to each element
in the database. The values can be simple data types like strings and
numbers or complex objects.
• A key-value store is like a relational database with only two columns which
is the key and the value.
Example: Key-value NoSQL solutions include Dynamo, Redis, Riak, Tokyo
Cabinet/Tyrant, Voldemort, Amazon SimpleDB, and Oracle BDB.
➢ Graph-Based databases:
• Graph-based databases focus on the relationship between the elements.
• It stores the data in the form of nodes in the database. The connections
between the nodes are called links or relationships.
• Graph-based NoSQL database solutions include Neo4J, Infinite Graph, and
FlockDB.
Availability:
Availability means that each read or write request for a data item will either
be processed successfully or will receive a message that the operation cannot
be completed.
Every non-failing node returns a response for all the read and write
requests in a reasonable amount of time.
The key word here is “every”. In simple terms, every node (on either side of
a network partition) must be able to respond in a reasonable amount of time.
Partition Tolerance:
Partition tolerance means that the system can continue operating even if the
network connecting the nodes has a fault that results in two or more
partitions, where the nodes in each partition can only communicate among
each other.
That means, the system continues to function and upholds its consistency
guarantees in spite of network partitions.
Network partitions are a fact of life. Distributed systems guaranteeing
partition tolerance can gracefully recover from partitions once the partition
heals.
The CAP theorem states that distributed databases can have at most two of
the three properties: consistency, availability, and partition tolerance.
As a result, database systems prioritize only two properties at a time.
Features:
Document Type Model: As we all know data is stored in documents rather
than tables or graphs, so it becomes easy to map things in many
programming languages.
Flexible Schema: Overall schema is very much flexible to support this
statement one must know that not all documents in a collection need to have
the same fields.
Distributed and Resilient: Document data models are very much
dispersed which is the reason behind horizontal scaling and distribution of
data.
Manageable Query Language: These data models are the ones in which
query language allows the developers to perform CRUD (Create Read Update
Destroy) operations on the data model.
Advantages:
Schema-less: These are very good in retaining existing data at massive
volumes because there are absolutely no restrictions in the format and the
structure of data storage.
Faster creation of document and maintenance: It is very simple to
create a document and apart from this maintenance requires is almost
nothing.
Open formats: It has a very simple build process that uses XML, JSON, and
its other forms.
Built-in versioning: It has built-in versioning which means as the documents
grow in size there might be a chance they can grow in complexity. Versioning
decreases conflicts.
Disadvantages:
Weak Atomicity: It lacks in supporting multi-document ACID transactions. A
change in the document data model involving two collections will require us to
run two separate queries i.e. one for each collection. This is where it breaks
atomicity requirements.
Consistency Check Limitations: One can search the collections and
documents that are not connected to an author collection but doing this might
create a problem in the performance of database performance.
Security: Nowadays many web applications lack security which in turn results
in the leakage of sensitive data. So it becomes a point of concern, one must
pay attention to web app vulnerabilities.
Features:
One of the most un-complex kinds of NoSQL data models.
For storing, getting, and removing data, key-value databases utilize simple
functions.
Querying language is not present in key-value databases.
Built-in redundancy makes this database more reliable.
Advantages:
It is very easy to use. Due to the simplicity of the database, data can accept
any kind, or even different kinds when required.
Its response time is fast due to its simplicity, given that the remaining
environment near it is very much constructed and improved.
Key-value store databases are scalable vertically as well as horizontally.
Built-in redundancy makes this database more reliable.
Disadvantages:
As querying language is not present in key-value databases, transportation
of queries from one database to a different database cannot be done.
The key-value store database is not refined. You cannot query the database
without a key.
Row-Oriented Table:
04.
Aditi B-Tech E & TC 8
Table 5.1 Row-Oriented
Column – Oriented Table:
The picture given below represents Nodes with properties from relationships
represented by edges.
– as in fig 5.15
14. Describe about Database Security with common threats and counter
measures in detail.
Security of databases refers to the array of controls, tools, and procedures
designed to ensure and safeguard confidentiality, integrity, and accessibility.
It is a technique for protecting and securing a database from intentional or
accidental threats.
Database security encompasses hardware parts, software parts, human
resources, and data.
To use the security efficiently, appropriate controls are required, which
are separated into a specific goal and purpose for the system.
Insider dangers are among the most frequent sources of security breaches to
databases. They often occur as a consequence of the inability of employees to have
access to privileged user credentials.
Human Error
The unintentional mistakes, weak passwords or sharing passwords, and other
negligent or uninformed behaviors of users remain the root causes of almost half
(49 percent) of all data security breaches.
Software Vulnerabilities
Hackers earn their money by identifying and exploiting vulnerabilities in software
such as databases management software. The major database software companies
and open-source databases management platforms release regular security patches
to fix these weaknesses. However, failing to implement the patches on time could
increase the risk of being hacked.
SQL/NoSQL Injection Attacks
A specific threat to databases is the infusing of untrue SQL as well as other non-SQL
string attacks in queries for databases delivered by web-based apps and HTTP
headers. Companies that do not follow the safe coding practices for web
applications and conduct regular vulnerability tests are susceptible to attacks using
these.
Buffer Overflow
Buffer overflow happens when a program seeks to copy more data into the memory
block with a certain length than it can accommodate. The attackers may make use
of the extra data, which is stored in adjacent memory addresses, to establish a basis
for they can begin attacks.
DDoS (DoS/DDoS) Attacks
In a denial-of-service (DoS) attack in which the attacker overwhelms the targeted
server -- in this case, the database server with such a large volume of requests that
the server is unable to meet no longer legitimate requests made by actual users. In
most cases, the server is unstable or even fails to function.
Malware
Malware is software designed to exploit vulnerabilities or cause harm to databases.
Malware can be accessed via any device that connects to the databases network.
Attacks on Backups
Companies that do not protect backup data using the same rigorous controls
employed to protect databases themselves are at risk of cyber attacks on backups.
Control Measures for the Security of Data in Databases:
Authentication
Authentication is the process of confirming whether a user logs in only with
the rights granted to him to undertake database operations. A certain user
can only log in up to his privilege level, but he cannot access any other
sensitive data.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 43
CS3492 –DBMS Unit-5 Mailam Engineering College
The ability to access sensitive data is restricted by the use of authentication.
For example, a mobile phone performs authentication by requesting a PIN,
fingerprint, or by face recognition. Similarly, a computer verifies a username
by requesting the appropriate password.
Access Control
Database access control is a means of restricting access to sensitive company
data to only those people (database users) who are authorized to access such
data and permitting access to unauthorized persons. It is a key security
concept that reduces risk to the business or organization.
Physical and logical access control are the two types of access control. Access
to campuses, buildings, rooms, and physical IT assets is restricted through
physical access control. Connections to computer networks, system files, and
data are restricted through logical access control.
The most well-known Database Access Control examples are:
− Discretionary Access Control (DAC)
− Mandatory Access Control (MAC)
− Role-Based Access Control (RBAC)
− Attribute-Based Access Control (ABAC)
Inference Control
Inference control in databases, also known as Statistical Disclosure Control
(SDC), is a discipline that aims to secure data so that it can be published
without disclosing sensitive information associated with specific individuals
among those to whom the data corresponds.
It prevents the user from completing any inference channel. This strategy
prevents sensitive information from indirect disclosure.
There are two kinds of inferences:
i. Identity disclosure
ii. Attribute disclosure
Flow Control
Distributed systems involve a large amount of data flow from one site to
another as well as within a site.
Flow control prohibits data from being transferred in such a way that
unauthorized agents cannot access it. A flow policy specifies the channels
through which data can flow. It also defines security classes for data as well as
transactions.
Applying Statistical Method
Statistical database security focuses on the protection of sensitive individual
values stored in so- called statistical databases and used for statistical
purposes, as well as retrieving summaries of values based on categories. They
do not allow the retrieval of individual information.
Account Level Privileges apply to the capabilities provided to the account itself
and can include:
− The CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base
relation.
− The CREATE VIEW privilege.
− The ALTER privilege, to apply schema changes such as adding or
removing attributes from relations.
− The DROP privilege, to delete relations or views.
− The MODIFY privilege, to insert, delete, or update tuples.
− The SELECT privilege, to retrieve information from the database by using a
SELECT query. These privileges give the account holder varying degrees of control
over the database and its contents. The DBA must carefully consider the level of
access they grant to each account to ensure the security and protection of sensitive
information.
Relation Level Privileges are the second level of privileges applies to the relation
level, whether they are base relations or virtual (view) relations.
It is the second level of privileges which is applied to the relation level. This
includes tables or relations and virtual relations known as views.
A user who has created a database object such as a table or a view will get all
privileges on that object.
This user is the holder of owner account which is created for each relation. In
this level, an owner account is created for each relation and this account will
also have right to pass the privileges to other users by GRANTING privileges
to their accounts. – as in fig 5.16
The granting and revoking of discretionary privileges can be done by using a
model known as access matrix model. This model specifies rights of each
subject for each object.
Here, Mary has only read privilege on file1 she can’t modify that file1. So, we can
represent the privileges of each subject on each object. The matrix M consists of
rows which resembles subjects like users, accounts and the columns resembles
objects like relations, views. Each position M(i,j) in the matrix represents the types
of privileges like read, write, update that subject i holds on object j.
The RBAC methodology is based on a group of three primary rules that govern
access to secured systems:
Student id: The student enters the following in the input field:
Now, this 1=1 will return all records for which this holds true. So basically, all the
student data is compromised. Now the malicious user can also delete the student
records in a similar fashion. Consider the following SQL query.
Query:
SELECT * from USER where
USERNAME = “” and PASSWORD=””
Now the malicious can use the ‘=’ operator in a clever manner to retrieve
private and secure user information. So instead of the above-mentioned query
the following query when executed retrieves protected data, not intended to be
shown to users.
Query:
Select * from User where
(Username = “” or 1=1)
AND(Password=”” or 1=1).
Since 1=1 always holds true, user data is compromised.
Types of Encryptions
1. Symmetric Encryption– Data is encrypted using a key and the decryption is
also done using the same key.
2. Asymmetric Encryption-Asymmetric Cryptography is also known as public-
key cryptography. It uses public and private keys to encrypt and decrypt data.
One key in the pair which can be shared with everyone is called the public key.
The other key in the pair which is kept secret and is only known by the owner is
called the private key. Either of the keys can be used to encrypt a message; the
opposite key from the one used to encrypt the message is used for decryption.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 50
CS3492 –DBMS Unit-5 Mailam Engineering College
Digital Signatures
• A Digital Signature (DS) is an authentication technique based on public key
cryptography used in e-commerce applications.
• It associates a unique mark to an individual within the body of his message.
This helps others to authenticate valid senders of messages.
PREPARED BY: V. MATHAVAN ASSO.PROF/CSE, S. VANAKOVARAYAN ASSO.PROF/CSE 51
CS3492 –DBMS Unit-5 Mailam Engineering College
• Typically, a user’s digital signature varies from message to message in order
to provide security against counterfeiting.
Digital Certificate
• Digital certificate is issued by a trusted third party which proves sender’s
identity to the receiver and receiver’s identity to the sender.
• A digital certificate is a certificate issued by a Certificate Authority (CA) to
verify the identity of the certificate holder.
• The CA issues an encrypted digital certificate containing the applicant’s
public key and a variety of other identification information.
• Digital certificate is used to attach public key with a particular individual or an
entity.
Access Control: Access control involves regulating the access to data within
the database. It can be challenging to implement access control mechanisms
that allow authorized users to access the data they need while preventing
unauthorized users from accessing it.
Auditing and Logging: DBMS must maintain an audit trail of all activities in
the database. This includes monitoring who accesses the database, what data
is accessed, and when it is accessed. This can be a challenge to implement
and manage, especially in large databases.
Database Design: The design of the database can also impact security. A
poorly designed database can lead to security vulnerabilities, such as SQL
injection attacks, which can compromise the confidentiality, integrity, and
availability of data.
Malicious attacks: Cyber-attacks such as hacking, malware, and phishing
pose a significant threat to the security of databases. DBMS must have robust
security measures in place to prevent and detect such attacks.
Physical Security: Physical security of the database is also important, as
unauthorized physical access to the server can lead to data breaches.
APR-MAY 23
Part A:
1.what is a distributed database management system? Q1, Pg 1
2.Define encryption with an example. Q32, Pg 5
Part B:
1.What is a distributed transaction? explain distributed query processing with
a diagram Q5,6, Pg 16
2.What is NoSQL? Outline the features of NoSQl Databases. Q7, Pg 22
3.Discuss role-based access control with an example. Q15, Pg 39
NOV-DEC 23
Part A:
1. State CAP Theorem.Q4, Pg 3
2. List the different types of SQL injection attacks? Q25, Pg 4
Part B:
APR-MAY 24
Part A
1. Define Distributed Database. Q1, Pg 1
2. What are the challenges faced when using an encrypted system? Q30, Pg 5
Part B
2. Explain in detail about key value stores and role-based access control in
advanced database management systems. Q11 Pg 36, Q15, Pg 45
NOV-DEC 24
Part A
1. Write the approaches for storing the relation in distributed data storage? Q6,Pg 2
2. What is statistical database security? Q26, Pg 5
Part B