0% found this document useful (0 votes)
11 views141 pages

DBMS DC Unit 4

This document outlines the course structure for 'Database Management Systems' at RMK Group of Educational Institutions, detailing course objectives, prerequisites, syllabus, and expected outcomes. It covers key topics such as SQL, relational algebra, transaction processing, and query optimization, along with practical case studies and assessments. The document emphasizes the importance of confidentiality and proper handling of proprietary information.

Uploaded by

athishasecerf1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views141 pages

DBMS DC Unit 4

This document outlines the course structure for 'Database Management Systems' at RMK Group of Educational Institutions, detailing course objectives, prerequisites, syllabus, and expected outcomes. It covers key topics such as SQL, relational algebra, transaction processing, and query optimization, along with practical case studies and assessments. The document emphasizes the importance of confidentiality and proper handling of proprietary information.

Uploaded by

athishasecerf1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 141

Please read this disclaimer before proceeding:

This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
22IT202
DATABASE MANAGEMENT SYSTEMS

Created by: Dr.J.JenoJasmine


Mr.D.Kirubakaran
Date: 01.06.2024
1.TABLE OF CONTENTS

1. Contents
2. Course Objectives

3. Pre Requisites

4. Syllabus

5. Course outcomes

6. CO- PO/PSO Mapping

7. Lecture Plan

8. Activity based learning

9. Lecture Notes

10. Assignments

11. Part A Question & Answer

12. Part B Question & Answer

13. Supportive online Certification courses

14. Real time Applications in day to day life and to Industry

15. Contents beyond the Syllabus

16. Assessment Schedule

17. Prescribed Text Books & Reference Books

18. Mini Project suggestions


2. COURSE OBJECTIVES

To understand the basic concepts of Data modeling and


Database Systems.

To understand SQL and effective relational database


design concepts.

To learn relational algebra, calculus and normalization

To know the fundamental concepts of transaction


processing, concurrency control techniques, recovery
procedure and data storage techniques

To understand query processing, efficient data querying


and advanced databases.

6
3. PRE REQUISITES

PRE-REQUISITE

• 22CS101 Problem Solving and C++ Programming

• 20CS102 Software Development Practices

7
4. SYLLABUS

DATABASE MANAGEMENT SYSTEMS

UNIT I DATABASE CONCEPTS 9+6


Concept of Database and Overview of DBMS - Characteristics of databases -Data
Models, Schemas and Instances - Three-Schema Architecture - Database Languages
and Interfaces- Introductions to data models types- ER Model- ER Diagrams –
Enhanced ER Model - reducing ER to table Applications: ER model of University
Database Application – Relational Database Design by ER- and EER-to-Relational
Mapping.

List of Exercise/Experiments
Case Study using real life database applications anyone from the
following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
● Build Entity Model diagram. The diagram should align with the
business and functional goals stated in the application.

UNIT II STRUCTURED QUERY LANGUAGE 9+6


SQL Data Definition and Data Types – Constraints – Queries – INSERT, UPDATE,
and DELETE in SQL - Views - Integrity Procedures, Functions, Cursor and Triggers -
Embedded SQL - Dynamic SQL.

List of Exercise/Experiments
Case Study using real life database applications anyone from the following
list and do the following exercises.
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance

1. Data Definition Commands, Data Manipulation Commands for inserting, deleting,


updating and retrieving Tables and Transaction Control statements
2. Database Querying – Simple queries, Nested queries, Sub queries and Joins
3. Views, Sequences, Synonyms
4. Database Programming: Implicit and Explicit Cursors
5. Procedures and Functions
6. Triggers
7. Exception Handling
UNIT III RELATIONAL ALGEBRA, CALCULUS AND NORMALIZATION 9+6

Relational Algebra – Operations - Domain Relational Calculus- Tuple Relational Calculus -


Fundamental operations. Relational Database Design - Functional Dependency –
Normalization (1NF, 2NF 3NF and BCNF) –Multivalued Dependency and 4NF –Joint
Dependencies and 5NF - De-normalization.

List of Exercise/Experiments
1. Case Study using real life database applications anyone from the following list
 Inventory Management for a EMart Grocery Shop
 Society Financial Management
 Cop Friendly App – Eseva
 Property Management – eMall
 Star Small and Medium Banking and Finance.
 Apply Normalization rules in designing the tables in scope.
.
UNIT IV TRANSACTIONS, CONCURRENCY CONTROL AND DATA STORAGE 9+6
Transaction Concepts – ACID Properties – Schedules based on Recoverability,
Serializability – Concurrency Control – Need for Concurrency – Locking Protocols – Two
Phase Locking – Transaction Recovery –Concepts – Deferred Update – Immediate
Update.Organization of Records in Files – Unordered, Ordered – Hashing Techniques –
RAID – Ordered Indexes – Multilevel Indexes - B+ tree Index Files – B tree Index Files.

List of Exercise/Experiments
Case Study using real life database applications anyone from the following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
Ability to showcase ACID Properties with sample queries with appropriate settings
for the above scenario
UNIT V QUERY OPTIMIZATION AND ADVANCED DATABASES 9+6
Query Processing Overview – Algorithms for SELECT and JOIN operations – Query
optimization using Heuristics.Distributed Database Concepts – Design –Concurrency Control
and Recovery – NOSQL Systems – Document-Based NOSQL Systems and MongoDB.

List of Exercise/Experiments
Case Study using real life database applications anyone from the following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance

 Build PL SQL / Stored Procedures for Complex Functionalities, ex EOD Batch


Processing for calculating the EMI for Gold Loan for each eligible Customer.
TOTAL:45+30=75 PERIODS
5. COURSE OUTCOMES
CO1: Map ER model to Relational model to perform database design effectively.

CO2: Implement SQL and effective relational database design concepts.

CO3: Apply relational algebra, calculus and normalization techniques in database

design.

CO4: Understand the concepts of transaction processing, concurrency control,

recovery procedure and data storage techniques.

CO5: Apply query optimization techniques and understand advanced databases.


6. CO- PO/PSO MAPPING

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 2 1 1 1 1 1 1 2 2 2 2 2
CO2 3 2 2 1 1 1 1 2 2 2 2 2
CO3 2 1 1 1 1 1 1 2 2 2 2 2
CO4 2 1 1 1 1 1 1 2 2 2 2 2
CO5 2 1 1 1 1 1 1 2 2 2 2 2
CO6 2 1 1 1 1 1 1 2 2 2 2 2

PSO1 PSO2 PSO3


CO1 2 2 2
CO2 2 3 2
CO3 2 2 2
CO4 2 2 2
CO5 2 2 2
CO6 2 2 2
5.COURSE OUTCOME

Cognitive/
Expected
Course Affective Level
Course Outcome Statement Level of
Code of the Course
Attainment
Outcome

Course Outcome Statements in Cognitive Domain

Map ER model to Relational


Analyse
C212.1 model to perform database 60%
K4
design effectively

Implement SQL and effective


relational database design Apply
C212.2 concepts. 60%
K3

Apply relational algebra, calculus


and normalization techniques in Analyse
C212.3 K4 60%
database design.

Understand the concepts of


transaction processing,
concurrency control, recovery Understand
C212.4 procedure and data storage 60%
K1
techniques.

Apply query optimization


techniques and understand Apply
C212.5 60%
K3
advanced databases.

Course Outcome Statements in Affective domain

C212.7 Attend the classes regularly Respond (A2) 95%


Submit the Assignments
C212.8 Respond (A2) 95%
regularly.
Participation in Seminar/Quiz/
Group Discussion/ Collaborative
C212.9 learning and content beyond Valuing (A3) 95%
syllabus 4

12
6.CO-PO/PSO MAPPING

Correlation Matrix of the Course Outcomes to


Programme Outcomes and Programme Specific
Outcomes Including Course Enrichment Activities

Programme Outcomes (POs), Programme Specific Outcomes (PSOs)

P P P P P P P P P P P P PS PS PS
O O O O O O O O O O O O O O O
Course 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Outcomes (Cos)
K3
K
K4 K5 K5 /K A2 A3 A3 A3 A3 A3 A2 K3 K3 K3
3
5

C212.1 K4 3 3 2 2 3 3 3

C212.2 K3 3 2 1 1 3 3 3

C212.3 K4 3 3 2 2 3 3 3
5
C212.4 K4 3 3 2 2 3 3 3

C212.5 K4 3 3 2 2 3 3 3

C212.6 K4 3 3 2 2 3 3 3

C212.7 A2 3

C212.8 A2 2 2 2 3

C212.9 A3 3 3 3 3 3

C305 3 3 2 2 3 3 3

13
7. LECTURE PLAN
S.No Topic No Proposed Actual CO Taxono Mode of
of Date Lecture my Delivery
Perio Date Level
ds
1 Transaction 1 CO4 K2 PPT
Concepts -
ACID
Properties

2 1 CO4 K2 PPT
Schedules

3 1 CO4 K2 PPT
Serializability

4 1 CO4 K2 PPT
Concurrency
Control

5 1 CO4 K2 PPT
Need for
Concurrency

6 Locking 1 CO4 K2 PPT


Protocols -
Two Phase
Locking
7 Transaction 1 CO4 K2 PPT
Recovery

8 Organization of 1 CO4 K2 PPT


Records

9 Hashing 1 CO4 K2 PPT


Techniques

10 RAID 1 CO4 K2 PPT

11 Ordered,Multile 1 CO4 K2 PPT


vel Indexes

12 B+ Trre and B 1 CO4 K2 PPT


Tree Index Files
8. Activity Based Learning

Crossword Puzzle
Transactions
8. Activity Based Learning

Brainstorming Session

Compare the following pairs of concepts/techniques:


1. Interleaved vs simultaneous concurrency
2. Serial vs serialisable schedule
3. Shared vs exclusive lock
4. Basic vs conservative 2PL
5. Wait-for vs wound-wait deadlock prevention protocol
6. Deadlock vs livelock
9. LECTURE NOTES
UNIT - IV

Transaction Concepts - ACID Properties

Transaction Concept:
The term transaction refers to a collection of operations that form a single logical
unit of work.

A transaction is a unit of program execution that accesses and possibly


updates various data items.

For Example, transfer of money from one account to another is a transaction


consisting of two updates, one to each account.

A transaction is delimited by statements (or function calls) of the form begin


transaction and end transaction.

The transaction consists of all operations executed between the begin transaction
and end transaction.

Since a transaction is indivisible, it either executes in its entirety or not at all.

A Simple Transactional Model:


Transactions access data using two operations:
read(X)
Transfers the data item X from the database to a variable X, in the main
memory buffer of the transaction that executed the read operation.

write(X)
Transfers the value in the variable X in the main-memory buffer of the
transaction that executed the write to the data item X in the database.
States of a Transaction:

A transaction must be in one of the following states:

Active - the initial state; the transaction stays in this state while it is executing.

Partially committed - after the final statement has been executed.

Failed - after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction.

Committed - after successful completion.

Fig : States of a Transaction


Active State:

A transaction is in active state when it is executing the instructions of the


transaction.

Partially committed state:


When a transaction finishes its last statement, it enters the partially committed
state.

At this point, there are two possibilities:


The transaction may complete its execution successfully.

The transaction may have some failure and aborted.

Committed state:

After completion of the transaction, consistency check is made.


If the consistency check is successful, the transaction enters the committed
state.

Once the transaction is committed, the updates of the transaction are made
permanent to the database.

Failed state:

If the consistency check fails, the transaction is aborted and rolled back.
The transaction is rolled back to undo the effect of its write operations on the
database.

Terminated state:

The terminated state corresponds to the transaction leaving the system.


ACID Properties

(A-Atomicity, C-Consistency, I-Isolation, D-Durability)

Atomicity
“Either all operations of the transaction are reflected properly in the
database, or none are.”

Assume that, before the execution of transaction Ti, the values of accounts A and
B are $1000 and $2000, respectively.

Now suppose that, during the execution of transaction Ti, a failure happened after
the write(A) operation but before the write(B) operation.

In this case, the values of accounts A and B reflected in the database are $950
and $2000. The system destroyed $50 as a result of this failure.

The sum of A+B before and after the execution of transaction is not same and the
database is now in inconsistent state.

We must ensure that such inconsistencies are not visible in a database system.

The basic idea behind ensuring atomicity is this:


The database system keeps track (on disk) of the old values of any data on which
a transaction performs a write.

This information is written to a file called the log.


If the transaction does not complete its execution, the database system restores
the old values from the log to make it appear as though the transaction never
executed.

Ensuring Atomicity is the responsibility of the recovery system of the database.


Transaction Concepts – ACID Properties

Consistency:
“The consistency requirement here is that the sum of A and B be
unchanged by the execution of the transaction.”

Without the consistency requirement, money could be created or destroyed by


the transaction.

The consistency requirement verifies that, if the database is consistent before an


execution of the transaction, the database remains consistent after the execution
of the transaction.

Ensuring consistency for an individual transaction is the responsibility of the


Application programmer who codes the transaction.

Isolation:
“If several transactions are executed concurrently, their operations may

interleave in some undesirable way, resulting in an inconsistent state.”

For example:
If the database is temporarily inconsistent with the deducted total written to A

and the increased total yet to be written to B. (A - $950, B - $2000)

If a second concurrently running transaction reads A and B at this intermediate

point and computes A+B, it will observe an inconsistent value.

A way to avoid the problem of concurrently executing transactions is to

execute transactions serially—that is, one after the other.


Transaction Concepts – ACID Properties
Concurrent execution of transactions provides significance performance

benefits such as:

 Improved throughput and Resource utilization

 Reduced waiting time


The isolation property of a transaction ensures that the concurrent execution of
transactions results in a state that is equivalent to a state that could have been
obtained when the transactions are executed on eat a time in some order.

Ensuring Isolation property is the responsibility of Concurrency-control component


of the database system.

Durability:
“The durability property guarantees that, once a transaction completes

successfully, all the updates on the database persist, even if there is a

system failure after the transaction completes execution.”

We assume for now that a failure of the computer system may result in loss of
data in main memory, but data written to disk are never lost.

Durability is guaranteed by ensuring that either:


1.The transaction updates have been written to disk before the transaction
completes.

2.Information about the transaction updates and the data written to disk is
sufficient to enable the database to reconstruct the updates when the database
system is restarted after the failure.

Ensuring durability is the responsibility of the recovery system of the database.


Example of a Transaction:

We illustrate the transaction concept using a simple bank application consisting of


several accounts and a set of transactions that access and update those accounts.

Transactions access data using two operations:

 read(X) - transfers the data item X from the database to a variable, also called
X, in a buffer in main memory belonging to the transaction that executed the
read operation.

 write(X) - which transfers the value in the variable X in the main-memory


buffer of the transaction that executed the write to the data item X in the
database.

Let Ti be a transaction that transfers $50 from account A to account B. This


transaction can be defined as:

Ti :

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)
Atomicity requirement

If the transaction fails after step 3 and before step 6, money will be “lost” leading
to an inconsistent database state. Failure could be due to software or hardware.
The system should ensure that updates of a partially executed transaction are not
reflected in the database.

Ensuring atomicity is the responsibility of the database system component called


recovery system.

Consistency requirement

The sum of A and B is unchanged by the execution of the transaction. In


general, consistency requirements include

• Explicitly specified integrity constraints such as primary keys and foreign keys

• Implicit integrity constraints

e.g. sum of balances of all accounts, minus sum of loan amounts must equal value
of cash-in-hand

A transaction must see a consistent database. During transaction execution the


database may be temporarily inconsistent. When the transaction completes
successfully the database must be consistent. Erroneous transaction logic can lead
to inconsistency.

The preservation of consistency is generally considered to be the responsibility of


the programmers who write the database programs or the DBMS module that
enforces integrity constraints
 Durability requirement
Once the user has been notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the database by the transaction
must persist even if there are software or hardware failures.

The recovery system of the database is responsible for ensuring durability.

 Isolation requirement

If between steps 3 and 6, another transaction Tj is allowed to access the partially


updated database, it will see an inconsistent database (the sum A + B will be less
than it should be).

Ti Tj

1. read(A)

2. A := A – 50

3. write(A)
read(A), read(B), print(A+B)

4. read(B)

5. B := B + 50

6. write(B)

Isolation can be ensured trivially by running transactions serially that is, one after
the other. However, executing multiple transactions concurrently has significant
benefits.
Ensuring the isolation property is the responsibility of a component of the
database system called the concurrency-control system.
Schedules

When several transactions run concurrently, the isolation property may be


violated, resulting in database consistency being destroyed despite the
correctness of each individual transaction.

The database system must control the interaction among the concurrent
transactions to prevent them from destroying the consistency of the database. It
does so through a variety of mechanisms called concurrency-control schemes.

The concept of schedules to help identify those executions that are guaranteed to
ensure the isolation property and thus database consistency.

Schedules represent the chronological order in which instructions are executed in


the system.

A schedule can have many transactions in it, each comprising of a number of


instructions.

A transaction that successfully completes its execution will have a commit


instruction as the last statement.

A transaction that fails to successfully complete its execution will have an


abort instruction as the last statement.

Types of Schedules:
1. Serial Schedule

2. Non-serial Schedule

3. Recoverable Schedule

4. Non-recoverable Schedule

5. Cascadeless Schedule

6. Strict Schedule
Schedules
Serial Schedule:
A schedule S is serial if, the transactions in the schedule are executed one after
the other(not interleaved).
Example:
Consider the schedule S with two transactions T1 and T2.
Schedule1 : T1 is followed by T2 Schedule2 : T2 is followed by T1

Non-Serial Schedule:
A schedule S is non-serial if, the operations of the transactions in the schedule
are interleaved.
Example:
Consider the schedule S with two transactions T1 and T2.

•We can ensure consistency of the database under concurrent execution by


making sure that any concurrent schedule that is executed has the same effect as
a schedule that could have occurred in serial manner.
Recoverable Schedule
A schedule S is recoverable if, each transaction commits only after all
transactions from which it has read the data has committed.

Example:

Consider the schedule S with two transactions T1 and T2.

In this schedule,
• T2 reads the value of A updated by T1. (T2 is dependent on T1)
• T1 is committed before T2 gets committed.
• So, T2 is safe from rollback due to failure of T1.
• Thus, this is a recoverable schedule.

Non-Recoverable Schedule

A schedule S is non-recoverable if, a transaction commits only before the


transaction from which it has read the data has committed.
Example:
Consider the schedule S with two transactions T1 and T2.

In this schedule,

• T2 reads the value of A updated by T1. (T2 is dependent on T1)

• T2 is committed before T1 gets committed.


• So, T2 may suffer from inconsistency due to failure of T1 after a point where T2
commits.

• Thus, this is a non-recoverable schedule.

Cascadeless Schedule
Cascading Rollback:
Even if a schedule is recoverable, to recover correctly from the failure of a
transaction Ti, we may have to roll back several other transactions that read the
value produced by Ti.

This condition is called as Cascading Rollback.


Example:
Consider the schedule S with three transactions T1,T2 and T3 for understanding
Cascading Rollback.

In this schedule,

• Transaction T1 writes a value of A that is read by transaction T2.

• Transaction T2 writes a value of A that is read by transaction T3.

• Suppose that, at this point, T1 fails. T1 must be rolled back.

• Since T2 is dependent on T1, T2 must be rolled back.

• Since T3 is dependent on T2, T3 must be rolled back.

• This phenomenon, in which a single transaction failure leads to a series of


transaction rollbacks, is called cascading rollback.

Cascadeless Schedule:
A schedule S is cascadeless if, every transaction in the schedule reads only the
data items that were written by committed transactions.

Cascading rollbacks will not occur in a cascadeless schedule since it reads committed
data items.

Example:
Consider the schedule S with three transactions T1,T2 and T3.
In this schedule, the transaction reads the value A of a committed transaction.
There is no possibility of cascading rollback.

Thus, this is a cascadeless schedule.

Strict Schedule
A schedule S is strict if, the transactions in the schedule can neither read nor
write an item A until the last transaction that wrote A has committed.

Example:

Consider the schedule S with two transactions T1 and T2.

• In this, the transaction T2 ,reads and writes the value A of the committed
transaction T1.

• Thus, this is a strict schedule.


Serializability
The database system must control concurrent execution of transactions, to
ensure that the database state remains consistent. The concept of
serializability examines how the database system can carry out this task.
For this it is necessary to understand which schedules will ensure
consistency, and which schedules will not.

Since transactions are programs, it is computationally difficult to determine


exactly what operations a transaction performs and how operations of
various transactions interact. The two main operations are: read and write.

The two forms of serializability are conflict serializability and view serializability

CONFLICT SERIALIZABILITY
Let us consider a schedule S in which there are two consecutive instructions, I and
J, of transactions Ti and Tj, respectively ( i != j ).

If the two consecutive instructions I and J refer to different data items, then we can
swap I and J without affecting the results of any instruction in the schedule.

If the two consecutive instructions I and J refer to the same data item Q, then the
order of the two steps may matter.

Since we are dealing with only read and write instructions, there are four cases that
we need to consider:

1. I = read(Q), J = read(Q). The order of I and J does not matter, since


the same value of Q is read by Ti and Tj, regardless of the order.

2. I=write(Q), J=read(Q). The order of I and J matters.

3. I =read(Q), J =write(Q). The order of I and J matters.

4. I=write(Q), J=write(Q). The order of I and J matters.


I and J conflict (i.e. order of I and J matters) if they are operations by different
transactions on the same data item, and at least one of these instructions is a write
operation.
Conflict Equivalence:
• If a schedule S can be transformed into a schedule S’ by a series of
swaps of non-conflicting instructions, we say that S and S’ are
conflict equivalent

Conflict Serializability:
• A schedule S is conflict serializable if it is conflict equivalent to a serial
schedule.

Example for Conflict Serializability

Consider two schedules S (Concurrent Schedule) and S’ (Serial Schedule)

Schedule S can be transformed into Serial schedule S’, by a series of swaps of non-
conflicting instructions such as:

1. Read(B) of T1 and Write(A) of T2 are non-conflictingand swapped


2. Read(B) of T1 and Read(A) of T2 are non-conflicting and swapped

3. Write(B) to T1 and Write(A) of T2 are non-conflicting and swapped

4. Write(B) of T1 and Read(A) of T2 are non-conflicting and swapped

After swapping the non-conflicting instructions, the Schedule S is conflict equivalent


to Serial schedule S’.

Therefore, Schedule S is conflict serializable.


Testing Conflict Serializability : Precedence Graph
Let us consider a schedule S in which there are two consecutive instructions, I
and J, of transactions Ti and Tj, respectively ( i != j ).

If the two consecutive instructions I and J refer to different data items, then we
can swap I and J without affecting the results of any instruction in the schedule.

Consider a schedule S. We construct a directed graph, called a precedence


graph, from S.

This graph consists of a pair G = (V, E), where V is a set of vertices and E is a set
of edges.

The set of vertices consists of all the transactions participating in the schedule.
The set of edges consists of all edges Ti →Tj for which one of three conditions
holds:

Ti executes write(Q) before Tj executes read(Q).


Ti executes read(Q) before Tj executes write(Q).
Ti executes write(Q) before Tj executes write(Q).

If an edge Ti →Tj exists in the precedence graph, then, in any serial schedule S’
equivalent to S, Ti must appear before Tj.

If the precedence graph for S has a cycle, then schedule S is not conflict
serializable.

If the graph contains no cycles, then the schedule S is conflict serializable.

A serializability order of the transactions can be obtained by the process called


topological sorting.

The topological sorting for a directed acyclic graph is the linear ordering of
vertices. For every edge U->V of a directed graph, the vertex U will come before
vertex U in the ordering. Topological sort starts with a node which has zero
indegree (i.e) no incoming edges
Serializability

Example 1:

Consider a Schedule S with three Transactions T1, T2 and T3 as follows:

Schedule S Precedence Graph for S

The Precedence Graph for the Schedule S contains the following edges:

• T3 ->T1 because, T3 executes Read(x) before T1 executes write(x)

• T2 ->T3 because, T2 executes Read(y) before T3 executes Write(y) and T2


executes Read(z) before T1 executes Write(z).

• T2 ->T1 because, T2 executes Write(z) before T1 executes Write(z)

If an edge Ti →Tj exists in the precedence graph, then, in any serial schedule S’
equivalent to S, Ti must appear before Tj.

The precedence graph for S does not form a cycle, then schedule S is conflict
serializable.

Using Topological Sorting, the serializability order of the Schedule is identified as


T2 ->T3->T1 (i.e) The schedule S is equivalent to a Serial Schedule in which T2
followed by T3 followed by T1.
Serializability
Example 2:
Consider a Schedule S with two Transactions T1 and T2 as follows:

Schedule S Precedence Graph for S

The Precedence Graph for the Schedule S contains the following edges:

T1 ->T2 because,

• T1 executes Read(A) before T2 executes write(A)

• T1 executes write(A) before T2 executes write(A)

• T1 executes Read(B) before T2 executes write(B)

• T1 executes write(B) before T2 executes write(B)

The precedence graph for S does not form a cycle, then schedule S is conflict
serializable.
Using Topological Sorting, the serializability order of the Schedule is identified as
T1 ->T2 (i.e) The schedule S is equivalent to a Serial Schedule in which T1 followed
by T2.

Example 3:
Consider a Schedule S with two Transactions T1 and T2 as follows:
Schedule S Precedence Graph for S
3. Serializability
The Precedence Graph for the Schedule S contains the following edges:

T1 ->T2 because,

• T1 executes Read(A) before T2 executes Write(A)

• T1 executes Read(B) before T2 executes Write(B)

• T1 executes Write(B) before T2 executes Write(B)

T2 ->T1 because,

• T2 executes Read(A) before T1 executes Write(A)

• T1 executes Read(B) before T2 executes Write(B)

The precedence graph for S has a cycle, then schedule S is NOT conflict
serializable.

VIEW SERIALIZABILITY

View equivalence:

Let S and S´ be two schedules with the same set of transactions. S and S´ are
view equivalent if the following three conditions are met, for each data item Q,

1. If a transaction Ti reads the initial value of Q in schedule S, then in schedule


S’ also transaction Ti must read the initial value of Q.
2. If a transaction Ti executes read(Q), and that value was produced by
write(Q) of transaction Tj in schedule S, then in schedule S’ also transaction
Ti must read the value of Q that was produced by the same write(Q)
operation of transaction Tj .

3. The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S’.

Conditions 1 and 2 ensure that each transaction reads the same values in both
schedules and, therefore, performs the same computation.

Condition 3, coupled with conditions 1 and 2, ensures that both schedules result
in the same final system state.
A schedule S is view serializable if it is view equivalent to a serial schedule.
Example:

Schedule S is view equivalent to the serial schedule S’ since:


Read(Q) reads the initial value of Q in both the schedules.

T3 performs the final write of Q in both the schedules.

The above schedule S with three transactions T1, T2, T3 is view-serializable but not
conflict serializable because swapping of non-conflicting operations does not result in
conflict equivalence to the serial schedule.

BLIND WRITES:
Blind writes occurs in a schedule perform which does write operations without having
performed a read operation.

Example:

Blind writes appears in schedule S because, it performs write operations without having
performed a read operation.
Serializability

Conflict Serializable vs. View Serializable Schedule

View Serializable

Conflict
Serializable

Every conflict serializable schedule is view serializable.


But, every view serializable schedule may not be conflict serializable.
Any view serializable schedule that is not conflict serializable must contain a blind
write.
It is easy to test conflict serializability but, it is expensive to test view
serializability.

Most of the concurrency control schemes used in practice is based on conflict


serializability.
Concurrency Control

Concurrency Control in Database Management System is a procedure of


managing simultaneous operations without conflicting with each other. It ensures
that Database transactions are performed concurrently and accurately to produce
correct results without violating data integrity of the respective Database.
Concurrency control techniques that are used to ensure the noninterference or
isolation property of concurrently executing transactions. Most of these techniques
ensure serializability of schedule. Using concurrency control protocols (sets of rules)
that guarantee serializability.

Concurrent access is quite easy if all users are just reading data. There is
no way they can interfere with one another. Though for any practical Database, it
would have a mix of READ and WRITE operations and hence the concurrency is a
challenge.

DBMS Concurrency Control is used to address such conflicts, which mostly


occur with a multi-user system. Therefore, Concurrency Control is the most
important element for proper functioning of a Database Management System where
two or more database transactions are executed simultaneously, which require
access to the same data.

Potential problems of Concurrency

Lost Updates occur when multiple transactions select the same row and update
the row based on the value selected

Uncommitted dependency issues occur when the second transaction selects a


row which is updated by another transaction (dirty read)

Non-Repeatable Read occurs when a second transaction is trying to access the


same row several times and reads different data each time.

Incorrect Summary issue occurs when one transaction takes summary over the
value of all the instances of a repeated data-item, and second transaction update
few instances of that specific data-item. In that situation, the resulting summary
does not reflect a correct result.
9.5 Need for Concurrency Control

There are two good reasons for allowing concurrency.

1. Improved throughput and resource utilization:


• A transaction consists of many steps. Some involve I/O activity; others involve CPU
activity. The CPU and the disks in a computer system can operate in parallel.
Therefore, I/O activity can be done in parallel with processing at the CPU.

• The parallelism of the CPU and the I/O system can therefore be exploited to run
multiple transactions in parallel.

• All of this increases the throughput of the system – that is, the number of
transactions executed in a given amount of time.

• Correspondingly, the processor and disk utilization also increase in other words, the
processor and disk spend less time idle, or not performing any usually work.

2. Reduced waiting time:

• There may be a mix a transactions running on a system, some short and long
transactions.

• If transactions run serially, a short transaction may have to wait for a preceding long
transaction to complete, which can lead to unpredictable delays in running a
transaction.

• If the transactions are operating on different parts of the database, it is better to let
them run concurrently, sharing the CPU cycles and disk accesses among them.
Concurrent execution reduces the unpredictable delays in running transactions.

• Moreover, it also reduces the average response time: the average time for a
transaction to be completed after it has been submitted.
9.5 Need for Concurrency Control

Some more reasons for using Concurrency control method is DBMS:

To apply Isolation through mutual exclusion between conflicting transactions

To resolve read-write and write-write conflict issues

To preserve database consistency through constantly preserving execution


obstructions

The system needs to control the interaction among the concurrent transactions. This
control is achieved using concurrent-control schemes.

Concurrency control helps to ensure serializability

Types of concurrency control protocols:

1. Lock based protocols

2. Two phase locking protocols

3. Timestamp protocols

4. Multiversion concurrency control protocols

5. Multiple granularity concurrency control protocol


9.6 Locking Protocols - Two Phase Locking

Lock based Protocols helps to overcome the issues related to accessing the DBMS
concurrently by locking the current transaction for only one user. The assumption or
more like a requirement that is necessary for implementing Lock Based Protocol is
that all the data items involved are accessed in a mutually exclusive manner i.e. when
one transaction is active by a user, no other transaction is allowed access to update or
modify that transaction at the same time. As the name suggests, the Lock based
protocols when in action, are required to acquire a lock to access the data items and
release the lock when the said transaction is completed.

Types of Locks

Several types of locks are used in concurrency control

• Binary locks

• Shared/exclusive locks also known as read/write locks

Binary Locks
A binary lock can have two states or values: locked and unlocked (or 1 and 0, for
simplicity)
A distinct lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database
operation that requests the item.
If the value of the lock on X is 0, the item can be accessed when requested, and
the lock value is changed to 1. We refer to the current value (or state) of the
lock associated with item X as lock(X).

Two operations are used with binary locking


• lock_item
• unlock_item
That is, A transaction requests access to an item X by first issuing a lock_item(X)
operation.
• If LOCK(X) = 1, the transaction is forced to wait.
• If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and the transaction is
allowed to access item X.
9.6 Locking Protocols - Two Phase Locking

Lock and unlock operations for binary locks

It is quite simple to implement a binary lock; all that is needed is a binary-valued


variable, LOCK, associated with each data item X in the database.

In its simplest form, each lock can be a record with three fields:
<Data_item_name, LOCK, Locking_Transaction> plus a queue for transactions
that are waiting to access the item.
9.6 Locking Protocols - Two Phase Locking

System Lock Tables


The system needs to maintain only these records for the items that are currently
locked in a lock table, which could be organized as a hash file on the item name. Items
not in the lock table are considered to be unlocked. The DBMS has a lock manager
subsystem to keep track of and control access to locks. Every transaction must obey
the following rules:

1. A transaction T must issue the operation lock_item(X) before any read_item(X) or


write_item(X) operations are performed in T.

2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and
write_item(X) operations are completed in T.

3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on
item X.

4. A transaction T will not issue an unlock_item(X) operation unless it already holds the
lock on item X.
9.6 Locking Protocols - Two Phase Locking

Shared/Exclusive locks
There are various modes in which a data item may be locked. Two modes are given
below:

1. Shared Mode

If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then Ti


can read, but cannot write, Q.

2. Exclusive Mode
If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q, then
Ti can both read and write Q.

Lock-compatibility

When we use the shared/exclusive locking scheme, the system must enforce the
following rules:

1. A transaction T must issue the operation read_lock (X) or write_lock (X) before any
read_item(X) operation is performed in T.

2. A transaction T must issue the operation write_lock (X) before any write_item (X)
operation is performed in T.

3. A transaction T must issue the operation unlock(X) after all read_item(X) and
write_item(X) operations are completed in T.3

4. A transaction T will not issue a read_lock (X) operation if it already holds a read
(shared) lock or a write (exclusive) lock on item X.

5. A transaction T will not issue a write_lock(X) operation if it already holds a read


(shared) lock or write (exclusive) lock on item X.
9.6 Locking Protocols - Two Phase Locking

Banking Example
Consider again the banking example. Let A and B be two accounts that are accessed
by transactions T1 and T2. Transaction T1 transfers $50 from account B to account A
Transaction T2 displays the total amount of money in accounts A and B—that is, the
sum A + B. This scenario is shown below.

Suppose that the values of accounts A and Bare $100 and $200, respectively. If
these two transactions are executed serially, either in the order T1, T2 or the order
T2, T1 then transaction T2 will display the value $300. If, however, these
transactions are executed concurrently, then schedule 1, in the below Figure is
possible.
T1 T2 Concurrency control Manger
lock-X(B);
grant_X(B,T1)
read(B);
B:=B-50;
write(B);
unlock(B); lock-S(A);
grant-S(A,T2)
read (A);
unlock(A);
lock-S(B);
grant-S(B,T2)
read (B);
unlock(B);
display(A+B)
lock_X(A);
grant-X(A,T2)
read(A);
A:=A+50;
write(A);
unlock(A);
In this case, transaction T2 displays $250, which is incorrect. The reason for this
mistake is that the transaction T1 unlocked data item B too early, as a result of
which T2 saw an inconsistent state.

The schedule shows the actions executed by the transaction, as well as the points at
which the concurrency-control manager grants the locks. The transaction making a
lock request cannot execute its next action until the concurrency-control manager
grants the lock. Hence, the lock must be granted in the interval of time between the
lock-request operation and the following action of the transaction. Sometimes
locking can lead to undesirable situation as in the next figure. In the figure since T3
is holding an X-lock on B and T4 is requesting a S-lock on B, T4 is waiting for T3 to
unlock B. similarly, since T4 is holding is a S-lock on A and T3 is requesting an X-lock
on A, T3 is waiting for T4 to unlock A. thus it is in a deadlock state. The only
solution is to rollback one of the two transactions. Once a transaction has been
rolled back, the data items that were locked by that transaction have been unlocked.
Locking Protocols - Two Phase Locking

Two Phase Locking Protocol


One protocol that ensures serializability is the two-phase locking protocol. This
protocol requires that each transaction issue lock and unlock requests in two phases:

1. Growing phase:

A transaction may obtain locks, but may not release any lock.

2. Shrinking phase:

A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed. Once the transaction releases a lock, it enters the shrinking phase, and it can
issue no more lock requests. Consider Transactions T1,T2,T3 and T4 given below:

For example, transactions T3 and T4 are two phase. On the other hand,
transactions T1 and T2 are not two phase.

The point in the schedule where the transaction has obtained its final lock (the end
of its growing phase) is called the lock point of the transaction.
Guaranteeing Serializability by Two-Phase Locking

A transaction is said to follow the two-phase locking protocol if all locking operations
(read_lock, write_lock) precede the first unlock operation in the transaction.

Such a transaction can be divided into two phases:

Expanding or growing (first) phase:


During the expanding phase which new locks on items can be acquired but none can
be released

Shrinking (second) phase:


During shrinking phase which existing locks can be released but no new locks can be
acquired. If lock conversion is allowed, then upgrading of locks (from read-locked to
writelocked) must be done during the expanding phase, and downgrading of locks
(from write-locked to read-locked) must be done in the shrinking phase. Hence, a
read_lock(X) operation that downgrades an already held write lock on X can appear
only in the shrinking phase.

Transactions that do not obey two-phase locking.


(a) Two transactions T1 and T2. (b) Results of possible serial schedules of T1 and
T2.
• Transactions T1 and T2 do not follow the two-phase locking protocol because the
write_lock(X) operation follows the unlock(Y) operation in T1, and similarly the
write_lock(Y) operation follows the unlock(X) operation in T2.

• The transactions can be rewritten as T1’ and T2’. Transactions T1’ and T2’, which are
the same as T1 and T2 but follow the two-phase locking protocol.

Note that the above transaction can produce a Deadlock.


It can be proved that, if every transaction in a schedule follows the two-phase locking
protocol,

• The schedule is guaranteed to be serializable, obviating the need to test for


serializability of schedules.

• The locking protocol, by enforcing two-phase locking rules, also enforces serializability.
Two-phase locking may limit the amount of concurrency that can occur in a schedule
because a transaction T may not be able to release an item X after it is through using
it if T must lock an additional item Y later; or conversely, T must lock the additional
item Y before it needs it so that it can release X.

• Hence, X must remain locked by T until all items that the transaction needs to read or
write have been locked; only then can X be released by T. Meanwhile, another
transaction seeking to access X may be forced to wait, even though T is done with X;
conversely, if Y is locked earlier than it is needed, another transaction seeking to
access Y is forced to wait even though T is not using Y yet.

• This is the price for guaranteeing serializability of all schedules without having to
check the schedules themselves.

• Although the two-phase locking protocol guarantees serializability (that is, every
schedule that is permitted is serializable), it does not permit all possible serializable
schedules (that is, some serializable schedules will be prohibited by the protocol).
9.6 Locking Protocols - Two Phase Locking
Types of Two-Phase Locking:
• Basic
• Conservative
• Strict
•Rigorous Two-Phase Locking

Basic Two-Phase Locking :

There are a number of variations of two-phase locking (2PL). The technique just
described is known as basic 2PL.

Conservative 2PL :
A variation known as conservative 2PL (or static 2PL) requires a transaction to lock all
the items it accesses before the transaction begins execution, by predeclaring its read-
set and write-set. The read-set of a transaction is the set of all items that the
transaction reads, and the write-set is the set of all items that it writes. If any of the
predeclared items needed cannot be locked, the transaction does not lock any item;
instead, it waits until all the items are available for locking. Conservative 2PL is a
deadlock-free protocol

Strict two-phase locking protocol:


This protocol requires not only that locking be two phase, but also that all exclusive-
mode locks taken by a transaction be held until that transaction commits. This
requirement ensures that any data written by an uncommitted transaction are locked
in exclusive mode until the transaction commits, preventing any other transaction from
reading the data.

Strict 2PL is not deadlock-free.

Rigorous two-phase locking protocol:


A more restrictive variation of strict 2PL is rigorous 2PL, which also guarantees strict
schedules. In this variation, a transaction T does not release any of its locks (exclusive
or shared) until after it commits or aborts, and so it is easier to implement than strict
2PL.

It requires that all locks be held until the transaction commits.


Lock conversions
A transaction that already holds a lock on item X is allowed under certain conditions to
convert the lock from one locked state to another.

Types

1. Upgrade

2. Downgrade

• A mechanism can be provided for upgrading a shared lock to an exclusive lock, and

downgrading an exclusive lock to a shared lock.

• We denote conversion from shared to exclusive modes by upgrade, and from exclusive
to shared by downgrade.

• Lock conversion cannot be allowed arbitrarily. Rather, upgrading can take place in only
the growing phase, whereas downgrading can take place in only the shrinking phase.
9.7 Transaction Recovery , Save Points & Isolation Levels
Transaction Recovery
Recovery techniques are heavily dependent upon the existence of a special file known as
a system log. It contains information about the start and end of each transaction and
any updates which occur in the transaction. The log keeps track of all transaction
operations that affect the values of database items. This information is needed to
recover from transaction failure.

The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.

•read_item(T, X): This log entry records that transaction T reads the value of database
item X.

•write_item(T, X, old_value, new_value): This log entry records that transaction T


changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.

•commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.

• abort(T): This records that transaction T has been aborted.

•checkpoint: Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.

A transaction T reaches its commit point when all its operations that access the database
have been executed successfully i.e. the transaction has reached the point at which it
will not abort (terminate without completing). Once committed, the transaction is
permanently recorded in the database. Commitment always involves writing a commit
entry to the log and writing the log to disk. At the time of a system crash, item is
searched back in the log for all transactions T that have written a start_transaction(T)
entry into the log but have not written a commit(T) entry yet; these transactions may
have to be rolled back to undo their effect on the database during the recovery process.
9.7 Transaction Recovery , Save Points & Isolation Levels

Save Points in Transaction Recovery


We can declare intermediate markers called savepoints within the context of a
transaction. Savepoints divide a long transaction into smaller parts.

A savepoint is a way of implementing transactions within a relational database


management system by indicating a point within a transaction that can be "rolled
back to" without affecting any work done by the transaction before the savepoint
was created.

Multiple savepoints can exist within a single transaction.


Savepoints are useful for implementing complex error recovery in database
applications. If an error occurs in the midst of a multiple-statement transaction,
the application may be able to recover from the error (by rolling back to a
savepoint) without needing to abort the entire transaction.

A savepoint can be declared by issuing a SAVEPOINT name statement.

All changes made after a savepoint has been declared can be undone by issuing a
ROLLBACK TO SAVEPOINT name command.

Issuing RELEASE SAVEPOINT name will cause the named savepoint to be


discarded, but will not otherwise affect anything.

Issuing the commands ROLLBACK or COMMIT will also discard any savepoints
created since the start of the main transaction.

Issuing the commands ROLLBACK or COMMIT will also discard any savepoints
created since the start of the main transaction.

Savepoints are similarly useful in application programs. If a procedure contains


several functions, then you can create a savepoint before each function begins.
Then, if a function fails, it is easy to return the data to its state before the
function began and re-run the function with revised parameters or perform a
recovery action.
After a rollback to a savepoint, Oracle releases the data locks obtained by
rolled back statements. Other transactions that were waiting for the previously
locked resources can proceed. Other transactions that want to update previously
locked rows can do so.

When a transaction is rolled back to a savepoint, the following occurs:


Oracle rolls back only the statements run after the savepoint creation.
Oracle preserves the specified savepoint, but all savepoints that were
established after the specified one are lost.

Oracle releases all table and row locks acquired since that savepoint but
retains all data locks acquired previous to the savepoint.

• Even then, the transaction remains active and can be continued.

The SAVEPOINT statement to identify a point in a transaction to which you can


later roll back.

Example : Creating Savepoints

UPDATE employees SET salary = 7000 WHERE last_name = 'Adam';

SAVEPOINT Adam_sal;
UPDATE employees SET salary = 12000 WHERE last_name = 'Mike';
SAVEPOINT Mike_sal;

SELECT SUM(salary) FROM employees;


ROLLBACK TO SAVEPOINT Adam_sal;
UPDATE employees SET salary = 11000 WHERE last_name = 'Mike';
COMMIT;
TRANSACTION ISOLATION LEVELS:

Serializability allows programmers to ignore issues related to concurrency


when they code transactions. If every transaction has the property that it
maintains database consistency if executed alone, then serializability ensures that
concurrent executions maintain consistency. However, the protocols required to
ensure serializability may allow too little concurrency for certain applications. In
these cases, weaker levels of consistency are used. The use of weaker levels of
consistency places additional burdens on programmers for ensuring database
correctness.

The SQL standard also allows a transaction to specify that it may be


executed in such a way that it becomes nonserializable with respect to other
transactions. For instance, a transaction may operate at the isolation level of read
uncommitted, which permits the transaction to read a data item even if it was
written by a transaction that has not been committed. SQL provides such features
for the benefit of long transactions whose results do not need to be precise. If
these transactions were to execute in a serializable fashion, they could interfere
with other transactions, causing the others’ execution to be delayed. The isolation
levels specified by the SQL standard are as follows:

Serializable usually ensures serializable execution. However, some


database systems implement this isolation level in a manner that may, in certain
cases, allow nonserializable executions.
 Repeatable read allows only committed data to be read and further requires
that, between two reads of a data item by a transaction, no other transaction is
allowed to update it. However, the transaction may not be serializable with
respect to other transactions. For instance, when it is searching for data
satisfying some conditions, a transaction may find some of the data inserted by
a committed transaction, but may not find other data inserted by the same
transaction.

 Read committed allows only committed data to be read, but does not require
repeatable reads. For instance, between two reads of a data item by the
transaction, another transaction may have updated the data item and
committed.

 Read uncommitted allows uncommitted data to be read. It is the lowest


isolation level allowed by SQL. All the isolation levels above additionally disallow
dirty writes, that is, they disallow writes to a data item that has already been
written by another transaction that has not yet committed or aborted.

Many database systems run, by default, at the read-committed isolation level.

In SQL, it is possible to set the isolation level explicitly, rather than accepting the
system’s default setting. For example, the statement “set transaction isolation
level serializable;” sets the isolation level to serializable; any of the other
isolation levels may be specified instead.

Changing of the isolation level must be done as the first statement of a


transaction.
9.8 FILE ORGANIZATION

A file is organized as a sequence of records. These records are mapped onto disk blocks.

The record organization methods are of the following

Fixed Length Records:

Consider a file of account records for the bank database. Each record of the file is defined as :

type deposit = record account-


number: char(10); branch- name: char(22); balance:real; end
If it is assumed that each character occupies 1 byte and that a real occupies 8
bytes, then account record is 40 bytes long. A simple approach is to use the first
40 bytes for the first record, the next 40 bytes for the second record and so on.

The below figure, show how fixed length records are stored in the file.

However, there are two problems with this simple approach:


It is difficult to delete a record from this structure. The space occupied
by the record to be deleted must be filled with some other record of the file,
or we must have a way of marking deleted records so that they can be
ignored.

If the block size happens to be a greater than 40 bytes, some records

will cross block boundaries. That is part of the record will be stored in

one block and part in another. It would thus require two block accesses

to read or write such a record.

The deletion can be performed in several ways:

Method 1: Without Pointers


The first approach is, when a record is deleted, all the records after it should be
moved up so that the space after deletion is occupied. The below figure show
this type of approach.
62
In the above figure record 2 was deleted and all the records after it are
moved up. This requires moving a large number of records.

Instead of this approach, it might be easier simply to move the final record of
the file into the space occupied by the deleted record. This approach is
illustrated in the below figure.

Another approach is to reuse the space of the deleted record by inserting


new record in that place. This approach avoids the movement of records. Since it
is hard find the available space, it is desirable to use some additional structure.

Method 2 : Using Pointers


At the beginning of the file, a certain number of bytes are allocated as a file
header. The header will contain a variety of information about the file. In
addition to all the information it maintains the address of the first record whose
contents are deleted. And this first record is used to store the address of the
second available record and so on

63
On insertion of a new record, we can use the record pointed by the header. The header
pointer is change to point to the next available record after insertion. If no
space is available the insertion is done at the end of the file.

Insertion and deletion of fixed length records are simple to implement, because the space
made available by a deleted record is exactly the space needed to insert a record.
Variable length records may not hold this type of advantages.

Variable Length Records:

Variable-length records arise in database systems in several ways:

Storage of multiple record types in a file.

Record types that allow variable lengths for one or more fields.

Record types that allow repeating fields (used in some older data models).

The format of the record is:

type account-list = record branch-

name: char (22); account-info:

array [1 .. ∞] of record;

account-number: char(l0);

balance: real;

end

end

64
The account-info is defined as an array with an arbitrary number of elements. That
is, the type definition does not limit the number of elements in the array, although
any actual record will have a specific number of elements in its array. There is no
limit on how large a record can be (up to, of course, the size of the disk storage).

Byte String Representation:

A simple method for implementing variable-length records is to attach a special


end-of-.record (^) symbol to the end of each record. Then each record is stored
as a string of consecutive bytes. The below Figure show, such an organization
to represent the account file as variable-length records. An alternative version of
the byte-string representation stores the record length at the beginning of each
record, instead of using end-of-record symbols.

The byte-string representation as described has some disadvantages:


• It is not easy to reuse space occupied formerly by a deleted record.
• There is no space, in general, for records to grow longer.
• If a variable-length record becomes longer, it must be moved-movement is costly
if pointers to the record are stored elsewhere in the database (e.g., in indices, or
in other records), since the pointers must be located and updated.

65
Thus, the basic byte-string representation described here not usually used for imple-
menting variable-length records. However, a modified form of the byte-string
representation, called the slotted-page structure, is commonly used for
organizing records within a single block. The slotted-page structure appears in

the below Figure. There is a header at the beginning of each block,


containing the following information:
The number of record entries in the header.
The end of free space in the block

An array whose entries contain the location and size of each record
The actual records are allocated contiguously in the block, starting from the
end of the block. The free space in the block is contiguous, between the
final entry in the header array, and the first record. If a record is inserted,
space is allocated for it at the end of free space, and an entry containing its size
and location is added to the header.
If a record is deleted, the space that it occupies is freed, and its entry is set to
deleted (its size is set to -1, for example). Further, the records in the block before
the deleted record are moved, so that the free space created by the deletion gets
occupied and all free space is again between the final entry in the header array
and the first record. The end-of-free-space pointer in the header is appropriately
updated as well. Records can be grown or shrunk by similar techniques, as long
as there is space in the block.

The slotted-page structure requires that there be no pointers that point directly
to records. Instead, pointers must point to the entry in the header that contains
the actual location of the record. This level of indirection allows records to be
moved to prevent fragmentation of space inside a block, while supporting indirect

pointers to the record.

66
Fixed-length Representation
Another way to implement variable-length records efficiently in a file system is to
use one or more fixed-length records to represent one variable-length record.

There are two ways of doing this:


Reserved space: If there is a maximum record length that is never exceeded, then
fixed-length records of that length is used. Unused space (for records shorter than
the maximum space) is filled with a special null, or end-of-record, symbol.
List representation: variable-length records can be represented by lists of fixed
length records, chained together by pointers.
If the reserved-space method is applied to the account example, it is needed to
select a maximum record length. The below figure shows, how the file of account
would be represented if, maximum of three accounts per branch are allowed.

Figure: Fixed-Length Representation

67
A record in this file is of the account-list type, but with the array containing
exactly three elements. Those branches with fewer than three accounts (for
example, Round Hill) have records with null fields. The symbol (^) is used to
represent this situation in Figure. The reserved-space method is useful when most
of the records have a length close to the maximum. Otherwise, a significant
amount of space may be wasted.
In the bank example, some branches may have many more accounts than others.
This situation leads to consider the linked list method. To represent the file by the
linked list method, a pointer field should be added. The resulting structure ap-
pears in the Figure

Figure: - Pointer Method


A disadvantage to the structure of the above Figure: 4.16 is that space is wasted
in all records except the first in a chain. The first record needs to have the branch-
name value, but subsequent records do not. But, it is needed to include a

field for branch-name in all records. This wasted space is significant.

To deal with this problem, two kinds of blocks are allowed in a file:
Anchor block, which contains the first record of a chain
Overflow block, which contains records other than those that are the first record
of a chain
Thus, all records within a block have the same length, even though not all
records in the file have the same length. The Figure: 4.17 show this file structure.

68
Figure: - Pointer Method-Using Anchor blockOverflow block

Organization of Records in Files:


An instance of a relation is a set of records. Given a set of records, the next
question is how to organize them in a file. Several of the possible ways of
organizing records in files are:

Heap File Organization (heap files):


In this simplest and most basic type of organization, records are placed in the file
in the order in which they are inserted, so new records are inserted at the end of
the file. Such an organization is called a heap or pile file.

Sequential File Organization (sorted files):

Records are stored in sequential order according to the value of a "search key" of
each record.

Hashing File Organization:

A hash function is computed on some attribute of each record. The result of the
hash function specifies in which block of the file the record should be placed.

Clustering File Organization:


Generally, a separate file is used to store the records of each relation. However, in
a clustering file organization, records of several different relations are stored in
the same file; Further, related records of the different relations are stored on the
same block, so that one I/O operation fetches related records from all the
relations. For example, records of the two relations can be considered to be
related if they would match in a join of the two relations.

69
Heap File Organization (heap files or unordered files):
In this simplest and most basic type of organization, records are placed in the
file in the order in which they are inserted, so new records are inserted at
the end of the file. Such an organization is called a heap or pile file. This
organization is often used with additional access paths, such as the secondary
indexes. It is also used to collect and store records for future use.

Inserting a new record is efficient. The last disk block of the file is copied into a
buffer. The new record is added and then the block is then rewritten back to the
disk. The address of the last file block is kept in the file header. However,
searching for a record using any search condition involves a linear search
through the file by block, which is an expensive procedure. If only one record
satisfies the search condition, then, on the average, a program will read into
memory and search half the file blocks before it finds the record. For a file b
blocks requires searching (b/2) blocks on average. If no records satisfy the search
condition, the program must read and search all b blocks in the file.

To delete a record, a program must first find its block, copy the block into the
buffer, and finally rewrite the block back to the disk. This leaves unused space in the
disk block. Deleting, a large number of records in this way results in wasted
storage space. Another technique used for record deletion is to have an extra
byte or bit, called a deletion marker, stored with each record. A record is deleted
by setting the deletion marker to a certain value. A different value of the marker
indicates a valid (not deleted) record. Search programs consider only valid
records in a block when conducting their search. Both of these deletion techniques
require periodic reorganization of the file to reclaim the unused space of
deleted records. During reorganization, the file blocks are accessed consecutively,
and records are packed by removing deleted records.

70
After such reorganization, the blocks are filled to capacity once more. Another
possibility is to use the space when inserting records although this requires
extra bookkeeping to keep track of empty locations.

Sequential File Organization (sorted files or ordered files):


A sequential file organization is designed for efficient processing of records in
sorted order based on some search key. A search key is any attribute or
set of attributes. It need not be the primary key, or even a super key. To
permit fast retrieval of records in search-key order, the records are chained
together by pointers. The pointer in each record points to the next record in
search-key order. Furthermore, to minimize the number of block accesses in
sequential file processing, the records are stored physically in search-
key order, or as close to search-key order as possible.

Below Figure: show a sequential file of account records taken from the banking
example. In that example, the records are stored in search-key order, using branch-
name as the search key.

Figure: Sequential File for account Records

71
The sequential file organization allows records to be read in sorted order; that can
be useful for display purposes, as well as for certain query-processing algorithms. It
is difficult, however, to maintain physical sequential order as records are in-
serted and deleted, since it is costly to move many records as a result of a single
insertion or deletion or deletion. Deletion can be managed by using pointer
chains. For insertion, the following rules are applied:
• Locate the record in the file that comes before the record to be inserted in search-
key order.
• If there is a free record (that is, space left after a deletion) within the same block
as this record, insert the new record there. Otherwise, insert the new record in an
overflow block. In either case, adjust the pointers so as to chain together the
records in search-key order.

The below figure: show the file of account, after the insertion of the record (North
Town, A-888, 800). The structure in the figure allows fast insertion of new records,
but forces sequential file-processing applications to process records in an order that
does not match the physical order of the records. If relatively few records need to
be stored in overflow blocks, this approach works well. Eventually, however, the
correspondence between search-key order and physical order may be totally lost, in
which case sequential processing will become much less efficient. At this point, the
file should be reorganized so that it is once again physically in sequential order. Such
reorganizations are costly; and must be done during times when the system load is
low. The frequency with which reorganizations are needed depends on the
frequency of insertion of new records. In the extreme case in which insertions rarely
occur, it is possible always to keep the file in physically sorted order.

Figure: Sequential File Organization after an Insertion

72
Clustering File Organization:
Many relational-database systems store each relation in a separate file, so that
they can take full advantage of the file system that the operating system
provides. This simple approach to relational-database implementation becomes
less satisfactory as the size of the database increases. There are performance
advantages to be gained from careful assignment of records to blocks, and from
careful organization of the blocks themselves. A more complicated file structure
may be beneficial, even if the strategy of storing each relation in a separate file is
used.
However, many large-scale database systems do not rely directly on the underly-
ing operating system for file management. Instead, one large operating-system
file is allocated to the database system. The database system stores all relations
in this one file, and manages the file itself. To see the advantage of storing many
relations in one file, consider the following SQL query for the bank database:

select account-number, customer-name, customer-street, customer-city


from depositor, customer where depositor. customer_name =
customer.customer_name
This query computes a join of the depositor and customer relations. Thus, for
each tuple of depositor, the system must locate the customer tuples with the
same value for customer-name. Regardless of how these records are located,
however, they need to be transferred from disk into main memory. In the worst
case, each record will reside on a different block, forcing us to do one block read
for each record required by the query. As an example, consider the depositor and
customer relations of given below

73
Figure: Depositor Relation

Figure: Customer Relation

The below figure shows a file structure designed for efficient execution of queries
involving depositor customer

Figure: Multiple clustering file structure


The depositor tuples for each customer-name are stored near the customer tuple for
the corresponding customer name. This structure mixes together tuples of two
relations, but allows for efficient processing of the join. When a tuple of the
customer relation is read, the entire block containing that tuple is copied from disk
into main memory. Since the corresponding depositor tuples are stored on the disk
near the customer tuple, it is also copied. If a customer has so many accounts that
the depositor records do not fit in one block, the remaining records appear on
nearby blocks.

74
A clustering file organization is a file organization, such as that illustrated in the
Figure: 4.22 that stores related records of two or more relations in each block.
Such a file organization allows us to read records that would satisfy the join
condition by using one block read.

The use of clustering has enhanced processing of a particular join (depositor


customer), but it results in slowing processing of other types of query. For
example, Select * from customer

requires more block accesses than it did in the scheme under which each relation is
stored in a separate file. Instead of several customer records appearing in one block
each record is located in a distinct block. Indeed, simply finding all the
customer records is not possible without some additional structure. To locate all
tuples of customer relation in the structure of the Figure: 4.22, it is needed to
chain together all records of that relation using pointers, as in the Figure: 4.23.
The usage of clustering depends on the types of query that the database designer
believes to be most frequent. Careful use of clustering can produce significant
performance gains in query processing.

Figure: Multiple Clustering File Structure with Pointer Chains

75
The usage of clustering depends on the types of query that the database designer
believes to be most frequent. Careful use of clustering can produce significant
performance gains in query processing.

VARIOUS OPERATIONS ON FILES:


Operations on files are usually grouped into retrieval operations and update
operations. Retrieval operations do not change any data in the file, but only
locate certain records so that their field values can be examined and processed.

Update operations change the file by insertion or deletion of records or


modification of field values.

The general operations are:

Open:
Prepares the file for reading or writing. Allocates appropriate buffers to hold file
blocks from disk, and retrieves the file header. Sets the file pointer to the
beginning of the file.

Reset:

Sets the file pointer of an open file to the beginning of the file.

Find:
Searches for the first record that satisfies the search condition. Transfers the
block containing that record into a main memory buffer. The file pointer points to
the record in the buffer and it becomes the current record.

Read:
Copies the current record from the buffer to a program variable in the user
program. This command may also advance the current record pointer to the next
record in the file, which may necessitate reading the next file block from the disk.

76
Findnext:
Searches for the next record in the file that satisfies the search condition.
Transfers the block containing that record into a main memory buffer. The record
is located in the buffer and becomes the current record.

Delete:

Deletes the current record and updates the file on disk to reflect the deletion.

Modify:

Modifies some field values for the current record and updates the file on disk to
reflect the modification.

Insert:
Inserts a new record in the file by locating the block where the record is to be
inserted, transferring that block into a main memory buffer, writing the record into
the buffer, and also writes the buffer to disk to reflect the insertion.

Close:

Completes the file access by releasing the buffers and performing any other
needed cleanup operations.

Scan:

If the file has just been opened or reset, scan returns the first record; otherwise it
returns the next record.

Findall:

Locates all the records in the file that satisfy a search condition.

Find Ordered:

Retrieves all the records in the file in some specified order

77
9.9 HASHING TECHNIQUES:
Hashing is a type of primary file organization, which provides very fast access to
records on certain search conditions. This organization is called as hash file.

The idea behind the hashing is to provide a function h, called a hash function or
randomizing function that is applied to the hash filed value of a record and yields
the address of the disk block in which the record is stored. A search for the record
within the block can be carried out in a main memory buffer.
The hash function is given by:

H(k)=K mod M

M-size of the bucket

H(k)- hash functions

K- search key values

A bucket is a unit of storage containing one or more records (a bucket is typically


a disk block).

In a hash file organization we obtain the bucket of a record directly from its
search-key value using a hash function.

Hash function h is a function from the set of all search-key values K to the set of
all bucket addresses B.

Hash function is used to locate records for access, insertion as well as deletion.
Records with different search-key values may be mapped to the same bucket;
thus entire bucket has to be searched sequentially to locate a record.

DISTRIBUTION OF HASH FUNCTIONS:


An ideal hash function is uniform, i.e., each bucket is assigned the same number
of search-key values from the set of all possible values.

Ideal hash function is random, so each bucket will have the same number of
records assigned to it irrespective of the actual distribution of search-key values in
the file.
7
8
Handling of Bucket Overflows

Bucket overflow can occur because of insufficient buckets


Skew in distribution of records. This can occur due to two reasons: multiple
records have same search-key value chosen hash function produces non-uniform
distribution of key values.

Although the probability of bucket overflow can be reduced, it cannot be


eliminated; it is handled by using overflow buckets.

Overflow chaining – the overflow buckets of a given bucket are chained together
in a linked list. Above scheme is called closed hashing.

An alternative, called open hashing, which does not use overflow buckets, is not
suitable for database applications.

4.5.1. DEFICIENCIES OF STATIC HASHING


In static hashing, function h maps search-key values to a fixed set of B of bucket
addresses. Databases grow or shrink with time.

If initial number of buckets is too small, and file grows, performance will degrade
due to too much overflows.

If space is allocated for anticipated growth, a significant amount of space will be


wasted initially (and buckets will be under full).

If database shrinks, again space will be wasted.


One solution: periodic re-organization of the file with a new hash function
Expensive, disrupts normal operations

Better solution: allow the number of buckets to be modified dynamically. 7


9
DYNAMIC HASHING:
USE OF EXTENDABLE HASH STRUCTURE

Each bucket j stores a value ij

All the entries that point to the same bucket have the same values on the first ij
bits.

To locate the bucket containing search-key Kj:

Compute h(Kj) = X

Use the first i high order bits of X as a displacement into bucket address table, and
follow the pointer to appropriate bucket

To insert a record with search-key value Kj

follow same procedure as look-up and locate the bucket, say j.

If there is room in the bucket j insert record in the bucket.

Else the bucket must be split and insertion re-attempted.

Hash structure after insertion of one Brighton and two Downtown records

8
0
The main advantage of the extendible hashing is that the performance of the
fields does not degrade as the file grows, as opposed to static external hashing

where collisions increase and the corresponding chaining causes the additional

accesses. In addition no space is allocated in extendible hashing for future

growth, but additional buckets can be allocated dynamically as needed.

Other hashing techniques are given below:

Folding involves applying an arithmetic function such as addition or a logical


function such as exclusive or to different portions of the hash field value to

calculate the hash address. Another technique involves picking some digits of

the hash field value – for example, the third, fifth and eighth digits to form the

hash address.

The problem with most hashing functions is that they do not guarantee that

distinct values will hash to addresses, because the hash field space (the number

of possible values a hash field can take) is usually much larger than the address

space (the number of available addresses for records).

A collision occurs when the hash field value of a record that is being inserted

hashes to an address that already contains a different record. In this situation the
new record must be inserted in some other position, since its hash address is

occupied. The process of finding another position is called collision resolution.

There are numerous methods for collision resolution as given below:

8
1
Open addressing: proceeding from the occupied position specified by

the hash address, the program checks the subsequent positions in order
until an unused (empty) position is found. The below algorithm may be used

for this purpose.

Algorithm:
i  hash address (k);

a i;

if location i is occupied

then begin i  (i + 1) mod M;

while (i <> a) and location I is occupied do i (i +1) mod M;

if (i = a) then all positions are full

else new_hash_address i;

end;

8
2
9.10 RAID

Redundant Arrays of Independent Disks:

RAID provides disk organization techniques that manage a large numbers of


disks, providing a view of a single disk, which aims
high reliability by storing data redundantly, so that data can be recovered
even if a disk fails and

high capacity and high speed by using multiple disks in parallel.

Improvement of reliability via Redundancy:


If only one copy of the data is stored, then each disk failure will result in loss of a
significant amount of data. Such a high rate of data loss is unacceptable. The
solution to the problem of reliability is to introduce redundancy; this
technique is called mirroring (or, sometimes shadowing).

Improvement in Performance via Parallelism:


Consider the benefit of parallel access to multiple disks. With disk mirroring, the
rate at which read requests can be handled is doubled, since read requests can be
sent to either disk. The transfer rate of each read is the same as in a single-disk
system but the number of reads per unit time has doubled.

Stripping of Data:

With multiple disks, the transfer rate can be improved as well by striping data
across multiple disks.

Bit-level striping:
Data striping consists of splitting the bits of each byte across multiple disks. Such
striping is called bit-level striping.

8
3
Block-level striping:

stripes blocks across multiple disks. It treats the array of disks as a single large
disk, and it gives blocks logical numbers. It is assumed that the block numbers
start from 0.

RAID LEVELS
Mirroring provides high reliability, but it is expensive. Striping provides high data
transfer rates, but does not improve reliability. Various alternative schemes aim to
provide redundancy at lower cost by combining disk striping with "parity" bits.
These schemes have different cost- performance trade-offs. The schemes are
classified into RAID levels.

RAID level 0 refers to disk arrays with striping at the level of blocks, but
without any redundancy.

The below figure a, shows an array of size 4.

RAID Level l refers to disk mirroring with block striping. The above figure b
shows a mirrored organization that holds four disks worth of data.
RAID Level 2 known as memory-style error-correcting-code (ECC)
organization, employs parity bits. Memory systems have long used parity bits for
error detection and correction.

8
4
Usually each byte in a memory system may have a parity bit associated
with it that records whether the number of bits in the byte that are set to 1 is
even (parity = 0) or odd (parity = 1). If one of the bits in the byte gets damaged
(either a 1 becomes a 0, or a 0 becomes a 1), the parity of the byte changes and
thus will not match the stored parity.

The idea of error-correcting codes can be used directly in disk arrays by striping
bytes across disks.

The below figure c shows the level 2 scheme. The disks labeled P store the
error correction bits. If one of the disks fails, the remaining bits of the byte
and the associated error- correction bits can be read from other disks, and can be
used to reconstruct the damaged data.

The RAID level 2 requires only three disks overhead for four disks of data,
unlike RAID level 1, which required four disks overhead.

RAID level 3, bit-interleaved parity organization, improves on level 2 by


exploiting the fact that disk controllers, can detect whether a sector has been
read correctly. So, single parity bit can be used for error correction, as well as for
detection. If the parity of the remaining bits is equal to the stored parity, the
missing bit is 0. Otherwise, it is 1. The above figure d shows the RAID level 3.

8
5
RAID level 3 is as good as level 2, but is less expensive in the number of extra
disks (it has only a one-disk overhead), so level 2 is not used in practice.

RAID level 3 has two benefits over level 1. It needs only one parity disk for
several regular disks, whereas Level l needs one mirror disk for every disk, and
thus reduces the storage overhead.

RAID level 4, block-interleaved parity organization, uses block level


striping, like RAID 0, and in addition keeps a parity block on a separate
disk for corresponding blocks from N other disks. This scheme is shown pictorially
in the Figure e. If one of the disks fails, the parity block can be used with the
corresponding blocks from the other disks to restore .the blocks of the failed disk.

RAID level 5, block-interleaved distributed parity, improves on level


4 by partitioning data and parity among all N + 1 disks, instead of storing
data in N disks and parity in one disk. In level 5, all disks can participate in
satisfying read requests, unlike RAID level 4, where the parity disk cannot
participate, so level 5 increases the total number of requests that can be met in a
given amount of time. For each set of N logical blocks, one of the disks stores the
parity, and the other N disks store the blocks. The Figure f shows the setup. The
P's are distributed across all the disks.

8
6
RAID level 6, the P + Q redundancy scheme is much like RAID level 5.

It stores extra redundant information to guard against multiple disk fail Instead of
using parity; level 6 uses error-correcting codes such as the Reed Solomon codes.

The scheme is shown in the below figure. 2 bits of redundant data are stored
for every 4 bits of data-unlike 1 parity in level 5 and the system can tolerate two
disk failures.

8
7
Choice of RAID Level
The factors to be taken into account when choosing a RAID level are
Monetary cost of extra disk storage requirements

Performance requirements in terms of number of I/O operations


Performance when a disk has failed

Performance during rebuild


HARDWARE ISSUES

SOFTWARE RAID:

Another issue in the choice of RAID implementations is at the level of hardware.


RAID can be implemented with no change at the hardware level, using only
software modification. Such RAID implementations are called software RAID.

HARDWARE RAID:

Systems with special hardware support are called hardware RAID systems.

HOT SWAPPING:
Some hardware RAID implementations permit hot swapping; that is, faulty disks
can be removed and replaced by new ones without turning power off. Hot
swapping reduces the mean time to repair.

8
8
9.11 INDEXING TECHNIQUES:

Introduction:
Database system indices play the same role as book indices or card catalogs in
the libraries. For example, to retrieve an account record given the account
number, the database system would look up an index to find on which disk block
the corresponding record resides, and then fetch the disk block, to get the
account record.

There are two basic kinds of indices:

Ordered indices:

Based on a sorted ordering of the values.

Hash indices:

Based on a uniform distribution of values across a range of buckets. The bucket


to which a value is assigned is determined by a function called a hash function.

Several techniques exist for both ordered indexing and hashing. No one technique is
the best. Rather, each technique is best suited to particular database applications.

Each technique must be evaluated on the basis of these factors:

Access types:

Access types can include finding records with a specified attribute value and
finding records, whose attribute values fall in a specified range.

Access time:

The time it takes to find a particular data item, or set of items using the technique in
question.

8
9
Insertion time:

The time it takes to insert a new data item.

Deletion time:

The time it takes to delete a data item.

Space overhead:

additional space occupied by an index structure.

Search Key :

Attribute to set of attributes used to look up records in a file. An index file


consists of records (called index entries) of the form

search-key pointer

ORDERED INDICES
To gain fast random access to records in a file, an index structure is used. Each
index structure is associated with a particular search key. Just like the index of a
book or a library catalog an ordered index stores the values of the search keys in
sorted order, and associates with each search key the records that contain it.
Ordered indices can be categorized as primary index and secondary index.

If the file containing the records is sequentially ordered, a primary index is an


index whose search key also defines the sequential order of the file. (The term
primary index is sometimes used to mean an index on a primary key).

Primary indices are also called clustering indices. The search key of a primary
index is usually the primary key, although that is not necessarily so.

Indices whose search key specifies an order different from the sequential order of
the file are called secondary indices, or non clustering indices.

9
0
PRIMARY INDEX
In this index, it is assumed that all files are ordered sequentially on some search
key. Such files, with a primary index on the search key, are called index-sequential
files. They represent one of the oldest index schemes used in database systems.
They are designed for applications that require both sequential processing of the
entire file and random access to individual records.

The Figure: 4.24 show a sequential file of account records taken from the banking
example. In the example figure, the records are stored in search-key order, with
branch-name used as the search key.

Figure: 4.24 – Sequential file for account records

Dense and Sparse Indices

An index record, or index entry, consists of a search-key value, and pointers to


one or more records with that value as their search-key value. The pointer to a
record consists of the identifier of a disk block and an offset within the disk block to
identify the record within the block.

There are two types of ordered indices that can be used:

9
1
Two types of ordered: Dense and Sparse Indices
DENSE INDEX
Dense index: an index record appears for every search-key value in the file. In a
dense primary index, the index record contains the search-key value and a pointer
to the first data record with that search-key value.

Implementations may store a list of pointers to all records with the same search-
key value; doing so is not essential for primary indices. The below figure, show
the dense index for the account file.

Figure: Dense Index.

SPARSE INDEX:
An index record appears for only some of the search-key values. To locate a
record we find the index entry with the largest search-key value that is less than
or equal to the search key value for which we are looking. We start at the record
pointed to by that index entry, and follow the pointers in the file until we find the
desired record.

9
2
The below figure show the sparse index for the account file.

Figure:Sparse Index

Suppose that we are looking up records for the Perryridge branch. Using the dense

index we follow the pointer directly to the first Perryridge record. We process this

record and follow the pointer in that record to locate the next record in search-key

(branch-name) order. We continue processing records until we encounter a record

for a branch other than Perryridge. If we are using the sparse index, we do not find

an index entry for "Perryridge". Since, the last entry (in alphabetic order) before

"Perryridge" is "Mianus" we follow that pointer. We then read the account file in

sequential order until we find the first Perryridge record, and begin processing at

that point.

Thus, it is generally faster to locate a record in a dense index; rather than a sparse

index. However, sparse indices have advantages over dense indices in that they

require less space and they impose less maintenance overhead for insertions and

deletions.

There is a trade-off that the system designer must make between access time and

space overhead. Although the decision regarding this trade-off depends on the spe-

cific application, a good compromise is to have a sparse index with one index entry

per block. The reason this design is a good trade-off is that the dominant cost in

processing a database request is the time that it takes to bring a block from disk into

main memory. Once we have brought in the block, the time to scan the entire block

is negligible. Using this sparse index, we locate the block containing the record that

9
3
we are seeking. Thus, unless the record is on an overflow block, we minimize block

accesses while keeping the size of the index as small as possible.

9.12 MULTI LEVEL INDICES


Even if the sparse index is used, the index itself may become too large for efficient

processing. It is not unreasonable, in practice, to have a file with 100,000 records,

with 10 records stored in each block. If we have one index record per block, the

index has 10,000 records. Index records are smaller than data records, so let us

assume that 100 index records fit on a block. Thus, our index occupies 100 blocks.

Such large indices are stored as sequential files on disk.

If an index is sufficiently small to be kept in main memory, the search time to find

an entry is low. However, if the index is so large that it must be kept on disk, a

search for an entry requires several disk block reads. Binary search can be used on

the index file to locate an entry, but the search still has a large cost. If overflow

blocks have been used, binary search will not be possible. In that case, a sequential

search is typically used, and that requires b block reads, which will take even longer.

Thus, the process of searching a large index may be costly.

To deal with this problem, we treat the index just as we would treat any other

sequential file, and construct a sparse index on the primary index, as in the below

figure To locate a record, we first use binary search on the outer index to find the

record for the largest search-key value less than or equal to the one that we desire.

The pointer points to a block of the inner index. We scan this block until we find the

record that has the largest search-key value less than or equal to the one that we

desire. The pointer in this record points to the block of the file that contains the

record for which we are looking

9
4
Figure: Two-level Sparse Index

Using the two levels of indexing, we have read only one index block, rather than
the seven we read with binary search, if we assume that the outer index is
already in main memory. If our file is extremely large, even the outer index may
grow too large to fit in main memory. In such a case, we can create yet another
level of index. Indices with two or more levels are called multilevel indices.
Searching for records with a multilevel index requires significantly fewer I/O
operations than does searching for records by binary search.

A typical dictionary is an example of a multilevel index in the non database world.


The header of each page lists the first word alphabetically on that page. Such a book
index is a multilevel index: The words at the top of each page of the book index
form a sparse index on the contents of the dictionary pages.

INDEX UPDATE
Regardless of what form of index is used, every index must be updated whenever
a record is either inserted into or deleted from the file. These are the algorithms
used for updating single level indices.

INSERTION:
First, the system performs a lookup using the search-key value that appears in
the record to be inserted. Again, the actions the system takes next depend on
whether the index is dense or sparse:

9
5
DENSE INDICES:

If the search-key value does not appear in the index, the system inserts an index
record with the search-key value in the index at the appropriate position.

Otherwise the following actions are taken:

If the index record stores pointers to all records with the same search- key value,
the system adds a pointer to the new record to the index record.

Otherwise, the index record stores a pointer to only the first record with the
search-key value. The system then places the record being inserted after the
other records with the same search-key values.

SPARSE INDICES:
We assume that the index stores an entry for each block. If the system creates a
new block, it inserts the first search-key value (in search-key order) appearing in
the new block into the index. On the other hand, if the new record has the least
search-key value in its block, the system updates the index entry pointing to the
block; if not, the system makes no change to the index.

DELETION.

To delete a record, the system first looks up the record to be deleted. The actions
the system takes next depend on whether the index is dense or sparse.

DENSE INDICES:
1. If the deleted record was the only record with its particular search-key
value, then the system deletes the corresponding index record from the
index.

9
6
2. Otherwise the following actions are taken:
If the index record stores pointers to all records with the same search-
key value, the system deletes the pointer to the deleted record from
the index record.

Otherwise, the index record stores a pointer to only the first record with
the search-key value.

SPARSE INDICES:
1. If the index does not contain an index record with the search-key value of
the deleted record, nothing needs to be done to the index.

2. Otherwise the system takes the following actions:


If the deleted record was the only record with its search key, the
system replaces the corresponding index record with an index record
for the next search-key value (in search-key order).

Otherwise, if the index record for the search-key value points to record
being deleted, the system updates the index record to point to the next
record with the same search-key value.

9
7
SECONDARY INDICES
Secondary indices must be dense, with an index entry for every search-key value,
and, a pointer to every record in the file. A primary index may be sparse, storing
only some of the search-key values, since it is always possible to find records with
intermediate, search-key values by a sequential access to a part of the file. If a
secondary index stores only some of the search-key values, records with
intermediate search-key values may be anywhere in the file and, in general, we
cannot find them without searching the entire file.

The pointers in such a secondary index do not point directly to the file. Instead,
each points to a bucket that contains pointers to the file. The below figure
shows the structure of a secondary index that uses an extra level of indirection on
the account file, on the search key balance.

SQL on INDEX:

Create an index

create index <index-name> on <relation-name> (<attribute-list>)


E.g.: create index b-index on branch(branchname)

Dropping of index: drop index <index-name>

9
8
9.13 B+ TREE AND B TREE
B+ Trees
Main disadvantage of index sequential file is that performance degrades as file
grows. Frequent reorganizations are undesirable

B+ trees are most widely used index structure that maintains efficiency.
Remember that a tree: Balanced tree: all leafs at the same level:

Fig: B+ tree is a balanced tree

ADVANTAGE OF B+-TREE INDEX FILES:


Automatically reorganizes itself with small, local, changes, in the face of insertions
and deletions.

Reorganization of entire file is not required to maintain performance.

(MINOR) DISADVANTAGE OF B+ TREES:

Extra insertion and deletion overhead, space overhead

PROPERTIES OF B+ TREE:

All paths from root to leaf are of the same length

Each node that is not a root or a leaf has between n/2 and n children. A leaf
node has between 2 to m values

A B+-tree is a rooted tree satisfying the following properties.

9
9
TWO TYPES OF NODES:
Leaf nodes: Store keys and pointers to data
Index nodes: Store keys and pointers to other nodes Leaf nodes are linked to
each other.
Keys may be duplicated: Every key to the right of a particular key is >= to that
key.
Typical structure of the Node

Ki are the search-key values


Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets
of records (for leaf nodes).
The search-keys in a node are ordered
K1 <K2 <K3 <. . .<Kn–1

PROPERTIES OF LEAF NODE:

For i = 1, 2, . . ., n–1, pointer Pi either points to a file record with search -key
value Ki, or to a bucket of pointers to file records, each record having search-key
value Ki.

If Li, Lj are leaf nodes and i <j, Li’s search-key values are less than Lj’s search-
key values

Pn points to next leaf node in search-key order The search-keys in a leaf node
are ordered

K1 <K2 <K3 <. . .<Kn–1

1
0
0
Example For B+ Tree:

UPDATES ON B+TREE

1. Find the leaf node in which the search-key value would appear
2. If the search-key value is already present in the leaf node

Add record to the file

If necessary add a pointer to the bucket.

3. If the search-key value is not present, then.

add the record to the main file (and create a bucket if necessary)

If there is room in the leaf node, insert (key-value, pointer) pair in the
leaf node

Otherwise, split the node (along with the new (key-value, pointer) entry

4. Splitting A Leaf Node:

take the n (search-key value, pointer) pairs (including the one being inserted) in
sorted order. Place the first n/2 in the original node, and the rest in a new node.

let the new node be p, and let k be the least key value in p. Insert (k,p) in the
parent of the node being split.

If the parent is full, split it and propagate the split further up.

5. Splitting of nodes proceeds upwards till a node that is not full is found.
In the worst case the root node may be split increasing the height of the tree
by 1.

1
0
1
Fig: Splitting A Leaf Node

Result of splitting node containing Brighton and Downtown on inserting Clear view

Next step: insert entry with (Downtown, pointer-to-new-node) into parent

UPDATTION OF B+TREE: INSERTION

Fig: B+Tree before and after insertion of “Clear view”

1
0
2
UPDATION OF B+TREE: DELETION

Find the record to be deleted, and remove it from the main file and from the
bucket (if present)

Remove (search-key value, pointer) from the leaf node if there is no bucket or if
the bucket has become empty

If the node has too few entries due to the removal, and the entries in the node
and a sibling fit into a single node, then merge siblings:

Insert all the search-key values in the two nodes into a single node (the one on
the left), and delete the other node.

Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted node, from its
parent, recursively using the above procedure.

Fig: Before and after deleting “Downtown

1
0
3
9.14 B TREE:
Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates
redundant storage of search keys.

Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer
field for each search key in a nonleaf node must be included.

GENERALIZED B-TREE LEAF NODE

Fig: Generalized B-tree Leaf Node


Non leaf node – pointers Bi are the bucket or file record pointers.
B Tree Example,

Fig: B Tree

Advantages of B-Tree indices:

May use less tree nodes than a corresponding B+-Tree.

Sometimes possible to find search-key value before reaching leaf node.

Disadvantages of B-Tree indices:

Only small fraction of all search-key values are found early

Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have greater
depth than corresponding B+-Tree

Insertion and deletion more complicated than in B+-Trees

Implementation is harder than B+-Trees.

1
0
4
10. ASSIGNMENTS
1. Consider the following schedules. The actions are listed in the order they are
scheduled, and prefixed with the transaction name.

S1: T2:R(Z), T2:R(Y), T2:W(Y), T3:R(Y), T3:R(Z), T1:R(X), T1:W(X), T3:W(Y),

T3:W(Z), T2:R(X), T1:R(Y) , T1:W(Y), T2:W(X)

S2: T3:R(Y), T3:R(Z), T1:R(X), T1:W(X), T3:W(Y), T3:W(Z), T2:R(Z), T1:R(Y),

T1:W(Y), T2:R(Y), T2:W(Y), T2:R(X), T2:W(X)

For each of the schedules, answer the following questions:

i) What is the precedence graph for the schedule?


ii)Is the schedule conflict-serializable? If so, what are all the conflict equivalent serial
schedules?

iii)Is the schedule view-serializable? If so, what are all the view equivalent serial
schedules?

Solution:
The actions listed out for Schedule S1 and S2 are be written as
Schedule S1 Schedule S2

T1 T2 T3 T1 T2 T3
R(z) R(y)
R(y) R(z)
W(y) R(x)
R(y) W(x)
R(z) W(y)
R(x) W(z)
W(x) R(z)
W(y) R(y)
W(z) W(y)
R(x) R(y)
R(y) W(y)
W(y) R(x)
W(x) W(x)
10. ASSIGNMENTS

2. Consider the following schedules. The actions are listed in the order they are

Scheduled

S1: R1(X), R3(X), W1(X), R2(X), W3(X)

S2: R3(X), R2(X), W3(X), R1(X), W1(X)

For each of the schedules, answer the following questions:

i) What is the precedence graph for the schedule?

ii) Which of the following are conflict serializable schedule , Find

the equivalent serial schedule

Solution:
The actions listed out for Schedule S1 and S2 are be written as
Schedule S1 Schedule S2
T1 T2 T3 T1 T2 T3
R(x) R(x)

R(x) R(x)
W(x)
W(x)
R(x)
R(x)
W(x) W(x)
10. ASSIGNMENTS
(i) PRECEDENCE GRAPH
Schedule S1
The Precedence graph for Schedule S1 consists of the following edges
T2 ->T3 because,
T2 executes R(z) before T3 executes W(z)
T2 executes R(y) before T3 executes W(y)
T2 executes W(y) before T3 executes W(y)

T2->T1 because,
T2 executes R(y) before T1 executes W(y)
T3->T1 because,
T3 executes W(y) before T1 executes R(y)
T3 executes R(y) before T1 executes W(y)
T1->T2 because,
T1 executes R(x) before T2 executes W(x)
T1 executes W(x) before T2 executes W(x)

Schedule S2
The Precedence graph for Schedule S2 consists of the following edges

T3 ->T1 because,
T3 executes R(y) before T1 executes W(y)
T3 executes W(y) before T1 executes W(y)

T3->T2 because,
T3 executes R(y) before T2 executes W(y)
T3 executes W(y) before T2 executes W(y)
T3 executes W(z) before T2 executes R(z)

T1->T2 because,
T1 executes R(x) before T2 executes W(x)
T1 executes W(x) before T2 executes W(x)
T1 executes W(x) before T2 executes R(x)
T1 executes R(y) before T2 executes R(y)
T1 executes W(y) before T2 executes W(y)
10. ASSIGNMENTS

ii) TEST FOR CONFLICT SERIALIZABILITY

 If the precedence graph for a schedule contains a cycle, the schedule is not
Conflict Serializable.
 If the precedence graph for a schedule does not contain cycle, the schedule is
ScC
hoendfu
licle
t SSe1rializable.

Schedule S1
The precedence graph for Schedule S1 contains cycles.
So, the Schedule S1 is not Conflict Serializable.

Schedule S2
The precedence graph for Schedule S2 does not contain cycles.
So, the Schedule S2 is Conflict Serializable.

The Conflict Equivalent Serial Schedule is T3 -> T1 -> T2 (i.e. T3 followed by T1


followed by T2)
This is determined using topological sorting (start with vertex with indegree=0)

iii) TEST FOR VIEW SERIALIZABILITY

 If a schedule is conflict serializable, then it will be view serializable


 If a schedule is not conflict serializable and contains blind writes, then it will be
view serializable

Schedule S1
The Schedule S1 is not conflict serializable and and does not contain Blind Writes,
so it is not View Serializable

Schedule S2

The Schedule S1 is Conflict Serializable. So, Schedule S2 is View Serializable.

The View Equivalent Serial Schedule is T3->T1->T2


10. ASSIGNMENTS

(i) PRECEDENCE GRAPH

Schedule S1
The Precedence graph for Schedule S1 consists of the following edges
T1 ->T3 because,

T1 executes R(x) before T3 executes W(x)


T1 executes W(x) before T3 executes W(x)

T3->T1 because,
T3 executes R(x) before T1 executes W(x)
T1->T2 because,
T1 executes W(x) before T2 executes R(x)
T2->T3 because,
T2 executes R(x) before T3 executes W(x)

Schedule S2
The Precedence graph for Schedule S2 consists of the following edges

T3 ->T1 because,
T3 executes R(x) before T1 executes W(x)

T2->T1 because,
T2 executes R(x) before T1 executes W(x)

T2->T3 because,
T2 executes R(x) before T3 executes W(x)
10. ASSIGNMENTS

(ii) TEST FOR CONFLICT SERIALIZABILITY

 If the precedence graph for a schedule contains a cycle, the schedule is not
Conflict Serializable.

 If the precedence graph for a schedule does not contain cycle, the schedule is
Conflict Serializable.

Schedule S1

The precedence graph for Schedule S1 contains cycles.

So, the Schedule S1 is not Conflict Serializable.

Schedule S2

The precedence graph for Schedule S2 does not contain cycles.

So, the Schedule S2 is Conflict Serializable.

The Conflict Equivalent Serial Schedule is T2 -> T3 -> T1 (i.e. T2 followed by T3


followed by T1)

This is determined using topological sorting (start with vertex with indegree=0)
10. ASSIGNMENTS

3. Which of the following schedules is conflict serializable? For each serializable


schedule, determine the equivalent serial schedules:

(a) r1(X), r3(X), w1(X), r2(X), w3(X).


(b) r1(X), r3(X), w3(X), w1(X), r2(X).
(c) r3(X), r2(X), w3(X), r1(X), w1(X).
(d) r3(X), r2(X), r1(X), w3(X), w1(X).

Solution:
(a)r1(X), r3(X), w1(X), r2(X), w3(X).
There are two cycles. It is not conflict serializable.

(b) r1(X), r3(X), w3(X), w1(X), r2(X).


There is one cycle. It is not conflict serializable.
10. ASSIGNMENTS

3. Which of the following schedules is conflict serializable? For each serializable


schedule, determine the equivalent serial schedules:

(a) r1(X), r3(X), w1(X), r2(X), w3(X).


(b) r1(X), r3(X), w3(X), w1(X), r2(X).
(c) r3(X), r2(X), w3(X), r1(X), w1(X).
(d) r3(X), r2(X), r1(X), w3(X), w1(X).

Solution:
(c ) r3(X), r2(X), w3(X), r1(X), w1(X).
There are NO cycles. This schedule is serializable

The schedule is equivalent to:


r2(X), r3(X), w3(X), r1(X), w1(X).
(T2 -> T3 -> T1)

(d) r3(X), r2(X), r1(X), w3(X), w1(X).


There is one cycle. It is not conflict serializable.
10. ASSIGNMENTS
1. Describe the structure of B+ tree. Construct a B+ tree to insert the following
numbers (order of the tree is 4) 3, 2, 5, 7, 6, 23, 24, 35, 67, 44, 43, 42, 17,
18,19. (CO4, K3)

2. Explain the structure of B Tree. Construct a B tree to insert the following (order
of the tree is 3) 25, 27, 28, 3, 4, 8, 9, 46, 48, 50, 2, 6.

(CO4, K2)

2. Construct B tree and B+ tree to insert the following key values(the order of the
tree is three) 32,11,15,13,7,22,15,44,67,4. (CO4, K2)

3. Suppose that we are using extendible hashing on a file that contains records with
the following search key values: 3, 5,7, 11, 17, 19, 23, 29, 31. Show that the
extendible hash structure for this file if the hash function is h(x)=x mod 7 and
bucket can hold three records. (CO4, K2)

4. The following key values are organized in an extendable hashing technique. 1 3 5


8 9 12 17 28 Show the extendable hash structure for this file if the hash function
is h(x)=x mod 8 and buckets can hold three records. Show how the
extendable hash structure changes as the result of each of the following steps:

(CO4, K2)
INSERT 2
INSERT 2
DELETE 5
DELETE 12
11. Part A Question & Answer

S.No Question and Answers CO K


1 Define Transactions CO4 K1

Collections of operations that form a single logical unit of work are


called transactions. A database system must ensure proper
execution of transactions despite failures—either the entire
transaction executes, or none of it does.
2 What are various states of a transaction? CO4 K1

• Active, the initial state; the transaction stays in this state while it
is executing
• Partially committed, after the final statement has been
executed
• Failed, after the discovery that normal execution can no longer
proceed
• Aborted, after the transaction has been rolled back and the
database has been restored to its state prior to the start of the
transaction
• Committed, after successful completion
3 List the Desirable Properties of Transactions CO4 K1

• Atomicity: Either all operations of the transaction are reflected


properly in the database or none are.
• Consistency: Execution of transaction in isolation (that is, with
no other transaction executing concurrently) preserves the
consistency of the database.
• Isolation: Even though multiple transactions may execute
concurrently, the system guarantees that, for every pair of
transactions Ti and Tj, it appears to Ti that either Tj finished
execution before Ti started, or Tj started execution after Ti
finished.
• Durability: After a transaction completes successfully, the
changes it has made to the database persist, even if there are
system failures.
These properties are often called the ACID properties; the acronym
is derived from the first letter of each of the four properties.

4 What is commit point? CO4 K1

A transaction T reaches its commit point when all its operations that
access the database have been executed successfully and the effect
of all the transaction operations on the database have been
recorded in the log.
11. Part A Question & Answer

S.No Question and Answers CO K


5 What are the two types of transaction? CO4 K1

The two types of transactions are


(i) global transaction
(ii) local transaction
The global transactions are those that access and update data in
several local databases. The local transactions are those that access
and update date in only one local database.
6 Define Log? CO4 K1

The log is a sequential, append-only file that is kept on disk, so it is


not affected by any type of failure except for disk or catastrophic
failure. The system maintains a log to keep track of all transaction
operations that affect the values of database items.
1. [start_transaction, T]. Indicates that transaction T has started
execution.
2. [write_item, T, X, old_value, new_value]. Indicates that
transaction T has changed the value of database item X from
old_value to new_value.
3. [read_item, T, X]. Indicates that transaction T has read the value
of database item X.
4. [commit, T]. Indicates that transaction T has completed
successfully, and affirms that its effect can be committed
(recorded permanently) to the database.
5. [abort, T]. Indicates that transaction T has been aborted.

7 What is meant by schedule in transactions ? CO4 K1

A schedule (or history) S of n transactions T1, T2, ..., Tn is an


ordering of the operations of the transactions. Operations from
different transactions can be interleaved in the schedule S. However,
for each transaction Ti that participates in the schedule S, the
operations of Ti in S must appear in the same order in which they
occur in Ti.

8 Define serializable schedule and view serializable? CO4 K1

The definition of serializable schedule is as follows: A schedule S of


n transactions is serializable if it is equivalent to some serial
schedule of the same n transactions. A schedule S is said to be view
serializable if it is view equivalent to a serial schedule.
11. Part A Question & Answer

S.No Question and Answers CO K


9 When do you say that two schedules are equivalent? CO4 K1

Two schedules are said to be equivalent if their results are


equivalent and if they produce the same final state of the database.
10 Define Cascadeless schedule and strict schedule: CO4 K1

A schedule is said to be cascadeless, or to avoid cascading


rollback, if every transaction in the schedule reads only items that
were written by committed transactions.
A schedule is said to be Strict schedule in which transactions can
neither read nor write an item X until the last transaction that wrote
X has committed (or aborted).
11 Classify the types of equivalence and serializability CO4 K1

Two definitions of equivalence of schedules are generally used:


• Conflict equivalence and conflict serailizability
• View equivalence and view serailizability
12 What is meant by concurrency control? CO4 K1

Concurrent control is a mechanism to ensure that several users


trying to update database in a controlled manner so that the result
of the updates is correct.
13 List the types of concurrency control mechanism used to CO4 K1
maintain concurrency.

1. Lock based protocols


2. Two phase locking protocols
3. Timestamp protocols
4. Multi-version concurrency control protocols
5. Multiple granularity concurrency control protocol

14 Define two-phase locking protocol. CO4 K1

A transaction is said to follow the two-phase locking protocol if all


locking operations precede the first unlock operation in the
transaction. Such a transaction can be divided into two phases.
i)an expanding or growing phase during which new locks on
items can be acquired but non can be released.
ii)A shrinking phase during which existing locks can be released
but no new locks can be acquired.
11. Part A Question & Answer

S.No Question and Answers CO K


15 Define – Lock. CO4 K1

A lock is a variable associated with a data item that describes the


status of the item with respect to possible operations that can be
applied to it.
16 Define Binary Locks CO4 K1

A binary lock can have two states or values: locked and unlocked (or
1 and 0, for simplicity). A distinct lock is associated with each
database item X. If the value of the lock on X is 1, item X cannot be
accessed by a database operation that requests the item. If the
value of the lock on X is 0, the item can be accessed when
requested, and the lock value is changed to 1.We refer to the
current value (or state) of the lock associated with item X as
lock(X).
17 Explain the modes in which data item may be locked. CO4 K1

There are two modes in which data item can be locked, they are:
(i) shared –if a transaction Ti obtains this lock on an item, then it
can read the item but not write
(ii) exclusive – if a transaction obtains this lock on an item, then it
can read as well as write item.
18 Define Deadlock CO4 K1

Deadlock occurs when each transaction T in a set of two or more


transactions is waiting for some item that is locked by some other
transaction T_ in the set. Hence, each transaction in the set is in a
waiting queue, waiting for one of the other transactions in the set to
release the lock on an item. But because the other transaction is
also waiting, it will never release the lock.
19 Name the schemes that prevent deadlock CO4 K1

The two schemes that use timestamps to prevent deadlock are:


(i) Wait-die
(ii) Wound –wait.
Another group of protocols that prevent deadlock but do not
require time stamps are:
(i) No waiting algorithms
(ii) Cautious waiting algorithms.
20 When does the problem of starvation occur in lock? CO4 K1

The problem of starvation occurs when a transaction cannot


proceed for an indefinite period of time while other transactions in
the system continue normally.
11. Part A Question & Answer

S.No Question and Answers CO K

What are the measures of the quality of disks?


The main measures of the qualities of a disk are Capacity,
access time, data transfer rate and reliability.
21 Access time is the time from when a read or write request is CO4 K1
issued to when data transfer begins.
The time for repositioning the arm is called the seek time.

Define rotational latency time and average latency time?


Once the seek has been started, the time spent waiting for the
sector to be accessed to appear under the head is called the
22 rotational latency time. CO4 K1
The average latency time of the disk is one-half the time for a
full rotation of the Disk.
Define the data-transfer rate and mean time to failure?
The data transfer rate is the rate at which data can be retrieved
from or stored to the disk.
23 The mean time to failure of a disk is the amount of time that, CO4 K1
on average, we can expect the system to run continuously
without any failure.
How can we measure the number of block
transfers?
24 CO4 K1
The number of block transfers is measured using:
the number of seek operations performed the number of blocks
read the number of blocks written
Write notes on magnetic disks
Magnetic disks provide bulk of secondary storage for computer
25 systems. Disks are relatively simple. Each disk platter has a flat CO4 K1
circular shape. The disk surface is divided into tracks, which are
sub divided into sectors.
What is meant by RAID?
A variety of disk-organization techniques, collectively called
26 redundant arrays of independent disks (RAID) is used to achieve CO4 K1
improved performance and reliability.

Define:
i) Bit-level striping ii) Block-level striping.
The process of splitting the bits of each byte across multiple
27 disks is called bit-level striping. CO4 K1
Block-level striping is the process of splitting blocks across
multiple disks.
11. Part A Question & Answer
S.No Question and Answers CO K
What factors should be taken into account
when choosing a RAID level?
The factors to be taken into account when choosing a RAID
level are
28 Monetary cost of extra disk storage requirements CO4 K1
Performance requirements in terms of number of I/O
operations
Performance when a disk has failed
Performance during rebuild.
Define the terms
records ii) files iii) types of records
Data is usually stored in the form of records, each consisting
of a collection of related data values.
29 A file is a sequence of records The two type of records are CO4 K1
Fixed length records – in which all records have the same
size.
Variable length records – in which different records have
different sizes.
List the possible ways of organizing records in
files.
The possible ways of organizing records in files are:
30 CO4 K1
Heap file organization
Sequential file organization
Hashing file organization
Clustering file organization
Explain i) heap file organization ii)
Sequential file organization.

In heap file organization, any record can be placed anywhere


in the file where there is space for the record. There is no
31 ordering of records. There is a single file for each relation. CO4 K1

In sequential file organization, records are stored in sequential


order according to the value of a “search key’ of each record.

What are the two types of


indices?
The two basic kinds of indices are:
32 Ordered indices- based on a sorted ordering of the values. CO4 K1
Hash indices – based on a uniform distribution of values across
a range of buckets. The bucket to which a value is assigned is
determined by a function, called a hash function.
11. Part A Question & Answer
S.No Question and Answers CO K
List the factors used to evaluate the
indexing and hashing techniques.
The factors that must be evaluated for indexing and hashing
33 technique are: access types, access time, insertion time CO4 K1
deletion time and space overhead

What are the index-sequential files? What are the two


types of ordered indices?
Files that are ordered sequentially with a primary index on the
search key are called index- sequential files.
34 The two types of ordered indices are CO4 K1
Dense index – an index record appears for every search-key
value in the file.
Sparse index – an index record appears for only some for the
search-key values.
Explain multilevel indices
Indices with two or more levels are called multilevel indices.
35 Searching for records with a multi level index requires CO4 K2
significantly fewer I/O operations than searching for records by
binary search.
What is a B+ - tree index?
A B + tree index is a balanced tree in which every path from
36 CO4 K1
the root of the tree to a leaf of the tree is of the same length.

What is the advantages of B-tree index files


over B+ tree index files?
B-trees eliminate the redundant storage of search key values,
37 CO4 K1
as it allows the search key values to appear only one, whereas
B+ trees have redundant storage of search-key values.

Write notes on hashing.


Hashing provides very fast access on records on search
conditions. The search condition must be an equality condition
on a single field, called hash field of the file. The hash field is
38 also a key field of the file, in which case it is called the hash CO4 K1
key. Hashing is used to search within a program whenever a
group of records are accessed exclusively by using the value of
one field.

Name the two types of hashing and any two hashing


functions.
The two types of hashing are:
39 internal hashing
external hashing Hashing function include CO4 K1
Folding
Picking some digits of the hash field value.
11. Part A Question & Answer
S.No Question and Answers CO K
What is Multilevel Index If primary index does not fit in
memory, access becomes expensive.
To reduce number of disk accesses to index records, treat
primary index kept on disk as a sequential file and construct a
40 sparse index on it. – outer index – a sparse index of primary CO4 K1
index – inner index – the primary index file If even outer index
is too large to fit in main memory, yet another level of index can
be created.

What are the types of


failure?
41 Transaction failure CO4 K1
System failure
Disk failure
What is known as heap file organization?
In the heap file organization, any record can be placed
anywhere in the file where there is space for the record. There
42 is no ordering of records. There is a single file for each CO4 K1
relation.

Draw the structure of B+-Tree Node .


Typical node – Ki are the search-key values – Pi are pointers
to children (for non-leaf nodes) or pointers to records
43 or buckets of records (for leaf nodes). The search-keys in
CO4 K1

a node are ordered K1 < K2 < K3 < . . . < Kn–1

What is External Sorting ?


It refers to sorting algorithms that are suitable for large files
of records on disk that do not fit entirely in main memory,
44 CO4 K1
such as most database files.
11. Part A Question & Answer

S.No Question and Answers CO K

What are the advantages and disadvantages of


indexed sequential file
Advantage
Quick accessing of records
45 Disadvantage CO4 K1

rewriting at least everything after the insertion point, which


makes inserts very expensive unless they are done at the end
of the file.
Define Bit-Interleaved Parity
When writing data, corresponding parity bits must also be
46 computed and written to a parity bit disk – To recover data inCO4 K1
a damaged disk, compute XOR of bits from other disks

What are the disadvantages of B-Tree over B+ Tree?


Only small fraction of all search-key values are found early
Non-leaf nodes are larger. Thus, B-Trees typically have greater
depth than corresponding B+-Tree Insertion and deletion more
47 complicated than in B+-Trees Implementation is harder than CO4 K1
B+-Trees.

How data in the damaged block is recovered in


Block- Interleaved Parity
When writing data block, corresponding block of parity bits
must also be computed and written to parity disk To find value
of a damaged block, compute XOR of bits from corresponding
48 blocks (including parity block) from other disks CO4 K1
S. No. Question and Answers CO K

What is meant by software and hardware RAID


systems?
RAID can be implemented with no change at the hardware
49 level, using only software modification. Such RAID CO4 K1
implementations are called software RAID systems and the
systems with special hardware support are called hardware
RAID systems.
Differentiate static and dynamic hashing.

50 CO4 K2

What are the ways in which the variable-


length records arise in database systems?
Storage of multiple record types in a file,
51 Record types that allow variable lengths for one CO4 K1
or more fields, Record types that allow
repeating fields.

What are the two types of blocks in the fixed –


length representation? Define them.

52 Anchor block: Contains the first record of a chain. CO4 K1


Overflow block: Contains the records other than those that are
the first record of a chain.
12. PART - B
S.No Question and Answers CO K
1 Explain the ACID properties of a transaction with examples CO4 K1

2 Explain the different states of a transaction. CO4 K1

3 State and explain the three concurrency problems. CO4 K1


4 What are schedules in transaction. What are the different schedule CO4 K1
available, illustrate with neat example.
5 Explain conflict and view serializability. CO4 K1
6 Explain two-phase Locking protocol. CO4 K1
7 What is deadlock? How does it occur? How transactions be written CO4 K1
to (i) Deadlock Prevention (ii) Deadlock Detection. Illustrate with
suitable examples.

8 Explain Transaction Recovery CO4 K1

9 Explain Save Points – Isolation Levels – SQL Facilities for CO4 K1


Concurrency and Recovery.

10 Consider the following schedules. The actions are listed in the order CO4 K1
they are scheduled, and prefixed with the transaction name.
S1: T2:R(Z), T2:R(Y), T2:W(Y), T3:R(Y), T3:R(Z), T1:R(X),
T1:W(X), T3:W(Y), T3:W(Z), T2:R(X), T1:R(Y) , T1:W(Y), T2:W(X)
S2: T3:R(Y), T3:R(Z), T1:R(X), T1:W(X), T3:W(Y), T3:W(Z),
T2:R(Z), T1:R(Y), T1:W(Y), T2:R(Y), T2:W(Y), T2:R(X), T2:W(X)
For each of the schedules, answer the following questions:
i) What is the precedence graph for the schedule?
ii) Is the schedule conflict-serializable? If so, what are all the
conflict equivalent serial schedules?
iii) Is the schedule view-serializable? If so, what are all the view
equivalent serial schedules?

11 Consider the following schedules. The actions are listed in the order CO4 K1
they are scheduled
S1: R1(X), R3(X), W1(X), R2(X), W3(X)
S2: R3(X), R2(X), W3(X), R1(X), W1(X)
For each of the schedules, answer the following questions:
What is the precedence graph for the schedule?
i)Which of the following are conflict serializable schedule , Find the
equivalent serial schedule
S. No. PART B CO K
List the different levels in RAID technology and explain its
12 CO4 K1
features.
Describe the different method of implementing variable length
13 CO4 K1
records.

Explain different properties of indexes in detail. Explain


14 CO4 K1
structure of file indices
Explain the various indexing schemes used in database
15 CO4 K1
environment.

16 Discuss about primary file storage system CO4 K1

17 Explain static and dynamic Hashing Techniques? CO4 K1

18 Briefly describe about B+ tree index file structure. CO4 K1

19 Explain in detail about B tree index files. CO4 K1

20 Discuss selection operation techniques CO4 K1


13. SUPPORTIVE ONLINE CERTIFICATION COURSES

Sl.No Name of the Name of the Course Website Link


. Institute

1. coursera Database https://www.coursera.org/learn/da


Management tabase-management
Essentials

2. coursera Database systems https://www.coursera.org/specializ


Specialization ations/database-systems
3. Udemy Introduction to https://www.udemy.com/course/d
Database atabase-engines-crash-course/
Engineering

4. Udemy Relational Database https://www.udemy.com/course/re


Design lational-database-design/
5. Udemy Database Design https://www.udemy.com/course/d
atabase-design/
6. Udemy Database Design https://www.udemy.com/course/c
Introduction wdatabase-design-introduction/
7. Udemy The Complete https://www.udemy.com/course/th
Database Design & e-complete-database-modeling-
Modeling Beginners and-design-beginners-tutorial/
Tutorial

8. Udemy Database Design https://www.udemy.com/course/c


and MySQL alebthevideomaker2-database-
and-mysql-classes/

9. NPTEL Data Base https://onlinecourses.nptel.ac.in/n


Management System oc21_cs04/preview
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
1. Finances
From the stock market to your local bank, databases are abundant across the financial
world. Tracking the vast amount of information behind the world’s daily transactions
requires extremely powerful databases. This includes financial models that analyze
that data to predict future activity.
Everyday Transactions
Most of us use databases daily and may not realize it. By understanding real-world,
daily activities, you will gain a better understanding of database usage and will be
able to better apply these concepts to understand the corporate world.
A couple of everyday transactions with which everyone is familiar are
Getting a prescription filled
Using a bank machine
Getting a Prescription Filled
The example walks you through a process everyone does at some point—having a
prescription filled.
As you are standing in your local pharmacy waiting for your prescription, you may not
think that this transaction is database intensive. But, if you were to take a closer look,
hundreds of gigabytes of data, maybe terabytes, are involved in this routine
transaction.
When you evaluate a transaction like this one, it is helpful to look at one layer of the
transaction at a time, peeling away successive layers, just like peeling an
onion. Figure shows the "layers" of this transaction that you will review.

Fig: Transaction
layers for a
prescription
purchase.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY

The Doctor's Office Databases


You can start by looking back to the beginning of this transaction—at the doctor's
office. A database must be kept of each patient, including data elements such as
name, address, phone number, insurance carrier, closest relative, medical history, and
family history. Billing records, payment receipts, appointment schedules, and
insurance filing information make up other databases that are used often.
The Pharmacy Databases
The actual pharmacy has database information that it maintains on a broad range of
subjects. The most obvious one is you—the customer. Every pharmacy will have,
minimally, a database of customer name and address information, customer date of
birth, prescribing physician and phone number, insurance carrier, and past prescription
history of each customer.
The insurance company, HMO, or PPO provides each pharmacy with another database
that must be used extensively. This database is called the formulary database. It
contains an approved list of brand drugs, approved generic substitutions, dosages,
and National Drug Code (NDC) information. Your insurance prescription card more
than likely has a specific formulary that the pharmacy must reference to ensure that
the drug used to fill your prescription, or an allowable substitution, will be covered by
your insurance. This type of database may also be used by the physician's office to
ensure that the drug they want to prescribe can be filled, without any substitution or
dosage change, when you get to the pharmacy.
Another database that is used prior to filling the prescription is the drug interaction
database. The prescribed drug may have serious interaction issues if used in
conjunction with medications currently in use. The database identifies potential
interaction problems, provides a detailed explanation of the interaction consequence,
and possibly suggests alternatives.
After the formulary and interaction databases have been utilized, the prescription is
ready to fill. The pharmacy inventory database is checked to determine whether the
NDC being requested is in stock and at what shelf location it can be found. The
inventory database must track expiration dates of each drug to prevent outdated
medication from being dispensed. After the prescription is filled, the available on-hand
inventory for your medication must be reduced by the quantity or volume of your
prescription.
If the medication is not available in stock, the pharmacy must search through its
wholesaler database to determine the best source for the drug. The wholesaler
database identifies each of the wholesalers from whom the needed drug can be
acquired, the cost from the wholesaler, and possibly the inventory availability at the
wholesaler.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY

The final database to be reviewed at the pharmacy is the order database. This
database contains all the outstanding orders that have been placed with all the
wholesalers. They may represent simple inventory replenishment orders or special
orders for drugs not normally stocked. If the orders are for narcotic items, or
Schedule 2 drugs, special order and tracking requirements must be met to satisfy the
requirements of the Drug Enforcement Administration (DEA). Figure shows the
database entities involved at the pharmacy layer.

Fig: Pharmacy layer databases.


14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
The Wholesaler's Database
As you continue to peel away layers in the transaction, you see that the next layer
represents the drug wholesaler. The pharmaceutical supply chain in the United States
is typically serviced by a wholesaler that operates between the manufacturer and the
retail or hospital pharmacy.
The wholesaler layer begins with an extensive customer database. Name, address,
billing address, and accounts receivable information are maintained. A customer who
represents a large chain, such as K-Mart, may have Ship To destinations all over the
country. Each of the Ship To locations must be maintained separately for order
tracking and for sales analysis purposes.
A separate database that identifies every drug and strength is used by the wholesaler
to ensure that the customer's intent on which drug they want to purchase from the
wholesaler is clear and, in turn, to make sure the drug the wholesaler purchases from
the manufacturer is explicitly identified. This database, by NDC number, changes
frequently with new manufacturers, generic suppliers, dosage changes, and the
constant introduction of new drugs. The management of this type of database has
spawned new businesses in the pharmaceutical industry because of its complexity and
volatility. Many wholesalers purchase this database with quarterly updates to ensure
that they have an up-to-date NDC database.
Most national and regional wholesalers have multiple distribution centers, or DCs,
located throughout the country or throughout the geographical area they serve. Each
of these DCs has an inventory database that is a focal point of their operations. The
inventory database must identify each item that the DC stocks, and for each item,
many inventory management attributes are tracked. Something as simple as the
picking location can include a forward picking location, a backup picking location, a
bulk location, a backup bulk location, packaging quantities, packaging dimensions,
packaging weights, expiration dates, and lot-tracking information along with inventory
quantities at each location. With thousands of items at a DC, you can begin to see the
complexity and size of the inventory database required for each DC within a
wholesaler's operation.
The wholesaler's database will always include, among other items, a shipping and
invoicing database and an accounts receivable database. Figure shows the database
entities involved at the wholesaler layer.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
The Manufacturer's Database
A manufacturer has many databases that are very similar to those of the wholesalers.
The supply-chain relationship between pharmacy and wholesaler is very similar to the
relationship between wholesaler and manufacturer. There are a few additional
databases, though, that are unique to the manufacturing process.
The first unique database you will look at in this layer of the transaction is the product
database. The product must contain a list of raw materials, or recipe ingredients, that
make up the product. Careful attention must also be given to the manufacturing
process. The FDA (Food and Drug Administration) expects every manufacturer to
stringently adhere to the recommended process to produce the drug. The database
has extensive instructions and quality control data for each step within the routing or
operational steps in the manufacturing process.
The other database that is unique to the manufacturer in this transaction is one that
tracks the capacity of each manufacturing process. Whether it is a material movement
operation, a mixing operation, an application of heat, or a packaging operation, each
operation has a finite limitation in terms of hours available and capacity. A complex
database is required to look at all the scheduled shop orders, retrieving the lot size of
each order, multiplying the shop order quantity by the routing quantity, and then
determining the work center or material-handling equipment necessary to complete
the operation. Each of these extended routing operations are aggregated by work
center and compared to the finite limitations noted earlier. The required due dates of
the shop orders are used as a starting point, and the entire process is backward
scheduled to determine when the shop order would need to begin to meet the
required completion date. When you factor in scheduled and unscheduled
maintenance, breakdowns, and required setups and quality inspections, the database
necessary to adequately evaluate capacity requirements and material requirements
planning is significant. Figure shows the database entities involved at the
manufacturer layer.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
Using Your Bank Machine (ATM)
The next example involves a transaction that takes only a few minutes to complete—
you are going to look at the databases that are used when you visit a bank and use
the ATM.
When you insert your ATM card into the bank machine, the first thing that must be
completed is to identify the account number of the user. The user may be using a
local bank or may be using another bank that is part of a participating network. The
bank must search its account databases and try to find a match within their system. If
this fails, the account number can be searched against a database that represents
participating banks.
After an account record is identified, the user is prompted for a PIN, or personal
identification number. The PIN is verified against a database entry. The transaction,
for obvious reasons, is canceled if the user does not supply a matching PIN.
After the PIN verification is completed, the account details are retrieved from the
database regarding the types of accounts that you have—checking, savings, or both.
The ATM prompts you for the type of transaction you are interested in completing.
Typical transactions are deposits, withdrawals, transfer savings to checking, and check
account balances. For this transaction, you are going to withdraw cash from your
checking account.
Now it's time for you to indicate how much cash you need. A couple of databases
must be accessed at this point. The first is easy—determine how much money you
currently have in your account. The second is more involved because most ATM
systems limit your withdrawal amounts in a 24-hour period. The system issues a SQL
select statement against the database to add the transaction amounts of all
withdrawal transactions for your account within the most recent 24 hours. If the daily
limit minus the returned summed amount is greater than your transaction amount,
you get the cash. If not, you will be limited as to how much can be withdrawn, or you
may not qualify to receive any cash.
Beyond the portion of the transaction that you see, a couple more databases are
being used. The transaction must be recorded, with account number, account type
(savings or checking), date, time of day, type of transaction, and amount. This is used
to post the transaction to your account and compute your current balance. This is the
database of transactions that you see on your statement at the end of your banking
period. The transaction is also used to post to the general ledger database for the
bank. If any ATM charges apply, these annoying charges are recorded in the
transaction database described previously.
The final database is composed of ACH (Automated Clearinghouse) transactions that
are forwarded to the Federal Reserve System so that banks can clear transactions
across the country. Each of these transactions are logged to a transaction database for
reconciliation purposes in the event of a failure within banking computer systems.
15. CONTENT BEYOND THE SYLLABUS
Timestamp based Protocol for Concurrency Control
Timestamp based protocol determines the serializability order to select an ordering
among transactions in advance.

The most common method for doing so is to use a timestamp-ordering scheme.

Timestamps
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted
by TS(Ti ).

This timestamp is assigned by the database system before the transaction Ti starts
execution. If a transaction Ti has been assigned timestamp TS(Ti ), and a new
transaction Tj enters the system, then TS(Ti ) < TS(Tj ).

There are two simple methods for implementing this scheme:


1. Use the value of the system clock as the timestamp
2.Use a logical counter that is incremented after a new timestamp has
been assigned;

To implement this scheme,we associate with each data item Q two timestamp
values:

•W-timestamp(Q) denotes the largest timestamp of any transaction that executed


write(Q) successfully.

• R-timestamp(Q) denotes the largest timestamp of any transaction that executed


read(Q) successfully.
These timestamps are updated whenever a new read(Q) or write(Q) instruction
is executed.

The Timestamp-Ordering Protocol


The timestamp-ordering protocol ensures that any conflicting read and write operations
are executed in timestamp order. This protocol operates as follows:

1. Suppose that transaction Ti issues read(Q).

a.If TS(Ti ) < W-timestamp(Q), then Ti needs to read a value of Q that was already
overwritten. Hence, the read operation is rejected, and Ti is rolled back.

b.If TS(Ti ) ≥ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is
set to the maximum of R-timestamp(Q) and TS(Ti ).
15. CONTENT BEYOND THE SYLLABUS
Timestamp based Protocol for Concurrency Control
2. Suppose that transaction Ti issues write(Q).
a.If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced.
Hence, the system rejects the write operation and rolls Ti back.

b. If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value of


Q. Hence, the system rejects this write operation and rolls Ti back.
c. Otherwise, the system executes the write operation and sets W-timestamp(Q) to
TS(Ti ).

If a transaction Ti is rolled back by the concurrency-control scheme as result of

issuance of either a read or write operation, the system assigns it a new timestamp

and restarts it.

The protocol can generate schedules that are not recoverable. However,
it can be extended to make the schedules recoverable, in one of several
ways:

Recoverability and cascadelessness can be ensured by performing all writes


together at the end of the transaction. The writes must be atomic in the
following sense: While the writes are in progress, no transaction is permitted
to access any of the data items that have been written.

Recoverability and cascadelessness can also be guaranteed by using a


limited form of locking, whereby reads of uncommitted items are postponed
until the transaction that updated the item commits

Recoverability alone can be ensured by tracking uncommitted writes, and


allowing a transaction Ti to commit only after the commit of any transaction
that wrote a value that Ti read.
15. CONTENT BEYOND THE SYLLABUS
Timestamp based Protocol for Concurrency Control
Thomas Write Rule
The modification to the timestamp-ordering protocol, called Thomas’ write rule, is
this:

Suppose that transaction Ti issues write(Q).


1.If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was
previously needed, and it had been assumed that the value would never be
produced. Hence, the system rejects the write operation and rolls Ti back.

2. If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete


value of Q. Hence, this write operation can be ignored.

3.Otherwise, the system executes the write operation and setsW-


timestamp(Q) to TS(Ti ).

Thomas Write Rule provides the guarantee of serializability order for the protocol.
It improves the Basic Timestamp Ordering Algorithm.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
Application and Uses of Database Management System (DBMS)

Railway Reservation System.


Library Management System
Banking System

Universities and colleges Management Systems


Credit card transactions.
Social Media Sites
Telecommunications
Finance Applications
15.CONTENT BEYOND SYLLABUS

Introduction to Hierarchical Database Model


Hierarchical Database Model, as the name suggests, is a database model in which
the data is arranged in a hierarchical tree edifice. As it is arranged based on the
hierarchy, every record of data tree should have at least one parent, except for the
child records in the last level, and each parent should have one or more child
records. The Data can be accessed by following through the classified structure,
always initiated from the Root or the first parent. Hence this model is named as
Hierarchical Database Model.

What is Hierarchical Database Model

It is a data model in which data is represented in the tree-like structure. In this


model, data is stored in the form of records which are the collection of fields. The
records are connected through links and the type of record tells which field is
contained by the record. Each field can contain only one value.

It must have only one parent for each child node but parent nodes can have more
than one child. Multiple parents are not allowed. This is the major difference
between the hierarchical and network database model. The first node of the tree is
called the root node. When data needs to be retrieved then the whole tree is
traversed starting from the root node. This model represents one- to- many
relationships.

Let us see one example: Let us assume that we have a main directory which
contains other subdirectories. Each subdirectory contains more files and directories.
Each directory or file can be in one directory only i.e. it has only one parent.
16. ASSESSMENT SCHEDULE

Assessment Type Proposed Date


Assessment 1 May 2023

Assessment 2 June 2023

Model Exam July 2023


17.PRESCRIBED TEXT BOOKS &REFERENCE BOOKS

TEXT BOOKS:

1. Abraham Silberschatz, Henry F. Korth, S. Sudharshan, ―Database System


Concepts, Sixth Edition, Tata McGraw Hill, 2011.

2. Ramez Elmasri, Shamkant B. Navathe, ―Fundamentals of Database Systems‖,


Sixth Edition, Pearson Education, 2011.

REFERENCES:

1. C.J.Date, A.Kannan, S.Swamynathan, ―An Introduction to Database Systems,


Eighth Edition, Pearson Education, 2006.

2. Raghu Ramakrishnan, ―Database Management Systems, Fourth Edition,


McGraw-Hill College Publications, 2015.

3. G.K.Gupta,"Database Management Systems”, Tata McGraw Hill, 2011.


18. MINI PROJECT SUGGESTIONS

Design E-R model for the following and also apply normalization
1) Blood bank management system
Hospitals will get register to request the blood they want. And some donors will get signup to
this blood bank to donate the blood. These donors will be available to donate in the particular
areas according to the registered data. The hospitals will request for the blood and blood bank
will provide the details of donors near to the hospital. Blood bank also shows the availability of
blood groups to the hospitals. We can also maintain the data of donated blood to the hospitals.

2) School management system


Staff details will be stored with id and all the staff details will be stored in the system. And we
retrieve them at any time by using their id. Students information will also be stored in the
system and students marks also can be stored. Salary management can also be done in this
system for the staff members of the school. Fees of the students can also be maintained in the
system. Another feature will contain sections information and the section class teacher.

3) Payroll management system


create a system where the admin will be the manager. The manager will log in with his id and
he will add all the details about the employees and he can add any new employees who are
joined in the organization. add a feature to calculate the salaries of the employees based on
their designation and attendance. Add a feature to display the details of all the employees in
the organization and we can also display the details and salaries of the employees which are
calculated in the current month.

4) Railway system
Users can book the train tickets to reach their destination. n this option includes things like the
present station and destination station and the train that they want to travel in and provide the
user to check the details of the train by using the train id and it must also show the details of
train arrival time, in which platform the train is arriving and departure timings of the train. also
add an option in which that will allow the user to book a meal while traveling on the train. And
we can also add the option which shows the price range of a different class of booking like AC,
second class, sleeper, and others. And try to think yourself to add any options.
5) Hospital Data Management
assign unique IDs to the patients and store the relevant information under the same. add the
patient’s name, personal details, contact number, disease name, and the treatment the patient
is going through. mention under which hospital department the patient is (such as cardiac,
gastro, etc.). add information about the hospital’s doctors. A doctor can treat multiple patients,
and he/she would have a unique ID as well. Doctors would also be classified in different
departments. add the information of ward boys and nurses working in the hospital and
assigned to different rooms. Patients would get admitted into rooms, so add that information in
your database too.
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
relianceon the contents of this information is strictly prohibited.

You might also like