DBMS DC Unit 4
DBMS DC Unit 4
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
22IT202
DATABASE MANAGEMENT SYSTEMS
1. Contents
2. Course Objectives
3. Pre Requisites
4. Syllabus
5. Course outcomes
7. Lecture Plan
9. Lecture Notes
10. Assignments
6
3. PRE REQUISITES
PRE-REQUISITE
7
4. SYLLABUS
List of Exercise/Experiments
Case Study using real life database applications anyone from the
following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
● Build Entity Model diagram. The diagram should align with the
business and functional goals stated in the application.
List of Exercise/Experiments
Case Study using real life database applications anyone from the following
list and do the following exercises.
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
List of Exercise/Experiments
1. Case Study using real life database applications anyone from the following list
Inventory Management for a EMart Grocery Shop
Society Financial Management
Cop Friendly App – Eseva
Property Management – eMall
Star Small and Medium Banking and Finance.
Apply Normalization rules in designing the tables in scope.
.
UNIT IV TRANSACTIONS, CONCURRENCY CONTROL AND DATA STORAGE 9+6
Transaction Concepts – ACID Properties – Schedules based on Recoverability,
Serializability – Concurrency Control – Need for Concurrency – Locking Protocols – Two
Phase Locking – Transaction Recovery –Concepts – Deferred Update – Immediate
Update.Organization of Records in Files – Unordered, Ordered – Hashing Techniques –
RAID – Ordered Indexes – Multilevel Indexes - B+ tree Index Files – B tree Index Files.
List of Exercise/Experiments
Case Study using real life database applications anyone from the following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
Ability to showcase ACID Properties with sample queries with appropriate settings
for the above scenario
UNIT V QUERY OPTIMIZATION AND ADVANCED DATABASES 9+6
Query Processing Overview – Algorithms for SELECT and JOIN operations – Query
optimization using Heuristics.Distributed Database Concepts – Design –Concurrency Control
and Recovery – NOSQL Systems – Document-Based NOSQL Systems and MongoDB.
List of Exercise/Experiments
Case Study using real life database applications anyone from the following list
a) Inventory Management for a EMart Grocery Shop
b) Society Financial Management
c) Cop Friendly App – Eseva
d) Property Management – eMall
e) Star Small and Medium Banking and Finance
design.
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 2 1 1 1 1 1 1 2 2 2 2 2
CO2 3 2 2 1 1 1 1 2 2 2 2 2
CO3 2 1 1 1 1 1 1 2 2 2 2 2
CO4 2 1 1 1 1 1 1 2 2 2 2 2
CO5 2 1 1 1 1 1 1 2 2 2 2 2
CO6 2 1 1 1 1 1 1 2 2 2 2 2
Cognitive/
Expected
Course Affective Level
Course Outcome Statement Level of
Code of the Course
Attainment
Outcome
12
6.CO-PO/PSO MAPPING
P P P P P P P P P P P P PS PS PS
O O O O O O O O O O O O O O O
Course 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
Outcomes (Cos)
K3
K
K4 K5 K5 /K A2 A3 A3 A3 A3 A3 A2 K3 K3 K3
3
5
C212.1 K4 3 3 2 2 3 3 3
C212.2 K3 3 2 1 1 3 3 3
C212.3 K4 3 3 2 2 3 3 3
5
C212.4 K4 3 3 2 2 3 3 3
C212.5 K4 3 3 2 2 3 3 3
C212.6 K4 3 3 2 2 3 3 3
C212.7 A2 3
C212.8 A2 2 2 2 3
C212.9 A3 3 3 3 3 3
C305 3 3 2 2 3 3 3
13
7. LECTURE PLAN
S.No Topic No Proposed Actual CO Taxono Mode of
of Date Lecture my Delivery
Perio Date Level
ds
1 Transaction 1 CO4 K2 PPT
Concepts -
ACID
Properties
2 1 CO4 K2 PPT
Schedules
3 1 CO4 K2 PPT
Serializability
4 1 CO4 K2 PPT
Concurrency
Control
5 1 CO4 K2 PPT
Need for
Concurrency
Crossword Puzzle
Transactions
8. Activity Based Learning
Brainstorming Session
Transaction Concept:
The term transaction refers to a collection of operations that form a single logical
unit of work.
The transaction consists of all operations executed between the begin transaction
and end transaction.
write(X)
Transfers the value in the variable X in the main-memory buffer of the
transaction that executed the write to the data item X in the database.
States of a Transaction:
Active - the initial state; the transaction stays in this state while it is executing.
Failed - after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back and the database has been
restored to its state prior to the start of the transaction.
Committed state:
Once the transaction is committed, the updates of the transaction are made
permanent to the database.
Failed state:
If the consistency check fails, the transaction is aborted and rolled back.
The transaction is rolled back to undo the effect of its write operations on the
database.
Terminated state:
Atomicity
“Either all operations of the transaction are reflected properly in the
database, or none are.”
Assume that, before the execution of transaction Ti, the values of accounts A and
B are $1000 and $2000, respectively.
Now suppose that, during the execution of transaction Ti, a failure happened after
the write(A) operation but before the write(B) operation.
In this case, the values of accounts A and B reflected in the database are $950
and $2000. The system destroyed $50 as a result of this failure.
The sum of A+B before and after the execution of transaction is not same and the
database is now in inconsistent state.
We must ensure that such inconsistencies are not visible in a database system.
Consistency:
“The consistency requirement here is that the sum of A and B be
unchanged by the execution of the transaction.”
Isolation:
“If several transactions are executed concurrently, their operations may
For example:
If the database is temporarily inconsistent with the deducted total written to A
Durability:
“The durability property guarantees that, once a transaction completes
We assume for now that a failure of the computer system may result in loss of
data in main memory, but data written to disk are never lost.
2.Information about the transaction updates and the data written to disk is
sufficient to enable the database to reconstruct the updates when the database
system is restarted after the failure.
read(X) - transfers the data item X from the database to a variable, also called
X, in a buffer in main memory belonging to the transaction that executed the
read operation.
Ti :
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
Atomicity requirement
If the transaction fails after step 3 and before step 6, money will be “lost” leading
to an inconsistent database state. Failure could be due to software or hardware.
The system should ensure that updates of a partially executed transaction are not
reflected in the database.
Consistency requirement
• Explicitly specified integrity constraints such as primary keys and foreign keys
e.g. sum of balances of all accounts, minus sum of loan amounts must equal value
of cash-in-hand
Isolation requirement
Ti Tj
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
Isolation can be ensured trivially by running transactions serially that is, one after
the other. However, executing multiple transactions concurrently has significant
benefits.
Ensuring the isolation property is the responsibility of a component of the
database system called the concurrency-control system.
Schedules
The database system must control the interaction among the concurrent
transactions to prevent them from destroying the consistency of the database. It
does so through a variety of mechanisms called concurrency-control schemes.
The concept of schedules to help identify those executions that are guaranteed to
ensure the isolation property and thus database consistency.
Types of Schedules:
1. Serial Schedule
2. Non-serial Schedule
3. Recoverable Schedule
4. Non-recoverable Schedule
5. Cascadeless Schedule
6. Strict Schedule
Schedules
Serial Schedule:
A schedule S is serial if, the transactions in the schedule are executed one after
the other(not interleaved).
Example:
Consider the schedule S with two transactions T1 and T2.
Schedule1 : T1 is followed by T2 Schedule2 : T2 is followed by T1
Non-Serial Schedule:
A schedule S is non-serial if, the operations of the transactions in the schedule
are interleaved.
Example:
Consider the schedule S with two transactions T1 and T2.
Example:
In this schedule,
• T2 reads the value of A updated by T1. (T2 is dependent on T1)
• T1 is committed before T2 gets committed.
• So, T2 is safe from rollback due to failure of T1.
• Thus, this is a recoverable schedule.
Non-Recoverable Schedule
In this schedule,
Cascadeless Schedule
Cascading Rollback:
Even if a schedule is recoverable, to recover correctly from the failure of a
transaction Ti, we may have to roll back several other transactions that read the
value produced by Ti.
In this schedule,
Cascadeless Schedule:
A schedule S is cascadeless if, every transaction in the schedule reads only the
data items that were written by committed transactions.
Cascading rollbacks will not occur in a cascadeless schedule since it reads committed
data items.
Example:
Consider the schedule S with three transactions T1,T2 and T3.
In this schedule, the transaction reads the value A of a committed transaction.
There is no possibility of cascading rollback.
Strict Schedule
A schedule S is strict if, the transactions in the schedule can neither read nor
write an item A until the last transaction that wrote A has committed.
Example:
• In this, the transaction T2 ,reads and writes the value A of the committed
transaction T1.
The two forms of serializability are conflict serializability and view serializability
CONFLICT SERIALIZABILITY
Let us consider a schedule S in which there are two consecutive instructions, I and
J, of transactions Ti and Tj, respectively ( i != j ).
If the two consecutive instructions I and J refer to different data items, then we can
swap I and J without affecting the results of any instruction in the schedule.
If the two consecutive instructions I and J refer to the same data item Q, then the
order of the two steps may matter.
Since we are dealing with only read and write instructions, there are four cases that
we need to consider:
Conflict Serializability:
• A schedule S is conflict serializable if it is conflict equivalent to a serial
schedule.
Schedule S can be transformed into Serial schedule S’, by a series of swaps of non-
conflicting instructions such as:
If the two consecutive instructions I and J refer to different data items, then we
can swap I and J without affecting the results of any instruction in the schedule.
This graph consists of a pair G = (V, E), where V is a set of vertices and E is a set
of edges.
The set of vertices consists of all the transactions participating in the schedule.
The set of edges consists of all edges Ti →Tj for which one of three conditions
holds:
If an edge Ti →Tj exists in the precedence graph, then, in any serial schedule S’
equivalent to S, Ti must appear before Tj.
If the precedence graph for S has a cycle, then schedule S is not conflict
serializable.
The topological sorting for a directed acyclic graph is the linear ordering of
vertices. For every edge U->V of a directed graph, the vertex U will come before
vertex U in the ordering. Topological sort starts with a node which has zero
indegree (i.e) no incoming edges
Serializability
Example 1:
The Precedence Graph for the Schedule S contains the following edges:
If an edge Ti →Tj exists in the precedence graph, then, in any serial schedule S’
equivalent to S, Ti must appear before Tj.
The precedence graph for S does not form a cycle, then schedule S is conflict
serializable.
The Precedence Graph for the Schedule S contains the following edges:
T1 ->T2 because,
The precedence graph for S does not form a cycle, then schedule S is conflict
serializable.
Using Topological Sorting, the serializability order of the Schedule is identified as
T1 ->T2 (i.e) The schedule S is equivalent to a Serial Schedule in which T1 followed
by T2.
Example 3:
Consider a Schedule S with two Transactions T1 and T2 as follows:
Schedule S Precedence Graph for S
3. Serializability
The Precedence Graph for the Schedule S contains the following edges:
T1 ->T2 because,
T2 ->T1 because,
The precedence graph for S has a cycle, then schedule S is NOT conflict
serializable.
VIEW SERIALIZABILITY
View equivalence:
Let S and S´ be two schedules with the same set of transactions. S and S´ are
view equivalent if the following three conditions are met, for each data item Q,
3. The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule S’.
Conditions 1 and 2 ensure that each transaction reads the same values in both
schedules and, therefore, performs the same computation.
Condition 3, coupled with conditions 1 and 2, ensures that both schedules result
in the same final system state.
A schedule S is view serializable if it is view equivalent to a serial schedule.
Example:
The above schedule S with three transactions T1, T2, T3 is view-serializable but not
conflict serializable because swapping of non-conflicting operations does not result in
conflict equivalence to the serial schedule.
BLIND WRITES:
Blind writes occurs in a schedule perform which does write operations without having
performed a read operation.
Example:
Blind writes appears in schedule S because, it performs write operations without having
performed a read operation.
Serializability
View Serializable
Conflict
Serializable
Concurrent access is quite easy if all users are just reading data. There is
no way they can interfere with one another. Though for any practical Database, it
would have a mix of READ and WRITE operations and hence the concurrency is a
challenge.
Lost Updates occur when multiple transactions select the same row and update
the row based on the value selected
Incorrect Summary issue occurs when one transaction takes summary over the
value of all the instances of a repeated data-item, and second transaction update
few instances of that specific data-item. In that situation, the resulting summary
does not reflect a correct result.
9.5 Need for Concurrency Control
• The parallelism of the CPU and the I/O system can therefore be exploited to run
multiple transactions in parallel.
• All of this increases the throughput of the system – that is, the number of
transactions executed in a given amount of time.
• Correspondingly, the processor and disk utilization also increase in other words, the
processor and disk spend less time idle, or not performing any usually work.
• There may be a mix a transactions running on a system, some short and long
transactions.
• If transactions run serially, a short transaction may have to wait for a preceding long
transaction to complete, which can lead to unpredictable delays in running a
transaction.
• If the transactions are operating on different parts of the database, it is better to let
them run concurrently, sharing the CPU cycles and disk accesses among them.
Concurrent execution reduces the unpredictable delays in running transactions.
• Moreover, it also reduces the average response time: the average time for a
transaction to be completed after it has been submitted.
9.5 Need for Concurrency Control
The system needs to control the interaction among the concurrent transactions. This
control is achieved using concurrent-control schemes.
3. Timestamp protocols
Lock based Protocols helps to overcome the issues related to accessing the DBMS
concurrently by locking the current transaction for only one user. The assumption or
more like a requirement that is necessary for implementing Lock Based Protocol is
that all the data items involved are accessed in a mutually exclusive manner i.e. when
one transaction is active by a user, no other transaction is allowed access to update or
modify that transaction at the same time. As the name suggests, the Lock based
protocols when in action, are required to acquire a lock to access the data items and
release the lock when the said transaction is completed.
Types of Locks
• Binary locks
Binary Locks
A binary lock can have two states or values: locked and unlocked (or 1 and 0, for
simplicity)
A distinct lock is associated with each database item X.
If the value of the lock on X is 1, item X cannot be accessed by a database
operation that requests the item.
If the value of the lock on X is 0, the item can be accessed when requested, and
the lock value is changed to 1. We refer to the current value (or state) of the
lock associated with item X as lock(X).
In its simplest form, each lock can be a record with three fields:
<Data_item_name, LOCK, Locking_Transaction> plus a queue for transactions
that are waiting to access the item.
9.6 Locking Protocols - Two Phase Locking
2. A transaction T must issue the operation unlock_item(X) after all read_item(X) and
write_item(X) operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the lock on
item X.
4. A transaction T will not issue an unlock_item(X) operation unless it already holds the
lock on item X.
9.6 Locking Protocols - Two Phase Locking
Shared/Exclusive locks
There are various modes in which a data item may be locked. Two modes are given
below:
1. Shared Mode
2. Exclusive Mode
If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q, then
Ti can both read and write Q.
Lock-compatibility
When we use the shared/exclusive locking scheme, the system must enforce the
following rules:
1. A transaction T must issue the operation read_lock (X) or write_lock (X) before any
read_item(X) operation is performed in T.
2. A transaction T must issue the operation write_lock (X) before any write_item (X)
operation is performed in T.
3. A transaction T must issue the operation unlock(X) after all read_item(X) and
write_item(X) operations are completed in T.3
4. A transaction T will not issue a read_lock (X) operation if it already holds a read
(shared) lock or a write (exclusive) lock on item X.
Banking Example
Consider again the banking example. Let A and B be two accounts that are accessed
by transactions T1 and T2. Transaction T1 transfers $50 from account B to account A
Transaction T2 displays the total amount of money in accounts A and B—that is, the
sum A + B. This scenario is shown below.
Suppose that the values of accounts A and Bare $100 and $200, respectively. If
these two transactions are executed serially, either in the order T1, T2 or the order
T2, T1 then transaction T2 will display the value $300. If, however, these
transactions are executed concurrently, then schedule 1, in the below Figure is
possible.
T1 T2 Concurrency control Manger
lock-X(B);
grant_X(B,T1)
read(B);
B:=B-50;
write(B);
unlock(B); lock-S(A);
grant-S(A,T2)
read (A);
unlock(A);
lock-S(B);
grant-S(B,T2)
read (B);
unlock(B);
display(A+B)
lock_X(A);
grant-X(A,T2)
read(A);
A:=A+50;
write(A);
unlock(A);
In this case, transaction T2 displays $250, which is incorrect. The reason for this
mistake is that the transaction T1 unlocked data item B too early, as a result of
which T2 saw an inconsistent state.
The schedule shows the actions executed by the transaction, as well as the points at
which the concurrency-control manager grants the locks. The transaction making a
lock request cannot execute its next action until the concurrency-control manager
grants the lock. Hence, the lock must be granted in the interval of time between the
lock-request operation and the following action of the transaction. Sometimes
locking can lead to undesirable situation as in the next figure. In the figure since T3
is holding an X-lock on B and T4 is requesting a S-lock on B, T4 is waiting for T3 to
unlock B. similarly, since T4 is holding is a S-lock on A and T3 is requesting an X-lock
on A, T3 is waiting for T4 to unlock A. thus it is in a deadlock state. The only
solution is to rollback one of the two transactions. Once a transaction has been
rolled back, the data items that were locked by that transaction have been unlocked.
Locking Protocols - Two Phase Locking
1. Growing phase:
A transaction may obtain locks, but may not release any lock.
2. Shrinking phase:
A transaction may release locks, but may not obtain any new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks as
needed. Once the transaction releases a lock, it enters the shrinking phase, and it can
issue no more lock requests. Consider Transactions T1,T2,T3 and T4 given below:
For example, transactions T3 and T4 are two phase. On the other hand,
transactions T1 and T2 are not two phase.
The point in the schedule where the transaction has obtained its final lock (the end
of its growing phase) is called the lock point of the transaction.
Guaranteeing Serializability by Two-Phase Locking
A transaction is said to follow the two-phase locking protocol if all locking operations
(read_lock, write_lock) precede the first unlock operation in the transaction.
• The transactions can be rewritten as T1’ and T2’. Transactions T1’ and T2’, which are
the same as T1 and T2 but follow the two-phase locking protocol.
• The locking protocol, by enforcing two-phase locking rules, also enforces serializability.
Two-phase locking may limit the amount of concurrency that can occur in a schedule
because a transaction T may not be able to release an item X after it is through using
it if T must lock an additional item Y later; or conversely, T must lock the additional
item Y before it needs it so that it can release X.
• Hence, X must remain locked by T until all items that the transaction needs to read or
write have been locked; only then can X be released by T. Meanwhile, another
transaction seeking to access X may be forced to wait, even though T is done with X;
conversely, if Y is locked earlier than it is needed, another transaction seeking to
access Y is forced to wait even though T is not using Y yet.
• This is the price for guaranteeing serializability of all schedules without having to
check the schedules themselves.
• Although the two-phase locking protocol guarantees serializability (that is, every
schedule that is permitted is serializable), it does not permit all possible serializable
schedules (that is, some serializable schedules will be prohibited by the protocol).
9.6 Locking Protocols - Two Phase Locking
Types of Two-Phase Locking:
• Basic
• Conservative
• Strict
•Rigorous Two-Phase Locking
There are a number of variations of two-phase locking (2PL). The technique just
described is known as basic 2PL.
Conservative 2PL :
A variation known as conservative 2PL (or static 2PL) requires a transaction to lock all
the items it accesses before the transaction begins execution, by predeclaring its read-
set and write-set. The read-set of a transaction is the set of all items that the
transaction reads, and the write-set is the set of all items that it writes. If any of the
predeclared items needed cannot be locked, the transaction does not lock any item;
instead, it waits until all the items are available for locking. Conservative 2PL is a
deadlock-free protocol
Types
1. Upgrade
2. Downgrade
• A mechanism can be provided for upgrading a shared lock to an exclusive lock, and
• We denote conversion from shared to exclusive modes by upgrade, and from exclusive
to shared by downgrade.
• Lock conversion cannot be allowed arbitrarily. Rather, upgrading can take place in only
the growing phase, whereas downgrading can take place in only the shrinking phase.
9.7 Transaction Recovery , Save Points & Isolation Levels
Transaction Recovery
Recovery techniques are heavily dependent upon the existence of a special file known as
a system log. It contains information about the start and end of each transaction and
any updates which occur in the transaction. The log keeps track of all transaction
operations that affect the values of database items. This information is needed to
recover from transaction failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
•read_item(T, X): This log entry records that transaction T reads the value of database
item X.
•commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
•checkpoint: Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database
have been executed successfully i.e. the transaction has reached the point at which it
will not abort (terminate without completing). Once committed, the transaction is
permanently recorded in the database. Commitment always involves writing a commit
entry to the log and writing the log to disk. At the time of a system crash, item is
searched back in the log for all transactions T that have written a start_transaction(T)
entry into the log but have not written a commit(T) entry yet; these transactions may
have to be rolled back to undo their effect on the database during the recovery process.
9.7 Transaction Recovery , Save Points & Isolation Levels
All changes made after a savepoint has been declared can be undone by issuing a
ROLLBACK TO SAVEPOINT name command.
Issuing the commands ROLLBACK or COMMIT will also discard any savepoints
created since the start of the main transaction.
Issuing the commands ROLLBACK or COMMIT will also discard any savepoints
created since the start of the main transaction.
Oracle releases all table and row locks acquired since that savepoint but
retains all data locks acquired previous to the savepoint.
SAVEPOINT Adam_sal;
UPDATE employees SET salary = 12000 WHERE last_name = 'Mike';
SAVEPOINT Mike_sal;
Read committed allows only committed data to be read, but does not require
repeatable reads. For instance, between two reads of a data item by the
transaction, another transaction may have updated the data item and
committed.
In SQL, it is possible to set the isolation level explicitly, rather than accepting the
system’s default setting. For example, the statement “set transaction isolation
level serializable;” sets the isolation level to serializable; any of the other
isolation levels may be specified instead.
A file is organized as a sequence of records. These records are mapped onto disk blocks.
Consider a file of account records for the bank database. Each record of the file is defined as :
The below figure, show how fixed length records are stored in the file.
will cross block boundaries. That is part of the record will be stored in
one block and part in another. It would thus require two block accesses
Instead of this approach, it might be easier simply to move the final record of
the file into the space occupied by the deleted record. This approach is
illustrated in the below figure.
63
On insertion of a new record, we can use the record pointed by the header. The header
pointer is change to point to the next available record after insertion. If no
space is available the insertion is done at the end of the file.
Insertion and deletion of fixed length records are simple to implement, because the space
made available by a deleted record is exactly the space needed to insert a record.
Variable length records may not hold this type of advantages.
Record types that allow variable lengths for one or more fields.
Record types that allow repeating fields (used in some older data models).
array [1 .. ∞] of record;
account-number: char(l0);
balance: real;
end
end
64
The account-info is defined as an array with an arbitrary number of elements. That
is, the type definition does not limit the number of elements in the array, although
any actual record will have a specific number of elements in its array. There is no
limit on how large a record can be (up to, of course, the size of the disk storage).
65
Thus, the basic byte-string representation described here not usually used for imple-
menting variable-length records. However, a modified form of the byte-string
representation, called the slotted-page structure, is commonly used for
organizing records within a single block. The slotted-page structure appears in
An array whose entries contain the location and size of each record
The actual records are allocated contiguously in the block, starting from the
end of the block. The free space in the block is contiguous, between the
final entry in the header array, and the first record. If a record is inserted,
space is allocated for it at the end of free space, and an entry containing its size
and location is added to the header.
If a record is deleted, the space that it occupies is freed, and its entry is set to
deleted (its size is set to -1, for example). Further, the records in the block before
the deleted record are moved, so that the free space created by the deletion gets
occupied and all free space is again between the final entry in the header array
and the first record. The end-of-free-space pointer in the header is appropriately
updated as well. Records can be grown or shrunk by similar techniques, as long
as there is space in the block.
The slotted-page structure requires that there be no pointers that point directly
to records. Instead, pointers must point to the entry in the header that contains
the actual location of the record. This level of indirection allows records to be
moved to prevent fragmentation of space inside a block, while supporting indirect
66
Fixed-length Representation
Another way to implement variable-length records efficiently in a file system is to
use one or more fixed-length records to represent one variable-length record.
67
A record in this file is of the account-list type, but with the array containing
exactly three elements. Those branches with fewer than three accounts (for
example, Round Hill) have records with null fields. The symbol (^) is used to
represent this situation in Figure. The reserved-space method is useful when most
of the records have a length close to the maximum. Otherwise, a significant
amount of space may be wasted.
In the bank example, some branches may have many more accounts than others.
This situation leads to consider the linked list method. To represent the file by the
linked list method, a pointer field should be added. The resulting structure ap-
pears in the Figure
To deal with this problem, two kinds of blocks are allowed in a file:
Anchor block, which contains the first record of a chain
Overflow block, which contains records other than those that are the first record
of a chain
Thus, all records within a block have the same length, even though not all
records in the file have the same length. The Figure: 4.17 show this file structure.
68
Figure: - Pointer Method-Using Anchor blockOverflow block
Records are stored in sequential order according to the value of a "search key" of
each record.
A hash function is computed on some attribute of each record. The result of the
hash function specifies in which block of the file the record should be placed.
69
Heap File Organization (heap files or unordered files):
In this simplest and most basic type of organization, records are placed in the
file in the order in which they are inserted, so new records are inserted at
the end of the file. Such an organization is called a heap or pile file. This
organization is often used with additional access paths, such as the secondary
indexes. It is also used to collect and store records for future use.
Inserting a new record is efficient. The last disk block of the file is copied into a
buffer. The new record is added and then the block is then rewritten back to the
disk. The address of the last file block is kept in the file header. However,
searching for a record using any search condition involves a linear search
through the file by block, which is an expensive procedure. If only one record
satisfies the search condition, then, on the average, a program will read into
memory and search half the file blocks before it finds the record. For a file b
blocks requires searching (b/2) blocks on average. If no records satisfy the search
condition, the program must read and search all b blocks in the file.
To delete a record, a program must first find its block, copy the block into the
buffer, and finally rewrite the block back to the disk. This leaves unused space in the
disk block. Deleting, a large number of records in this way results in wasted
storage space. Another technique used for record deletion is to have an extra
byte or bit, called a deletion marker, stored with each record. A record is deleted
by setting the deletion marker to a certain value. A different value of the marker
indicates a valid (not deleted) record. Search programs consider only valid
records in a block when conducting their search. Both of these deletion techniques
require periodic reorganization of the file to reclaim the unused space of
deleted records. During reorganization, the file blocks are accessed consecutively,
and records are packed by removing deleted records.
70
After such reorganization, the blocks are filled to capacity once more. Another
possibility is to use the space when inserting records although this requires
extra bookkeeping to keep track of empty locations.
Below Figure: show a sequential file of account records taken from the banking
example. In that example, the records are stored in search-key order, using branch-
name as the search key.
71
The sequential file organization allows records to be read in sorted order; that can
be useful for display purposes, as well as for certain query-processing algorithms. It
is difficult, however, to maintain physical sequential order as records are in-
serted and deleted, since it is costly to move many records as a result of a single
insertion or deletion or deletion. Deletion can be managed by using pointer
chains. For insertion, the following rules are applied:
• Locate the record in the file that comes before the record to be inserted in search-
key order.
• If there is a free record (that is, space left after a deletion) within the same block
as this record, insert the new record there. Otherwise, insert the new record in an
overflow block. In either case, adjust the pointers so as to chain together the
records in search-key order.
The below figure: show the file of account, after the insertion of the record (North
Town, A-888, 800). The structure in the figure allows fast insertion of new records,
but forces sequential file-processing applications to process records in an order that
does not match the physical order of the records. If relatively few records need to
be stored in overflow blocks, this approach works well. Eventually, however, the
correspondence between search-key order and physical order may be totally lost, in
which case sequential processing will become much less efficient. At this point, the
file should be reorganized so that it is once again physically in sequential order. Such
reorganizations are costly; and must be done during times when the system load is
low. The frequency with which reorganizations are needed depends on the
frequency of insertion of new records. In the extreme case in which insertions rarely
occur, it is possible always to keep the file in physically sorted order.
72
Clustering File Organization:
Many relational-database systems store each relation in a separate file, so that
they can take full advantage of the file system that the operating system
provides. This simple approach to relational-database implementation becomes
less satisfactory as the size of the database increases. There are performance
advantages to be gained from careful assignment of records to blocks, and from
careful organization of the blocks themselves. A more complicated file structure
may be beneficial, even if the strategy of storing each relation in a separate file is
used.
However, many large-scale database systems do not rely directly on the underly-
ing operating system for file management. Instead, one large operating-system
file is allocated to the database system. The database system stores all relations
in this one file, and manages the file itself. To see the advantage of storing many
relations in one file, consider the following SQL query for the bank database:
73
Figure: Depositor Relation
The below figure shows a file structure designed for efficient execution of queries
involving depositor customer
74
A clustering file organization is a file organization, such as that illustrated in the
Figure: 4.22 that stores related records of two or more relations in each block.
Such a file organization allows us to read records that would satisfy the join
condition by using one block read.
requires more block accesses than it did in the scheme under which each relation is
stored in a separate file. Instead of several customer records appearing in one block
each record is located in a distinct block. Indeed, simply finding all the
customer records is not possible without some additional structure. To locate all
tuples of customer relation in the structure of the Figure: 4.22, it is needed to
chain together all records of that relation using pointers, as in the Figure: 4.23.
The usage of clustering depends on the types of query that the database designer
believes to be most frequent. Careful use of clustering can produce significant
performance gains in query processing.
75
The usage of clustering depends on the types of query that the database designer
believes to be most frequent. Careful use of clustering can produce significant
performance gains in query processing.
Open:
Prepares the file for reading or writing. Allocates appropriate buffers to hold file
blocks from disk, and retrieves the file header. Sets the file pointer to the
beginning of the file.
Reset:
Sets the file pointer of an open file to the beginning of the file.
Find:
Searches for the first record that satisfies the search condition. Transfers the
block containing that record into a main memory buffer. The file pointer points to
the record in the buffer and it becomes the current record.
Read:
Copies the current record from the buffer to a program variable in the user
program. This command may also advance the current record pointer to the next
record in the file, which may necessitate reading the next file block from the disk.
76
Findnext:
Searches for the next record in the file that satisfies the search condition.
Transfers the block containing that record into a main memory buffer. The record
is located in the buffer and becomes the current record.
Delete:
Deletes the current record and updates the file on disk to reflect the deletion.
Modify:
Modifies some field values for the current record and updates the file on disk to
reflect the modification.
Insert:
Inserts a new record in the file by locating the block where the record is to be
inserted, transferring that block into a main memory buffer, writing the record into
the buffer, and also writes the buffer to disk to reflect the insertion.
Close:
Completes the file access by releasing the buffers and performing any other
needed cleanup operations.
Scan:
If the file has just been opened or reset, scan returns the first record; otherwise it
returns the next record.
Findall:
Locates all the records in the file that satisfy a search condition.
Find Ordered:
77
9.9 HASHING TECHNIQUES:
Hashing is a type of primary file organization, which provides very fast access to
records on certain search conditions. This organization is called as hash file.
The idea behind the hashing is to provide a function h, called a hash function or
randomizing function that is applied to the hash filed value of a record and yields
the address of the disk block in which the record is stored. A search for the record
within the block can be carried out in a main memory buffer.
The hash function is given by:
H(k)=K mod M
In a hash file organization we obtain the bucket of a record directly from its
search-key value using a hash function.
Hash function h is a function from the set of all search-key values K to the set of
all bucket addresses B.
Hash function is used to locate records for access, insertion as well as deletion.
Records with different search-key values may be mapped to the same bucket;
thus entire bucket has to be searched sequentially to locate a record.
Ideal hash function is random, so each bucket will have the same number of
records assigned to it irrespective of the actual distribution of search-key values in
the file.
7
8
Handling of Bucket Overflows
Overflow chaining – the overflow buckets of a given bucket are chained together
in a linked list. Above scheme is called closed hashing.
An alternative, called open hashing, which does not use overflow buckets, is not
suitable for database applications.
If initial number of buckets is too small, and file grows, performance will degrade
due to too much overflows.
All the entries that point to the same bucket have the same values on the first ij
bits.
Compute h(Kj) = X
Use the first i high order bits of X as a displacement into bucket address table, and
follow the pointer to appropriate bucket
Hash structure after insertion of one Brighton and two Downtown records
8
0
The main advantage of the extendible hashing is that the performance of the
fields does not degrade as the file grows, as opposed to static external hashing
where collisions increase and the corresponding chaining causes the additional
calculate the hash address. Another technique involves picking some digits of
the hash field value – for example, the third, fifth and eighth digits to form the
hash address.
The problem with most hashing functions is that they do not guarantee that
distinct values will hash to addresses, because the hash field space (the number
of possible values a hash field can take) is usually much larger than the address
A collision occurs when the hash field value of a record that is being inserted
hashes to an address that already contains a different record. In this situation the
new record must be inserted in some other position, since its hash address is
8
1
Open addressing: proceeding from the occupied position specified by
the hash address, the program checks the subsequent positions in order
until an unused (empty) position is found. The below algorithm may be used
Algorithm:
i hash address (k);
a i;
if location i is occupied
else new_hash_address i;
end;
8
2
9.10 RAID
Stripping of Data:
With multiple disks, the transfer rate can be improved as well by striping data
across multiple disks.
Bit-level striping:
Data striping consists of splitting the bits of each byte across multiple disks. Such
striping is called bit-level striping.
8
3
Block-level striping:
stripes blocks across multiple disks. It treats the array of disks as a single large
disk, and it gives blocks logical numbers. It is assumed that the block numbers
start from 0.
RAID LEVELS
Mirroring provides high reliability, but it is expensive. Striping provides high data
transfer rates, but does not improve reliability. Various alternative schemes aim to
provide redundancy at lower cost by combining disk striping with "parity" bits.
These schemes have different cost- performance trade-offs. The schemes are
classified into RAID levels.
RAID level 0 refers to disk arrays with striping at the level of blocks, but
without any redundancy.
RAID Level l refers to disk mirroring with block striping. The above figure b
shows a mirrored organization that holds four disks worth of data.
RAID Level 2 known as memory-style error-correcting-code (ECC)
organization, employs parity bits. Memory systems have long used parity bits for
error detection and correction.
8
4
Usually each byte in a memory system may have a parity bit associated
with it that records whether the number of bits in the byte that are set to 1 is
even (parity = 0) or odd (parity = 1). If one of the bits in the byte gets damaged
(either a 1 becomes a 0, or a 0 becomes a 1), the parity of the byte changes and
thus will not match the stored parity.
The idea of error-correcting codes can be used directly in disk arrays by striping
bytes across disks.
The below figure c shows the level 2 scheme. The disks labeled P store the
error correction bits. If one of the disks fails, the remaining bits of the byte
and the associated error- correction bits can be read from other disks, and can be
used to reconstruct the damaged data.
The RAID level 2 requires only three disks overhead for four disks of data,
unlike RAID level 1, which required four disks overhead.
8
5
RAID level 3 is as good as level 2, but is less expensive in the number of extra
disks (it has only a one-disk overhead), so level 2 is not used in practice.
RAID level 3 has two benefits over level 1. It needs only one parity disk for
several regular disks, whereas Level l needs one mirror disk for every disk, and
thus reduces the storage overhead.
8
6
RAID level 6, the P + Q redundancy scheme is much like RAID level 5.
It stores extra redundant information to guard against multiple disk fail Instead of
using parity; level 6 uses error-correcting codes such as the Reed Solomon codes.
The scheme is shown in the below figure. 2 bits of redundant data are stored
for every 4 bits of data-unlike 1 parity in level 5 and the system can tolerate two
disk failures.
8
7
Choice of RAID Level
The factors to be taken into account when choosing a RAID level are
Monetary cost of extra disk storage requirements
SOFTWARE RAID:
HARDWARE RAID:
Systems with special hardware support are called hardware RAID systems.
HOT SWAPPING:
Some hardware RAID implementations permit hot swapping; that is, faulty disks
can be removed and replaced by new ones without turning power off. Hot
swapping reduces the mean time to repair.
8
8
9.11 INDEXING TECHNIQUES:
Introduction:
Database system indices play the same role as book indices or card catalogs in
the libraries. For example, to retrieve an account record given the account
number, the database system would look up an index to find on which disk block
the corresponding record resides, and then fetch the disk block, to get the
account record.
Ordered indices:
Hash indices:
Several techniques exist for both ordered indexing and hashing. No one technique is
the best. Rather, each technique is best suited to particular database applications.
Access types:
Access types can include finding records with a specified attribute value and
finding records, whose attribute values fall in a specified range.
Access time:
The time it takes to find a particular data item, or set of items using the technique in
question.
8
9
Insertion time:
Deletion time:
Space overhead:
Search Key :
search-key pointer
ORDERED INDICES
To gain fast random access to records in a file, an index structure is used. Each
index structure is associated with a particular search key. Just like the index of a
book or a library catalog an ordered index stores the values of the search keys in
sorted order, and associates with each search key the records that contain it.
Ordered indices can be categorized as primary index and secondary index.
Primary indices are also called clustering indices. The search key of a primary
index is usually the primary key, although that is not necessarily so.
Indices whose search key specifies an order different from the sequential order of
the file are called secondary indices, or non clustering indices.
9
0
PRIMARY INDEX
In this index, it is assumed that all files are ordered sequentially on some search
key. Such files, with a primary index on the search key, are called index-sequential
files. They represent one of the oldest index schemes used in database systems.
They are designed for applications that require both sequential processing of the
entire file and random access to individual records.
The Figure: 4.24 show a sequential file of account records taken from the banking
example. In the example figure, the records are stored in search-key order, with
branch-name used as the search key.
9
1
Two types of ordered: Dense and Sparse Indices
DENSE INDEX
Dense index: an index record appears for every search-key value in the file. In a
dense primary index, the index record contains the search-key value and a pointer
to the first data record with that search-key value.
Implementations may store a list of pointers to all records with the same search-
key value; doing so is not essential for primary indices. The below figure, show
the dense index for the account file.
SPARSE INDEX:
An index record appears for only some of the search-key values. To locate a
record we find the index entry with the largest search-key value that is less than
or equal to the search key value for which we are looking. We start at the record
pointed to by that index entry, and follow the pointers in the file until we find the
desired record.
9
2
The below figure show the sparse index for the account file.
Figure:Sparse Index
Suppose that we are looking up records for the Perryridge branch. Using the dense
index we follow the pointer directly to the first Perryridge record. We process this
record and follow the pointer in that record to locate the next record in search-key
for a branch other than Perryridge. If we are using the sparse index, we do not find
an index entry for "Perryridge". Since, the last entry (in alphabetic order) before
"Perryridge" is "Mianus" we follow that pointer. We then read the account file in
sequential order until we find the first Perryridge record, and begin processing at
that point.
Thus, it is generally faster to locate a record in a dense index; rather than a sparse
index. However, sparse indices have advantages over dense indices in that they
require less space and they impose less maintenance overhead for insertions and
deletions.
There is a trade-off that the system designer must make between access time and
space overhead. Although the decision regarding this trade-off depends on the spe-
cific application, a good compromise is to have a sparse index with one index entry
per block. The reason this design is a good trade-off is that the dominant cost in
processing a database request is the time that it takes to bring a block from disk into
main memory. Once we have brought in the block, the time to scan the entire block
is negligible. Using this sparse index, we locate the block containing the record that
9
3
we are seeking. Thus, unless the record is on an overflow block, we minimize block
with 10 records stored in each block. If we have one index record per block, the
index has 10,000 records. Index records are smaller than data records, so let us
assume that 100 index records fit on a block. Thus, our index occupies 100 blocks.
If an index is sufficiently small to be kept in main memory, the search time to find
an entry is low. However, if the index is so large that it must be kept on disk, a
search for an entry requires several disk block reads. Binary search can be used on
the index file to locate an entry, but the search still has a large cost. If overflow
blocks have been used, binary search will not be possible. In that case, a sequential
search is typically used, and that requires b block reads, which will take even longer.
To deal with this problem, we treat the index just as we would treat any other
sequential file, and construct a sparse index on the primary index, as in the below
figure To locate a record, we first use binary search on the outer index to find the
record for the largest search-key value less than or equal to the one that we desire.
The pointer points to a block of the inner index. We scan this block until we find the
record that has the largest search-key value less than or equal to the one that we
desire. The pointer in this record points to the block of the file that contains the
9
4
Figure: Two-level Sparse Index
Using the two levels of indexing, we have read only one index block, rather than
the seven we read with binary search, if we assume that the outer index is
already in main memory. If our file is extremely large, even the outer index may
grow too large to fit in main memory. In such a case, we can create yet another
level of index. Indices with two or more levels are called multilevel indices.
Searching for records with a multilevel index requires significantly fewer I/O
operations than does searching for records by binary search.
INDEX UPDATE
Regardless of what form of index is used, every index must be updated whenever
a record is either inserted into or deleted from the file. These are the algorithms
used for updating single level indices.
INSERTION:
First, the system performs a lookup using the search-key value that appears in
the record to be inserted. Again, the actions the system takes next depend on
whether the index is dense or sparse:
9
5
DENSE INDICES:
If the search-key value does not appear in the index, the system inserts an index
record with the search-key value in the index at the appropriate position.
If the index record stores pointers to all records with the same search- key value,
the system adds a pointer to the new record to the index record.
Otherwise, the index record stores a pointer to only the first record with the
search-key value. The system then places the record being inserted after the
other records with the same search-key values.
SPARSE INDICES:
We assume that the index stores an entry for each block. If the system creates a
new block, it inserts the first search-key value (in search-key order) appearing in
the new block into the index. On the other hand, if the new record has the least
search-key value in its block, the system updates the index entry pointing to the
block; if not, the system makes no change to the index.
DELETION.
To delete a record, the system first looks up the record to be deleted. The actions
the system takes next depend on whether the index is dense or sparse.
DENSE INDICES:
1. If the deleted record was the only record with its particular search-key
value, then the system deletes the corresponding index record from the
index.
9
6
2. Otherwise the following actions are taken:
If the index record stores pointers to all records with the same search-
key value, the system deletes the pointer to the deleted record from
the index record.
Otherwise, the index record stores a pointer to only the first record with
the search-key value.
SPARSE INDICES:
1. If the index does not contain an index record with the search-key value of
the deleted record, nothing needs to be done to the index.
Otherwise, if the index record for the search-key value points to record
being deleted, the system updates the index record to point to the next
record with the same search-key value.
9
7
SECONDARY INDICES
Secondary indices must be dense, with an index entry for every search-key value,
and, a pointer to every record in the file. A primary index may be sparse, storing
only some of the search-key values, since it is always possible to find records with
intermediate, search-key values by a sequential access to a part of the file. If a
secondary index stores only some of the search-key values, records with
intermediate search-key values may be anywhere in the file and, in general, we
cannot find them without searching the entire file.
The pointers in such a secondary index do not point directly to the file. Instead,
each points to a bucket that contains pointers to the file. The below figure
shows the structure of a secondary index that uses an extra level of indirection on
the account file, on the search key balance.
SQL on INDEX:
Create an index
9
8
9.13 B+ TREE AND B TREE
B+ Trees
Main disadvantage of index sequential file is that performance degrades as file
grows. Frequent reorganizations are undesirable
B+ trees are most widely used index structure that maintains efficiency.
Remember that a tree: Balanced tree: all leafs at the same level:
PROPERTIES OF B+ TREE:
Each node that is not a root or a leaf has between n/2 and n children. A leaf
node has between 2 to m values
9
9
TWO TYPES OF NODES:
Leaf nodes: Store keys and pointers to data
Index nodes: Store keys and pointers to other nodes Leaf nodes are linked to
each other.
Keys may be duplicated: Every key to the right of a particular key is >= to that
key.
Typical structure of the Node
For i = 1, 2, . . ., n–1, pointer Pi either points to a file record with search -key
value Ki, or to a bucket of pointers to file records, each record having search-key
value Ki.
If Li, Lj are leaf nodes and i <j, Li’s search-key values are less than Lj’s search-
key values
Pn points to next leaf node in search-key order The search-keys in a leaf node
are ordered
1
0
0
Example For B+ Tree:
UPDATES ON B+TREE
1. Find the leaf node in which the search-key value would appear
2. If the search-key value is already present in the leaf node
add the record to the main file (and create a bucket if necessary)
If there is room in the leaf node, insert (key-value, pointer) pair in the
leaf node
Otherwise, split the node (along with the new (key-value, pointer) entry
take the n (search-key value, pointer) pairs (including the one being inserted) in
sorted order. Place the first n/2 in the original node, and the rest in a new node.
let the new node be p, and let k be the least key value in p. Insert (k,p) in the
parent of the node being split.
If the parent is full, split it and propagate the split further up.
5. Splitting of nodes proceeds upwards till a node that is not full is found.
In the worst case the root node may be split increasing the height of the tree
by 1.
1
0
1
Fig: Splitting A Leaf Node
Result of splitting node containing Brighton and Downtown on inserting Clear view
1
0
2
UPDATION OF B+TREE: DELETION
Find the record to be deleted, and remove it from the main file and from the
bucket (if present)
Remove (search-key value, pointer) from the leaf node if there is no bucket or if
the bucket has become empty
If the node has too few entries due to the removal, and the entries in the node
and a sibling fit into a single node, then merge siblings:
Insert all the search-key values in the two nodes into a single node (the one on
the left), and delete the other node.
Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted node, from its
parent, recursively using the above procedure.
1
0
3
9.14 B TREE:
Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates
redundant storage of search keys.
Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer
field for each search key in a nonleaf node must be included.
Fig: B Tree
Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have greater
depth than corresponding B+-Tree
1
0
4
10. ASSIGNMENTS
1. Consider the following schedules. The actions are listed in the order they are
scheduled, and prefixed with the transaction name.
iii)Is the schedule view-serializable? If so, what are all the view equivalent serial
schedules?
Solution:
The actions listed out for Schedule S1 and S2 are be written as
Schedule S1 Schedule S2
T1 T2 T3 T1 T2 T3
R(z) R(y)
R(y) R(z)
W(y) R(x)
R(y) W(x)
R(z) W(y)
R(x) W(z)
W(x) R(z)
W(y) R(y)
W(z) W(y)
R(x) R(y)
R(y) W(y)
W(y) R(x)
W(x) W(x)
10. ASSIGNMENTS
2. Consider the following schedules. The actions are listed in the order they are
Scheduled
Solution:
The actions listed out for Schedule S1 and S2 are be written as
Schedule S1 Schedule S2
T1 T2 T3 T1 T2 T3
R(x) R(x)
R(x) R(x)
W(x)
W(x)
R(x)
R(x)
W(x) W(x)
10. ASSIGNMENTS
(i) PRECEDENCE GRAPH
Schedule S1
The Precedence graph for Schedule S1 consists of the following edges
T2 ->T3 because,
T2 executes R(z) before T3 executes W(z)
T2 executes R(y) before T3 executes W(y)
T2 executes W(y) before T3 executes W(y)
T2->T1 because,
T2 executes R(y) before T1 executes W(y)
T3->T1 because,
T3 executes W(y) before T1 executes R(y)
T3 executes R(y) before T1 executes W(y)
T1->T2 because,
T1 executes R(x) before T2 executes W(x)
T1 executes W(x) before T2 executes W(x)
Schedule S2
The Precedence graph for Schedule S2 consists of the following edges
T3 ->T1 because,
T3 executes R(y) before T1 executes W(y)
T3 executes W(y) before T1 executes W(y)
T3->T2 because,
T3 executes R(y) before T2 executes W(y)
T3 executes W(y) before T2 executes W(y)
T3 executes W(z) before T2 executes R(z)
T1->T2 because,
T1 executes R(x) before T2 executes W(x)
T1 executes W(x) before T2 executes W(x)
T1 executes W(x) before T2 executes R(x)
T1 executes R(y) before T2 executes R(y)
T1 executes W(y) before T2 executes W(y)
10. ASSIGNMENTS
If the precedence graph for a schedule contains a cycle, the schedule is not
Conflict Serializable.
If the precedence graph for a schedule does not contain cycle, the schedule is
ScC
hoendfu
licle
t SSe1rializable.
Schedule S1
The precedence graph for Schedule S1 contains cycles.
So, the Schedule S1 is not Conflict Serializable.
Schedule S2
The precedence graph for Schedule S2 does not contain cycles.
So, the Schedule S2 is Conflict Serializable.
Schedule S1
The Schedule S1 is not conflict serializable and and does not contain Blind Writes,
so it is not View Serializable
Schedule S2
Schedule S1
The Precedence graph for Schedule S1 consists of the following edges
T1 ->T3 because,
T3->T1 because,
T3 executes R(x) before T1 executes W(x)
T1->T2 because,
T1 executes W(x) before T2 executes R(x)
T2->T3 because,
T2 executes R(x) before T3 executes W(x)
Schedule S2
The Precedence graph for Schedule S2 consists of the following edges
T3 ->T1 because,
T3 executes R(x) before T1 executes W(x)
T2->T1 because,
T2 executes R(x) before T1 executes W(x)
T2->T3 because,
T2 executes R(x) before T3 executes W(x)
10. ASSIGNMENTS
If the precedence graph for a schedule contains a cycle, the schedule is not
Conflict Serializable.
If the precedence graph for a schedule does not contain cycle, the schedule is
Conflict Serializable.
Schedule S1
Schedule S2
This is determined using topological sorting (start with vertex with indegree=0)
10. ASSIGNMENTS
Solution:
(a)r1(X), r3(X), w1(X), r2(X), w3(X).
There are two cycles. It is not conflict serializable.
Solution:
(c ) r3(X), r2(X), w3(X), r1(X), w1(X).
There are NO cycles. This schedule is serializable
2. Explain the structure of B Tree. Construct a B tree to insert the following (order
of the tree is 3) 25, 27, 28, 3, 4, 8, 9, 46, 48, 50, 2, 6.
(CO4, K2)
2. Construct B tree and B+ tree to insert the following key values(the order of the
tree is three) 32,11,15,13,7,22,15,44,67,4. (CO4, K2)
3. Suppose that we are using extendible hashing on a file that contains records with
the following search key values: 3, 5,7, 11, 17, 19, 23, 29, 31. Show that the
extendible hash structure for this file if the hash function is h(x)=x mod 7 and
bucket can hold three records. (CO4, K2)
(CO4, K2)
INSERT 2
INSERT 2
DELETE 5
DELETE 12
11. Part A Question & Answer
• Active, the initial state; the transaction stays in this state while it
is executing
• Partially committed, after the final statement has been
executed
• Failed, after the discovery that normal execution can no longer
proceed
• Aborted, after the transaction has been rolled back and the
database has been restored to its state prior to the start of the
transaction
• Committed, after successful completion
3 List the Desirable Properties of Transactions CO4 K1
A transaction T reaches its commit point when all its operations that
access the database have been executed successfully and the effect
of all the transaction operations on the database have been
recorded in the log.
11. Part A Question & Answer
A binary lock can have two states or values: locked and unlocked (or
1 and 0, for simplicity). A distinct lock is associated with each
database item X. If the value of the lock on X is 1, item X cannot be
accessed by a database operation that requests the item. If the
value of the lock on X is 0, the item can be accessed when
requested, and the lock value is changed to 1.We refer to the
current value (or state) of the lock associated with item X as
lock(X).
17 Explain the modes in which data item may be locked. CO4 K1
There are two modes in which data item can be locked, they are:
(i) shared –if a transaction Ti obtains this lock on an item, then it
can read the item but not write
(ii) exclusive – if a transaction obtains this lock on an item, then it
can read as well as write item.
18 Define Deadlock CO4 K1
Define:
i) Bit-level striping ii) Block-level striping.
The process of splitting the bits of each byte across multiple
27 disks is called bit-level striping. CO4 K1
Block-level striping is the process of splitting blocks across
multiple disks.
11. Part A Question & Answer
S.No Question and Answers CO K
What factors should be taken into account
when choosing a RAID level?
The factors to be taken into account when choosing a RAID
level are
28 Monetary cost of extra disk storage requirements CO4 K1
Performance requirements in terms of number of I/O
operations
Performance when a disk has failed
Performance during rebuild.
Define the terms
records ii) files iii) types of records
Data is usually stored in the form of records, each consisting
of a collection of related data values.
29 A file is a sequence of records The two type of records are CO4 K1
Fixed length records – in which all records have the same
size.
Variable length records – in which different records have
different sizes.
List the possible ways of organizing records in
files.
The possible ways of organizing records in files are:
30 CO4 K1
Heap file organization
Sequential file organization
Hashing file organization
Clustering file organization
Explain i) heap file organization ii)
Sequential file organization.
50 CO4 K2
10 Consider the following schedules. The actions are listed in the order CO4 K1
they are scheduled, and prefixed with the transaction name.
S1: T2:R(Z), T2:R(Y), T2:W(Y), T3:R(Y), T3:R(Z), T1:R(X),
T1:W(X), T3:W(Y), T3:W(Z), T2:R(X), T1:R(Y) , T1:W(Y), T2:W(X)
S2: T3:R(Y), T3:R(Z), T1:R(X), T1:W(X), T3:W(Y), T3:W(Z),
T2:R(Z), T1:R(Y), T1:W(Y), T2:R(Y), T2:W(Y), T2:R(X), T2:W(X)
For each of the schedules, answer the following questions:
i) What is the precedence graph for the schedule?
ii) Is the schedule conflict-serializable? If so, what are all the
conflict equivalent serial schedules?
iii) Is the schedule view-serializable? If so, what are all the view
equivalent serial schedules?
11 Consider the following schedules. The actions are listed in the order CO4 K1
they are scheduled
S1: R1(X), R3(X), W1(X), R2(X), W3(X)
S2: R3(X), R2(X), W3(X), R1(X), W1(X)
For each of the schedules, answer the following questions:
What is the precedence graph for the schedule?
i)Which of the following are conflict serializable schedule , Find the
equivalent serial schedule
S. No. PART B CO K
List the different levels in RAID technology and explain its
12 CO4 K1
features.
Describe the different method of implementing variable length
13 CO4 K1
records.
Fig: Transaction
layers for a
prescription
purchase.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
The final database to be reviewed at the pharmacy is the order database. This
database contains all the outstanding orders that have been placed with all the
wholesalers. They may represent simple inventory replenishment orders or special
orders for drugs not normally stocked. If the orders are for narcotic items, or
Schedule 2 drugs, special order and tracking requirements must be met to satisfy the
requirements of the Drug Enforcement Administration (DEA). Figure shows the
database entities involved at the pharmacy layer.
Timestamps
With each transaction Ti in the system, we associate a unique fixed timestamp, denoted
by TS(Ti ).
This timestamp is assigned by the database system before the transaction Ti starts
execution. If a transaction Ti has been assigned timestamp TS(Ti ), and a new
transaction Tj enters the system, then TS(Ti ) < TS(Tj ).
To implement this scheme,we associate with each data item Q two timestamp
values:
a.If TS(Ti ) < W-timestamp(Q), then Ti needs to read a value of Q that was already
overwritten. Hence, the read operation is rejected, and Ti is rolled back.
b.If TS(Ti ) ≥ W-timestamp(Q), then the read operation is executed, and R-timestamp(Q) is
set to the maximum of R-timestamp(Q) and TS(Ti ).
15. CONTENT BEYOND THE SYLLABUS
Timestamp based Protocol for Concurrency Control
2. Suppose that transaction Ti issues write(Q).
a.If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti is producing was needed
previously, and the system assumed that that value would never be produced.
Hence, the system rejects the write operation and rolls Ti back.
issuance of either a read or write operation, the system assigns it a new timestamp
The protocol can generate schedules that are not recoverable. However,
it can be extended to make the schedules recoverable, in one of several
ways:
Thomas Write Rule provides the guarantee of serializability order for the protocol.
It improves the Basic Timestamp Ordering Algorithm.
14.REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND
TO INDUSTRY
Application and Uses of Database Management System (DBMS)
It must have only one parent for each child node but parent nodes can have more
than one child. Multiple parents are not allowed. This is the major difference
between the hierarchical and network database model. The first node of the tree is
called the root node. When data needs to be retrieved then the whole tree is
traversed starting from the root node. This model represents one- to- many
relationships.
Let us see one example: Let us assume that we have a main directory which
contains other subdirectories. Each subdirectory contains more files and directories.
Each directory or file can be in one directory only i.e. it has only one parent.
16. ASSESSMENT SCHEDULE
TEXT BOOKS:
REFERENCES:
Design E-R model for the following and also apply normalization
1) Blood bank management system
Hospitals will get register to request the blood they want. And some donors will get signup to
this blood bank to donate the blood. These donors will be available to donate in the particular
areas according to the registered data. The hospitals will request for the blood and blood bank
will provide the details of donors near to the hospital. Blood bank also shows the availability of
blood groups to the hospitals. We can also maintain the data of donated blood to the hospitals.
4) Railway system
Users can book the train tickets to reach their destination. n this option includes things like the
present station and destination station and the train that they want to travel in and provide the
user to check the details of the train by using the train id and it must also show the details of
train arrival time, in which platform the train is arriving and departure timings of the train. also
add an option in which that will allow the user to book a meal while traveling on the train. And
we can also add the option which shows the price range of a different class of booking like AC,
second class, sleeper, and others. And try to think yourself to add any options.
5) Hospital Data Management
assign unique IDs to the patients and store the relevant information under the same. add the
patient’s name, personal details, contact number, disease name, and the treatment the patient
is going through. mention under which hospital department the patient is (such as cardiac,
gastro, etc.). add information about the hospital’s doctors. A doctor can treat multiple patients,
and he/she would have a unique ID as well. Doctors would also be classified in different
departments. add the information of ward boys and nurses working in the hospital and
assigned to different rooms. Patients would get admitted into rooms, so add that information in
your database too.
Thank you
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the
respective group / learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender immediately by e-mail if you
have received this document by mistake and delete this document from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in
relianceon the contents of this information is strictly prohibited.