Block 3
Block 3
MANAGEMENT
Structure Page Nos.
10.0 Introduction
10.1 Objectives
10.2 The Transactions
10.2.1 Properties of a Transaction
10.2.2 States of a Transaction
10.3 Concurrent Transactions
10.3.1 Transaction Schedule
10.3.2 Problems of Concurrent Transactions
10.4 The Locking Protocol
10.4.1 Serialisable Schedule
10.4.2 Locks
10.4.3 Two-Phase Locking (2PL)
10.5 Deadlock Handling and its Prevention
10.6 Optimistic Concurrency Control
10.7 Timestamp-Based Protocols
10.7.1 Timestamp Based Concurrency Control
10.7.2 Multi-version Technique
10.8 Weak Level of Consistency and SQL commands for Transactions
10.9 Summary
10.10 Solutions/ Answers
10.0 INTRODUCTION
One of the main advantages of storing data in an integrated repository or a database is
to allow sharing of data amongst multiple users. Several users access the database or
perform transactions at the same time. What if a user’s transactions try to access a data
item that is being used /modified by another transaction? This unit attempts to provide
details on how concurrent transactions are executed under the control of DBMS.
However, in order to explain the concurrent transactions, first this unit describes the
term transaction. Concurrent execution of user programs is essential for better
performance of DBMS, as concurrent running of several user programs keeps utilising
CPU time efficiently, since disk accesses are frequent and are relatively slow in case of
DBMS. This unit not only explains the issues of concurrent transaction but also explains
algorithms to control those problems.
10.1 OBJECTIVES
After going through this unit, you should be able to:
• define the term database transactions and their properties;
• describe the issues of concurrent transactions;
• explain the mechanism to prevent issues arising due to concurrently
executing transactions;
• describe the principles of locking and serialisability; and
• describe concepts of deadlock and its prevention.
Another similar example may be the transfer of money from account number X to
account number Y. This transaction may be written as:
2
Write the value of X to the data block
A block of data on secondary storage may contain many data items, as its size may be in
MBs. Further, a transaction processing system may modify several data items
simultaneously, therefore, an interesting question for a DBMS is: when to write a data
block back to secondary storage? This process is called Buffer management. You may
refer to further readings for details on this topic.
Transactions have certain desirable properties. Let us look into the properties of a
transaction.
10.2.1 Properties of a Transaction
A transaction has four basic properties. These are:
• Atomicity
• Consistency
• Isolation
• Durability
These are also called the ACID properties of transactions.
Atomicity: It defines a transaction to be a single unit of processing. In other words,
either a transaction will be done completely or not at all. For example, in the transaction
of Example 2, the transaction is reading and writing more than one data items. The
atomicity property requires either operations on both the data item to be performed or
not at all.
Consistency: This property ensures that complete transaction execution takes a
database from one consistent state to another consistent state. If a transaction fails even
then the database should come back to a consistent state, i.e., either to the database state
that was before the start of the transaction or the database state after the end of the
transaction.
Isolation or Independence: The isolation property states that the updates of a transaction
should not be visible till they are committed. Isolation guarantees that the progress of a
transaction does not affect the outcome of another transaction. For example, if another
transaction that is a withdrawal transaction which withdraws an amount of Rs. 5000
from X account is in progress, then failure or success of the transaction of example 2,
should not affect the outcome of this transaction. Only the state of the data that has been
read by the transaction should determine the outcome of this transaction.
Durability: This property necessitates that once a transaction has been committed, the
changes made by it be never lost because of subsequent failure. Thus, a transaction is
also a basic unit of recovery. The details of transaction-based recovery are discussed in
the next unit.
3
A transaction can be in any one of the states during its execution. These states are
displayed in Figure 2.
Execute
Start Commit
Abort/
Rollback
A:Read X
Subtract
4
100
Write X
B:Read Y
Add 100
Write Y
Let us suppose an auditor wants to know the total assets of Mr. Sharma. S/he executes
the following transaction:
Transaction T2:
Read X
Read Y
Display X+Y
Suppose both of these transactions are issued simultaneously, then the execution of
these instructions can be mixed in many ways. This is also called the Schedule. Let us
define this term in more detail.
Consider that a database system has n active transactions, namely T1, T2, …, Tn. Each
of these transactions can be represented using a transaction program consisting of
operations, as shown in Example 1 and Example 2. A schedule, say S, is defined as the
sequential ordering of the operations of the ‘n’ interleaved transactions, where the
sequence or order of operations of an individual transaction is maintained.
For example, the two transactions TA and TB, as given below, if executed in parallel,
may produce a schedule:
TA TB
READ X READ X
WRITE X WRITE X
5
Read Y Read Y Y= 100000
Add 100 Add 100 100100
Write Y Write Y Y= 100100
a) Complete execution of T2 is before T1 starts, then sum X+Y will show the
correct assets.
Thus, there can be a possibility of anomalies when the transactions T1 and T2 are
allowed to execute in parallel. Let us define the anomalies of concurrent execution of
transactions more precisely.
Let us assume the following transactions (assuming there will not be errors in datawhile
execution of transactions)
Transaction T3 and T4: T3 reads the balance of account X and subtracts a withdrawal
amount of Rs. 5000, whereas T4 reads the balance of account X and adds an amount of
Rs. 3000
6
T3 T4
READ X READ X
SUB 5000 ADD 3000
WRITE X WRITE X
T3 T4 Value of X
T3 T4
READ X 10000
READ X 10000
SUB 5000 5000
ADD 3000 13000
WRITE X 5000
WRITE X 13000
After the execution of both transactions, the value X is 13000, while the
semantically correct value should be 8000. The problem occurred as the update
made by T3 has been overwritten by T4. The root cause of the problem was the
fact that both the transactions had read the value of X as 10000. Thus, one of
the two updates has been lost and we say that a lost update has occurred.
There is one more way in which the lost updates can arise. Consider the
following part of some transactions T5 and T6; each of which increases the
value of X by 1000.
T5 T6
Update X 3000
Update X 4000
ROLLBACK 2000
Here, transactions T5 and T6 update the same item X (please note that an
Update include Reading, Modifying and Write operations). Thereafter, T5
decides to undo its action and rolls back, causing the value of X to go back to
the original value 2000. In this case, the update performed by T6 had got lost
and a lost update is said to have occurred.
2. Unrepeatable read Anomaly: Suppose transaction T7 reads X twice during
its execution. If it did not update X itself, it could be very disturbing to see a
different value of X in its next read. But this could occur if, between the two
read operations, another transaction modifies X.
T7 T8
READ X 2000
Update X 3000
READ X 3000
Thus, inconsistent values are read, and the results of the transaction may be in
7
error.
3. Dirty Read Anomaly: T10 reads a value which has been updated by T9.
This update has not been committed and T9 aborts.
T9 T10
Update X 500
READ X 500
ROLLBACK 200 ?
Here T10 reads a value that has been updated by transaction T9 that has been
aborted.Thus, T10 has read a value that would never exist in the database and
hence the problem.
Please note that all three problems that we have discussed so far are primarily due to
the violation of the Isolation property of the concurrently executing transactions that
use the same data items.
Well, one of the commonest techniques used for this purpose is to restrict access to data
items that are being read or written by one transaction and are also being written by
another transaction. This technique is called locking. Let us discuss locking in more
detail in the next section.
2)
What are the anomalies of concurrent transactions? Can these problems
occur in transactions which do not read the same data values?
……………………………………………………………………………………
……………………………………………………………………………………
8
……………………………………………………………………………………
3) What is a Commit state? Can you rollback after the transaction commits?
……………………………………………………………………………………
……………………………………………………………………………………
If the operations of two transactions conflict with each other, how to determine that no
concurrency-related problems have occurred in the transaction execution? For this
purpose, let us define the term – Schedule, Serial Schedule, interleaved schedule and
Serializable Schedule.
A serial schedule is one in which the actions/operations of one transaction are performed
at a time. This is followed by the actions/operations of the next transaction and so on.
For example, Schedule A and Schedule B of Figure 5 are serial schedules, as in no
schedule operations of transactions T1 and Transaction T2 interleave with each other.
9
Schedule C: An Interleaved Schedule
Schedule T1 T2
READ X READ X
SUBTRACT 100 SUBTRACT 100
READ X READ X
WRITE X WRITE X
READ Y READ Y
READ Y READ Y
ADD 100 ADD 100
DISPLAY X+Y DISPLAY X+Y
WRITE Y WRITE Y
Figure 6 (a): An Interleaved Schedule
You may observe that the Schedule C in Figure 6 is an interleaved schedule. It has
conflicting operations, like reading and writing X and Y. There can be many interleaved
schedules of transactions T1 and T2. How do you identify which two schedules are
equivalent? Well, for that, let us define the term Conflict Equivalence.
Conflict Equivalence: Two schedules are defined as conflict equivalent if any two
conflicting operations in the two schedules are in the same order. For example, Schedule
D, as given below, is not a conflict equivalent schedule of Schedule C, as the sequence
of WRITE X by T1 and READ X by T2 are not in the same order in the two interleaved
schedules.
Any schedule that produces the same results as a serial schedule is called a serialisable
schedule. The basis of serialisability is taken from the notion of a serial schedule.
Considering there are two concurrent transactions, T1 and T2, how many different serial
schedules are possible for these two transactions? The possible serial schedules for two
transactions, T1 and T2, are:
T1 followed by T2 OR T2 followed by T1.
Likewise, the number of possible serial schedules with three concurrent transactions
would be defined by the number of possible permutations, which are:
But how can a schedule be determined to be serialisable or not? Is there any algorithmic
way of determining whether a schedule is serialisable or not?
10
Using the notion of precedence graph, an algorithm can be devised to determine whether
an interleaved schedule is serialisable or not. In this graph, the transactions of the
schedule are represented as the nodes. This graph also has directed edges. An edgefrom
the node representing transactions Ti to node Tj means that there exists a conflicting
operation between Ti and Tj and Ti precedes Tj in some conflicting operations. The
basic principle for determining a serialisable schedule is that the precedence graph,
which determines the sequences of occurrence of transactions, does not contain a cycle.
Given a precedence graph with no cycles, then it must be equivalent to a serial schedule.
The steps of constructing a precedence graph are:
1. Create a node for every transaction in the schedule.
2. Find the precedence relationships in conflicting operations. Conflicting
operations are (read-write) or (write-read) or (write–write) on the same data
item in two different transactions. But how to find them?
2.1 For a transaction Ti, which reads an item A, find a transaction Tj that writes
A later in the schedule. If such a transaction is found, draw an edge from Ti
to Tj.
2.2 For a transaction Ti, which has written an item A, find a transaction Tj later
in the schedule that reads A. If such a transaction is found, draw an edge
from Ti to Tj.
2.3 For a transaction Ti which has written an item A, find a transaction Tj that
writes A later than Ti. If such a transaction is found, draw an edge from Ti
to Tj.
3. If there is any cycle in the graph, the schedule is not serialisable, otherwise, find
the equivalent serial schedule of the transaction by traversing the transaction
nodes starting with the node that has no input edge.
Let us explain the algorithm with the help of the following example.
Example:
Let us use this algorithm to check whether the schedule given in Figure 6(a) is
Serialisable. Figure 7 shows the required graph. Please note as per step 1, we draw
the two nodes for T1 and T2. In the schedule given in Figure 6, please note that
the transaction T2 reads data item X, which is subsequently written by T1, thus
there is an edge from T2 to T1 (clause 2.1). Also, T2 reads data item Y, which is
subsequently written by T1, thus there is an edge from T2 to T1 (clause 2.1).
However, that edge already exists, so we do not need to redo it. Please note that
there are no cycles in the graph, thus, the schedule given in Figure 6 is
serialisable. The equivalent serial schedule (as per step 3) would be T2 followed
by T1, which is serial Schedule–A of Figure 5.
T1 T2
Please note that the schedule given in Figure 6(b) is not serialsable, because in that
schedule, the two edges that exist between nodes T1 and T2 are:
• T1 writes X which is later read by T2 (clause 2.2), so there exists an edge from
T1 to T2.
• T2 reads Y which is later written by T1 (clause 2.1), so there exists an edge
11
fromT2 to T1.
T1 T2
Please note that the graph above has a cycle T1-T2-T1, therefore it is not serialisable.
10.4.2 Locks
Serialisability is just a test of whether a given interleaved schedule is serialisable or
has a concurrency related problem. However, it does not ensure that the interleaved
concurrent transactions do not have any concurrency related problems. This can be done
by using locks. So let us discuss what the different types of locks are, and then how
locking ensures serialisability of executing transactions.
Types of Locks
There are two basic types of locks:
• Binary lock: This locking mechanism has two states for a data item: locked or
unlocked.
• Multiple-mode locks: In this locking type each data item can be in three states
read locked or shared locked, write locked or exclusive locked or unlocked.
Let us first take an example for binary locking and explain how it solves the concurrency
related problems. Let us reconsider the transactions T1 and T2 and schedule given in
Figure 4 (c) for this purpose; however, we will add required binary locks to them.
Schedule T1 T2
LOCK X LOCK X
READ X READ X
SUBTRACT 100 SUBTRACT 100
WRITE X WRITE X
UNLOCK X UNLOCK X
LOCK X LOCK X
LOCK Y LOCK Y
READ X READ X
READ Y READ Y
DISPLAY X+Y DISPLAY X+Y
UNLOCK X UNLOCK X
UNLOCK Y UNLOCK Y
LOCK Y LOCK Y
READ Y READ Y
ADD 100 ADD 100
WRITE Y WRITE Y
UNLOCK Y UNLOCK Y
Figure 9: An incorrect locking implementation
Does the locking as done above solve the problem of concurrent transactions? No, the
same problems remain. Try working with the old values given in figure 4(c). Thus,
locking should be done with some logic to make sure that locking results in no
12
concurrency related problem. One such solution is given below:
Schedule T1 T2
LOCK X LOCK X
LOCK Y LOCK Y
READ X READ X
SUBTRACT 100 SUBTRACT 100
WRITE X WRITE X
LOCK X (ISSUED LOCK X: denied as T1 holds the lock.
BYT2) The transaction T2 Waits and T1
continues.
READ Y READ Y
ADD 100 ADD 100
WRITE Y WRITE Y
UNLOCK X UNLOCK X
LOCK X request of T2 on X will be granted
and transaction T2 resumes.
UNLOCK Y UNLOCK Y
LOCK Y LOCK Y
READ X READ X
READ Y READ Y
DISPLAY X+Y DISPLAY X+Y
UNLOCK X UNLOCK X
UNLOCK Y UNLOCK Y
As shown in Figure 10, when you obtain all the locks at the beginning of the transaction
and release them at the end, it ensures that transactions are executed with no
concurrency-related problems. However, such a scheme limits the concurrency. We will
discuss a two-phase locking method in the next subsection that provides sufficient
concurrency. However, let us first discuss multiple-mode locks.
Multiple-mode locks: It offers two locks: shared locks and exclusive locks. But why
do we need these two locks? There are many transactions in the database system that
never update the data values. These transactions can coexist with other transactions that
update the database. In such a situation multiple reads are allowed on a data item, so
multiple transactions can lock a data item in the shared or read lock. On the other hand,
if a transaction is an updating transaction, that is, it updates the data items, it must ensure
that no other transaction can access (read or write) those data items that it wants to
update. In this case, the transaction places an exclusive lock on the data items. Thus, a
higher level of concurrency can be achieved compared to the binary locking scheme.
The properties of shared and exclusive locks are summarised below:
a) Shared lock
• It is requested by a transaction that wants to just read the value of the data item.
• A shared lock on a data item does not allow an exclusive lock to be placed but
permits any number of shared locks to be placed on that item.
b) Exclusive lock
We explain these locks with the help of an example. However, you may refer to further
readings for more information on multiple-mode locking. We will once again consider
13
the transactions T1 and T2, but in addition, a transaction T11 that finds the total of
accounts Y and Z.
Schedule T1 T2 T11
S_LOCK X S_LOCK X
S_LOCK Y S_LOCK Y
READ X READ X
S_LOCK Y S_LOCK Y
S_LOCK Z S_LOCK Z
READ Y
READ Z
X_LOCK X X_LOCK X. The exclusive lock request on X is denied as T2
holds the Shared lock. The transaction T1 Waits.
READ Y READ Y
DISPLAY X+Y DISPLAY X+Y
UNLOCK X UNLOCK X
X_LOCK Y X_LOCK Y. The previous exclusive lock requeston X is
granted as X is unlocked. But the new exclusive lock request
on Y is not granted as Y is locked by T2 and T11 in Shared
mode. Thus, T1waits till both T2 and T11 will release the
Shared lock on Y.
DISPLAY Y+Z DISPLAY Y+Z
UNLOCK Y UNLOCK Y
UNLOCK Y UNLOCK Y
UNLOCK Z UNLOCK Z
READ X READ X
SUBTRACT 100 SUBTRACT 100
WRITE X WRITE X
READ Y READ Y
ADD 100 ADD 100
WRITE Y WRITE Y
UNLOCK X UNLOCK X
UNLOCK Y UNLOCK Y
As shown in Figure 11, locking can result in a serialisable schedule. Next, we discuss
the concept of granularity of locking.
Granularity of Locking
In general, locks may be allowed on the following database items:
1) The complete database itself
2) A file of the database
3) A page or disk block of data
4) A record in a table
5) A data item of an attribute
Granularity of locking depends on the size of database item being locked. A coarse
granularity locking means locking a larger data item, e.g. the complete database or file
or page etc. The fine granularity locking is locking the database items of the smaller
size, e.g. the record locking or the data item locking.
In the next section, we discuss the locking protocol that ensures correct execution of
concurrent transactions.
In this section we try to answer the question: Can you release locks a bit early and still
have no concurrency related problem? Yes, we can do it if we use two-phase locking
protocol. The two-phase locking protocol consists of two phases:
Phase 1: The lock acquisition phase: If a transaction T wants to read an object, it
needs to obtain the S (shared) lock. If transaction T wants to modify an object,
it needsto obtain X (exclusive) lock. No conflicting locks are granted to a
transaction. New locks on items can be acquired but no lock can be
released, till all the locks required by the transaction are obtained.
Phase 2: Lock Release Phase: The existing locks can be released in any order, but no
new lock can be acquired after a lock has been released. The locks are held
only till they are required.
Normally, any legal schedule of transactions that follows two-phase locking protocol is
guaranteed to be serialisable. The two-phase locking protocol has been proven for its
correctness. However, the proof of this protocol is beyond the scope of this Unit. You
can refer to further readings for more details on this protocol.
There are two types of 2PL:
(1) Conservative 2PL
(2) Strict 2PL
The conservative 2PL allows the release of the lock at any time after all the locks have
been acquired. For example, you can release the locks in the schedule of Figure 10,
after you have read the values of Y and Z in transaction 11, even before the display of
the sum. This will enhance the concurrency level. The conservative 2PL is shown
graphically in Figure 12.
Time à
Figure 12: Conservative Two-Phase Locking
However, conservative 2PL suffers from the problem that it can result in loss of atomic
or isolation property of transaction as theoretically speaking, once a lock is released on
15
a data item, it can be modified by another transaction before the first transaction
commits or aborts.
To avoid such a situation, you use strict 2PL. The strict 2PL is graphically
depicted in Figure 13. However, the basic disadvantage of strict 2PL is that it
restricts concurrency as it locks the item beyond the time it is needed by a
transaction.
Number of locks
Time à
Figure 13: Strict Two-Phase Locking
Does the 2PL solve all the problems of concurrent transactions? No, the strict 2PLsolves
the problem of concurrency and atomicity; however, it introduces another problem:
“Deadlock”. Let us discuss this problem in the next section.
……………………………………………………………………………………
……………………………………………………………………………………
2) Consider the following two transactions on two bank accounts having a balance
A and B.
Transaction T1: Transfer Rs. 100 from A to B
As seen earlier, though the 2PL protocol handles the problem of serialisability, but it
causes some problems also. For example, consider the following two transactions anda
schedule involving these transactions:
TA TB
X_LOCK A X_LOCK A
X_LOCK B X_LOCK B
: :
: :
UNLOCK A UNLOCK A
UNLOCK B UNLOCK B
Schedule
T1: X_LOCK A
T2: X_LOCK B
T1: X_LOCK B
T2: X_LOCK A
As is clearly seen, the schedule causes a problem. After T1 has locked A, T2 locks B
and then T1 tries to lock B, but is unable to do so and waits for T2 to unlock B. Similarly,
T2 tries to lock A but finds that it is held by T1 which has not yet unlocked it and thus
waits for T1 to unlock A. At this stage, neither T1 nor T2 can proceed since both
transactions are waiting for the other to unlock the locked resource (refer to figure 14).
Clearly, the schedule comes to a halt in its execution. The important thing to be seen
here is that both TA and TB follow the 2PL, which guarantees serialisability. Whenever
the situation, i.e. all the transactions are waiting for a condition that will never occur,
arises, we say that a deadlock has occurred.
The deadlock can be described in terms of a directed graph called a “wait for” graph,
which is maintained by the lock manager of the DBMS. This graph G is defined by the
pair (V, E). It consists of a set of vertices/nodes V and a set of edges/arcs E. Each
transaction is represented by node and an arc from Ti à Tj, if Tj holds a lock on data
items that Ti is waiting for. When transaction Ti requests a data item currently being
held by transaction Tj then the edge Ti à Tj is inserted in the "wait for" graph. This
edge is removed only when transaction Tj is no longer holding the data item needed by
transaction Ti.
A deadlock in the system of transactions occurs, if and only if the wait-for graph
contains a cycle. Each transaction involved in the cycle is said to be deadlocked.
To detect deadlocks, a periodic check for cycles in “wait-for” graph can be done. For
example, the “wait-for” for the schedule of transactions TA and TB, as given earlier in
this section, is shown in Figure 14.
17
TA TB
In Figure 14, TA and TB are the two transactions. The two edges are present between
nodes TA and TB since each is waiting for the other to unlock a resource held by the
other, forming a cycle and causing a deadlock problem. The above case showsa direct
cycle. However, in actual situations, more than two nodes may be there in a cycle.
A deadlock is a situation that can be created because of locks. It causes transactions to
wait forever hence the name deadlock. A deadlock occurs because of the following
conditions:
Deadlock Prevention
One of the simplest approaches for avoiding a deadlock would be to acquire all the
locks at the start of the transaction. However, this approach restricts concurrency
greatly, also you may lock some of the items that are not updated by that transaction.
A deadlock prevention algorithm prevents a deadlock. It uses the approach: do not
allow circular wait. This approach rolls back some of the transactions instead of
letting them wait.
There exist two such schemes. These are:
T1 T2 T3
Wait Die
For example, consider three transactions T1, T2 and T3 with TSP(T1) < TSP(T2) <
TSP(T3), i.e. T1 is the oldest and T3 is the youngest. For example, consider the
following sequence of requests for a data item X:
T2 requests for X Action: Locks the Resource
T1 requests for X Action: Wound (Rollback) T1, as it is older.
T3 requests for X Action: T3 Waits for T2 to release X, as it is younger.
T1 T2 T3
Wound T2 Wait
It is important to check that no transaction should get rolled back repeatedly such that
it is never allowed to make progress. This is referred to starvation of a transaction. Also,
both “wait-die” and “wound-wait” scheme should avoid starvation. The number of
aborts and rollbacks will be higher in wait-die scheme than in the wound-wait scheme.
But one major problem with both schemes is that these schemes may result in
unnecessary rollbacks. You can refer to further readings for more details on deadlock-
related schemes.
Is locking the only way to prevent concurrency-related problems? There exist some
other methods too. One such method is called an Optimistic Concurrency control. Let
us discuss it in more detail in this section.
a) READ Phase: In this phase, a transaction makes a local copy of the data items
needed by it in its own memory space. The transaction then makes changes in
the local copies of data items.
b) VALIDATE Phase: In this phase, the transaction validates the values of data
items read by the transaction during the READ phase. This validation ensures
that no other transaction has changed the values of the data items, while this
transaction was modifying the local copy of the data items. In case, the validation
is unsuccessful, then this transaction is aborted, and the local updates of data
items are discarded. However, in case of successful validation, the Write phase
is performed.
c) WRITE Phase: This phase is performed if the Validate Phase is successful. In
this phase, the transaction commits and all the data item updates made by the
transaction in the local copies, are applied to the database.
To explain the optimistic concurrency control, the following terms are used:
• Write set (WS(T)): Given a transaction T, WS(T) is the set of data items that
would be written by it.
• Read set (RS(T)): Given a transaction T, RS(T) is the set of data items that
would be read by it.
More details on this scheme are available in further readings. But let us show this
scheme here with the help of the following examples:Consider the set for transactions
T1 and T2.
T1 T2
Phase Operation Phase Operatio
n
- - Read Reads the RS(T2), say
variables X and Y and
performs updating of local
values
Read Reads the RS(T1), say - -
variable A and B and
performs updating of local
values
Validate Validate the values of RS(T1) - -
- - Validate Validate the values of RS(T2)
Write Transaction Commits and - -
writes the updated values
WS(T1) in the database.
- - Write Transaction Commits and
writes the updated values
WS(T2) in the database.
In this example, both T1 and T2 commit. Please note that read sets RS(T1) and
RS(T2) are disjoint, also the Write sets are also disjoint, thus, no concurrency-related
problem occurs.
In this case, both T1 and T2 get aborted as they fail during validate phase while
only T3 is committed. Optimistic concurrency control performs its checking at
the transaction commit point in a validation phase. The serialisation order is
determined by the time of the transaction validation phase.
A timestamp is used to identify the sequence of transactions. This section explains the
timestamp-based concurrent execution of transactions and the multi-version technique.
1. Read timestamp: Read timestamp of a data item, say X, is the highest timestamp
of the transaction that has read that data item.
2. Write timestamp: Write timestamp of a data item is the highest timestamp of
the transaction that has written that data item.
Assume that a data item Di has a read timestamp rDSi and write timestamp wDSi; and
a transaction with timestamp trSi makes a request to READ Di, then:
For example, consider the transactions T1 with timestamp 1, T2 with timestamp 2 and
T3 with timestamp 3. A data item D1 is read by these transactions in the sequence T2,
T1 and T3. In addition, they request to write the data item D1 in the sequence T3, T1,
T2. Then how are these sequences allowed or rejected? The following Figure shows
this sequence:
Thus, you can observe that most of the rules have been applied in the example, as given
above.
Multi-version technique, as the name suggests, allows the previous versions of data to
be stored. Thus, this scheme avoids the rollback of transactions. This scheme also uses
read timestamp and write timestamp for each version of the data item. This technique
can be defined as:
Consider a data item Di and its version Di(Ver1), Di(Ver2), …, Di(Vern). For each
Di(Veri), a read timestamp rDSi(Veri) and a write timestamp wDSi(Veri) are stored.
The read timestamp for the multi-version technique is the highest timestamp of the
22
transactions that have read the ith version; whereas the write timestamp is the timestamp
of the transaction that has resulted in the creation of this version.
In this technique read operation reads the version whose timestamp is equal to or just
less than the transaction. While the case of a write operation by a transaction with
timestamp trSi, which is trying to write a data item Di that has k versions, the following
three cases are possible:
SERIALISABLE: In this isolation level, the transactions follow the ACID properties,
therefore, eliminating problems due to concurrent execution of transactions, however,
restricting the concurrency.
REPEATABLE READ: This isolation level allows the reading of data items only after
the related transaction has been committed. However, in this isolation level a recently
committed transaction may perform addition or removal of some rows.
READ COMMITTED: This isolation level also allows the reading of data items only
after the related transaction has been committed. However, if a transaction repeats a
read operation of a specific data item, then it is not guaranteed to obtain the same value
for that data item.
READ UNCOMMITTED: This is the weakest isolation level and does not put any
restriction on the concurrent execution of transactions. However, it may result in every
concurrency-related problem.
To set a typical isolation level for transaction execution, you may set the transaction
isolation level using the command:
You may also change the isolation level by using the alter command. You can commit
a transaction by using COMMIT or roll it back by using ROLLBACK in the transaction.
You can refer to further readings for a detailed discussion on weak consistency levels
and SQL commands related to transaction control.
23
T1: S_LOCK A -- --
-- T2: X_LOCK B --
-- T2: S_LOCK A --
-- -- T3: X_LOCK C
-- T2: S_LOCK C --
T1: S_LOCK B -- --
T1: S_LOCK A -- --
-- -- T3: S_LOCK A
All the unlocking requests start from here
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
3) What are the different types of timestamps for a data item? Why are they used?
…………………………………………………………………………………
…………………………………………………………………………………
10.9 SUMMARY
In this unit you have gone through the concepts of transaction and Concurrency
Management. A transaction is a sequence of many actions. Concurrency control deals
with ensuring that two or more transactions do not get into each other’s way, i.e.,
updates of data items made by one transaction do not affect the updates made by the
other transactions on the same data items.
Concurrency Control is usually done via locking. If a transaction tries to lock a resource
already locked by some other transaction, it cannot do so and waits for the source to be
24
unlocked. This unit also presents the Two-Phase Locking (2PL) protocol that ensures
that transactions do not encounter concurrency-related problems. A system is in a
deadlock state if there exists a set of transactions, which are waiting for each other to
complete. We can use a deadlock prevention protocol to ensure that the system will
never enter a deadlock state.
10.10 SOLUTIONS/ANSWERS
A transaction can update more than one data values. Some transactions can do
writing of data without reading a data value.
A simple transaction example may be: Updating the stock inventory of an item
that has been issued.
1) There are six possible results, corresponding to six possible serial schedules:
2)
Schedule T1 T2
READ A READ A
A = A - 100 A = A - 100
WRITE A WRITE A
READ A READ A
READ B READ B
READ B READ B
RESULT = A * B RESULT = A * B
DISPLAY RESULT DISPLAY RESULT
B = B + 100 B = B + 100
WRITE B WRITE B
Please make the precedence graph and find out that the schedule is not serialisable.
3) This is a schedule using conservative 2 PL.
Schedule T1 T2
Lock A Lock A
Lock B Lock B
Read A Read A
A = A - 100 A = A - 100
Write A Write A
Unlock A Unlock A
Lock A Lock A: Granted
Lock B Lock B: Waits
Read B Read B
B = B + 100 B = B + 100
Write B Write B
Unlock B Unlock B
Read A Read A
Read B Read B
Result = A * B Result = A * B
Display Result Display Result
Unlock A Unlock A
Unlock B Unlock B
You must also make the schedules using read and exclusive lock and a schedule in
strict 2PL.
1) Transaction T1 gets the shared lock on A, T2 gets the exclusive lock on B and
Shared lock on A, while transaction T3 gets the exclusive lock on C.
26
The Wait for graph for the transactions for the given schedule is:
T1 T3
T2
2) The basic philosophy for optimistic concurrency control is the optimism that
nothing will go wrong, so let the transaction interleave in any fashion, but to
avoid any concurrency-related problem, you just validate your assumption
before you make changes permanent. This is a good model for situations
having a low rate of transactions.
3) Two basic timestamps are associated with a data item – Read timestamp and
Write timestamp. They are primarily used to see that a transaction that is
reading data should not be too old that some younger transaction has already
written it, or a transaction that wants to write that data item should not be
younger than the transaction that has read or written it.
27
UNIT 11 DATABASE RECOVERY AND
SECURITY
11.0 Introduction
11.1 Objectives
11.2 What Is Recovery?
11.2.1 Kinds of Failures
11.2.2 Storage Structures for Recovery
11.2.3 Recovery and Atomicity
11.2.4 Transactions and Recovery
11.2.5 Recovery in Small Databases
11.3 Transaction Recovery
11.3.1 Log-Based Recovery
11.3.2 Checkpoints in Recovery
11.3.3 Recovery Algorithms
11.3.4 Recovery with Concurrent Transactions
11.3.5 Buffer Management
11.3.6 Remote Backup Systems
11.4 Security in Commercial Databases
11.4.1 Common Database Security Failures
11.4.2 Database Security Levels
11.4.3 Relationship Between Security and Integrity
11.4.4 Difference Between Operating System And Database Security
11.5 Access Control
11.5.1 Authorisation of Data Items
11.5.2 A Basic Model of Database Access Control
11.5.3 SQL Support for Security and Recovery
11.6 Audit Trails in Databases
11.7 Summary
11.8 Solutions/Answers
11.0 INTRODUCTION
In the previous unit of this block, you have gone through the details of transactions, their
properties, and the management of concurrent transactions. In this unit, you will be
introduced to two important issues relating to database management systems – how to
deal with database failures and how to handle database security. A computer system
suffers from different types of failures. A DBMS controls very critical data of an
organisation and, therefore, must be reliable. However, the reliabilityof the database
system is also linked to the reliability of the computer system on which it runs. The
types of failures that the computer system is likely to be subjected to include failures of
components or subsystems, software failures, power outages, accidents, unforeseen
situations and natural or man-made disasters. Database recovery techniques are
methods of making the database consistent till the last possible consistent state. Thus,
the basic objective of the recovery system is to resume the database system to the point
of failure with almost no loss of information. Further, the recovery cost should be
justifiable. In this unit, we will discuss various types of failures and some of the
approaches to database recovery.
The second main issue that is being discussed in this unit is Database security.
“Database security” is the protection of the information contained in the database against
unauthorised access, modification, or destruction. The first condition for security is to
have Database integrity. “Database integrity” is the mechanism that is applied to ensure
that the data in the database is consistent. In addition, the unit discusses various access
1
control mechanisms for database access. Finally, the unit introduces the use of audit
trails in a database system.
11.1 OBJECTIVES
The recovery mechanisms must ensure that a consistent state of the database can be
restored under all circumstances. In case of transaction abort or deadlock, the system
remains in control and can deal with the failure, but in case of a system failure, the
system loses control because the computer itself has failed. Will the results of such
failure be catastrophic? A database contains a huge amount of useful information, and
any system failure should be recognised on the system restart. The DBMS should
recover from any such failures. Let us first discuss the kinds of failure for ascertaining
the approach of recovery.
A DBMS may encounter a failure. These failures may be of the following types:
2
2. System crash: This kind of failure includes hardware/software failure of a
computer system. In addition, a sudden power failure can also result in a system crash,
which may result in the following:
• Loss or corruption of non-volatile storage contents.
• Loss of contents of the entire disk or parts of the disk. However, such loss is
assumed to be detectable; for example, the checksums used on disk
drives can detect this failure.
11.2.2 Storage Structures for Recovery
All these failures result in the inconsistent state of a database. Thus, we need a
recovery scheme in a database system, but before we discuss recovery, let us briefly
define the storage structure from the recovery point of view.
There are various ways for storing information for database system recovery. These
are:
Volatile storage: Volatile storage does not survive system crashes. Examples of
volatile storage are - the main memory or cache memory of a database server.
Non-volatile storage: The non-volatile storage survives the system crashes if it does
not involve disk failure. Examples of non-volatile storage are - magnetic disk,
magnetic tape, flash memory, and non-volatile (battery-backed) RAM.
Stable storage: This is a mythical form of storage structure that is assumed to
survive all failures. This storage structure is assumed to maintain multiple copies on
distinct non-volatile media, which may be independent disks. Further, data loss in
case of disaster can be protected by keeping multiple copies of data at remote sites. In
practice, software failures are more common than hardware failures. Fortunately,
recovery from software failures is much quicker.
11.2.3 Recovery and Atomicity
The concept of recovery relates to the atomic nature of a transaction. Atomicity is the
property of a transaction, which states that a transaction is a complete unit. Thus, the
execution of a part transaction can lead to an inconsistent state of the database, which
may require database recovery. Let us explain this with the help of an example:
Assume that a transaction transfers Rs.2000/- from A’s account to B’s account. For
simplicity, we are not showing any error checking in the transaction. The transaction
may be written as:
Transaction T1:
READ A
A = A – 2000
WRITE A
Failure
READ B
B = B + 2000
WRITE B
COMMIT
What would happen if the transaction failed after account A has been written back to the
database? As far as the holder of account A is concerned s/he has transferred the money
but that has never been received by account holder B.
Why did this problem occur? Because although a transaction is atomic, yet it has a life
cycle during which the database gets into an inconsistent state and failure has occurred
at that stage.
What is the solution? In this case, where the transaction has not yet committed the
3
changes made by it, the partial updates need to be undone.
The basic unit of recovery is a transaction. But how are the transactions handled during
recovery?
i. Consider that some transactions are deadlocked, then at least one of these
transactions must be restarted to break the deadlock, and thus, the partial
updates made by this restarted transaction program are to be undone to keep the
database in a consistent state. In other words, you may ROLLBACK the effect
of a transaction.
ii. A transaction has committed, but the changes made by the transaction have not
been communicated to the physical database on the hard disk. A software
failure now occurs, and the contents of the CPU/ RAM are lost. This leaves the
database in an inconsistent state. Such failure requires that on restarting the
system the database be brought to a consistent state using redo operation. The
redo operation performs the changes made by the transaction again to bring the
system to a consistent state. The database system can then be made available to
the users. The point to be noted here is that such a situation has occurred as
database updates are performed in the buffer memory. Figure 1 shows cases of
undo and redo.
TA=?
Registers
RA=4000
A= 6000 TB=?
RB=10000
B= 8000
4
are made irrelevant. You cannot
determine if the TA, TB has been written
back to A, B.
You must UNDO the transaction.
Case 2 A=4000 TA=4000 The value of A in the physical database
B=8000 TB=8000 has got updated due to buffer
management. RB is not written back to
RAM (TB). The transaction did not
COMMIT so far. Now, the transaction
T1 aborts.
You must UNDO the transaction.
Case 3 A=6000 TA=4000 The value B in the physical database has
B=8000 TB=10000 not got updated due to buffer
COMMIT management. T1 has raised the
COMMIT flag. The changes of the
transaction must be performed again.
You must REDO the transaction.
How? (Discussed in later sections).
Failures can be handled using different recovery techniques that are discussed later in
the unit. But the first question is: Do you really need recovery techniques as a failure
control mechanism? The recovery techniques are somewhat expensive both in terms of
time and memory space for small systems. In such a case, it is beneficial to avoid
failures by some checks instead of deploying recovery techniques to make the database
consistent. Also, recovery from failure involves manpower that can be used in other
productive work if failures can be avoided. It is, therefore, important to find some
general precautions that help control failures. Some of these precautions may be:
• to regulate the power supply.
• to use a failsafe secondary storage system such as RAID.
• to take periodic database backups and keep track of transactions after each
recorded state.
• to properly test transaction programs prior to use.
• to set important integrity checks in the databases as well as user interfaces.
However, it may be noted that if the database system is critical to an organisation, it
must use a DBMS that is suitably equipped with recovery procedures.
Let us first define the term transaction log in the context of DBMS. A transaction log,
in DBMS, records information about every transaction that modifies any data values in
the database. A log contains the following information about a transaction:
5
• A transaction BEGIN marker.
• Transaction identification - transaction ID, terminal ID, user ID, etc.
• The operations being performed by the transaction such as UPDATE, DELETE,
INSERT.
• The data items or objects affected by the transaction - may include the table's
name, row number and column number.
• Before or previous values (also called UNDO values) and after or changed
values (also called REDO values) of the data items that have been updated.
• A pointer to the next transaction log record, if needed.
• The COMMIT marker of the transaction.
But how do we recover using a log? Let us demonstrate this with the help of an example
having three concurrent transactions that are active on ACCOUNTS relation:
Assume that these transactions have the following log file (hypothetical structure):
Now assume at this point of time a failure occurs, then how the recovery of the
database will be done on restart.
The selection of REDO or UNDO for a transaction for the recovery is done based on
the state of the transactions. This state is determined in two steps:
• Look into the log file and find all the transactions that have started. For example,
in Figure 3, transactions T1, T2 and T3 are candidates for recovery.
• Find those transactions that have COMMITTED. REDO these transactions. All
other transactions have not COMMITTED, so they should be rolled back, so
UNDO them. For example, in Figure 3, UNDO will be performed on T1 and
T2, and REDO will be performed on T3.
Please note that in Figure 4, some of the values may not have yet been communicated
to the database, yet we need to perform UNDO as we are not sure what values have been
written back to the database. Similarly, you must perform REDO operations on
committed transactions, such as Transaction T3 in Figure 3 and Figure 4.
But how will the system recover? Once the recovery operation has been specified, the
system just determines the required REDO or UNDO values from the transaction log
and changes the inconsistent state of the database to a consistent state. (Please refer to
Figure 3 and Figure 4).
Let us consider several transactions, which are shown on a timeline, with their
respective START and COMMIT time (see Figure 5).
T1
T2
T3
T4
tf
Failure time
T1
T2
T3
T4
t1 t2 time
Checkpoint Failure
A checkpoint is taken at time t1, and a failure occurs at time t2. Checkpoint transfers
all the committed changes to the database and all the system logs to stable storage (the
storage that would not be lost). On the restart of the system after the failure, the stable
checkpointed state is restored. Thus, we need to REDO or UNDO only those
transactions that have been completed or started after the checkpoint has been taken. A
disadvantage of this scheme is that the database would not be available when the
checkpoint is being taken. In addition, some of the uncommitted data values may be put
in the physical database. To overcome the first problem, the checkpoints should be
taken at times when the system load is low. To avoid the second problem, the system
may allow the ongoing transactions to be complete while not starting any new
transactions.
In the case of Figure 6, the recovery from failure at time t2 will be as follows:
• The transaction T1 will not be considered for recovery, as the changes made by
it have already been committed and transferred to the physical database at
checkpoint t1.
• The transaction T2 has not committed till checkpoint t1 but has committed
before t2 will be REDONE.
• T3 must be UNDONE as the changes made by it before the checkpoint (we do
not know for sure if any such changes were made prior to the checkpoint) must
have been communicated to the physical database. T3 must be restarted with a
new name.
• T4 started after the checkpoint, and if we strictly follow the scheme in which
the buffers are written back only on the checkpoint, then nothing needs to be
done except restart the transaction T4 with a new name.
The restart of a transaction requires the log to keep information on the new name of the
transaction. This new transaction may be given higher priority.
But one question that remains unanswered is - during a failure, we lose database
information in RAM buffers; we may also lose the content of the log as it is also stored
in RAM buffers, so how does the log ensure recovery?
8
The answer to this question lies in the fact that for storing the log of the transaction, we
follow a Write Ahead Log Protocol. As per this protocol, the transaction logs are
written to stable storage as follows:
• UNDO portion of the log is written to stable storage prior to any updates.
and
• REDO portion of the log is written to stable storage prior to the commit.
Log-based recovery scheme can be used for any kind of failure provided you have stored
the most recent checkpoint state and most recent log as per write-ahead log protocol
into the stable storage. Stable storage from the viewpoint of external failure requires
more than one copy of such data at more than one location. You can refer to further
readings for more details on recovery and its techniques.
1) What is the need for recovery? What is the basic unit of recovery?
……………………………………………………………………………………
……………………………………………………………………………………
As discussed earlier, a database failure may bring the database into an inconsistent
state, as many ongoing transactions will simply abort. In order to bring such an
inconsistent database to a consistent state, you are required to use recovery algorithms.
Database recovery algorithms require basic data related to failed transactions. Thus, a
recovery algorithm requires the following actions:
1) Actions are taken to collect the required information for recovery while the
transactions are being processed prior to the failure.
2) Actions are taken after a failure to recover the database contents to a state
that ensures atomicity, consistency and durability.
In the context of point 1 above, it may be noted that the information about the changes
made by a transaction is recorded in a log file. In general, the sequence of logging
during the transaction execution is as follows:
• A transaction, say Ti, announces its start by putting a log record consisting of
<Ti start>.
• Before Ti executes the write(X) (see Figure 7), put a log record <Ti, X, V1,
V2>, where V1 represented the older value of X, i.e., the value of X before the
write (undo value), and V2 is the updated value of X (redo value).
• On successful completion of the last statement of the transaction Ti, a log
9
record consisting of <Ti commit> is put in the log. It may be noted that all
these log records are put in stable storage media, that is, they are not
buffered in the main memory).
Two approaches for recovery using logs are:
• Deferred database modification.
• Immediate database modification.
Deferred Database Modification
This scheme logs all the changes made by a transaction into a log file, which is stored
on stable storage. Further, these changes are not made in the database till the
transaction commits. Let us assume that transactions execute serially to simplify the
discussion. Consider a transaction Ti, it performs the following sequence of log file
actions:
Record for
Event Comments
Log file
Start of
<Ti start>
Transaction
Transaction writes a new value V for data
write(X) <Ti, X, V >
item X in the database.
The transaction has been committed, and
Transaction Ti
<Ti commit> now the deferred updates can be written to
commits
the database
• Please note that in this approach, the old values of the data items are not
saved at all, as changes are being recorded in the log file and the database is
not changed till a transaction commits. For example, the write(X), shown
above, does not write the value of X in the database till the transaction
commits. However, the record <Ti, X, V > is written to the log. All these
updates are written to the database after the transaction commits.
How is the recovery performed for the deferred database modification scheme?
For the database recovery after the failure, the log file is checked. The transactions
for which the log file contains the <Ti start> and the <Ti commit> records, the
REDO operation is performed using log record <Ti, X, V >. Why? Because in
the deferred update scheme we do not know if the changes made by this
committed transaction have been carried out in the physical database or not.
The Redo operation can be performed many times without the loss of information, so
it will be applicable even if the crash occurs during recovery. The transactions for
which <Ti start> is in the log file but not the <Ti commit>, the transaction UNDO is
not required, as it is expected that the values are not yet written to the database.
Let us explain this scheme with the help of the transactions given in Figure 7.
Figure 8 shows the state of a sample log file for three possible failure instances, namely
10
(a), (b) and (c). (Assuming that the initial balance in X is 10,000/- Y is 5,000/- and Z
has 20,000/-):
Figure 8: Log Records for Deferred Database Modification for Transactions of Figure 7
Why do you store only the Redo value in this scheme? The reason is that No UNDO is
required as updates are communicated to stable storage only after COMMIT or even
later. The following REDO operations would be required if the log on stable storage
at the time of the crash is as shown in Figure 8(a) 8(b) and 8(c).
Example:
Consider the log as it appears at three instances of time.
For each of the failures as shown in Figure 10 (a), (b) and (c), the following recovery
actions would be needed:
(a) UNDO (T1):
X ß Undo value of X, i.e. 10000;
Y ß Undo value of Y, i.e. 5000.
(b) UNDO (T2):
Z ß Undo value of Z, i.e. 20000;
REDO (T1):
X ß Redo value of X, i.e. 9000;
Y ß Redo value of Y, i.e. 6000.
(c) REDO (T1, T2) by moving forward:
X ß Redo value of X, i.e. 9000;
Y ß Redo value of Y, i.e. 6000;
Z ß Redo value of Z, i.e. 19000;
12
• Transaction log sequence numbers for log records may be assigned to log
entries. This will help in identifying the related database log pages.
• Instead of deletion the records from the physical database, record the deletion
in the log.
• Keep track of pages that have been updated in memory but have not been
written back to the physical database. Perform Redo operations for only such
pages. Also, keep track of updated pages during checkpointing.
You may refer to further readings for more details on newer methods and algorithms
of recovery.
11.3.4 Recovery with Concurrent Transactions
In general, a commercial database management system, such as a banking system,
has many users. These users can perform multiple transactions. Therefore, a
centralised database system executes many concurrent transactions. As discussed in
Unit 10, these transactions may experience concurrency-related issues and, thus, are
required to maintain serialisability. In general, these transactions share a large buffer
area and log files. Thus, a recovery scheme that allows better control of disk buffers
and large log files should be employed for such systems. One such technique is
called checkpointing. This changes the extent to which REDO or UNDO operations
are to be performed in a database system. The concept of the checkpoint has already
been discussed in section 11.3.2. How checkpoints can help in the process of
recovery is explained below:
When you use the checkpoint mechanism, you also add records of checkpoints in the
log file. A checkpoint record is of the form: <checkpoint TL>. In this case, TL is
the list of transactions, which were started, but not yet committed at the time when the
checkpoint was created. Further, we assume that at the time of creating a checkpoint
record, no transaction was allowed to proceed.
On the restart of the database system after failure, the following steps are performed
for the database recovery:
• Create two lists: UNDOLIST and REDOLIST. Initialise both lists to a NULL.
• Scan the log file for every log record, starting from the end of the file and moving
backwards, till you locate the first checkpoint record <checkpoint TL>. Perform the
following actions, for each of the log records found in this step:
o Is the log record <Ti COMMIT>?
§ If yes, then add Ti to REDOLIST.
o Is the log record <Ti START>?
§ If yes, then Is Ti NOT IN REDOLIST?
• Add Ti to UNDOLIST, as it has not yet been committed.
o For every Ti in TL: Is Ti in NOT IN REDOLIST?
• add Ti to UNDOLIST, as this transaction was active at
the time of checkpoint and has not been committed yet.
This will make the UNDOLIST and REDOLIST of the transactions. Now, you can
perform the UNDO operations followed by REDO operations using the log file, as
given in section 11.3.3.
As discussed in the previous sections, the database transactions are executed in the
memory, which contains a copy of the physical database. These database buffers are
written back to stable storage from time to time. In general, it is the memory
management service of the operating system that manages the buffers of a database
system. However, several database management schemes require their own buffer
management policies and, hence, the buffer management system. Some of these
strategies are:
13
Log Record Buffering
The recovery process requires database transactions to write logs in stable storage.
This is a very time-consuming process, as for a single transaction several log records
are to be written to stable storage. Therefore, several commercial database
management systems perform the buffering of the log file itself, which means that log
records are kept in the blocks of the main memory allocated for this purpose. The logs
are then written to the stable storage once the buffer becomes full or when a
transaction commits. This log file buffering helps in reducing disk accesses, which are
very expensive in terms of time of operation.
Database recovery requires that log records should be stored in stable storage.
Therefore, log records should be transferred from memory buffers to the table storage
as per the following scheme, called Write-Ahead logging.
• The sequence of log records in the memory buffer should be maintained in
stable storage.
• A transaction should be moved to COMMIT state only if the <Ti commit>
log record is written to stable storage.
• Prior to writing a database buffer to stable storage, related log records in the
buffer should be moved to stable storage.
Database Buffering
The database updates are performed after moving database blocks on secondary
storage to memory buffers. However, due to limited memory capacity, only a few
database blocks can be kept in the memory buffers. Therefore, database buffer
management consists of policies for deciding which blocks should be kept in database
buffers and what blocks should be removed from the database buffers back to
secondary storage. Removing a database block from the buffer requires that it is re-
written to the secondary storage. In addition, the log is written to stable storage, as per
the write-ahead logging.
•The failure of the primary database site must be detected. It may be noted that
this detection should not detect communication failure as a primary database
failure. Thus, it may use a failsafe communication, which may have
alternative communication links to the primary database site.
• The backup site should be capable enough to work as the primary database
site at the time of failure of the primary site. In addition to recovery of
ongoing transactions, once the primary site recovers it should get all the
updates, which were performed while the primary site was down.
You may refer to the further readings for more details on this topic.
14
…………………………………………………………………………………
2) How is log of the deferred database modification technique differ from the log
of Immediate database modification?
……………………………………………………………………………………
……………………………………………………………………………………
3) Define buffer management and remote backup system in the context of
recovery.
……………………………………………………………………………………
……………………………………………………………………………………
15
Weak User Account Settings: Many of the database user accounts do not contain the
user settings that may be found in operating system environments. For example, the user
accounts name and passwords, which are commonly known, are not disabled or
modified to prevent access.
Insufficient Segregation of Duties: Several organisations have no established security
administrator role. This results in database administrators (DBAs) performing both the
functions of the administrator (for users' accounts), as well as the performance and
operations expert. This may result in management inefficiencies.
Inadequate Audit Trails: The auditing capabilities of DBMS, since it requires keeping
track of additional requirements, are often ignored for enhanced performance or disk
space. Inadequate auditing results in reduced accountability. It also reduces the
effectiveness of data history analysis. The audit trails record information about the
actions taken on certain critical data. They log events directly associated with the data;
thus, they are essential for monitoring the access and the activities on a database system.
Unused DBMS Security Features: The security of an individual application is usually
independent of the security of the DBMS. Please note that security measures that are
built into an application apply to users of the client software only. The DBMS itself
and many other tools or utilities that can connect to the database directly through ODBC
or any other protocol may bypass this application-level security completely. Thus, you
must try to use security restrictions that are reliable, for instance, try using the security
mechanisms that are defined within the DBMS.
11.4.2 Database Security Levels
Basically, database security can be broken down into the following levels:
• Server Security
• Database Connections
• Table Access Control
Server Security: Server security is the process of controlling access to the database
server. This is the most important aspect of security and should be carefully planned.
The basic idea here is “You cannot access what you do not see”. For security
purposes, you should never let your database server be visible to the world. If a
database server is supplying information to a web server, then it should be
configured in such a manner that it is allowed connections from that web server only.
Such a connection would require a trusted IP address.
Trusted IP Addresses: To connect to a server through a client machine, you would
need to configure the server to allow access to only trusted IP addresses. You should
know exactly who should be allowed to access your database server. For example, if
the database server is the backend of a local application that is running on the internal
network, then it should only talk to addresses from within the internal network.
Database Connections: With the ever-increasing number of Dynamic Applications,
an application may allow immediate unauthenticated updates to some databases. If
you are going to allow users to make updates to a database via a web page, please
ensure that you validate all such updates. This will ensure that all updates are
desirable and safe. For example, you may remove any possible SQL code from user-
supplied input if a normal user is not allowed to input SQL code.
Table Access Control: Table access control is probably one of the most overlooked
but one of the very strong forms of database security because of the difficulty in
applying it. Using a table access control properly would require the collaboration of
both the system administrator as well as the database developer. In practice, however,
such “collaboration” is relatively difficult to find.
By now, we have defined some of the basic issues of database security, let us now
consider specifics of server security from the point of view of network access of the
system. Internet-based databases have been the most recent targets of security attacks.
16
All web-enabled applications listen to a number of ports. Cyber criminals often
perform a simple “port scan” to look for ports that are open from the popular default
ports used by database systems. How can we address this problem? We can address
this problem “by default”, that is, we can change the default ports a database service
would listen into. Thus, this is a very simple way to protect the DBMS from such
criminals.
11.4.3 Relationship between Security and Integrity
Database security usually refers to the avoidance of unauthorised access and
modification of data of the database, whereas database integrity refers to the avoidance
of accidental loss of consistency of data. You may please note that data security deals
not only with data modification but also access to the data, whereas data integrity, which
is normally implemented with the help of constraints, essentially deals with data
modifications. Thus, enforcement of data security, in a way, starts with data integrity.
For example, any modification of data, whether unauthorised or authorised must ensure
data integrity constraints. Thus, a very basic level of security may begin with data
integrity but will require many more data controls. For example, SQL WRITE and
UPDATE on specific data items or tables would be possible if it does not violate
integrity constraints. Further, the data controls would allow only authorised WRITE and
UPDATE on these data items.
Security within the operating system can be implemented at several levels ranging from
passwords for access to the operating system to the isolation of concurrently executing
processes within the operating system. However, there are a few differences between
security measures taken at the operating system level compared to those of database
system. These are:
• Database system protects more objects, as the data is persistent in nature. Also,
database security is concerned with different levels of granularity such as files,
tuples, attribute values or indexes. Operating system security is primarily concerned
with the management and use of resources.
• Database system objects can be complex logical structures such as views, a number
of which can map to the same physical data objects. Moreover, different
architectural levels viz. internal, conceptual and external levels, have different
security requirements. Thus, database security is concerned with the semantics –
meaning of data, as well as with its physical representation. The operating system
canprovide security by not allowing any operation to be performed on the database
unless the user is authorised for the operation concerned.
After this brief introduction to different aspects of database security, let us discuss one
of the important levels of database security, access control, in the next section.
All relational database management systems provide some sort of intrinsic security
mechanisms that are designed to minimise security threats, as stated in the previous
sections. These mechanisms range from the simple password protection offered in
Microsoft Access to the complex user/role structure supported by advanced relational
databases like Oracle, MySQL, Microsoft SQL Server, IBM Db2 etc. But can we define
access control for all these DBMS using a single mechanism? SQL provides that
interface for access control. Let us discuss the security mechanisms common to all
databases using the Structured Query Language (SQL).
An excellent practice is to create individual user accounts for each database user. If
17
users are allowed to share accounts, then it becomes very difficult to fix individual
responsibilities. Thus, it is important that we provide separate user accounts for separate
users. Does this mechanism have any drawbacks? If the expected number of database
users is small, then it is all right to give them individual usernames and passwords and
all the database access privileges that they need to have on the database items.
However, consider a situation where there are a large number of users. Specification of
access rights to all these users individually will take a long time. That is still manageable
as it may be a one-time effort; however, the problem will be compounded if we need to
change the access rights for a particular user. Such an activity would require huge
maintenance costs. This cost can be minimised if we use a specific concept called
“Roles”. A database may have hundreds of users, but their access rights may be
categorised in specific roles, for example, teacher and student in a university database.
Such roles would require the specification of access rights only oncefor each role. The
users can then be assigned usernames, passwords, and specific roles. Thus, the
maintenance of user accounts becomes easier as now we have limited roles to be
maintained. You may study these mechanisms in the context of specific DBMS. A role
can be defined using a set of data item/object authorisations. In the next sections, we
define some of the authorisations in the context of SQL.
Authorisation is a set of rules that can be used to determine which user has what type
of access to which portion of the database. The following forms of authorisation are
permitted on database items:
Example
Consider the relation:
Employee (Empno, Name, Address, Deptno, Salary, Assessment)
Assume there are two types of users: The personnel manager and the general user. What
access rights may be granted to each user? One extreme possibility is to grant
unconstrained access or to have limited access. One of the most influential protection
models was developed by Lampson and extended by Graham and Denning. This model
has 3 components:
1) A set of object entities to which access must be controlled.
2) A set of subject entities that request access to objects.
3) A set of access rules in the form of an authorisation matrix, as given in Figure
11 for the relation of the example.
Subject
Personnel Read Read All All All All
Manager
General Read Read Read Read Not Not
User accessible accessible
Figure 11: Authorisation Matrix for Employee relation.
As the above matrix shows, Personnel Manager and General User are the two subjects.
Objects of the database are Empno, Name, Address, Deptno, Salary and Assessment.
As per the access matrix, the personnel manager can perform any operation on the
database of an employee except for updating the Empno and Name, which may be
created once and can never be changed. The general user can only read the data but
cannot update, delete or insert the data into the database. Also, the information about
the salary and assessment of the employee is not accessible to the general user.
In summary, it can be said that the basic access matrix is the representation of basic
access rules. These rules can be written using SQL statements, which are given in the
next subsection.
11.5.3 SQL Support for Security and Recovery
You would need to create the users or roles before you grant them various permissions.
The permissions then can be granted to a created user or role. This can be done with the
use of the SQL GRANT statement.
The syntax of this statement is:
The third line specifies the user(s) or role(s) that is/are being granted permissions.
Finally, the fourth line, WITH GRANT OPTION, is optional. If this line is included in
the statement, the user is also permitted to grant the same permissions that s/he has
received to other users. Please note that the WITH GRANT OPTION cannot be
specified when permissions are assigned to a role.
SQL does not have very specific commands for recovery but, it allows explicit
COMMIT, ROLLBACK and other related commands.
21
F Check Your Progress 3
1) On what systems does the security of a Database Management System
depend?
……………………………………………………………………………………
……………………………………………………………………………………
2) Write the syntax for granting permission to alter the database.
……………………………………………………………………………………
……………………………………………………………………………………
3) Write the syntax for ‘Revoke Statement’ that revokes the grant option.
……………………………………………………………………………………
……………………………………………………………………………………
4) What is the main difference between data security and data integrity?
…………………………………………………………………………………
……..…………………………………………………………………………….
11.7 SUMMARY
In this unit, we have discussed the recovery of the data contained in a database system
after failure. Database recovery techniques are methods of making the database fault
tolerant. The aim of the recovery scheme is to allow database operations to be resumed
after a failure with no loss of information and at an economically justifiable cost. The
basic technique to implement database recovery is to use data redundancy in the form
of logs and archival copies of the database. Checkpoint helps the process of recovery.
Security and integrity concepts are crucial. The DBMS security mechanism restricts
users to only those pieces of data that are required for the functions they perform.
Security mechanisms restrict the type of actions that these users can perform on the data
that is accessible to them. The data must be protected from accidental or intentional
(malicious) corruption or destruction.
Security constraints guard against accidental or malicious tampering with data; integrity
constraints ensure that any properly authorised access, alteration, deletion, orinsertion
of the data in the database does not change the consistency and validity of the data.
Database integrity involves the correctness of data, and this correctness has to be
preserved in the presence of concurrent operations. The unit also discussed the use of
audit trails.
11.8 SOLUTIONS/ANSWERS
22
Check Your Progress 1
1) Recovery is needed to take care of the failures that may be due to software,
hardware and external causes. The aim of the recovery scheme is to allow
database operations to be resumed after a failure with the minimum loss of
information and at an economically justifiable cost. One of the common
techniques is log-based recovery. The transaction is the basic unit of recovery.
2) All recovery processes require redundancy. Log-based recovery process
records the consistent state of the database and all the modifications made
by a transaction into a log on the stable storage. In case of any failure, the
stable log and the database states are used to create a consistent database
state.
3) A checkpoint is a point when all the database updates and logs are written to
stable storage. A checkpoint ensures that not all the transactions need to be
REDONE or UNDONE. Thus, it helps in faster recovery from failure. The
checkpoint helps in recovery, as in case of a failure all the committed
transactions prior to the checkpoint are NOT to be redone. Only non-
committed transactions at the time of checkpoint or transactions that started
after the checkpoint are required to be REDONE or UNDONE based on the
log.
23
UNIT 12 QUERY PROCESSING AND
EVALUATION
Structure Page Nos.
12.0 Introduction
12.1 Objectives
12.2 Query Processing: An Introduction
12.2.1 Role of Relational Algebra in Query Optimisation
12.2.2 Using Statistics and Stored Size for Cost Estimation.
12.3 Cost of Selection Operation
12.3.1 File scan
12.3.2 Index scan
12.3.3 Implementation of Complex Selections
12.4 Cost of Sorting
12.5 Cost of Join Operation
12.5.1 Block Nested-Loop Join
12.5.2 Merge-Join
12.5.3 Hash-Join
12.6 Other Operations
12.7 Representation and Evaluation of Query Expressions
12.7.1 Evaluating a Query Tree.
12.7.2 Evaluating Complex Joins
12.8 Creation of Query Evaluation Plans
12.8.1 Transformation of Relational Expressions
12.8.2 Query Evaluation Plans
12.8.3 Choosing an Optimal Evaluation Plan
12.8.4 Cost and Storage-Based Query Optimisation
12.9 View and Query Processing
12.9.1 Materialised View
12.9.2 Materialised Views and Query Optimisation
12.10 Summary
12.11 Solutions/Answers
12.0 INTRODUCTION
The Query Language – SQL is one of the main reasons for the success of RDBMS. A
userjust needs to write the query in SQL that is close to the English language and does
not need to say how such a query is to be evaluated. However, a query needs to be
evaluated efficiently by the DBMS. But how is a query evaluated efficiently? This unit
attempts to answer this question. The unit covers the basic principles of query
evaluation, the cost of query evaluation, the evaluation of join queries, etc. in detail. It
also provides information about query evaluation plans and the role of storage in query
evaluation and optimisation. This unit introduces you to the complexity of query
evaluation in DBMS.
12.1 OBJECTIVES
After going through this unit, you should be able to:
• Explain the measure of query cost;
• define algorithms for individual relational algebra operations;
• create and modify query expression;
• define evaluation plan choices, and
• define query processing using views.
1
12.2 QUERY PROCESSING: AN INTRODUCTION
Before defining the measures of query cost, let us begin by defining query processing.
Figure 1 shows the steps of query processing.
SQL QUERY
Code Generator
Result of query
In the first step, a query is scanned, parsed, validated and translated into an internal
form. This internal form may be the relational algebra (an intermediate query form).
The parser checks the syntax of the query and verifies the relations. Next, the query is
optimised, and a query execution plan is generated, which is then compiled into a code
that can be executed by the database runtime processor. The query processing involves
the study of the following concepts:
• measures to find query cost.
• algorithms for evaluating relational algebraic operations.
2
• evaluating a complete expression using algorithms on individual operations.
12.2.1 Role of Relational Algebra in Query Optimisation
In order to optimise the evaluation of a query, first, you must define the query using
relational algebra. A relational algebra expression may have many equivalent
expressions. For example, the relational algebraic expression s (salary < 5000) (psalary (EMP))
is equivalent to psalary (ssalary < 5000 (EMP)). This may result in generating many alternative
ways of evaluating the query.
Further, a relational algebraic expression can be evaluated in many different ways. A
detailed evaluation strategy for an expression is known as an evaluation plan. For
example, you can use an index on salary to find employees with salary < 5000, or you
can perform a complete relation scan and discard employees with salary ³ 5000. Both
of these are separate evaluation plans. The basis of the selection of the best evaluation
plan is the cost of these evaluation plans.
Query Optimisation: The query optimisation selects the query evaluation plan with
the lowest cost among the equivalent query evaluation plans. Cost is estimated using
the statistical information obtained from the database catalogue, viz., the number of
tuples in each relation, the size of tuples, different values of an attribute, etc. The cost
estimation is made on the basis of heuristic rules.
What is the basis for measuring the query cost? The next section addresses this question.
3
12.3 COST OF SELECTION OPERATION
The selection operation can be performed in several ways. Let us discuss the algorithms
and the related cost of performing selection operation.
12.3.1 File scan
File scan algorithms locate and retrieve records that fulfil a selection condition in a file.
The following are the two basic file scan algorithms for selection operation:
1) Linear search: This algorithm scans each file block and tests all records to see
whether their attributes match the selection condition.
The cost of this algorithm (in terms of block transfer): This algorithm would
require reading all the blocks of the file, as it must test all the records for the
specific condition.
𝑪𝒐𝒔𝒕𝑻𝒐 𝒇𝒊𝒏𝒅 𝒓𝒆𝒐𝒓𝒅𝒔 𝒕𝒉𝒂𝒕 𝒎𝒂𝒕𝒄𝒉 𝒂 𝒈𝒊𝒗𝒆𝒏 𝒄𝒓𝒊𝒕𝒆𝒓𝒊𝒂
= 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑑𝑎𝑡𝑎𝑏𝑎𝑠𝑒 𝑖𝑛 𝑡𝑒𝑟𝑚𝑠 𝑜𝑓 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘𝑠
= 𝑁2 .
𝑪𝒐𝒔𝒕 𝑭𝒊𝒏𝒅𝒊𝒏𝒈 𝒂 𝒔𝒑𝒆𝒄𝒊𝒇𝒊𝒄 𝒗𝒂𝒍𝒖𝒆 𝒐𝒇 𝒌𝒆𝒚 𝒂𝒕𝒕𝒓𝒊𝒃𝒖𝒕𝒆
= 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑜𝑐𝑘 𝑡𝑟𝑎𝑛𝑠𝑓𝑒𝑟 𝑓𝑜𝑟 𝑙𝑜𝑐𝑎𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒
(𝑜𝑛 𝑎𝑛 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 ℎ𝑎𝑙𝑓 𝑜𝑓 𝑡ℎ𝑒 𝑓𝑖𝑙𝑒 𝑛𝑒𝑒𝑑𝑠 𝑡𝑜 𝑏𝑒 𝑡𝑟𝑎𝑣𝑒𝑟𝑠𝑒𝑑) 𝑠𝑜 𝑡ℎ𝑒 𝑐𝑜𝑠𝑡 𝑖𝑠
= 𝑁2 /2.
𝐶𝑜𝑠𝑡!"#$%& ()$%*+
= 𝐶𝑜𝑠𝑡,-.*/ !$()0 ,"#$%& ()$%*+ 1. 2"#0 2"%(1 134-) 1+$1 5$1*+)( 1+) ()$%*+ *%"1)%"$
+ 𝐶𝑜𝑠𝑡2)1*+"#6 *.#1"63.3( !-.*/( 1"-- 1+) %)*.%0( /))4 5$1*+"#6 1+) *%"1)%"$
(b) Hash key: It retrieves a single block directly, thus, the cost in the hash key
organisation is given as:
=Block transfer needed for finding hash target +1
4
2) Primary index-scan for comparison: Assuming that the relation is sorted on the
attribute(s) that are being compared, (< , > etc.), then we need to locate the first
record satisfying the condition after which the records are scanned forward or
backwards as the condition may be, displaying all the records. Thus, the cost,
in this case, would be:
Please note you can roughly compute the number of blocks satisfying the
condition as:
(b) Secondary index comparison: For the queries of the type that use the
comparison on secondary index value > value searched, the index is used to
find the first index entry which is greater than the value searched, thereafter,
the indexed is scanned sequentially till the end, finding the pointers to records.
For the £ type query, just scan the leaf pages of the index to find the pointers to
the records until the first entry that satisfies the condition is found. Thereafter,
the index can be scanned till the records satisfy the condition to obtain pointers
to the records.
You may please note that after you have found the pointers to the records using the
index scan, retrieving those pointed records may require one block transfer for each
record. Please note that linear file scans may be cheaper if many records are to be
fetched.
Conjunctive selection using one index: In such case, select any algorithm given
earlier on one or more conditions and then test remaining conditions on the
selected tuples after fetching them into the memory buffer.
3) Techniques like quicksort can be used for relations that fit in the memory.
4) External sort-merge is a good choice for relations that do not fit in the memory.
Once you decide on the sorting technique, you can find the cost of these algorithms to
find the sorted file. You may refer to further readings for more details.
…………………………………………………………………………………
…………………………………………………………………………………
3) What are the various methods adopted for performing selection operation?
…………………………………………………………………………………
…………………………………………………………………………………
The choice of join algorithm is based on the cost estimates. We will elaborate on only
a few of these algorithms in this section. The following relations and related statistics
will be used to elaborate those algorithms.
MARKS (enrollno, subjectcode, marks): 20000 rows, 500 blocks
STUDENT (enrollno, name, dob): 5000 rows, 200 blocks.
6
12.5.1 Block Nested-Loop Join
In this join approach, a complete block of the outer loop is joined with the complete
block of the inner loop. We have chosen STUDENT as the outer relation, as it is
smaller in size and, therefore, will result in a smaller number of overall block
transfers.
12.5.2 Merge-Join
The merge-join is applicable to equijoin and natural join operations only. It has the
following process:
1) Sort both relations on the joining attribute (if not already sorted).
2) Merge the sorted relations to Join them. In this step, every pair with the same
value on the joining attribute must be matched.
For example, consider the instances of STUDENT and MARKS relations, as given in
Figure 2.
STUDENT MARKS
enrolno Name ----- enrolno subjectcode Marks
1001 Ajay …… 1001 MCS-211 55
1002 Aman …... 1001 MCS-212 75
1005 Rakesh …... 1002 MCS-212 90
1100 Raman …... 1005 MCS-215 75
Block 1 of STUDENT relation Block 1of MARKS relation
Figure 2: Sample Relations for Computing Join
7
Computation of the number of block accesses:
Join operation on enrolment number would be as follows:
i) The enrolment number 1001 of STUDENT relation will join with two tuples of
MARKS relation as:
1001 …. 1001 MCS-211
1001 …. 1001 MCS-212
ii) Joining on the key 1001 will be over as soon as you find 1002 in MARKS
relations, as MARKS relation is also sorted on enrolno. Thus, the join will
continue as Merge-Join operations and the following tuples will be output after
the output of the first two tuples for 1001.
This will be followed by output.
1002 ….. 1002 MCS-212
1005 ….. 1005 MCS-215
You may observe that the cost of the Merge-Join operation can be computed based on
the assumption that a block that is part of the merge is read only once. The basic
underlying assumption is that the main memory is sufficiently large to accommodate
all the tuples of a specific attribute join value. Therefore, the number of block accesses
for Merge-Join is:
=Blocks of STUDENT + Blocks of MARKS + the cost of sorting on enrolno
(if relations are unsorted)
12.5.3 Hash-Join
Hash-join can be performed for both the equijoin and natural join operations. A hash
function h is applied on joining attributes to partition tuples of both relations. In the case
of STUDENT and MARKS relations, the hash function h maps joining attribute
(enrolno in our example case) values to {0, 1, ..., n-1}.
The join attribute is hashed to the join-hash partitions. In the example of Figure 4, we
have used the mod 5 function for hashing, therefore, n = 05.
Students Marks
Partitions Partitions
STUDENT 1005 1005 1001 MCS 211 55
0 0
1001 Ajay . 1100 1001 MCS 212 75
1002 Aman . 1002 MCS 212 90
1005 Rakesh . 1005 MCS 215 75
. . 1 1001 1001 1
1106 1001 . . .
. . . . .
. . . . .
1100 Raman . 1002 1002
1106 ……. . 2 2
. . .
. . .
. . . 3 3
4 4
Once the partition tables of STUDENT and MARKS are made on the enrolno, then only
the corresponding partitions will participate in the join as:
A STUDENT tuple and a MARKS tuple that satisfy the Join condition will have
the same value for the join attributes. Therefore, they will be hashed to an
equivalent partition and, thus, can be joined easily. For example, STUDENT
partition 1 consists of tuples of STUDENT with enrolno 1001, 1006; whereas
8
MARKS partition 1 consists of marks of the student enrolno 1001 in two
different subjects. The join can now simply be performed for enrolno 1001.
There is no joining MARKS tuple for STUDENT 1006.
• Partition the relation r and s using the same hash function h. (When partitioning
a relation, one block of memory is reserved as the output buffer for each
partition). It may be noted that the tuples in ith partition of r may join only with
the tuples of ith partition of s.
• For each partition si of s, load the partition into memory and build an in-memory
hash index using the joining attribute. Why is this hash index needed? As
several joining attributes may get mapped into the same partition. For example,
partition 0 and partition 1 in Figure 3.
• Read the partition i of r into memory block by block. For each tuple in ri , find
the tuples si in the hash index of si , which would join with that tuple in ri. Output
the joined results.
The value n (the number of partitions) and the hash function h are chosen in such a
manner that each si should fit into the memory. Typically, n is chosen as:
[the Number of blocks of s/Number of memory buffers] ´ f
where f is typically around 1.2.
The partition of r can be large, as ri need not fit in memory.
You may refer to the further readings for more details on hash join.
Let us explain the hash join and its cost for the natural join 𝑆𝑇𝑈𝐷𝐸𝑁𝑇 ⋈ 𝑀𝐴𝑅𝐾𝑆
Assume a memory size of 25 blocks Þ M=25.
SELECT s as STUDENT as it has a smaller number of blocks (200 blocks) and r as
MARKS (500 blocks).
Number of partitions to be created for STUDENT = (blocks of STUDENT/M) ´ 1.2
= (200/25) ´ 1.2 = 9.6 »10
9
Thus, the STUDENT relation will be partitioned into 10 partitions of 20 blocks each.
MARKS will also have 10 partitions of 50 blocks each. The 25 buffers will be used as:
20 blocks for one complete partition of STUDENT,
01 block will be used for input of MARKS partitions, and
The remaining 4 blocks may be used for storing the results.
Set operations (such as È and Ç) can either use a variant of merge-join after sorting or
a variant of hash-join.
Using the Hashing:
1) Partition both relations using the same hash function, thereby creating partitions
consisting of tuples of r and s, as:
r0 , r1 …. rn – 1 and s0 , s1…. sn – 1
10
F Check Your Progress 2
1) Define the algorithm for Block Nested-Loop Join for the worst-case scenario.
……………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………
3) What are the other operations that may be part of query evaluation?
……………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………
s phone=’1129250025’ MARKS
STUDENT
Figure 4: A Sample query tree
11
12.7.1 Evaluating a Query Tree.
In this section, let us examine the methods for evaluating a query expression that is
expressed as a query tree. In general, we use two methods:
• Materialisation
• Pipelining.
Materialisation: Evaluates a relational algebraic expression in a bottom-up approach
by explicitly generating and storing the results of each operation of expression. For
example, for Figure 4, first selection operation on the STUDENT relation would be
performed. Next, the result of the selection operation would be joined with the MARKS
relation. Finally, the projection operation will be performed. Materialised evaluation is
always possible even though the cost of writing/reading results to/from disk can be quite
high.
The pipelining execution method may involve a buffer, which is being filled by the
result tuples of a lower-level operation, while records may be picked up from the buffer
by a higher-level operation.
When an expression involves three relations, then you have more than one strategy for
the evaluation of the expression. For example, join of relations such as STUDENT ⋈
MARKS ⋈ SUBJECTS may involve the following three strategies:
Strategy 1: Compute STUDENT ⋈ MARKS, and join the result with SUBJECTS.
Strategy 2: Compute MARKS ⋈ SUBJECTS first, and then join the result with
STUDENT.
Strategy 3: Perform the pair of joins at the same time. This can be done by building
an index of enrolno in STUDENT and on subjectcode in SUBJECTS. For
each tuple m in MARKS, look up the corresponding tuples in STUDENT
and the corresponding tuples in SUBJECTS. Each tuple of MARKS will
be examined only once. Strategy 3 combines two operations into one
special-purpose operation that may be more efficient than implementing
the joins of two relations.
12
1) Generating logically equivalent expressions using equivalence rules
2) Generate alternative query plans.
3) Choose the cheapest plan based on the estimated cost.
The overall process is called cost-based optimisation. The cost difference between a
good and a bad method of evaluating a query would be enormous. We need to
estimate the cost of operations and statistical information about relations. For
example, the number of tuples, the number of distinct values for an attribute, etc.,
helps estimate the size of the intermediate results, and information like available
indices may help in estimating the cost of complex expressions. Let us discuss all the
steps in query-evaluation plan development in more detail.
(3) Commutate the selection and projection or vice-versa. This commutation may
sometimes reduce query cost.
(4) Use associative or commutative rules for the Cartesian product or join
operation to find various alternative paths for query evaluation.
(5) Move the selection and projection (projection may be expanded to include
join condition) before Join operations. The selection and projection result in
the reduction of the number of tuples and, therefore, may reduce the cost of
joining.
(6) Commutate the projection and selection with Cartesian product or union.
Let us explain the use of some of these rules with the help of an example. Consider the
query for the relations:
STUDENT (enrolno, name, phone)
MARKS (enrolno, subjectcode, grade)
SUBJECT (subjectcode, sname)
Example 1: Consider the query: Find the enrolment number, name, and grade of those
students who have secured an A grade in the subject DBMS. One of the possible
solutions to this query may be:
p enrolno, name, grade (s (sname = ‘DBMS’ ^ grade =’A’) ((STUDENT ⋈ MARKS) ⋈ SUBJECT)
The query tree for this would be:
13
p enrolno, name, grade
SUBJECT
STUDENT MARKS
As per the suggested rules, the selection condition may be moved before the join
operation. The selection condition given in Figure 5 above is: sname = ‘DBMS’ and
grade = ‘A’. Both of these conditions belong to different tables, as sname is available
only in the SUBJECT table and grade in the MARKS table. Thus, the selection
conditions will be mapped accordingly, as shown in Figure 6. Thus, the equivalent
expression will be:
p enrolno, name, grade
s sname=DBMS
STUDENT
s grade=A
SUBJECTS
MARKS
Further, the expected size of SUBJECT and MARKS after selection will be small, so
it may be a good idea to join MARKS with SUBJECT first, and thereafter, the
resultant relation is joined with the STUDENT relation. Hence, the associative law of
JOIN may be applied.
STUDENT
s grade=’A’ s sname=’DBMS’
MARKS SUBJECT
Figure 7: Modified query tree using associativity of join.
14
Even moving the projection before possible join, wherever possible, may optimise
the query processing, as projection may also reduce the size of the intermediate
result. Thus, you can move projection of outer join (refer to Figure 7) to inner join
(Refer Figure 8).
STUDENT
s grade=’A’ s sname=’DBMS’
MARKS SUBJECT
The final equivalent relation algebraic query expression of query of example 1 is:
(p enrolno, name (STUDENT)) ⋈
(p enrolno, grade ((s (grade =’A’) MARKS) ⋈ (s (sname = ‘DBMS’) SUBJECT)))
(merge join)
(Pipeline) (sort on enroll no)
(Primary (Pipeline)
index)
(Block-Nested-Join)
STUDENT
(Pipeline) (Pipeline)
s grade=’A’ s grade=’DBMS’
(Use cluster (Use linear scan)
index on grade)
MARKS SUBJECT
Figure 9: A sample query evaluation plan
15
12.8.3 Choosing an Optimal Evaluation Plan
The output of the previous step of query evaluation is the number of query evaluation
plans. Which of these plans should be chosen for the final query evaluation? This
decision is normally based on the database statistics. In addition, it is not necessary that
individual best plans for each part of query evaluation may make the best query
evaluation algorithm. For example, to join two relations, hash join is computationally
less expensive than the merge join; however, the merge join produces sorted output.
Thus, for the cases where sorted output may be useful for further processing of part
results, merge join may be preferred. Similarly, the use of the nested loop method of
joining may allow pipelining of the results of the join operation for further operations.
In general, for choosing an optimal evaluation plan, you may perform cost-based
optimisation to choose an optimal query evaluation plan.
12.8.4 Cost and Storage-Based Query Optimisation
Cost-based optimisation is performed on the basis of the cost of various individual
operations that are to be performed as per the query evaluation plan. The cost is
calculated as we have explained in section 12.3 with respect to the method and operation
(JOIN, SELECT, etc.).
Further, the task of keeping a materialised view up to date with the underlying data is
known as materialised view maintenance. Materialised views can be maintained by re-
computation on every update. A better option is to use incremental view maintenance,
i.e., where only the affected part of the view is modified. View maintenance, in general,
can be performed using triggers, which can be written for any data manipulation
operation on the relations that are part of a view definition.
Any query that uses natural join on b and c can use this materialised view ‘a’ as:
16
Consider you are evaluating a query:
Then this query would be rewritten using the materialised view ‘a’ as:
z = r NATURAL JOIN a
Do you need to perform materialisation? It depends on cost estimates for the two
alternatives viz., use of a materialised view by view definition, or simple evaluation.
Query optimiser should be extended to consider all the alternatives of view evaluation
and choose the best overall plan. This decision must be made on the basis of the system
workload. Indices in such decision-making may be considered as specialised views.
Some database systems provide tools to help the database administrator with index and
materialised view selection.
F Check Your Progress 3
1) List the methods used for the evaluation of expressions.
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
12.10 SUMMARY
This Unit introduces you to the basic concepts of query processing and evaluation. A
query is to be represented in a standard form before it can be processed. In general, a
query is evaluated after representing it using relational algebra. Thereafter, several
query evaluation plans are generated using query transformations and an optimal plans
is chosen for query evaluation. To find the cost of a query evaluation plan, you may
use database statistics. The unit also defines various algorithms and the cost of these
algorithms for evaluating various operations like select, project, join etc. You may refer
to database textbooks for more details on query evaluation.
12.11 SOLUTIONS/ANSWERS
Check Your Progress 1
1) The steps of query evaluation are as follows:
a. In the first step, query scanning, parsing and validating is done.
b. Next, translate the query into a relational algebraic expression.
c. Next, the syntax is checked along with the names of the relations.
17
d. Finally, the optimal query evaluation plan is executed and the
e. answers to the query are returned.
};
• Duplicate elimination
• Projection
• Aggregate functions.
• Set operations.
3) The evaluation plan defines exactly what algorithms are to be used for each
operation and the manner in which the operations are coordinated.
18
“APPENDIX-A”
Equivalence Rules
3) Only the last of the sequence of projection operations is needed, the others can
be omitted.
pattriblist1 (pattriblist2 (pattriblist3 …..(E) ..) = pattriblist1 (E)
4) The selection operations can be combined with Cartesian products and theta
join operations.
s q1 (E1 × E2 ) = E1 ⋈q1 E2
and
s q2 (E1 ⋈q1 E2 ) = E1 ⋈q2 Ù q1 E2
9) The set operations of union and intersection are commutative. But set
difference is not commutative.
E1 È E2 = E2 È E1 and similarly for the intersection.
10) Set union and intersection operations are also associative.
19