Unit 5
Unit 5
Lost Updates occur when multiple transactions select the same row and update the row based
on the value selected
Uncommitted dependency issues occur when the second transaction selects a row which is
updated by another transaction (dirty read)
Non-Repeatable Read occurs when a second transaction is trying to access the same row
several times and reads different data each time.
Incorrect Summary issue occurs when one transaction takes summary over the value of all the
instances of a repeated data-item, and second transaction update few instances of that specific
data-item. In that situation, the resulting summary does not reflect a correct result.
Different concurrency control protocols offer different benefits between the amount of concurrency
they allow and the amount of overhead that they impose. Following are the Concurrency Control
techniques in DBMS:
Lock-Based Protocols
Two Phase Locking Protocol
Timestamp-Based Protocols
Validation-Based Protocols
Lock-based Protocols
Lock Based Protocols in DBMS is a mechanism in which a transaction cannot Read or Write the data
until it acquires an appropriate lock. Lock based protocols help to eliminate the concurrency problem
in DBMS for simultaneous transactions by locking or isolating a particular transaction to a single user.
A lock is a data variable which is associated with a data item. This lock signifies that operations that
can be performed on the data item. Locks in DBMS help synchronize access to the database items by
concurrent transactions.
All lock requests are made to the concurrency-control manager. Transactions proceed only once the
lock request is granted.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared/exclusive: This type of locking mechanism separates the locks in DBMS based on their uses.
If a lock is acquired on a data item to perform a write operation, it is called an exclusive lock.
A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared
between transactions. This is because you will never have permission to update data on the data item.
For example, consider a case where two transactions are reading the account balance of a person. The
database will let them read by placing a shared lock. However, if another transaction wants to update
that account's balance, shared lock prevent it until the reading process is over.
With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be held
concurrently on the same data item. X-lock is requested using lock-x instruction. Transactions may
unlock the data item after finishing the 'write' operation.
For example, when a transaction needs to update the account balance of a person. You can allows this
transaction by placing X lock on it. Therefore, when the second transaction wants to read or write,
exclusive lock prevent this operation.
This type of lock-based protocols allows transactions to obtain a lock on every object before beginning
operation. Transactions may unlock the data item after finishing the 'write' operation.
4. Pre-claiming Locking
Pre-claiming lock protocol helps to evaluate operations and create a list of required data items which
are needed to initiate an execution process. In the situation when all locks are granted, the transaction
executes. After that, all locks release when all of its operations are over.
Starvation
Starvation is the situation when a transaction needs to wait for an indefinite period to acquire a lock.
Deadlock refers to a specific situation where two or more processes are waiting for each other to
release a resource or more than two processes are waiting for the resource in a circular chain.
Two Phase Locking Protocol also known as 2PL protocol is a method of concurrency control in
DBMS that ensures serializability by applying a lock to the transaction data which blocks other
transactions to access the same data simultaneously. Two Phase Locking protocol helps to eliminate
the concurrency problem in DBMS.
This locking protocol divides the execution phase of a transaction into three different parts.
In the first phase, when the transaction begins to execute, it requires permission for the locks it
needs.
The second part is where the transaction obtains all the locks. When a transaction releases its
first lock, the third phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the
acquired locks.
The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two
steps:
Growing Phase: In this phase transaction may obtain locks but may not release any locks.
Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock
It is true that the 2PL protocol offers serializability. However, it does not ensure that deadlocks do not
happen.
Strict Two-Phase Locking Method
Strict-Two phase locking system is almost similar to 2PL. The only difference is that Strict-2PL never
releases a lock after using it. It holds all the locks until the commit point and releases all the locks at
one go when the process is over.
Centralized 2PL
In Centralized 2 PL, a single site is responsible for lock management process. It has only one lock
manager for the entire DBMS.
Primary copy 2PL mechanism, many lock managers are distributed to different sites. After that, a
particular lock manager is responsible for managing the lock for a set of data items. When the primary
copy has been updated, the change is propagated to the slaves.
Distributed 2PL
In this kind of two-phase locking mechanism, Lock managers are distributed to all sites. They are
responsible for managing locks for data at that site. If no data is replicated, it is equivalent to primary
copy 2PL. Communication costs of Distributed 2PL are quite higher than primary copy 2PL
Timestamp based Protocol in DBMS is an algorithm which uses the System Time or Logical Counter
as a timestamp to serialize the execution of concurrent transactions. The Timestamp-based protocol
ensures that every conflicting read and write operations are executed in a timestamp order.
The older transaction is always given priority in this method. It uses system time to determine the time
stamp of the transaction. This is the most commonly used concurrency protocol.
Lock-based protocols help you to manage the order between the conflicting transactions when they will
execute. Timestamp-based protocols manage conflicts as soon as an operation is created.
The timestamps of the transactions determine the serializability order. Thus, if TS(Ti) < TS(Tj ), then
the system must ensure that the produced schedule is equivalent to a serial schedule in which
transaction Ti appears before transaction Tj . To implement this scheme, we associate with each data
item Q two timestamp values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q)
successfully.
• R-timestamp(Q) denotes the largest timestamp of any transaction that executed read(Q) successfully.
a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a value of Q that was already overwritten.
Hence, the read operation is rejected, and Ti is rolled back.
b. If TS(Ti) ≥ W-timestamp(Q), then the read operation is executed, and Rtimestamp(Q) is set to the
maximum of R-timestamp(Q) and TS(Ti).
a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was needed previously, and the
system assumed that that value would never be produced. Hence, the system rejects the write operation
and rolls Ti back.
b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, the
system rejects this write operation and rolls Ti back. c. Otherwise, the system executes the write
operation and sets W-timestamp(Q) to TS(Ti). If a transaction Ti is rolled back by the concurrency-
control scheme as result of issuance of either a read or write operation, the system assigns it a new
timestamp and restarts it.
Example:
Advantages:
Disadvantages:
The modification to the timestamp-ordering protocol, called Thomas’ write rule, is this: Suppose that
transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was previously needed, and it
had been assumed that the value would never be produced. Hence, the system rejects the write
operation and rolls Ti back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete value of Q. Hence, this write
operation can be ignored.
3. Otherwise, the system executes the write operation and sets W-timestamp(Q) to TS(Ti).
Validation based Protocol in DBMS also known as Optimistic Concurrency Control Technique is a
method to avoid concurrency in transactions. In this protocol, the local copies of the transaction data
are updated rather than the data itself, which results in less interference while execution of the
transaction.
1. Read Phase
2. Validation Phase
3. Write Phase
Read Phase
In the Read Phase, the data values from the database can be read by a transaction but the write
operation or updates are only applied to the local data copies, not the actual database.
Validation Phase
In Validation Phase, the data is checked to ensure that there is no violation of serializability while
applying the transaction updates to the database.
Write Phase
In the Write Phase, the updates are applied to the database if the validation is successful, else; the
updates are not applied, and the transaction is rolled back.
To perform the validation test, we need to know when the various phases of transactions Ti took place.
We shall, therefore, associate three different timestamps with transaction Ti:
2. Validation(Ti), the time when Ti finished its read phase and started its validation phase.
We determine the serializability order by the timestamp-ordering technique, using the value of the
timestamp Validation(Ti).
The validation test for transaction Tj requires that, for all transactions Ti with TS(Ti) < TS(Tj ), one of
the following two conditions must hold:
1. Finish(Ti) < Start(Tj ). Since Ti completes its execution before Tj started, the serializability order is
indeed maintained.
2. The set of data items written by Ti does not intersect with the set of data items read by Tj , and Ti
completes its write phase before Tj starts its validation phase (Start(Tj ) < Finish(Ti) < Validation(Tj )).
This condition ensures that the writes of Ti and Tj do not overlap. Since the writes of Ti do not affect
the read of Tj , and since Tj cannot affect the read of Ti, the serializability order is indeed maintained.
This validation scheme is called the optimistic concurrency control scheme since transactions execute
optimistically, assuming they will be able to finish execution and validate at the end
In the concurrency-control schemes described thus far, we have used each individual data item as the
unit on which synchronization is performed.
There are circumstances, however, where it would be advantageous to group several data items, and to
treat them as one individual synchronization unit. For example, if a transaction Ti needs to access the
entire database, and a locking protocol is used, then Ti must lock each item in the database. Clearly,
executing these locks is time consuming. It would be better if Ti could issue a single lock request to
lock the
What is needed is a mechanism to allow the system to define multiple levels of granularity. We can
make one by allowing data items to be of various sizes and defining a hierarchy of data granularities,
where the small granularities are nested within larger ones. Such a hierarchy can be represented
graphically as a tree. A nonleaf node of the multiple-granularity tree represents the data associated with
its descendants. In the tree protocol, each node is an independent data item.
As an illustration, consider the tree of above Figure, which consists of four levels of nodes. The highest
level represents the entire database. Below it is nodes of type area; the database consists of exactly
these areas. Each area in turn has nodes of type file as its children. Each area contains exactly those
files that are its child nodes. No file is in more than one area. Finally, each file has nodes of
type record. As before, the file consists of exactly those records that are its child nodes, and no record
can be present in more than one file.
Each node in the tree can be locked individually. As we did in the two-phase locking protocol, we shall
use shared and exclusive lock modes. When a transaction locks a node, in either shared or exclusive
mode, the transaction also has implicitly locked all the descendants of that node in the same lock mode.
For example, if transaction Ti gets an explicit lock on file Fc of Figure 16.16, in exclusive mode, then
it has an implicit lock in exclusive mode all the records belonging to that file. It does not need to lock
the individual records of Fc explicitly.
Suppose that transaction Tj wishes to lock record rb6 of file Fb. Since Ti has locked Fb explicitly, it
follows that rb6 is also locked (implicitly). But, when Tj issues a lock request for rb6 , rb6 is not
explicitly locked! How does the system determine whether Tj can lock rb6 ? Tj must traverse the tree
from the root to record rb6 . If any node in that path is locked in an incompatible mode, then Tj must
be delayed.
Figure 5.2: Compatibility Matrix
Suppose now that transaction Tk wishes to lock the entire database. To do so, it simply must lock the
root of the hierarchy. Note, however, that Tk should not succeed in locking the root node, since Ti is
currently holding a lock on part of the tree (specifically, on file Fb). But how does the system
determine if the root node can be locked? One possibility is for it to search the entire tree. This
solution, however, de- feats the whole purpose of the multiple-granularity locking scheme. A more
efficient way to gain this knowledge is to introduce a new class of lock modes, called inten- tion lock
modes. If a node is locked in an intention mode, explicit locking is being done at a lower level of the
tree (that is, at a finer granularity). Intention locks are put on all the ancestors of a node before that
node is locked explicitly. Thus, a transaction does not need to search the entire tree to determine
whether it can lock a node successfully. A transaction wishing to lock a node—say, Q—must traverse a
path in the tree from the root to Q. While traversing the tree, the transaction locks the various nodes in
an intention mode.
There is an intention mode associated with shared mode, and there is one with exclusive mode. If a
node is locked in intention-shared (IS) mode, explicit locking is being done at a lower level of the
tree, but with only shared-mode locks. Similarly, if a node is locked in intention-exclusive (IX) mode,
then explicit locking is being done at a lower level, with exclusive-mode or shared-mode locks.
Finally, if a node is locked in shared and intention-exclusive (SIX) mode, the subtree rooted by that
node is locked explicitly in shared mode, and that explicit locking is being done at a lower level with
exclusive-mode locks. The compatibility function for these lock modes is in above figure.
3. It can lock a node Q in S or IS mode only if it currently has the parent of Q locked in either IX or IS
mode.
4. It can lock a node Q in X, SIX, or IX mode only if it currently has the parent of Q locked in either
IX or SIX mode.
5. It can lock a node only if it has not previously unlocked any node (that is, Ti is two phase).
6. It can unlock a node Q only if it currently has none of the children of Q locked.
Observe that the multiple-granularity protocol requires that locks be acquired in top- down (root-to-
leaf) order, whereas locks must be released in bottom-up (leaf-to-root) order.
As an illustration of the protocol, consider the tree of Figure 16.16 and these trans- actions:
• Suppose that transaction T18 reads record ra2 in file Fa. Then, T18 needs to lock the database,
area A1, and Fa in IS mode (and in that order), and finally to lock ra2 in S mode.
• Suppose that transaction T19 modifies record ra9 in file Fa. Then, T19 needs to lock the database,
area A1, and file Fa in IX mode, and finally to lock ra9 in X mode.
• Suppose that transaction T20 reads all the records in file Fa. Then, T20 needs to lock the database and
area A1 (in that order) in IS mode, and finally to lock Fa in S mode.
• Suppose that transaction T21 reads the entire database. It can do so after locking the database in S
mode.
We note that transactions T18, T20, and T21 can access the database concurrently. Transaction T19
can execute concurrently with T18, but not with either T20 or T21.
This protocol enhances concurrency and reduces lock overhead. It is particularly useful in applications
that include a mix of
The concurrency-control schemes discussed thus far ensure serializability by either delaying an
operation or aborting the transaction that issued the operation. For ex- ample, a read operation may be
delayed because the appropriate value has not been written yet; or it may be rejected (that is, the
issuing transaction must be aborted) because the value that it was supposed to read has already been
overwritten. These difficulties could be avoided if old copies of each data item were kept in a system.
In multiversion concurrency control schemes, each write(Q) operation creates a new version of Q.
When a transaction issues a read(Q) operation, the concurrency- control manager selects one of the
versions of Q to be read. The concurrency-control
scheme must ensure that the version to be read is selected in a manner that ensures serializability. It is
also crucial, for performance reasons, that a transaction be able to determine easily and quickly which
version of the data item should be read.
The most common transaction ordering technique used by multiversion schemes is timestamping. With
each transaction Ti in the system, we associate a unique static timestamp, denoted by TS(Ti). The
database system assigns this timestamp before the transaction starts execution, as described in Section
16.2.
With each data item Q, a sequence of versions <Q1, Q2,.. ., Qm> is associated. Each
version Qk contains three data fields:
• R-timestamp(Qk ) is the largest timestamp of any transaction that successfully read version Qk .
A transaction—say, Ti—creates a new version Qk of data item Q by issuing a write(Q) operation. The
content field of the version holds the value written by Ti. The system initializes the W-timestamp and
R-timestamp to TS(Ti). It updates the R-timestamp value of Qk whenever a transaction Tj reads the
content of Qk , and R-timestamp(Qk ) < TS(Tj ).
The multiversion timestamp-ordering scheme presented next ensures serializ- ability. The scheme
operates as follows. Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote
the version of Q whose write timestamp is the largest write timestamp less than or equal to TS(Ti).
1. If transaction Ti issues a read(Q), then the value returned is the content of version Qk .
2. If transaction Ti issues write(Q), and if TS(Ti) < R-timestamp(Qk ), then the sys- tem rolls back
transaction Ti. On the other hand, if TS(Ti) = W-timestamp(Qk ), the system overwrites the contents
of Qk ; otherwise it creates a new version of Q.
The justification for rule 1 is clear. A transaction reads the most recent version that comes before it in
time. The second rule forces a transaction to abort if it is “too late” in doing a write. More precisely,
if Ti attempts to write a version that some other transaction would have read, then we cannot allow that
write to succeed.
Versions that are no longer needed are removed according to the following rule.
Suppose that there are two versions, Qk and Qj , of a data item, and that both versions have a W-
timestamp less than the timestamp of the oldest transaction in the system.
Then, the older of the two versions Qk and Qj will not be used again, and can be deleted.
The multiversion timestamp-ordering scheme has the desirable property that a read request never fails
and is never made to wait. In typical database systems, where reading is a more frequent operation than
is writing, this advantage may be of major practical significance.
The scheme, however, suffers from two undesirable properties. First, the reading of a data item also
requires the updating of the R-timestamp field, resulting in two potential disk accesses, rather than one.
Second, the conflicts between transactions are resolved through rollbacks, rather than through waits.
This alternative may be expensive. Section 16.5.2 describes an algorithm to alleviate this problem.
This multiversion timestamp-ordering scheme does not ensure recoverability and cascadelessness. It
can be extended in the same manner as the basic timestamp- ordering scheme, to make it recoverable
and cascadeless.
The multiversion two-phase locking protocol attempts to combine the advantages of multiversion
concurrency control with the advantages of two-phase locking. This protocol differentiates
between read-only transactions and update transactions.
Update transactions perform rigorous two-phase locking; that is, they hold all locks up to the end of the
transaction. Thus, they can be serialized according to their commit order. Each version of a data item
has a single timestamp. The timestamp in this case is not a real clock-based timestamp, but rather is a
counter, which we will call the ts-counter, that is incremented during commit processing.
Read-only transactions are assigned a timestamp by reading the current value of ts-counter before they
start execution; they follow the multiversion timestamp- ordering protocol for performing reads. Thus,
when a read-only transaction Ti issues a read(Q), the value returned is the contents of the version
whose timestamp is the largest timestamp less than TS(Ti).
When an update transaction reads an item, it gets a shared lock on the item, and reads the latest version
of that item. When an update transaction wants to write an item, it first gets an exclusive lock on the
item, and then creates a new version of the data item. The write is performed on the new version, and
the timestamp of the new version is initially set to a value ∞, a value greater than that of any possible
timestamp.
When the update transaction Ti completes its actions, it carries out commit processing: First, Ti sets the
timestamp on every version it has created to 1 more than the value of ts-counter; then, Ti increments ts-
counter by 1. Only one update transaction is allowed to perform commit processing at a time.
As a result, read-only transactions that start after Ti increments ts-counter will see the values updated
by Ti, whereas those that start before Ti increments ts-counter will see the value before the updates
by Ti. In either case, read-only transactions never need to wait for locks. Multiversion two-phase
locking also ensures that schedules are recoverable and cascadeless.
Suppose there are two versions, Qk and Qj , of a data item, and that both versions have a timestamp
less than the timestamp of the oldest read-only transaction in the system. Then, the older of the two
versions Qk and Qj will not be used again and can be deleted.
Multiversion two-phase locking or variations of it are used in some commercial database systems.
Recovery with concurrent transactions can be done in the following four ways.
1. Interaction with concurrency control
2. Transaction rollback
3. Checkpoints
4. Restart recovery
Transaction rollback:
In this scheme, we rollback a failed transaction by using the log.
The system scans the log backward a failed transaction, for every log record found in the log the
system restores the data item.
Checkpoints:
Checkpoints is a process of saving a snapshot of the applications state so that it can restart from
that point in case of failure.
Checkpoint is a point of time at which a record is written onto the database form the buffers.
Checkpoint shortens the recovery process.
When it reaches the checkpoint, then the transaction will be updated into the database, and till
that point, the entire log file will be removed from the file. Then the log file is updated with the
new step of transaction till the next checkpoint and so on.
The checkpoint is used to declare the point before which the DBMS was in the consistent state,
and all the transactions were committed.
Restart recovery:
Operational Data Analytics Insurance firms succeed through their ability to identify and quantify risks
facing their clients. They are under constant and increasing pressure to rapidly consider every available
quantifiable factor to develop profiles of insurance risk. To this end, insurers collect a vast amount of
operational data about policy holders and insured objects. While extremely valuable, this operational
data must often wait to be coaxed into a traditional data warehouse format, for even later assessment by
an analyst. This case study discusses how Mobiliar restructured their multi-database data warehouse
environment to streamline risk analysis using operational data.
The ultimate goal for organizations, like Mobiliar, that want to streamline their analytics is being able
to analyze their data in real time—without having a negative impact on OLTP [online transaction
processing] performance and without having to wait for the classic ETL [extract, transform, and load]
process to load transaction data into the data warehouse.
Astonishing Proof of Concept (PoC) Results
Oracle Database In-Memory has a unique dual format (rows and columns) that maintains the
transactional data in both row and columnar format in memory, enabling real-time analytics to be
performed immediately across all transactions, thereby eliminating delays and reliance on transforming
transactions into a data mart, data warehouse or other analytic store for examination. To prove that
Oracle Database In-Memory could truly allow Mobiliar to use their operational data for real-time
analytics, the database team at Mobiliar set up a proof of concept to test different analytical scenarios.
Key to the proof of concept for Mobiliar was choosing scenarios that represented typical business cases
for the insurance company.
About Oracle Database In-Memory Oracle Database In-Memory transparently accelerates analytic
queries by orders of magnitude, enabling real-time business decisions. It dramatically accelerates data
warehouses and mixed workload OLTP environments. The unique "dual-format" approach
automatically maintains data in both the existing Oracle row format for OLTP operations, and in a new
purely in-memory column format optimized for analytical processing. Both formats are simultaneously
active and transactionally consistent. Embedding the column store into Oracle Database ensures it is
fully compatible with ALL existing features, and requires absolutely no changes in the application
layer.
Important Questions
Q.1: a) Describe major problems associated with concurrent processing with examples.
b) What is the role of locks in avoiding these problems.
c) What is phantom phenomenon
Q.2: What do you mean by multiple granularities? Explain in detail.
Q.3: Define deadlock. Explain deadlock recovery and preverntion techniques
Q.4: Explain multiversion concurrency control in detail.
Q.5: Explain the working of various time stamping protocols for concurrency control
Q.6: Explain the difference between two phase commit protocol and three phase commit protocol.
Q.7: What is meant by the concurrent execution of database transactions in a multiuser system?
Discuss why concurrency control is needed, and give informal examples.
Q.8: What are the problems encountered in distributed DBMS while considering concurrency control
and recovery?
1.If a transaction has obtained a __________ lock, it can read but cannot write on the item
a) Shared mode
b) Exclusive mode
c) Read only mode
d) Write only mode
2.If a transaction has obtained a ________ lock, it can both read and write on the item
a) Shared mode
b) Exclusive mode
c) Read only mode
d) Write only mode
3.A transaction can proceed only after the concurrency control manager ________ the lock to the
transaction
a) Grants
b) Requests
c) Allocates
d) None of the mentioned
4.If a transaction can be granted a lock on an item immediately in spite of the presence of another
mode, then the two modes are said to be ________
a) Concurrent
b) Equivalent
c) Compatible
d) Executable
5.A transaction is made to wait until all ________ locks held on the item are released
a) Compatible
b) Incompatible
c) Concurrent
d) Equivalent
6.State true or false: It is not necessarily desirable for a transaction to unlock a data item immediately
after its final access
a) True
b) False
7.The situation where no transaction can proceed with normal execution is known as ________
a) Road block
b) Deadlock
c) Execution halt
d) Abortion
8.The protocol that indicates when a transaction may lock and unlock each of the data items is called as
__________
a) Locking protocol
b) Unlocking protocol
c) Granting protocol
d) Conflict protocol
9.If a transaction Ti may never make progress, then the transaction is said to be ____________
a) Deadlocked
b) Starved
c) Committed
d) Rolled back
10.The two phase locking protocol consists which of the following phases?
a) Growing phase
b) Shrinking phase
c) More than one of the mentioned
d) None of the mentioned
11.If a transaction may obtain locks but may not release any locks then it is in _______ phase
a) Growing phase
b) Shrinking phase
c) Deadlock phase
d) Starved phase
12.If a transaction may release locks but may not obtain any locks, it is said to be in ______ phase
a) Growing phase
b) Shrinking phase
c) Deadlock phase
d) Starved phase
13.Which of the following cannot be used to implement a timestamp
a) System clock
b) Logical counter
c) External time counter
d) None of the mentioned
14.A logical counter is _________ after a new timestamp has been assigned
a) Incremented
b) Decremented
c) Doubled
d) Remains the same
15.W-timestamp(Q) denotes?
a) The largest timestamp of any transaction that can execute write(Q) successfully
b) The largest timestamp of any transaction that can execute read(Q) successfully
c) The smallest timestamp of any transaction that can execute write(Q) successfully
d) The smallest timestamp of any transaction that can execute read(Q) successfully
16.R-timestamp(Q) denotes?
a) The largest timestamp of any transaction that can execute write(Q) successfully
b) The largest timestamp of any transaction that can execute read(Q) successfully
c) The smallest timestamp of any transaction that can execute write(Q) successfully
d) The smallest timestamp of any transaction that can execute read(Q) successfully
17.A ________ ensures that any conflicting read and write operations are executed in timestamp order
a) Organizational protocol
b) Timestamp ordering protocol
c) Timestamp execution protocol
d) 802-11 protocol
18.The default timestamp ordering protocol generates schedules that are
a) Recoverable
b) Non-recoverable
c) Starving
d) None of the mentioned
19.Which of the following timestamp based protocols generates serializable schedules?
a) Thomas write rule
b) Timestamp ordering protocol
c) Validation protocol
d) None of the mentioned
20.State true or false: The Thomas write rule has a greater potential concurrency than the timestamp
ordering protocol
a) True
b) False
21.In timestamp ordering protocol, suppose that the transaction Ti issues read(Q) and TS(Ti)<W-
timestamp(Q), then
a) Read operation is executed
b) Read operation is rejected
c) Write operation is executed
d) Write operation is rejected
22.In timestamp ordering protocol, suppose that the transaction Ti issues write(Q) and TS(Ti)<W-
timestamp(Q), then
a) Read operation is executed
b) Read operation is rejected
c) Write operation is executed
d) Write operation is rejected
23.The _________ requires each transaction executes in two or three different phases in its lifetime
a) Validation protocol
b) Timestamp protocol
c) Deadlock protocol
d) View protocol
24.During __________ phase, the system reads data and stores them in variables local to the
transaction.
a) Read phase
b) Validation phase
c) Write phase
d) None of the mentioned
25.During the _________ phase the validation test is applied to the transaction
a) Read phase
b) Validation phase
c) Write phase
d) None of the mentioned
26.During the _______ phase, the local variables that hold the write operations are copied to the
database
a) Read phase
b) Validation phase
c) Write phase
d) None of the mentioned
27.Read only operations omit the _______ phase
a) Read phase
b) Validation phase
c) Write phase
d) None of the mentioned
28.Which of the following timestamp is used to record the time at which the transaction started
execution?
a) Start(i)
b) Validation(i)
c) Finish(i)
d) Write(i)
29.Which of the following timestamps is used to record the time when a transaction has finished its
read phase?
a) Start(i)
b) Validation(i)
c) Finish(i)
d) Write(i)
30.Which of the following timestamps is used to record the time when a database has completed its
write operation?
a) Start(i)
b) Validation(i)
c) Finish(i)
d) Write(i)
31.State true or false: Locking and timestamp ordering force a wait or rollback whenever a conflict is
detected.
a) True
b) False
32.State true or false: We determine the serializability order of validation protocol by the validation
ordering technique
a) True
b) False
33.In a granularity hierarchy the highest level represents the
a) Entire database
b) Area
c) File
d) Record
34.If a node is locked in an intention mode, explicit locking is done at a lower level of the tree. This is
called
a) Intention lock modes
b) Explicit lock
c) Implicit lock
d) Exclusive lock
35.If a node is locked in ____________ then explicit locking is being done at a lower level, with
exclusive-mode or shared-mode locks.
a) Intention lock modes
b) Intention-shared-exclusive mode
c) Intention-exclusive (IX) mode
d) Intention-shared (IS) mode
36.This validation scheme is called the _________ scheme since transactions execute optimistically,
assuming they will be able to finish execution and validate at the end.
a) Validation protocol
b) Validation-based protocol
c) Timestamp protocol
d) Optimistic concurrency-control
37.The file organization which allows us to read records that would satisfy the join condition by using
one block read is
a) Heap file organization
b) Sequential file organization
c) Clustering file organization
d) Hash files organization
38.DBMS periodically suspends all processing and synchronizes its files and journals through the use
of
a) Checkpoint facility
b) Backup facility
c) Recovery manager
d) Database change log
39.The extent of the database resource that is included with each lock is called the level of
a) Impact
b) Granularity
c) Management
d) DBMS control
40.A condition that occurs when two transactions wait for each other to unlock data is known as a(n)
a) Shared lock
b) Exclusive lock
c) Binary lock
d) Deadlock
Answer Key
1 A 11 A 21 B 31 A
2 B 12 B 22 D 32 B
3 A 13 C 23 A 33 A
4 C 14 A 24 A 34 A
5 A 15 A 25 B 35 C
6 A 16 B 26 C 36 A
7 B 17 B 27 C 37 C
8 A 18 B 28 A 38 A
9 B 19 A 29 B 39 B
10 C 20 A 30 C 40 D
Relational algebra is a procedural query language, which takes instances of relations as input and
yields instances of relations as output. It uses operators to perform queries. An operator can be
either unary or binary. They accept relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate results are also considered
relations.
Select
Project
Union
Set different
Cartesian product
Rename
Normalization
o Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them using relationship.
o The normal form is used to reduce redundancy from the database table.
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency
exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
In aggregation, the relation between two entities is treated as a single entity. In aggregation,
relationship with its corresponding entities is aggregated into a higher level entity.
For example: Center entity offers the Course entity act as a single entity in the relationship which is in
a relationship with another entity visitor. In the real world, if a visitor visits a coaching center then he
will never enquiry about the Course only or just about the Center instead he will ask the enquiry about
both.
d)Define super key, candidate key, primary key and foreign key.
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
For example: In Student table, ID is used as a key because it is unique for each student. In PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of key:
1. Primary key
o It is the first key which is used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys as we saw in PERSON table. The key which is most suitable
from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary key
since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
o The remaining attributes except for primary key are considered as a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like
SSN, Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination
can also be a key.
4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary key of another
table.
o In a company, every employee works in a specific department, and employee and department
are two different entities. So we can't store the information of the department in the employee
table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in the
EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
Strong Entity
A strong entity is not dependent of any other entity in the schema. A strong entity will always
have a primary key. Strong entities are represented by a single rectangle. The relationship of two
strong entities is represented by a single diamond.
Various strong entities, when combined together, create a strong entity set.
Weak Entity
A weak entity is dependent on a strong entity to ensure the its existence. Unlike a strong entity, a
weak entity does not have any primary key. It instead has a partial discriminator key. A weak
entity is represented by a double rectangle.
The relation between one strong and one weak entity is represented by a double diamond.
Difference between Strong and Weak Entity:
2. Strong entity is not dependent of any Weak entity is depend on strong entity.
other entity.
3. Strong entity is represented by single Weak entity is represented by double rectangle.
rectangle.
4. Two strong entity’s relationship is While the relation between one strong and one
represented by single diamond. weak entity is represented by double diamond.
5. Strong entity have either total While weak entity always has total participation.
participation or not.
b) What do you mean by conflict serializable schedule?
Conflicting Operations
Example:
Conflict Equivalent
In the conflict equivalent, one can be transformed to another by swapping non-conflicting operations.
In the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2 by swapping non-
conflicting operations).
Example:
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before starting any
operation of T2. Schedule S1 can be transformed into a serial schedule by swapping non-conflicting operations
of S1.
T1 T2
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Write(B)
Concurrency Control
o In the concurrency control, the multiple transactions can be executed simultaneously.
o It may affect the transaction result. It is highly important to maintain the order of execution of
those transactions.
Several problems can occur when concurrent transactions are executed in an uncontrolled manner.
Following are the three problems in concurrency control.
1. Lost updates
2. Dirty read
3. Unrepeatable read
Example:
Here,
o At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
o So at time T5, the update of Transaction-X is lost because Transaction y overwrites it without
looking at its current value.
o Such type of problem is known as Lost Update Problem as update made by one transaction is
lost here.
2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the database, and then
the transaction fails for some reason. The updated database item is accessed by another
transaction before it is changed back to the original value.
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values
which have never formed part of the stable database.
Example:
o At time t2, transaction-Y writes A's value.
o At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
o So, Transaction-X now contains a value which has never become part of the stable database.
o Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty value
which has not been committed.
Example:
If a database system is not multi-layered, then it becomes difficult to make any changes in the
database system. Database systems are designed in multi-layers as we learnt earlier.
Data Independence
A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather
difficult to modify or update a set of metadata once it is stored in the database. But as a DBMS
expands, it needs to change over time to satisfy the requirements of the users. If the entire data
is dependent, it would become a tedious and highly complex job.
Metadata itself follows a layered architecture, so that when we change data at one layer, it does not
affect the data at another level. This data is independent but mapped to each other.
All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data
independence is the power to change the physical data without impacting the schema or logical
data. For example, in case we want to change or upgrade the storage system itself − suppose
we want to replace hard-disks with SSD − it should not have any impact on the logical data or
schemas.
Mapping Constraints
o A mapping constraint is a data constraint that expresses the number of entities to which another
entity can be related via a relationship set.
o It is most useful in describing the relationship sets that involve more than two entity sets.
o For binary relationship set R on an entity set A and B, there are four possible mapping
cardinalities. These are as follows:
1. One to one (1:1)
2. One to many (1:M)
3. Many to one (M:1)
4. Many to many (M:M)
One-to-one
In one-to-one mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2
is associated with at most one entity in E1.
One-to-many
In one-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an entity
in E2 is associated with at most one entity in E1.
Many-to-one
In one-to-many mapping, an entity in E1 is associated with at most one entity in E2, and an entity in E2
is associated with any number of entities in E1.
Many-to-many
In many-to-many mapping, an entity in E1 is associated with any number of entities in E2, and an
entity in E2 is associated with any number of entities in E1.
c) Define keys. Explain various types of keys
Keys
o Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also used to
establish and identify relationships between tables.
For example: In Student table, ID is used as a key because it is unique for each student. In PERSON
table, passport_number, license_number, SSN are keys since they are unique for each person.
Types of key:
1. Primary key
o It is the first key which is used to identify one and only one instance of an entity uniquely. An
entity can contain multiple keys as we saw in PERSON table. The key which is most suitable
from those lists become a primary key.
o In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License_Number and Passport_Number as primary key
since they are also unique.
o For each entity, selection of the primary key is based on requirement and developers.
2. Candidate key
o A candidate key is an attribute or set of an attribute which can uniquely identify a tuple.
o The remaining attributes except for primary key are considered as a candidate key. The
candidate keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like
SSN, Passport_Number, and License_Number, etc. are considered as a candidate key.
3. Super Key
Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME) the name of
two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination
can also be a key.
4. Foreign key
o Foreign keys are the column of the table which is used to point to the primary key of another
table.
o In a company, every employee works in a specific department, and employee and department
are two different entities. So we can't store the information of the department in the employee
table. That's why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in the
EMPLOYEE table.
o Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
d) Explain the phantom phenomena. Discuss the timestamp protocol that avoids the
phantom phenomena.
The so-called phantom problem occurs within a transaction when the same query produces different
sets of rows at different times. For example, if a SELECT is executed twice, but returns a row the
second time that was not returned the first time, the row is a “phantom” row.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.
Where,
o TS protocol ensures freedom from deadlock that means no transaction ever waits.
o But the schedule may not be recoverable and may not even be cascade- free.
e) What are distributed database? List advantage and disadvantage of data
replication and data fragmentation.
Features
Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.
Data Replication
Data replication is the process of storing separate copies of the database at two or more sites. It is a
popular fault tolerance technique of distributed databases.
Reliability − In case of failure of any site, the database system continues to work since a copy
is available at another site(s).
Reduction in Network Load − Since local copies of data are available, query processing can
be done with reduced network usage, particularly during prime hours. Data updating can be
done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query processing and
consequently quick response time.
Simpler Transactions − Transactions require less number of joins of tables located at different
sites and minimal coordination across the network. Thus, they become simpler in nature.
Disadvantages of Data Replication
Snapshot replication
Near-real-time replication
Pull replication
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination
of horizontal and vertical). Horizontal fragmentation can further be classified into two techniques:
primary horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments whenever
required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation
Since data is stored close to the site of usage, efficiency of the database system is increased.
Local query optimization techniques are sufficient for most queries since data is locally
available.
Since irrelevant data is not available at the sites, security and privacy of the database system
can be maintained.
Disadvantages of Fragmentation
When data from different fragments are required, the access speeds may be very high.
In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
Lack of back-up copies of data in different sites may render the database ineffective in case of
failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table.
Vertical fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a
Student table having the following schema.
STUDENT
Now, the fees details are maintained in the accounts section. In this case, the designer will fragment
the database as follows −
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields.
Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal
fragment must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to
be maintained at the School of Computer Science, then the designer will horizontally fragment the
database as follows −
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal
extraneous information. However, reconstruction of the original table is often an expensive task.
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
SECTION-C
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
Consider the two tables below:
Student
StudentCourse
FROM table1
ON table1.matching_column = table2.matching_column;
Note: We can also write JOIN instead of INNER JOIN. JOIN is same as INNER JOIN.
This query will show the names and age of students enrolled in different courses.
ON Student.ROLL_NO = StudentCourse.ROLL_NO;
Output:
2. LEFT JOIN: This join returns all the rows of the table on the left side of the join and matching
rows for the table on the right side of join. The rows for which there is no matching row on right
side, the result-set will contain null. LEFT JOIN is also known as LEFT OUTER JOIN.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
ON table1.matching_column = table2.matching_column;
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are same.
Example Queries(LEFT JOIN):
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output:
3. RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table
on the right side of the join and matching rows for the table on the left side of join. The rows for
which there is no matching row on left side, the result-set will contain null. RIGHT JOIN is also
known as RIGHT OUTER JOIN.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
ON table1.matching_column = table2.matching_column;
FROM Student
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output:
4. FULL JOIN: FULL JOIN creates the result-set by combining result of both LEFT JOIN and
RIGHT JOIN. The result-set will contain all the rows from both the tables. The rows for which
there is no matching, the result-set will contain NULL values.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
ON table1.matching_column = table2.matching_column;
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output:
b) Discuss the following terms (i) DDL command (ii) DML Command
SQL Commands
o SQL commands are instructions. It is used to communicate with the database. It is also used to
perform specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the table, modify the
table, set permission for users.
o CREATE
o ALTER
o DROP
o TRUNCATE
Syntax:
Example:
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
DROP TABLE ;
Example
c. ALTER: It is used to alter the structure of the database. This change could be either to modify the
characteristics of an existing attribute or probably to add a new attribute.
Syntax:
EXAMPLE
d. TRUNCATE: It is used to delete all the rows from the table and free the space containing the table.
Syntax:
Example:
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a table.
Syntax:
Or
For example:
b. UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = valueN] [WHERE CONDITI
ON]
For example:
UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3'
Syntax:
For example:
Relational Calculus
o Relational calculus is a non-procedural query language. In the non-procedural query language,
the user is concerned with the details of how to obtain the end results.
o The relational calculus tells what to do but never explains how to do.
Notation:
Where
For example:
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name'
from Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
For example:
Output: This query will yield the same result as the previous one.
o It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Notation:
Where
For example:
Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a database.
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of each other
but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a third
attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black)
of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
Trigger
Triggers are stored programs, which are automatically executed or fired when some events occur.
Triggers are, in fact, written to be executed in response to any of the following events −
Benefits of Triggers
Auditing
Creating Triggers
Example
To start with, we will be using the CUSTOMERS table we had created and used in the previous
chapters −
When the above code is executed at the SQL prompt, it produces the following result −
Trigger created.
OLD and NEW references are not available for table-level triggers, rather you can use them for
record-level triggers.
If you want to query the table in the same trigger, then you should use the AFTER keyword,
because triggers can query the table or change it again only after the initial changes are applied
and the table is back in a consistent state.
The above trigger has been written in such a way that it will fire before any DELETE or
INSERT or UPDATE operation on the table, but you can write your trigger on a single or
multiple operations, for example BEFORE DELETE, which will fire whenever a record will
be deleted using the DELETE operation on the table.
Triggering a Trigger
Let us perform some DML operations on the CUSTOMERS table. Here is one INSERT statement,
which will create a new record in the table −
Old salary:
New salary: 7500
Salary difference:
Because this is a new record, old salary is not available and the above result comes as null. Let us now
perform one more DML operation on the CUSTOMERS table. The UPDATE statement will update
an existing record in the table −
UPDATE customers
SET salary = salary + 500
WHERE id = 2;
Ans 5(a): Any transaction must maintain the ACID properties, viz. Atomicity, Consistency, Isolation,
and Durability.
Atomicity − This property states that a transaction is an atomic unit of processing, that is,
either it is performed in its entirety or not performed at all. No partial update should exist.
Consistency − A transaction should take the database from one consistent state to another
consistent state. It should not adversely affect any data item in the database.
Isolation − A transaction should be executed as if it is the only one in the system. There should
not be any interference from the other concurrent transactions that are simultaneously running.
Durability − If a committed transaction brings about a change, that change should be durable
in the database and not lost in case of any failure.
Example: Let Ti be a transaction that transfers $50 from account A to account B. This transaction can
be defined as
Ti: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID requirements. (For ease of presentation, we consider them in an
order different from the order A-C-I-D).
Consistency: The consistency requirement here is that the sum of A and B be unchanged by the
execution of the transaction. Without the consistency requirement, money could be created or
destroyed by the transaction! It can be verified easily that, if the database is consistent before an
execution of the transaction, the database remains consistent after the execution of the transaction.
Ensuring consistency for an individual transaction is the responsibility of the application
programmer who codes the transaction. This task may be facilitated by automatic testing of integrity
constraints.
Atomicity: Suppose that, just before the execution of transaction Ti the values of accounts A and B
are $1000 and $2000, respectively. Now suppose that, during the execution of transaction Ti, a
failure occurs that prevents Ti from completing its execution successfully. Examples of such
failures include power failures, hardware failures, and software errors. Further, suppose that the
failure happened after the write(A) operation but before the write(B) operation. In this case, the
values of accounts A and B reflected in the database are $950 and $2000. The system destroyed $50
as a result of this failure. In particular, wenote that the sum A + B is no longer preserved. Thus,
because of the failure, the state of the system no longer reflects a real state of the world that the
database is supposed to capture. We term such astate an inconsistent state. We must ensure that such
inconsistencies are not visible in a database system. Note, however, that the system must at some
point be in an inconsistent state. Even if transaction Ti is executed to completion, there exists a
point at which the value of account A is $950 and the value of account B is $2000, which is clearly
an inconsistent state. This state, however, is eventually replaced by the consistent state where the
value of account A is $950, and the value of account B is $2050. Thus, if the transaction never
started or was guaranteed to complete, such an inconsistent state would not be visible except during
the execution of the transaction. That is the reason for the atomicity requirement: If the atomicity
property is present, all actions of the transaction are reflected in the database, or none are. The basic
idea behind ensuring atomicity is this: The database system keeps track (on disk) of the old values
of any data on which a transaction performs a write, and, if the transaction does not complete its
execution, the database system restores the old values to make it appear as though the transaction
never executed. Ensuring atomicity is the responsibility of the database system itself; specifically, it
is handled bya component called the transaction-management component.
Durability: Once the execution of the transaction completes successfully, and the user who initiated
the transaction has been notified that the transfer of funds has taken place, it must be the case that
no system failure will result in a loss of data corresponding to this transfer of funds. The durability
property guarantees that, once a transaction completes successfully, all the updates that it carried out
on the database persist, even if there is a system failure after the transaction completes execution.
Isolation: Even if the consistency and atomicity properties are ensured for each transaction, if
several transactions are executed concurrently, their operations may interleave in some undesirable
way, resulting in an inconsistent state.
For example, as we saw earlier, the database is temporarily inconsistent while the transaction to
transfer funds from A to B is executing, with the deducted total written to A and the increased total
yet to be written to B. If a second concurrently running transaction reads A and B at this intermediate
point and computes A+B, it will observe an inconsistent value. Furthermore, if this second
transaction then performs updates on A and B based on the inconsistent values that it read, the
database may be left in an inconsistent state even after both transactions have completed.
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations,
where transactions are about to execute. The DBMS inspects the operations and analyzes if they can
create a deadlock situation. If it finds that a deadlock situation might occur, then that transaction is
never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering mechanism of transactions in
order to predetermine a deadlock situation.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with a
conflicting lock by another transaction, then one of the two possibilities may occur −
If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than T j − then
Ti is allowed to wait until the data-item is available.
If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with
conflicting lock by some another transaction, one of the two possibilities may occur −
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later
with a random delay but with the same timestamp.
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an item
held by a younger one, the older transaction forces the younger one to abort and release the item.
In both the cases, the transaction that enters the system at a later stage is aborted.
(b). Consider the following relational DATABASE. Give an expression in SQL for each of the
following queries. Underline records are primary key.
(b).
i) Find the names of all employees who works for the ABC Bank
where employee. person name= company. person name and company name=’ABC’;
ii) Find the names of all employees who live in the same city and on the same street as do their
managers.
where p. person name = m. person name and m. manager name = r. person name
iii) Find the names, street address, and cities of residence for all employees who work for ‘ABC
Bank ' and earn more than 7,000 per annum.
select e. person name, employee. Street, employee. City from employee, works
where e. person name =works. person name and company name = 'ABC Bank' and salary >
7000);
iv) Find the names of all employees in the database who earn more than every employee of
XYZ.
where salary > all (select salary from works where company name =’XYZ’;
vi) Delete all tuples in the works relation for employees of ABC
delete works
vii) Find the names of all employees in the database who live in the same cities as the companies
for which they work.
Ques:7:
Directory Systems
In the pre computerization days, organizations would create physical directories of employees and
distribute them across the organization. In general, a directory is a listing of information about some
class of objects such as persons. Directories can be used to find information about a specific object, or
in the reverse direction to find objects that meet a certain requirement. In the world of physical
telephone directories, directories that satisfy lookups in the forward direction are called white pages,
while directories that satisfy lookups in the reverse direction are called yellow pages
Several directory access protocols have been developed to provide a standardized way of accessing
data in a directory. The most widely used among them today is the Lightweight Directory Access
Protocol (LDAP).
The reasons for using a specialized protocol for accessing directory information:
• First, directory access protocols are simplified protocols that cater to a limited type of access to data.
They evolved in parallel with the database access protocols.
• Second, and more important, directory systems provide a simple mechanism to name objects in a
hierarchical fashion, similar to file system directory names, which can be used in a distributed
directory system to specify what information is stored in each of the directory servers.
LDAP: Lightweight Directory Access Protocol
A directory system is implemented as one or more servers, which service multiple clients. Clients use
the application programmer interface defined by directory system to communicate with the directory
servers. Directory access protocols also define a data model and access control.
The X.500 directory access protocol, defined by the International Organization for Standardization
(ISO), is a standard for accessing directory information. However, the protocol is rather complex, and
is not widely used. The Lightweight Directory Access Protocol (LDAP) provides many of the X.500
features, but with less complexity, and is widely used.
(b) Give the following queries in the relational algebra using the relational schema
lecturer( code=cs1500(subject))