DE Module5 TransactionProcessing
DE Module5 TransactionProcessing
The main advantage of storing data in an integrated repository or a database is to allow sharing of it among
multiple users.ie several users can access the database or perform transaction at the same time. Concurrent
running of several user programs keeps utilizing the CPU time efficiently.
Transaction
It is defined as the unit of work in a database system. It is a unit of data processing which involves
manipulation of data values in the database.
A database system which deals with a large number of transactions is known as Transaction Processing
System.
Transaction States
A transaction is an atomic unit of work that is either entirely completed or not done at allFor recovery
purposes, the system needs to keep track of when each transaction starts, terminates, and commits or aborts The
basic operations are:
i. BEGIN_TRANSACTION. This marks the beginning of transaction execution.
ii. READ or WRITE. These specify read or write operations on the database items that are executed as
part of a transaction.
iii. END_TRANSACTION. This specifies that READ and WRITE transaction operations have ended
and marks the end of transaction execution.
iv. COMMIT_TRANSACTION. This signals a successful end of the transaction so that any changes
(updates) executed by the transaction can be safely committed to the database and will not be
undone.Rollback( or abort).
v. ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so that any
changes or effects that the transaction may have applied to the database must be undone.
When a transaction starts, it goes to active state. It then issues read / write operations.
1
When it ends , it moves to the partially committed state.
Then it is checked if was successfully completed and then it enters commit state otherwise it goes to
failure state.
The terminate state indicates that transaction leaves the system.
Properties of transaction(ACID)
Transactions should possess several properties, often called the ACID properties; they should be enforced by
the concurrency control and recovery methods of the DBMS. The following are the ACID properties:
i. Atomicity:
2
A transaction is an atomic unit of processing; it should either be performed in its entirety or not
performed at all.
ii. Consistency preservation
It ensures that a complete transaction execution takes a database from one consistent state to
another consistent state . Even if a transaction fails then the database should return to its previous
consistent state.
iii. Isolation or Independence
It states that updates of a transaction should not be visible till they are committed . It guarantees
that progress of other transactions do not affect the outcome of current transaction. Thus, each
transaction is unaware of other transactions executing concurrently in the system. A transaction
should appear as though it is being executed in isolation from other transactions, even though
many transactions are executing concurrently.
iv. Durability:
The changes applied to the database by a committed transaction must persist in the database.
These changes must not be lost because of any failure.
Schedules
A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of the operations of the
transactions subject to constraint that , for reach transaction Ti , that participates in the schedule S, operations of
Ti must appear in same order in which they occur in Ti. The order of operations in S is considered to be a total
ordering, meaning that for any two operations in the schedule, one must occur before the other. A shorthand
notation for describing a schedule uses the symbols b, r, w, e, c, and a for the operations begin_transaction,
read_item, write_item, end_transaction,
commit, and abort, respectively.
Thus a schedule S is defined as the sequential ordering of the operations of the n interleaved transactions
which maintains the order of operations within the individual transaction.
Conflict:
Two operations conflict, if they are operations by different transactions on the same data item and at
least one of these transactions is a write operations.
Two schedules are said to be conflict equivalent if one schedule can be transformed into another by a
series of swaps of non conflicting instructions ie. the order of any two conflicting operations is the same in both
schedules.
Concurrent Transaction:
DBMS supports multi user environment & thus allows multiple transactions to proceed simultaneously.
Concurrent problems occur only if two transactions are contending for the same data items & at least one of the
concurrent transactions wishes to update a data value.
Multiple transactions that proceed simultaneously are known as concurrent transactions.
Advantages:
Improved throughput and resource utilization
Reduced waiting time.
4
Problems of Concurrent Transactions:
1. Lost updates
It occurs when two transactions that access the same database items have their operations
interleaved in such a way that it makes the value of some database item incorrect.
Example: Values
T1 T2 T1 T2
Time Read X 100
Read X 100
X=X+10 110
X=X+20 120
Write X 120
(value lost) Write X 120
(Overwrites X)
5
Example: values
T1 T2 T1 T2
Read X 100
Time X=X+10 110
Write X 110
Read X 110
110
X=X+20 130
Fails Write X 130
Rollbacks but the (Incorrect value)
temporary value
has been read
Example: values
T1 T2 T1 T2
Time Read X 100
Read X 100
X=X+10 110
Write X 110
Read X 110
(different values)
4. Incorrect Summary
It occurs while calculating aggregates. If one transaction is calculating an aggregate summary
function on a number of database items while other transactions are updating some of these items, the aggregate
function may calculate some values before they are updated and others after they are updated.
6
Example: Value
T1 T2 T1 T2
Time sum = 0 0
Read A 100
sum = sum+A 100
Read B (old value)
. Read m 500
. m= m+10 510
Read m (new value) 510
sum=sum+m
Read n
. Read B 200
. B = B-10 190
Using precedence graph notation, an algorithm can be devised to determine whether an interleaved schedule is
serializable or not. Precedence graph (or serialization graph), is a directed graph G = (N, E) that consists of
a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1,e2, ..., em }.
7
The transactions of the schedule represent the nodes. An edge from Ti to Tj means there exists a
conflicting operation between Ti and Tj and Ti precedes Tj in some conflicting operations.
Each edge ei in the graph is of the form (Tj Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the starting node of
ei and Tk is the ending node of ei.
A serlializable schedule is one that contains no cycle in the graph.
iii. If there is any cycle in the graph then the schedule is not serializable, otherwise the equivalent
serial schedule is found by traversing the nodes, starting from the node which has no input edge.
Example:
T1 T2 T3
Time Read z
Read y
Write y
Read Y
Read Z
Read X
Write X
Write Y
Write Z
Read X
Read Y
Write Y
Write X
8
Precedence graph is
Y
T1 T2
X
Cycles
Y
T3
Z, Y X( T1 T2 ) , Y( T2 T1 )
Example:
T1 T2 T3
Time Read Y
Read X
Read X
Write X
Write Y
Write Z
Read Z
Read Y
Write Y
Read Y
Write Y
Read X
Write X
Precedence graph is
X, Y
T1 T2
Y Y, Z
T3
9
CONCURRENCY CONTROL
One of the fundamental properties of a transaction is isolation. When several transactions execute
concurrently in the database, however, the isolation property may no longer be preserved. To ensure that
it is, the system must control the interaction among the concurrent transactions; this control is achieved
through one of a variety of mechanisms called concurrency-control schemes.
System must control the interaction among the concurrent transactions through one of a variety of
mechanisms called concurrency-control schemes.
Most of the concurrency-control techniques ensure serializability of schedules using concurrency
control protocols .
One important set of protocols—known as two-phase locking protocols— employ the technique of
locking data items to prevent multiple transactions from accessing the items concurrently.
Another set of concurrency control protocols use timestamps. A timestamp is a unique identifier for
each transaction, generated by the system. Timestamps values are generated in the same order as the
transaction start times.
The multiversion concurrency control protocols that use multiple versions of a data item. One
multiversion protocol extends timestamp order to multiversion timestamp ordering , and another extends
two-phase locking.
Another concurrency control protocol is based on the concept of validation or certification of a
transaction after it executes its operations; these are sometimes called optimistic protocols, and also
assume that multiple versions of a data item can exist.
1. LOCKING
A lock is a variable associated with a data item that describes the status of the item with respect to
possible operations that can be applied to it. Locks can be place by transactions on shared resources that
it desires to use.
The lock manager of the DBMS controls and stores lock information. Locking ensures
serializability of executing transactions.
Types of Locks
Several types of locks are used in concurrency control.
i. Binary Locks
A binary lock can have two states or values: locked and unlocked.
10
They are simple, but are also too restrictive for database concurrency control purposes.
If the value of the lock on X is 1, item X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when
requested, and the lock value is changed to 1.
At most one transaction can hold the lock on a particular item. Thus no two transactions
can access the same item concurrently.
Obtaining all the locks at the beginning of transaction & releasing them at the end
ensures that transactions are executed with no concurrency problems, but such scheme
limits the concurrency.
11
Compatibility matrix:
o It represents the compatibility relation between the two modes of locking.
o An element comp(A, B) of the matrix has the value true if and only if mode A is compatible with
mode B.
Two-Phase Locking
It is the protocol used to achieve serializability in a schedule.It consists of two phases
i. Growing / Lock Acquisition Phase
- In this phase new locks on items can be acquired by the transaction.
- No conflicting locks are granted to the requesting transaction.
- New locks can be acquired but no lock can be released till all the locks required by the
transaction are obtained.
ii. Shrinking / Lock Release Phase
- Existing locks can be released in any order but no new lock can be acquired after a lock has been
released.
- The locks are held only till they are required.
The point in the schedule where the transaction has obtained its final lock (the end of its growing phase)
is called the lock point of the transaction.
Lock Conversion
It is the process by which, a transaction that already holds a lock on item X is allowed under
certain conditions to convert the lock from one locked state to another.
upgrading of locks - from read-locked to write-locked
- must be done during the expanding phase,
downgrading of locks - from write-locked to read-locked
- must be done in the shrinking phase.
13
Difference between conservative and rigorous 2PL:
- Conservative 2PL must lock all its items before it starts, so once the transaction starts it is in its
shrinking phase.
- Rigorous 2PL does not unlock any of its items until after it terminates (by committing or aborting),
so the transaction is in its expanding phase until it ends.
Deadlock
Deadlock occurs when each transaction T in a set of two or more transactions is waiting for some items
that is locked by some other transaction in the set. It is created because of locks .
Deadlock occurs because of following conditions
i. Mutual exclusion
A resource can be exclusively locked by only one transaction.
ii. Non preemptive Locking
A data item can only be unlocked by the transaction that locked it.
iii. Partial allocation
A transaction can acquire locks in any sequential manner.
iv. Circular waiting
Transaction locks part of the resources and wait to lock other resources which are locked by
others
To prevent deadlock, one has to ensure that at least one of the conditions doesn’t occur.
Deadlock Prevention
Some deadlock prevention protocols are used to prevent deadlock .
i. Advance Locking
It is the simplest approach which requires that every transaction lock all needed items in
advance, but it restricts concurrency.
14
Other schemes use concept of timestamp(TS), which can be defined as a system
generated unique number sequence. Thus smaller timestamp means an older transaction.
ii. Wait die scheme
If Ti requests for a resource held by Tj then
If TS(Ti) < TS(Tj) (ie. Ti is older)
Then it is allowed to wait
Else
Ti aborts or dies.
iii. Wound wait scheme (based on preemptive)
If Ti requests for a resource held by Tj then
If TS(Ti) > TS(Tj) (ie Ti is younger)
Then it is allowed to wait.
Else
Ti wounds Tj and Tj aborts.
Both schemes end up aborting the younger transaction which may be involved in deadlock.
Other schemes are based on waiting types
iv. No waiting
If a transaction is unable to obtain a lock it aborts.
v. Cautious waiting
If Ti is unable to lock because of Tj, then it checks , if Tj is not blocked then Ti waits else it
aborts.
Deadlock Detection
The system checks if a state of deadlock actually exists. The simplest way is to construct and maintain a
wait for graph. A wait for graph is maintained by the lock manager of DBMS. It consists of a set of vertices/
nodes and a set of edges or arcs. Each transaction is represented by a node .An edge from Ti to Tj exists if Tj
holds a lock and Ti is waiting for it.
If Ti requests a data item currently being held by Tj then and edge Ti Tj exists. When Tj releases the item,
the edge is removed.
Deadlock occurs if and only if the wait for graph contains a cycle. To detect this , periodic checks for
cycles in graph is done. To remove deadlock any transaction is rolled back based on timeout etc.
15
Starvation
It is a problem which occurs due to locking. In this a transaction can’t proceed for an indefinite period of
time while others continue normally.
It occurs if waiting scheme gives priority to some over others.
Solutions to starvations are use of First Come First Serve transaction or the use of priority with increase in
priority of the transaction with its waiting(ie as it waits longer, priority increases)
2. TIMESTAMP
It is a way of assuring serializability which does not depend on locks.
A timestamp is a unique identifier created by the DBMS to identify a transaction.
Typically, timestamp values are assigned in the order in which the transactions are submitted to the
system, so a timestamp can be thought of as the transaction start time.
It is denoted by TS.
If a transaction Ti has been assigned timestamp TS(Ti), and a new transaction Tj enters the system, then
TS(Ti) < TS(Tj ).
Implementation of Timestamp:
When the transaction enters the system, its timestamp is equal to the value of the system clock .
When the transaction enters the system, its timestamp is equal to the value of the logical counter that
is incremented after a new timestamp has been assigned.
If TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is equivalent to a serial
schedule in which transaction Ti appears before transaction Tj .
A schedule in which the transactions participate is then serializable, and the only equivalent serial
schedule permitted has the transactions in order of their timestamp values. This is called timestamp
ordering (TO).
To implement this scheme, the protocol maintains with each data item X two timestamp values:
write_TS(X): denotes the largest timestamp of any transaction that executed write(X) successfully.
read_TS(X): denotes the largest timestamp of any transaction that executed read(X) successfully.
16
The Timestamp-Ordering Protocol
The timestamp-ordering protocol ensures that any conflicting read and write operations are executed in
timestamp order.
If transaction T attempts read(X),
Check if TS(T) >= write_TS(X)
Then read & set
Read_TS(X) = max(read_TS(X), TS(T))
Else
Rollback T & give it a new larger timestamp.
If transaction T attempts write(X),
Check if TS(T) >= Read_TS(X)
Further check if TS(T) > write_TS(X)
Then write & set
Write_TS(X) = TS(T)
Else
Ignore the write
This ignoring is known as Thomas Write Rule.
Else
Rollback T & give it a new larger timestamp.
If a transaction Ti is rolled back by the concurrency-control scheme then the system assigns it a new
timestamp and restarts it.
The protocol ensures freedom from deadlock, since no transaction ever waits.
There is a possibility of starvation of long transactions if a sequence of conflicting short transactions
causes repeated restarting of the long transaction.
The schedule may not be cascade free and may not even be recoverable.
We can make the schedule recoverable by adopting one of the following mechanisms:
o Recoverability and cascadelessness can be ensured by performing all writes together at the end of
the transaction.
The writes must be atomic in the following sense: While the writes are in progress, no
transaction is permitted to access any of the data items that have been written.
17
Recoverability and cascadelessness can also be guaranteed by using a limited form of locking,
whereby reads of uncommitted items are postponed until the transaction that updated the item
commits
Recoverability alone can be ensured by tracking uncommitted writes, and allowing a transaction Ti
to commit only after the commit of any transaction that wrote a value that Ti read.
Example:
Let T1, T2, T3, T4 operate on A and B(assume read_TS and write_TS of A &B<10)
Let TS(T1) = 10 TS(T2) = 11 TS(T3) = 12 TS(T4) = 13
read_TS(A)=write_TS(A)=read_TS(B)=write_TS(B)=0
18
If T is aborted and rolled back, any transaction T1 that may have used a value written by T must also be
rolled back. Similarly, any transaction T2 that may have used a value written by T1 must also be rolled
back, and so on. This effect is known as cascading rollback which is a problem in basic Timestamp
Ordering protocol(Basic TO).
i. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
ii. If write_TS(X) > TS(T), then do not execute the write operation but continue processing.
This is a case of Outdated or Obsolete Writes. This is because some transaction with
timestamp greater than TS(T)—and hence after T in the timestamp ordering—has already
written the value of X. Thus, we must ignore the write_item(X) operation of T because it is
already outdated and obsolete. Notice that any conflict arising from this situation would be
detected by case (1).
iii. If neither the condition in part (1) nor the condition in part (2) occurs, then and only then
execute the write_item(X) operation of T and set write_TS(X) to TS(T).
3.MULTIVERSION
In this method, several versions X1, X2, ..., Xk of each data item X are maintained.
Each version Xi contains three data fields:
i. Content is the value of version Xi.
ii. write_TS(Xi). The write timestamp of Xi is the timestamp of the transaction that created
version Xi.
19
iii. read_TS(Xi). The read timestamp of Xi is the largest of all the timestamps of transactions that
have successfully read version Xi.
Whenever a transaction T is allowed to execute a write_item(X) operation, a new version Xk+1 of item
X is created, with both the write_TS(Xk+1) and the read_TS(Xk+1) set to TS(T).
Correspondingly, when a transaction T is allowed to read the value of version Xi, the value of
read_TS(Xi) is set to the larger of the current read_TS(Xi) and TS(T).
Algorithm:
Suppose that transaction Ti issues a read_item(X) or write_item(X) operation. Then The rules are as
follows:
Let version i of X has the highest write_TS(Xi) of all versions of X that is also less than or equal to
TS(Ti)
If transaction Ti issues a read_item(X), then the value returned is the content of version Xi and
set the value of read_TS(Xi) to the larger of TS(T) and the current read_TS(Xi).
If transaction Ti issues write_item(X), and if TS(Ti) < read_item(Xi), then the system rolls back
transaction Ti.
On the other hand, if TS(Ti) = write_item(Xi), the system overwrites the contents of Xi
otherwise it creates a new version Xj of X with read_TS(Xj) = write_TS(Xj) = TS(T).
4.VALIDATION OR OPTIMISTIC
In optimistic concurrency control techniques, also known as validation or certification techniques,
no checking is done while the transaction is executing.
In this scheme, updates in the transaction are not applied directly to the database items until the
transaction reaches its end.
During transaction execution, all updates are applied to local copies of the data items that are kept for
the transaction.
At the end of transaction execution, a validation phase checks whether any of the transaction’s updates
violate serializability.
Certain information needed by the validation phase must be kept by the system.
If serializability is not violated, the transaction is committed and the database is updated from the local
copies; otherwise, the transaction is aborted and then restarted later.
20
There are three phases for this concurrency control protocol:
1. Read phase. A transaction can read values of committed data items from the database. However, updates
are applied only to local copies (versions) of the data items kept in the transaction workspace.
2. Validation phase. Checking is performed to ensure that serializability will not be violated if the
transaction updates are applied to the database.
3. Write phase. If the validation phase is successful, the transaction updates are applied to the database;
otherwise, the updates are discarded and the transaction is restarted.
The idea behind optimistic concurrency control is to do all the checks at once; hence, transaction
execution proceeds with a minimum of overhead until the validation phase is reached.
The optimistic protocol uses transaction timestamps and also requires that the write_sets and read_sets
of the transactions be kept by the system.
Additionally, start and end times for some of the three phases need to be kept for each transaction.
The validation phase for Ti checks that, for each such transaction Tj that is either committed or is in its
validation phase, one of the following conditions holds:
a. Transaction Tj completes its write phase before Ti starts its read phase.
b. Ti starts its write phase after Tj completes its write phase, and the read_set of Ti has no items
in common with the write_set of Tj.
c. Both the read_set and write_set of Ti have no items in common with the write_set of Tj, and
Tj completes its read phase before Ti completes its read phase.
If none of these three conditions holds, the validation of transaction Ti fails and it is aborted and
restarted later.
The validation can also be described as follows:
i. Tj finishes before Ti starts.
ii. Tj finishes before Ti starts validation & Ti didn’t read any item written by Tj.
iii. Ti neither read nor write any item written by Tj.
21
DATABASE RECOVERY MANAGEMENT
Although a transaction is considered to be atomic, yet it has a life cycle during which the
database gets into an inconsistent state & if failures occurs where transaction has not yet committed the
changes , the partial updates need to be undone.
All information about a transaction are kept in a log file.
Recovery techniques are needed only if a transaction doesn’t complete normally & terminates
abnormally. The abnormal termination may be due to
- User’s decision to abort
- Deadlock
- System failures
In abort or deadlock, the system remains in control but in system failure the system loses control.
2. A transaction or system error. Some operation in the transaction may cause it to fail, such as integer
overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or
because of a logical
programming error.3 Additionally, the user may interrupt the transaction during its execution.
3. Local errors or exception conditions detected by the transaction. During transaction execution, certain
conditions may occur that necessitate cancellation of the transaction. For example, data for the transaction may
not be found. An exception condition,4 such as insufficient account balance in a banking database, may cause a
transaction, such as a fund withdrawal, to be canceled. This exception could be programmed in the transaction
itself, and in such a case would not be considered as a transaction failure.
4. Concurrency control enforcement. The concurrency control method may decide to abort a transaction
because it violates serializability or it may abort one or more transactions to resolve a state of deadlock among
22
several transactions . Transactions aborted because of serializability violations or deadlocks are typically
restarted automatically at a later time.
5. Disk failure. Some disk blocks may lose their data because of a read or write malfunction or because of a
disk read/write head crash. This may happen during a read or a write operation of the transaction.
6. Physical problems and catastrophes. This refers to an endless list of problems that includes power or air-
conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape
by the operator.
Database Errors
An error is said to have occurred if execution of a command to manipulate database is not successful.
Errors are broadly classified as
i. User Errors
It includes errors in program or errors made by online users. It can be avoided by applying check
conditions or limiting the access rights.
ii. Consistency Error
They occur due to inconsistent state of database, due to wrong execution of commands or abort.
To avoid this we must check for consistency of data entered.
iii. System Error
It includes errors in database system such as deadlock. It is hard to detect & may require
reprogramming the erroneous components.
23
Storage Types
Volatile storage: Examples main memory and cache memory.
Nonvolatile storage: Examples disk and magnetic tapes.
Stable storage:
RECOVERY TECHNIQUES
Recovery can be done by restoring the previous consistent state (backward) or moving forward to the
next consistent state as per the committed transactions(forward).
i. Backward Recovery(UNDO)
The uncommitted changes made by a transaction to a database are undone. The system is reset to
the previous consistent state free from any errors.
ii. Forward Recovery(REDO)
The committed changes made by a transaction are re applied to be an earlier copy of the
database.
RECOVERY TECHNIQUES:
1. LOG-BASED RECOVERY
The most widely used structure for recording database modifications is the log.
The log is a sequence of log records, recording all the update activities in the database.
A transaction log is a record in DBMS that keeps track of all the transactions.
It helps in remembering which transaction did which changes.
An update log record describes a single database write. It has these fields:
Transaction identifier is the unique identifier of the transaction that performed the write
operation.
Data-item identifier is the unique identifier of the data item written. Typically, it is the location
on disk of the data item.
Old value is the value of the data item prior to the write.
New value is the value that the data item will have after the write.
Example <T1, X, 10,100>
Thus the system knows how to separate the changes made by transactions that have committed & those changes
that didn’t yet commit.
24
The selection of REDO or UNDO is done on the basis of the state of the transaction ie. If committed then
REDO and if not committed then UNDO. It takes the UNDO or REDO values from log & changes the
inconsistent state to consuistent state.
An improvement to this , is use of checkpoints. Checkpoints transfers all the committed changes to database &
all the system logs to stable storage(ie that would not be lost). At restart , after failure, the stable check pointed
state is restored.
As per Write Ahead Log protocol, the undo portion of the log is written prior to any updates & redo portion of
log is written to stable storage prior to commit.
Disadvantage
In this method too much buffer space may be needed.
25
ii. Immediate Database Modification: (UNDO/ REDO)
This technique may apply changes to the database on the disk before the transaction reaches a
successful completion. Any changes applied to the database are first recorded on log and force
written to disk so that these can be undone if necessary.
In the immediate update techniques, the database may be updated by some operations of a
transaction before the transaction reaches its commit point.
However, these operations must also be recorded in the log on disk by force-writing before they
are applied to the database on disk, making recovery still possible.
If a transaction fails after recording some changes in the database on disk but before reaching its
commit point, the effect of its operations on the database must be undone; that is, the transaction
must be rolled back.
In the general case of immediate update, both undo and redo may be required during recovery.
This technique, known as the UNDO/REDO algorithm, requires both operations during
recovery.
The recovery scheme uses two recovery procedures:
o undo(Ti) restores the value of all data items updated by transaction Ti to the old values.
o redo(Ti) sets the value of all data items updated by transaction Ti to the new values.
Checkpoints:
We need to search the entire log to determine which transactions need to be redone and those that need
to be undone.
Major difficulties:
o The search process is time consuming.
o Most of the transactions that need to be redone have already written their updates into the
database. So redoing them will be time consuming.
The difficulties related to large log files can be addressed with checkpoints.
The methodology utilized for removing all previous transaction logs and storing them in permanent
storage is called a Checkpoint.
The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all
transactions were committed.
26
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving its update to the database.
Example:
The recovery system reads the logs backward from the end to the last checkpoint i.e. from T4 to T1.
It will keep track of two lists – Undo and Redo.
Whenever there is a log with instruction <Tn, start>and <Tn, commit> or only <Tn, commit>
o then it will put that transaction in Redo List.
o T2 and T3 contain <Tn, Start> and <Tn, Commit> whereas T1 will have only <Tn, Commit>.
Here, T1, T2, and T3 are in the redo list.
Whenever a log record with no instruction of commit or abort is found, that transaction is put to Undo
List .
Here, T4 has <Tn, Start> but no <Tn, commit> as it is an ongoing transaction. T4 will be put in the undo
list.
The system periodically performs checkpoints, which require the following sequence of actions to take
place:
o Output onto stable storage all log records currently residing in main memory.
o Output to the disk all modified buffer blocks.
o Output onto stable storage a log record <checkpoint>
27
The key idea behind the shadow-paging technique is to maintain two page tables during the life of a
transaction:
o the current page table
o the shadow page table.
When the transaction starts, both page tables are identical.
The shadow page table is never changed over the duration of the transaction.
The current page table may be changed when a transaction performs a write operation.
All input and output operations use the current page table to locate database pages on disk.
The shadow directory is saved on disk while the current directory is used by the transaction.
When the transaction partially commits , the shadow page is discarded and current page table
becomes the new shadow page table. If the transaction aborts then the current page table is
discarded.
Steps for performing a write(X) operation by the transaction Tj where X resides on the ith page:
If the ith page is not already in main memory, then the system issues input(X) to bring the page
to memory.
If this is the write first performed on the ith page by this transaction, then:
1. An unused page on disk is first found.
2. Contents of the ith page is copied to the new free page.
3. The new free page is now deleted from the list of free page frames.
4. The current page table is modified so that the ith entry points to the new page.
28
5. The value of xj is assigned to X in the buffer page.
The shadow page table is stored in nonvolatile storage.
The state of the database prior to the execution of the transaction can be recovered from the shadow
table in the event of a crash, or transaction abort.
When the transaction commits, the system writes the current page table to nonvolatile storage.
In general, the old value of the data item before updating is called the before image (BFIM), and the new value
after updating is called the after image (AFIM). If shadowing is used, both the BFIM and the AFIM can be
kept on disk.
o Two types of log entry information are included for a write command:
The information needed for UNDO
The information needed for REDO.
o A REDO-type log entry includes the new value (AFIM) of the item written by the operation
since this is needed to redo the effect of the operation from the log.
o The UNDO-type log entries include the old value (BFIM) of the item since this is needed to
undo the effect of the operation from the log.
29
ADVANCED DATABASE
A DBMS (DataBase Management System) is software for database development and/or management.
A DBMS is a tool used to perform any kinds of operations on data in database.
DBMS also provides protection and security for relational databases.
It also helps maintain data consistency in case of multiple users.
Examples of popular DBMSes include MySQL, Oracle, Sybase, Microsoft Access and IBM DB
2.
Advanced Database Management Systems target a new breed of databases known as
NoSQL/newSQL. Advanced database management systems also support new trends in data
management like
Distributed data and computational resources
Absence of central control over distributed resources
Dynamic response in real-time to external events etc.
Inheritance: It is the ability of an object within the class hierarchy to inherit the attributes and
methods of the classes above it.
The Object Oriented Database Manifesto specifically lists the following features as mandatory for a
system to support before it can be called an OODBMS;
Complex objects
Object identity
Encapsulation
Types and Classes
overloading and late binding
Computational completeness
Extensibility
Persistence
Secondary storage management
Concurrency
Recovery
an Ad Hoc Query Facility etc.
31
Difference between OODBMS and RDBMS:
OODBMS RDBMS
Object Tuple
OID Key
33
WEB BASED DB
A web database is essentially a database that can be accessed from a local network or the
internet instead of one that has its data stored on a desktop or its attached storage.
Used for both professional and personal use, they are hosted on websites, and are software as service
(SaaS) products, which means that access is provided via a web browser.
The Web-based database management system is one of the essential parts of DBMS and is used to store
web application data.
A web-based Database management system is used to handle those databases that are having data
regarding E-commerce, E-business, blogs, e-mail, and other online applications.
A Web database is a database application designed to be managed and accessed through the Internet.
Website operators can manage this collection of data and present analytical results based on the data in
the Web database application.
Applicable Uses:
Businesses both large and small can use Web databases to create website polls, feedback forms, client
or customer and inventory lists.
Personal Web database use can range from storing personal email accounts to a home inventory to
personal website analytics.
The Web database is entirely customizable to an individual's or business's needs.
34
DATA WAREHOUSING
A Data Warehouse can be viewed as a data system with the following attributes:
It is a database designed for investigative tasks, using data from various applications.
It supports a relatively small number of clients with relatively long interactions.
It includes current and historical data to provide a historical perspective of information.
Its usage is read-intensive.
It contains a few large tables.
35
Database is application-oriented-collection of data whereas Data Warehouse is the subject-oriented
collection of data.
Database uses Online Transactional Processing (OLTP) whereas Data warehouse uses Online
Analytical Processing (OLAP).
Database tables and joins are complicated because they are normalized whereas Data Warehouse tables
and joins are easy because they are denormalized.
Databases need to be available 24/7/365, meaning downtime is costly. Data warehouses aren't as
affected by downtime.
Architecture:
Source layer: A data warehouse system uses a heterogeneous source of data. That data is stored
initially to corporate relational databases or legacy databases, or it may come from an information
system outside the corporate walls.
Data Staging:
The data stored to the source should be extracted, cleansed to remove inconsistencies and fill
gaps, and integrated to merge heterogeneous sources into one standard schema.
The so-named Extraction, Transformation, and Loading Tools (ETL) can combine
heterogeneous schemata, extract, transform, cleanse, validate, filter, and load source data into a
data warehouse.
36
Data Warehouse layer:
Information is saved to one logically centralized individual repository: a data warehouse.
The data warehouses can be directly accessed, but it can also be used as a source for creating
data marts, which partially replicate data warehouse contents and are designed for specific
enterprise departments.
Analysis:
In this layer, integrated data is efficiently and dynamically analyze information, and simulate
hypothetical business scenarios.
37
38
DATA MINING
Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to
extract valuable information from huge sets of data. Data mining is also called Knowledge Discovery in
Database (KDD).
The process of extracting information to identify patterns, trends, and useful data that would allow the
business to take the data-driven decision from huge sets of data is called Data Mining.
Data mining is the act of automatically searching for large stores of information to find trends and
patterns that go beyond simple analysis procedures.
Data mining utilizes complex mathematical algorithms for data segments and evaluates the probability
of future events.
Data Mining is a process used by organizations to extract specific data from huge databases to solve
business problems.
It primarily turns raw data into useful information.
The process of extracting valid, previously unknown, comprehensible, and actionable information
from large databases and using it to make crucial business decisions is know as Data Mining.
Using Data Mining, businesses can learn more about their customers and develop more effective
strategies to expand their various business functions and utilize their resources more optimally and
insightfully.
Data mining consists of useful data collection and warehousing as well as computer processing.
o Difference between Data Mining and Data Warehousing:
o Data Warehousing mainly focuses on extracting data from different sources, cleaning the data,
and storing it in the warehouses.
o On the other hand, Data Mining is used to study and explore the data using queries. After Data
Mining, the explored information is used to report, plan strategies, find meaningful patterns, etc.
o Example: A company's data warehouse stores all the relevant information of projects and
employees. We can apply Data Mining queries to this data warehouse to get useful records.
40
Advantages of Data Mining.
Data Mining is used to polish the raw data and make us able to explore, identify, and understand the
patterns hidden within the data.
The Data Mining technique enables organizations to obtain knowledge-based data.
It automates finding predictive information in large databases, thereby helping to identify the
previously hidden patterns promptly.
It assists faster and better decision making, which later helps businesses take necessary actions to
increase revenue and lower operational costs.
Using the Data Mining techniques, the experts can manage applications in various areas such as
Market Analysis, Production Control, Sports, Fraud Detection, Astrology, etc.
The shopping websites use Data Mining to define a shopping pattern and design or select the products
for better revenue generation.
Data Mining also helps in data optimization.
Data Mining can also be used to determine hidden profitability.
Compared with other statistical data applications, data mining is a cost-efficient.
It can be induced in the new system as well as the existing platforms.
It is a quick process that makes it easy for new users to analyze enormous amounts of data in a short
time.
41