18CSC303J Database Management System: Unit-V
18CSC303J Database Management System: Unit-V
18CSC303J Database Management System: Unit-V
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
● Isolation can be ensured trivially by running transactions serially
● That is, one after the other.
● However, executing multiple transactions concurrently has significant benefits,
as we will see later.
ACID Properties
● Atomicity. Either all operations of the transaction are properly reflected in the
database or none are.
● Consistency. Execution of a transaction in isolation preserves the consistency
of the database.
● Isolation. Although multiple transactions may execute concurrently, each
transaction must be unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently
executed transactions.
● That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj,
finished execution before Ti started, or Tj started execution after Ti finished.
● Durability. After a transaction completes successfully, the changes it has made
to the database persist, even if there are system failures.
Transaction State
● Active – the initial state; the transaction stays in this state while it is
executing
● Partially committed – after the final statement has been executed.
● Failed -- after the discovery that normal execution can no longer proceed.
● Aborted – after the transaction has been rolled back and the database
restored to its state prior to the start of the transaction. Two options after it
has been aborted:
● Restart the transaction
4 can be done only if no internal logical error
● Kill the transaction
● Committed – after successful completion.
Concurrent Executions
● Multiple transactions are allowed to run concurrently in the system.
Advantages are:
● Increased processor and disk utilization, leading to better
transaction throughput
4 E.g. one transaction can be using the CPU while another is
reading from or writing to the disk
● Reduced average response time for transactions: short
transactions need not wait behind long ones.
● Concurrency control schemes – mechanisms to achieve isolation
● That is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
Schedules
● Schedule – a sequences of instructions that specify the chronological
order in which instructions of concurrent transactions are executed
● A schedule for a set of transactions must consist of all instructions of
those transactions
● Must preserve the order in which the instructions appear in each
individual transaction.
● A transaction that successfully completes its execution will have a
commit instructions as the last statement
● By default transaction assumed to execute commit instruction as its
last step
● A transaction that fails to successfully complete its execution will have
an abort instruction as the last statement
Schedule 1
● Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
● An example of a serial schedule in which T1 is followed by T2 :
Schedule 2
● A serial schedule in which T2 is followed by T1 :
Schedule 3
● Let T1 and T2 be the transactions defined previously. The following
schedule is not a serial schedule, but it is equivalent to Schedule 1.
Schedule 3 Schedule 6
Conflict Serializability (Cont.)
● Example of a schedule that is not conflict serializable:
● If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence, database must ensure that schedules are
recoverable.
Cascading Rollbacks
● Cascading rollback – a single transaction failure leads to a series of
transaction rollbacks. Consider the following schedule where none of
the transactions has yet committed (so the schedule is recoverable)
● Let S and S´ be two schedules with the same set of transactions. S and S´ are
view equivalent if the following three conditions are met, for each data item
Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in
schedule S’ also transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was
produced by transaction Tj (if any), then in schedule S’ also transaction
Ti must read the value of Q that was produced by the same write(Q)
operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q) operation in
schedule S must also perform the final write(Q) operation in schedule
S’.
● As can be seen, view equivalence is also based purely on reads and writes
alone.
View Serializability (Cont.)
● A schedule S is view serializable if it is view equivalent to a serial schedule.
● Every conflict serializable schedule is also view serializable.
● Below is a schedule which is view-serializable but not conflict serializable.
● If we start with A = 1000 and B = 2000, the final result is 960 and 2040
● Determining such equivalence requires analysis of operations other than read
and write.
Concurrency Control
Lock-Based Protocols
● A lock is a mechanism to control concurrent access to a data item
● Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
● Lock requests are made to the concurrency-control manager by the
programmer. Transaction can proceed only after request is granted.
Lock-Based Protocols (Cont.)
● Lock-compatibility matrix
● Variant: “First-updater-wins”
4 Check for concurrent updates when write occurs by locking item
– But lock should be held till all concurrent transactions have finished
4 (Oracle uses this plus some extra features)
4 Differs only in when abort occurs, otherwise equivalent
Benefits of Snapshot Isolation
● Reading is never blocked,
● and also doesn’t block other txns activities
● Performance similar to Read Committed
● Avoids the usual anomalies
● No dirty read
● No lost update
● No non-repeatable read
● Predicate based selects are repeatable (no phantoms)
● Problems with SI
● SI does not always give serializable executions
4 Serializable: among two concurrent txns, one sees the effects of the
other
4 In SI: neither sees the effects of the other
● Result: Integrity constraints can be violated
Snapshot Isolation
● E.g. of problem with SI
● T1: x:=y
● T2: y:= x
● Initially x = 3 and y = 17
4 Serial execution: x = ??, y = ??
4 if both transactions start at the same time, with snapshot isolation: x
= ?? , y = ??
● Called skew write
● Skew also occurs with inserts
● E.g:
4 Find max order number among all orders
4 Create a new order with order number = previous max + 1
Snapshot Isolation Anomalies
● SI breaks serializability when txns modify different items, each based on a previous
state of the item the other modified
● Not very common in practice
4 E.g., the TPC-C benchmark runs correctly under SI
4 when txns conflict due to modifying different data, there is usually also a
shared item they both modify too (like a total quantity) so SI will abort one of
them
● But does occur
4 Application developers should be careful about write skew
● SI can also cause a read-only transaction anomaly, where read-only transaction may see
an inconsistent state even if updaters are serializable
● We omit details
● Using snapshots to verify primary/foreign key integrity can lead to inconsistency
● Integrity constraint checking usually done outside of snapshot
SI In Oracle and PostgreSQL
● Warning: SI used when isolation level is set to serializable, by Oracle, and
PostgreSQL versions prior to 9.1
● PostgreSQL’s implementation of SI (versions prior to 9.1) described in
Section 26.4.1.3
● Oracle implements “first updater wins” rule (variant of “first committer wins”)
4 concurrent writer check is done at time of write, not at commit time
4 Allows transactions to be rolled back earlier
4 Oracle and PostgreSQL < 9.1 do not support true serializable execution
● PostgreSQL 9.1 introduced new protocol called “Serializable Snapshot
Isolation” (SSI)
4 Which guarantees true serializabilty including handling predicate reads
(coming up)
SI In Oracle and PostgreSQL
● Can sidestep SI for specific queries by using select .. for update in Oracle and
PostgreSQL
● E.g.,
1. select max(orderno) from orders for update
2. read value into local variable maxorder
3. insert into orders (maxorder+1, …)
● Select for update (SFU) treats all data read by the query as if it were also
updated, preventing concurrent updates
● Does not always ensure serializability since phantom phenomena can occur
(coming up)
● In PostgreSQL versions < 9.1, SFU locks the data item, but releases locks when the
transaction completes, even if other concurrent transactions are active
● Not quite same as SFU in Oracle, which keeps locks until all
● concurrent transactions have completed
Insert and Delete Operations
● If two-phase locking is used :
● A delete operation may be performed only if the transaction deleting the
tuple has an exclusive lock on the tuple to be deleted.
● A transaction that inserts a new tuple into the database is given an
X-mode lock on the tuple
● Insertions and deletions can lead to the phantom phenomenon.
● A transaction that scans a relation
4 (e.g., find sum of balances of all accounts in Perryridge)
and a transaction that inserts a tuple in the relation
4 (e.g., insert a new account at Perryridge)
(conceptually) conflict in spite of not accessing any tuple in common.
● If only tuple locks are used, non-serializable schedules can result
4 E.g. the scan transaction does not see the new account, but reads
some other tuple written by the update transaction
Insert and Delete Operations (Cont.)
● The transaction scanning the relation is reading information that indicates what
tuples the relation contains, while a transaction inserting a tuple updates the same
information.
● The conflict should be detected, e.g. by locking the information.
● One solution:
● Associate a data item with the relation, to represent the information about what
tuples the relation contains.
● Transactions scanning the relation acquire a shared lock in the data item,
● Transactions inserting or deleting a tuple acquire an exclusive lock on the data
item. (Note: locks on the data item do not conflict with locks on individual
tuples.)
● Above protocol provides very low concurrency for insertions/deletions.
● Index locking protocols provide higher concurrency while
preventing the phantom phenomenon, by requiring locks
on certain index buckets.
Index Locking Protocol
● Index locking protocol:
● Every relation must have at least one index.
● A transaction can access tuples only after finding them through one or more
indices on the relation
● A transaction Ti that performs a lookup must lock all the index leaf nodes that
it accesses, in S-mode
4 Even if the leaf node does not contain any tuple satisfying the index
lookup (e.g. for a range query, no tuple in a leaf is in the range)
● A transaction Ti that inserts, updates or deletes a tuple ti in a relation r
4 must update all indices to r
4 must obtain exclusive locks on all index leaf nodes affected by the
insert/update/delete
● The rules of the two-phase locking protocol must be observed
● Guarantees that phantom phenomenon won’t occur
Next-Key Locking
● Index-locking protocol to prevent phantoms required locking entire leaf
● Can result in poor concurrency if there are many inserts
● Alternative: for an index lookup
● Lock all values that satisfy index lookup (match lookup value, or fall in
lookup range)
● Also lock next key value in index
● Lock mode: S for lookups, X for insert/delete/update
● Ensures that range queries will conflict with inserts/deletes/updates
● Regardless of which happens first, as long as both are concurrent
Concurrency in Index Structures
● Indices are unlike other database items in that their only job is to help in
accessing data.
● Index-structures are typically accessed very often, much more than other
database items.
● Treating index-structures like other database items, e.g. by 2-phase locking
of index nodes can lead to low concurrency.
● There are several index concurrency protocols where locks on internal nodes
are released early, and not in a two-phase fashion.
● It is acceptable to have nonserializable concurrent access to an index as
long as the accuracy of the index is maintained.
4 In particular, the exact values read in an internal node of a
B+-tree are irrelevant so long as we land up in the correct leaf node.
Concurrency in Index Structures (Cont.)
● Example of index concurrency protocol:
● Use crabbing instead of two-phase locking on the nodes of the B+-tree, as follows.
During search/insertion/deletion:
● First lock the root node in shared mode.
● After locking all required children of a node in shared mode, release the lock on the
node.
● During insertion/deletion, upgrade leaf node locks to exclusive mode.
● When splitting or coalescing requires changes to a parent, lock the parent in
exclusive mode.
● Above protocol can cause excessive deadlocks
● Searches coming down the tree deadlock with updates going up the tree
● Can abort and restart search, without affecting transaction
● Better protocols are available; see Section 16.9 for one such protocol, the B-link tree
protocol
● Intuition: release lock on parent before acquiring lock on child
4 And deal with changes that may have happened between lock release and
acquire
Weak Levels of Consistency
● Degree-two consistency: differs from two-phase locking in that S-locks may
be released at any time, and locks may be acquired at any time
● X-locks must be held till end of transaction
● Serializability is not guaranteed, programmer must ensure that no
erroneous database state will occur]
● Cursor stability:
● For reads, each tuple is locked, read, and lock is immediately released
● X-locks are held till end of transaction
● Special case of degree-two consistency
Weak Levels of Consistency in SQL
● SQL allows non-serializable executions
● Serializable: is the default
● Repeatable read: allows only committed records to be read, and repeating
a read should return the same value (so read locks should be retained)
4 However, the phantom phenomenon need not be prevented
– T1 may see some records inserted by T2, but may not see others
inserted by T2
● Read committed: same as degree two consistency, but most systems
implement it as cursor-stability
● Read uncommitted: allows even uncommitted data to be read
● In many database systems, read committed is the default consistency level
● has to be explicitly changed to serializable when required
4 set isolation level serializable
Transactions across User Interaction
● Many applications need transaction support across user interactions
● Can’t use locking
● Don’t want to reserve database connection per user
● Application level concurrency control
● Each tuple has a version number
● Transaction notes version number when reading tuple
4 select r.balance, r.version into :A, :version
from r where acctId =23
● When writing tuple, check that current version number is same as the version
when tuple was read
4 update r set r.balance = r.balance + :deposit
where acctId = 23 and r.version = :version
● Equivalent to optimistic concurrency control without validating read set
● Used internally in Hibernate ORM system, and manually in many applications
● Version numbering can also be used to support first committer wins check of snapshot
isolation
● Unlike SI, reads are not guaranteed to be from a single snapshot
Deadlocks
● Consider the following two transactions:
T1: write (X) T2: write(Y)
write(Y) write(X)
x2
x1
y1
memory disk
Data Access (Cont.)
● Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.
● Ti's local copy of a data item X is called xi.
● Transferring data items between system buffer blocks and its private
work-area done by:
● read(X) assigns the value of data item X to the local variable xi.
● write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
● Note: output(BX) need not immediately follow write(X). System can
perform the output operation when it deems fit.
● Transactions
● Must perform read(X) before accessing X for the first time (subsequent
reads can be from local copy)
● write(X) can be executed at any time before the transaction commits
Recovery and Atomicity
● To ensure atomicity despite failures, we first output information describing
the modifications to stable storage without modifying the database itself.
● We study log-based recovery mechanisms in detail
● We first present key concepts
● And then present the actual recovery algorithm
● Less used alternative: shadow-copy and shadow-paging (brief details in
book)
shadow-copy
Log-Based Recovery
● A log is kept on stable storage.
● The log is a sequence of log records, and maintains a record of update
activities on the database.
● When transaction Ti starts, it registers itself by writing a
<Ti start>log record
● Before Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old value), and V2 is
the value to be written to X (the new value).
● When Ti finishes it last statement, the log record <Ti commit> is written.
● Two approaches using logs
● Deferred database modification
● Immediate database modification
Immediate Database Modification
● The immediate-modification scheme allows updates of an uncommitted
transaction to be made to the buffer, or the disk itself, before the transaction
commits
● Update log record must be written before database item is written
● We assume that the log record is output directly to stable storage
● (Will see later that how to postpone log record output to some extent)
● Output of updated blocks to stable storage can take place at any time before
or after transaction commit
● Order in which blocks are output can be different from the order in which
they are written.
● The deferred-modification scheme performs updates to buffer/disk only at
the time of transaction commit
● Simplifies some aspects of recovery
● But has overhead of storing local copy
Transaction Commit
● A transaction is said to have committed when its commit log record is output
to stable storage
● all previous log records of the transaction must have been output already
● Writes performed by a transaction may still be in the buffer when the
transaction commits, and may be output later
Immediate Database Modification Example
Log Write Output
<T0 start>
<T0, A, 1000, 950>
<To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600 BC output before T1
BB , BC commits
<T1 commit>
BA
BA output after T0
● Note: BX denotes block containing X.
commits
Concurrency Control and Recovery
● With concurrent transactions, all transactions share a single disk buffer and a
single log
● A buffer block can have data items updated by one or more transactions
● We assume that if a transaction Ti has modified an item, no other transaction
can modify the same item until Ti has committed or aborted
● i.e. the updates of uncommitted transactions should not be visible to
other transactions
4 Otherwise how to perform undo if T1 updates A, then T2 updates A
and commits, and finally T1 has to abort?
● Can be ensured by obtaining exclusive locks on updated items and
holding the locks till end of transaction (strict two-phase locking)
● Log records of different transactions may be interspersed in the log.
Undo and Redo Operations
● Undo of a log record <Ti, X, V1, V2> writes the old value V1 to X
● Redo of a log record <Ti, X, V1, V2> writes the new value V2 to X
● Undo and Redo of Transactions
● undo(Ti) restores the value of all data items updated by Ti to their old
values, going backwards from the last log record for Ti
4 each time a data item X is restored to its old value V a special log
record <Ti , X, V> is written out
4 when undo of a transaction is complete, a log record
<Ti abort> is written out.
● redo(Ti) sets the value of all data items updated by Ti to the new values,
going forward from the first log record for Ti
4 No logging is done in this case
Undo and Redo on Recovering from Failure
Tc Tf
T1
T2
T3
T4
……
<checkpoint L>
…..
<checkpoint L>
last_checkpoint
…..
Log
Failure with Loss of Nonvolatile Storage
● So far we assumed no loss of non-volatile storage
● Technique similar to checkpointing used to deal with loss of non-volatile
storage
● Periodically dump the entire content of the database to stable storage
● No transaction may be active during the dump procedure; a procedure
similar to checkpointing must take place
4 Output all log records currently residing in main memory onto
stable storage.
4 Output all buffer blocks onto the disk.
4 Copy the contents of the database to stable storage.
4 Output a record <dump> to log on stable storage.
Recovering from Failure of Non-Volatile Storage
● To recover from disk failure
● restore database from most recent dump.
● Consult the log and redo all transactions that committed after the
dump
● Can be extended to allow transactions to be active during dump;
known as fuzzy dump or online dump
● Similar to fuzzy checkpointing
Recovery with Early Lock Release and
Logical Undo Operations
Recovery with Early Lock Release
● Support for high-concurrency locking techniques, such as those used for
B+-tree concurrency control, which release locks early
● Supports “logical undo”
● Recovery based on “repeating history”, whereby recovery executes exactly
the same actions as normal processing
Logical Undo Logging
● Operations like B+-tree insertions and deletions release locks early.
● They cannot be undone by restoring old values (physical undo), since once
a lock is released, other transactions may have updated the B+-tree.
● Instead, insertions (resp. deletions) are undone by executing a deletion
(resp. insertion) operation (known as logical undo).
● For such operations, undo log records should contain the undo operation to be
executed
● Such logging is called logical undo logging, in contrast to physical undo
logging
4 Operations are called logical operations
● Other examples:
4 delete of tuple, to undo insert of tuple
– allows early lock release on space allocation information
4 subtract amount deposited, to undo deposit
– allows early lock release on bank balance
Physical Redo
● Redo information is logged physically (that is, new value for each write)
even for operations with logical undo
● Logical redo is very complicated since database state on disk may not be
“operation consistent” when recovery starts
● Physical redo logging does not conflict with early lock release
Operation Logging
● Operation logging is done as follows:
1. When operation starts, log <Ti, Oj, operation-begin>. Here Oj is a unique
identifier of the operation instance.
2. While operation is executing, normal log records with physical redo and
physical undo information are logged.
3. When operation completes, <Ti, Oj, operation-end, U> is logged, where U
contains information needed to perform a logical undo information.
Example: insert of (key, record-id) pair (K5, RID7) into index I9
1 2 3 4 4' 3'
2' 1'
ARIES Data Structures: DirtyPage Table
● DirtyPageTable
● List of pages in the buffer that have been updated
● Contains, for each such page
4 PageLSN of the page
4 RecLSN is an LSN such that log records before this LSN have already
been applied to the page version on disk
– Set to current end of log when a page is inserted into dirty page table
(just before being updated)
– Recorded in checkpoints, helps to minimize redo work
ARIES Data Structures
ARIES Data Structures: Checkpoint Log
● Checkpoint log record
● Contains:
4 DirtyPageTable and list of active transactions
4 For each active transaction, LastLSN, the LSN of the last log record
written by the transaction
● Fixed position on disk notes LSN of last completed
checkpoint log record
● Dirty pages are not written out at checkpoint time
4 Instead, they are flushed out continuously, in the background
● Checkpoint is thus very low overhead
● can be done frequently
ARIES Recovery Algorithm
ARIES recovery involves three passes
● Analysis pass: Determines
● Which transactions to undo
● Which pages were dirty (disk version not up to date) at time of crash
● RedoLSN: LSN from which redo should start
● Redo pass:
● Repeats history, redoing all actions from RedoLSN
4 RecLSN and PageLSNs are used to avoid redoing actions already
reflected on page
● Undo pass:
● Rolls back all incomplete transactions
4 Transactions whose abort was complete earlier are not undone
– Key idea: no need to undo these transactions: earlier undo actions
were logged, and are redone as required
Aries Recovery: 3 Passes
● Analysis, redo and undo passes
● Analysis determines where redo should start
● Undo has to go back till start of earliest incomplete transaction
Undo pass
ARIES Recovery: Analysis
Analysis pass
● Starts from last complete checkpoint log record
● Reads DirtyPageTable from log record
● Sets RedoLSN = min of RecLSNs of all pages in DirtyPageTable
4 In case no pages are dirty, RedoLSN = checkpoint record’s LSN
● Sets undo-list = list of transactions in checkpoint log record
● Reads LSN of last log record for each transaction in undo-list from
checkpoint log record
● Scans forward from checkpoint
● .. Cont. on next page …
ARIES Recovery: Analysis (Cont.)
Analysis pass (cont.)
● Scans forward from checkpoint
● If any log record found for transaction not in undo-list, adds transaction to
undo-list
● Whenever an update log record is found
4 If page is not in DirtyPageTable, it is added with RecLSN set to LSN of
the update log record
● If transaction end log record found, delete transaction from undo-list
● Keeps track of last log record for each transaction in undo-list
4 May be needed for later undo
● At end of analysis pass:
● RedoLSN determines where to start redo pass
● RecLSN for each page in DirtyPageTable used to minimize redo work
● All transactions in undo-list need to be rolled back
ARIES Redo Pass
Redo Pass: Repeats history by replaying every action not already reflected in the
page on disk, as follows:
● Scans forward from RedoLSN. Whenever an update log record is found:
1. If the page is not in DirtyPageTable or the LSN of the log record is less
than the RecLSN of the page in DirtyPageTable, then skip the log record
2. Otherwise fetch the page from disk. If the PageLSN of the page fetched
from disk is less than the LSN of the log record, redo the log record
NOTE: if either test is negative the effects of the log record have already
appeared on the page. First test avoids even fetching the page from
disk!
ARIES Undo Actions
● When an undo is performed for an update log record
● Generate a CLR containing the undo action performed (actions performed
during undo are logged physicaly or physiologically).
4 CLR for record n noted as n’ in figure below
● Set UndoNextLSN of the CLR to the PrevLSN value of the update log record
4 Arrows indicate UndoNextLSN value
● ARIES supports partial rollback
● Used e.g. to handle deadlocks by rolling back just enough to release reqd. locks
● Figure indicates forward actions after partial rollbacks
4 records 3 and 4 initially, later 5 and 6, then full rollback