Transaction Management and Concurrency Control
Transaction Management and Concurrency Control
MANAGEMENT
1
In this Lecture
Database transactions and their properties
Concurrency control and what role it plays in
maintaining the database’s integrity
Locking methods are and how they work
2
A Transaction
Database transactions reflect real-world transactions that
are triggered by events such as buying a product,
registering for a course, or making a deposit into a
checking account.
Transactions are likely to contain many parts, such as
updating a customer’s account, adjusting product
inventory, and updating the seller’s accounts receivable.
All parts of a transaction must be successfully completed
to prevent data integrity problems.
Therefore, executing and managing transactions are
important database system activities. 3
A Transaction
In database terms, a transaction is any action that reads
from or writes to a database.
A transaction may consist of the following:
A simple SELECT statement to generate a list of table
contents.
A series of related UPDATE statements to change the
values of attributes in various tables.
A series of INSERT statements to add rows to one or more
tables.
A combination of SELECT, UPDATE, and INSERT statements.
4
A Transaction
A transaction is a logical unit of work that must be
entirely completed or entirely aborted;
No intermediate states are acceptable.
A successful transaction changes the database from
one consistent state to another.
5
6
Transaction
Properties
ACID
Transaction Properties
Each individual transaction must display
atomicity, consistency, isolation, and durability.
These four properties are sometimes referred
to as the ACID test.
Let’s look briefly at each of the properties.
7
Transaction Properties -
Atomicity
requires that all operations (SQL requests) of a
transaction be completed;
if not, the transaction is aborted.
If a transaction T1 has four SQL requests, all
four requests must be successfully completed;
otherwise, the entire transaction is aborted.
In other words, a transaction is treated as a
single, indivisible, logical unit of work.
8
Transaction Properties -
Consistency
indicates the permanence of the database’s
consistent state.
A transaction takes a database from one
consistent state to another.
When a transaction is completed, the database
must be in a consistent state.
If any of the transaction parts violates an integrity
constraint, the entire transaction is aborted.
9
Transaction Properties -
Isolation
means that the data used during the execution of a
transaction cannot be used by a second transaction
until the first one is completed.
In other words, if transaction T1 is being executed and
is using the data item X, that data item cannot be
accessed by any other transaction (T2 … Tn) until T1
ends.
This property is particularly useful in multiuser
database environments because several users can
access and update the database at the same time.
10
Transaction Properties -
Durability
ensures that once transaction changes are
done and committed,
they cannot be undone or lost, even in the
event of a system failure.
11
Transaction
12
Management
with SQL
Transaction Management
with SQL
The American National Standards Institute (ANSI) has
defined standards that govern SQL database
transactions.
Transaction support is provided by two SQL
statements: COMMIT and ROLLBACK.
The ANSI standards require that when a transaction
sequence is initiated by a user or an application
program, the sequence must continue through all
succeeding SQL statements until one of the following
four events occurs:
13
1. A COMMIT statement is reached, in which case all changes
are permanently recorded within the database.
The COMMIT statement automatically ends the SQL transaction.
14
Transaction Log
A DBMS uses a transaction log to keep track of all
transactions that update the database.
The DBMS uses the information stored in this log for a
recovery requirement triggered by a ROLLBACK
statement.
The ROLLBACK statement is used during:
a program’s abnormal termination,
or a system failure such as a network discrepancy or a disk
crash.
15
Transaction Log
Some RDBMSs use the transaction log to recover a
database forward to a currently consistent state.
After a server failure, for example, Oracle
automatically rolls back uncommitted transactions
and rolls forward transactions that were committed
but not yet written to the physical database.
This behavior is required for transactional correctness
and is typical of any transactional DBMS.
16
Transaction Log
While the DBMS executes transactions that modify the
database, it also automatically updates the transaction log.
The transaction log stores the following:
A record for the beginning of the transaction.
For each transaction component (SQL statement):
– The type of operation being performed (INSERT, UPDATE,
DELETE).
– The names of the objects affected by the transaction (the name
of the table).
– The “before” and “after” values for the fields being updated.
– Pointers to the previous and next transaction log entries for the
same transaction.
The ending (COMMIT) of the transaction. 17
Transaction Log
Table 1 illustrates a simplified transaction log that reflects a basic
transaction composed of two SQL UPDATE statements.
If a system failure occurs, the DBMS will examine the transaction log for all
uncommitted or incomplete transactions and restore (ROLLBACK) the
database to its previous state on the basis of that information.
When the recovery process is completed, the DBMS will write in the log all
committed transactions that were not physically written to the database
before the failure occurred.
18
Figure 1: A Transaction Log
19
Transaction Log
A feature used by the DBMS to keep track of all transaction operations that
update the database.
The information stored in this log is used by the DBMS for recovery
purposes.
If a ROLLBACK is issued before the termination of a transaction, the DBMS
will restore the database only for that particular transaction, rather than for
all of them, to maintain the durability of the previous transactions.
In other words, committed transactions are not rolled back.
20
Concurrency
Control
Problems
Solutions
21
Concurrency Control
Coordinating the simultaneous execution of transactions in a multiuser
database system is known as concurrency control.
The objective of concurrency control is to ensure the serializability of
transactions in a multiuser database environment.
To achieve this goal, most concurrency control techniques are oriented
toward preserving the isolation property of concurrently executing
transactions.
Concurrency control is important because the simultaneous execution of
transactions over a shared database can create several data integrity and
consistency problems.
22
The Scheduler
A special DBMS process that establishes the order in which the operations are
executed within concurrent transactions.
The scheduler interleaves the execution of database operations to ensure
serializability and isolation of transactions.
To determine the appropriate order, the scheduler bases its actions on
concurrency control algorithms, such as locking or time stamping methods.
Not all transactions are serializable.
The DBMS determines what transactions are serializable and proceeds to
interleave the execution of the transaction’s operations.
Generally, transactions that are not serializable are executed on a first-come,
first-served basis by the DBMS.
The scheduler’s main job is to create a serializable schedule of a transaction’s
operations, in which the interleaved execution of the transactions (T1, T2, T3,
etc.) yields the same results as if the transactions were executed in serial
order (one after another).
23
Concurrency Control -
Problems
The three main problems are:
lost updates
uncommitted data
and inconsistent retrievals.
24
Concurrency Control –
Problems – Lost Updates
The lost update problem occurs when two concurrent transactions, T1 and
T2, are updating the same data element and one of the updates is lost
(overwritten by the other transaction).
Assume that you have a product whose current PROD_QOH value is 35.
Also assume that two concurrent transactions, T1 and T2, occur and update
the PROD_QOH value for some item in the PRODUCT table.
The transactions are shown in Table 2.
25
Click icon to add picture
26
The sequence depicted in Table 4 shows how the lost update problem can arise
Concurrency Control –
Problems – Lost Updates
This occurs when a transaction can read a product’s PROD_QOH value from
the table before a previous transaction has been committed, using the
same product.
Note that the first transaction (T1) has not yet been committed when the
second transaction (T2) is executed.
Therefore, T2 still operates on the value 35, and its subtraction yields 5 in
memory.
In the meantime, T1 writes the value 135 to disk, which is promptly
overwritten by T2. In short, the addition of 100 units is “lost” during the
process.
27
Concurrency Control – Problems
– Uncommitted Data
Occurs when two transactions, T1 and T2, are executed concurrently and the first
transaction (T1) is rolled back after the second transaction (T2) has already
accessed the uncommitted data—thus violating the isolation property of
transactions.
T1 is forced to roll back due to an error during the updating of the invoice’s total;
it rolls back all the way, undoing the inventory update as well.
This time the T1 transaction is rolled back to eliminate the addition of the 100 units.
(See Table 5.)
Because T2 subtracts 30 from the original 35 units, the correct answer should be 5 28
29
Concurrency Control – Problems
– Inconsistent Retrievals
Occurs when a transaction accesses data before and after one or more
other transactions finish working with such data.
For example, an inconsistent retrieval would occur if transaction T1
calculated some summary (aggregate) function over a set of data while
another transaction (T2) was updating the same data.
The problem is that the transaction might read some data before it is
changed and other data after it is changed, thereby yielding inconsistent
results.
30
Concurrency Control – Problems
– Inconsistent Retrievals
To illustrate the problem, assume the following conditions:
31
Concurrency Control – Problems –
Inconsistent Retrievals
34
Concurrency Control – Lock
Granularity
Indicates the level of lock use.
Locking can take place at the following levels: database, table, page, row,
or even field (attribute).
35
Concurrency Control – Lock
Granularity – Database Level
In a database-level lock, the entire database is locked, thus preventing the
use of any tables in the database by transaction T2 while transaction T1 is
being executed.
This level of locking is good for batch processes, but it is unsuitable for
multiuser DBMSs.
You can imagine how s-l-o-w data access would be if thousands of
transactions had to wait for the previous transaction to be completed
before the next one could reserve the entire database.
36
An example of Database Level
lock
37
Concurrency Control – Lock
Granularity – Table Level
In a table-level lock, the entire table is locked, preventing access to any row
by transaction T2 while transaction T1 is using the table.
If a transaction requires access to several tables, each table may be locked.
However, two transactions can access the same database as long as they
access different tables.
Table-level locks, while less restrictive than database-level locks, cause traffic
jams when many transactions are waiting to access the same table.
Such a condition is especially irksome if the lock forces a delay when different
transactions require access to different parts of the same table—that is, when
the transactions would not interfere with each other.
Consequently, table-level locks are not suitable for multiuser DBMSs.
38
An example of a table level lock
39
Concurrency Control – Lock
Granularity – Page Level
In a page-level lock, the DBMS locks an entire diskpage.
A diskpage, or page, is the equivalent of a diskblock, which can be described
as a directly addressable section of a disk.
A page has a fixed size, such as 4K, 8K, or 16K.
For example, if you want to write only 73 bytes to a 4K page, the entire 4K
page must be read from disk, updated in memory, and written back to disk.
A table can span several pages, and a page can contain several rows of one
or more tables.
Page-level locks are currently the most frequently used locking
method for multiuser DBMSs.
40
An example of a page level
lock
41
Concurrency Control – Lock
Granularity – Row Level
A row-level lock is much less restrictive than the locks discussed earlier.
The DBMS allows concurrent transactions to access different rows of the
same table even when the rows are located on the same page.
Although the row-level locking approach improves the availability of data, its
management requires high overhead because a lock exists for each row in a
table of the database involved in a conflicting transaction.
Modern DBMSs automatically escalate a lock from a row level to a page level
when the application session requests multiple locks on the same page.
42
An example of a row level
lock
43
Concurrency Control – Lock
Granularity – Field Level
The field-level lock allows concurrent transactions to access the same row as
long as they require the use of different fields (attributes) within that row.
Although field-level locking clearly yields the most flexible multiuser data
access, it is rarely implemented in a DBMS because it requires an extremely
high level of computer overhead and because the row-level lock is much
more useful in practice.
44
Summary
Concurrency control coordinates the simultaneous execution of transactions.
The concurrent execution of transactions can result in three main problems:
lost updates, uncommitted data, and inconsistent retrievals.
The scheduler is responsible for establishing the order in which the
concurrent transaction operations are executed.
The transaction execution order is critical and ensures database integrity in
multiuser database systems.
45