0% found this document useful (0 votes)
16 views

Concurrency Control Protocols

Dbms materials

Uploaded by

divakarmass444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Concurrency Control Protocols

Dbms materials

Uploaded by

divakarmass444
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Query Processing and

Transactions

UNIT 3
Query Processing

• Translation of high-level queries into low-level expression


• Step wise process
• Requires concepts of relational algebra and file structure
• Extracting data from the database
• how these queries are processed and how they are optimized.
Query Processing
From above diagram
• The first step is to transform the query into a standard form.

• A query is translated into SQL and into a relational algebraic expression.

• During this process, Parser checks the syntax and verifies the relations and the
attributes which are used in the query.
Query Optimization

• Main aim of Query Optimization is to minimize the cost function,


I/O Cost + CPU Cost + Communication Cost

• It defines how an RDBMS can improve the performance of the query by


re-ordering the operations.

• It is the process of selecting the most efficient query evaluation plan from
among various strategies if the query is complex.
Example
SELECT Ename FROM Employee
WHERE Salary > 5000;

Translated into Relational Algebra Expression

σ Salary > 5000 (π Ename (Employee))

OR

π Ename (σ Salary > 5000 (Employee))


Transaction
Transaction
• A transaction is a single logical unit of work that accesses and
possibly modifies the contents of a database.

• Transactions access data using read and write operations.

• In order to maintain consistency in a database, before and after the


transaction, certain properties are followed.

• These are called ACID properties.


Transaction States :
Active State :
• When the instructions of the transaction are running then the
transaction is in active state. If all the ‘read and write’ operations are
performed without any error then it goes to the “partially committed
state”; if any instruction fails, it goes to the “failed state”.

Partially Committed :
• After completion of all the read and write operation the changes are
made in main memory or local buffer. If the changes are made
permanent on the DataBase then the state will change to “committed
state” and in case of failure it will go to the “failed state”.
Failed State :
• When any instruction of the transaction fails, it goes to the “failed
state” or if failure occurs in making a permanent change of data on
Data Base.

Aborted State :
• After having any type of failure the transaction goes from “failed
state” to “aborted state” and since in previous states, the changes are
only made to local buffer or main memory and hence these changes
are deleted or rolled-back.
Committed State:
• It is the state when the changes are made permanent on the Data Base
and the transaction is complete and therefore terminated in the
“terminated state”.

Terminated State:
• If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new
transaction and the old transaction is terminated.
a.Atomicity
• Transactions do not occur partially.
• Each transaction is considered as one unit and either runs to
completion or is not executed at all.
• It involves the following two operations.
—Abort: If a transaction aborts, changes made to the database are not
visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer
of 100 from account X to account Y.

• If the transaction fails after completion


of T1 but before completion of T2.( say,
after write(X) but before write(Y)), then
the amount has been deducted from X but
not added to Y.
• This results in an inconsistent database
state.
b. Consistency
• Integrity constraints must be maintained so that the database is consistent before
and after the transaction.
• It refers to the correctness of a database
• The total amount before and after the transaction must be maintained.
Example:
• Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, the database is consistent. Inconsistency occurs in case T1 completes
but T2 fails.
c. Isolation
• This property ensures that multiple transactions can occur
concurrently without leading to the inconsistency of the database
state.

• Transactions occur independently without interference.

• Changes occurring in a particular transaction will not be visible to any


other transaction

• This property ensures that the execution of transactions concurrently


Example

• Let X= 500, Y = 500.


Consider two transactions T and T”

• T’’: (X+Y = 500+500=1000)


is thus not consistent with the sum at end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).

This results in database inconsistency. Hence, transactions must take place in isolation and
changes should be visible only after they have been made to the main memory.
d.Durability

• This property ensures that once the transaction has completed execution.

• The updates and modifications to the database are stored in and written to
disk and they persist even if a system failure occurs.

• These updates now become permanent and are stored in non-volatile


memory.

• The effects of the transaction, thus, are never lost.


Concurrency Control
• Executing a single transaction at a time will increase the waiting time of the other

transactions which may result in delay in the overall execution. Hence for increasing

the overall throughput and efficiency of the system, several transactions are

executed.

• Concurrency control is a very important concept of DBMS which ensures the

simultaneous execution of data by several user without arising inconsistency in

data .
Example

• Imagine a library database where users can borrow and return books. Two users,
Alice and Bob, both want to borrow the same book from the library at the same time.

Without concurrency control:


Step 1: Alice checks the availability of the book and finds that it's available.
Step 2: Bob also checks the availability of the book and sees that it's available.
Step 3: Both Alice and Bob borrow the book simultaneously.
Step 4: The library database doesn't track that the book was borrowed by both Alice and
Bob, leading to inconsistencies.
With concurrency control:

Step 1: Alice checks the availability of the book and finds that it's available.
Step 2: The database locks the book to prevent other users from borrowing it.
Step 3: Bob also checks the availability of the book but is unable to borrow it because
it's already locked by Alice's transaction.
Step 4: After Alice returns the book, the lock is released, and Bob can then borrow it.
Step 5: Both transactions complete successfully without conflicts, and the database
remains consistent.
Concurrency Control Problems
• The database transaction consist of two major operations “Read” and “Write”.

• It is very important to manage these operations in the concurrent execution of the


transactions in order to maintain the consistency of the data.

Problems:

1. Dirty Read Problem(Write-Read conflict)

2. Lost Update Problem (W-W Conflict)

3. Unrepeatable Read Problem (W-R Conflict)


1. Dirty Read Problem(Write-Read conflict)
The dirty read problem occurs when,
i. one transaction updates an item of the database
ii. Somehow the transaction fails,
iii. Before the data gets rollback, the updated database item is accessed by another
transaction.
iv. Hence, there comes the Read-Write Conflict between both transactions.

Example:
• Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:
At time t1, transaction TX reads the value of Then at time t4, transaction TY reads account A
account A, i.e., $300. that will be read as $350.

At time t2, transaction TX adds $50 to Then at time t5, transaction TX rollbacks due to
account A that becomes $350. server problem, and the value changes back to
$300 (as initially).
At time t3, transaction TX writes the updated
value in account A, i.e., $350. But the value for account A remains $350 for
transaction TY as committed.
2. Lost Update Problem (W-W Conflict)
The problem occurs when,
i.Two different database transactions perform the read/write operations on the same
database items in concurrent execution.
ii.That makes the values of the items incorrect hence making the database inconsistent.
iii. Hence, there comes the Lost Update Problem.

Example:
Consider the below diagram where two transactions TX and TY, are performed on the
same account A where the balance of account A is $300.
i. At time t1, transaction TX reads the value of v. At time t6, transaction TX writes the
account A, i.e., $300 (only read). value of account A that will be updated as
ii. At time t2, transaction TX deducts $50 from $250 only, as TY didn’t update the value
account A that becomes $250 (only deducted and not yet.
updated/write). vii. Similarly, at time t7, transaction TY
iii. Alternately, at time t3, transaction TY reads the writes the values of account A, so it will
value of account A that will be $300 only because TX write as done at time t4 that will be $400. It
didn't update the value yet. means the value written by TX is lost, i.e.,
iv. At time t4, transaction TY adds $100 to account $250 is lost.
A that becomes $400 (only added but not
updated/write).
3. Unrepeatable Read Problem (W-R Conflict)
• Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
Example:
• Consider two transactions, TX and TY, performing the read/write operations on
account A, having an available balance = $300. The diagram is shown below:
• At time t1, transaction TX reads the value from account A, i.e., $300.
• At time t2, transaction TY reads the value from account A, i.e., $300.
• At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
• At time t4, transaction TY writes the updated value, i.e., $400.
• After that, at time t5, transaction TX reads the available value of account A, and that
will be read as $400.
• It means that within the same transaction TX, it reads two different values of account
A,
• i.e., $ 300 initially, and after updation made by transaction TY, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Concurrency Control Protocols
• Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database.
• Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.

Concurrency Control Protocols


1.Lock Based Concurrency Control Protocol
2.Time Stamp Concurrency Control Protocol
There are two main types of locks used in lock-based concurrency
control protocols:

1.Locked based concurrency control protocol


1.1 Shared Lock (S-lock)
1.2 Exclusive Lock (X-lock)
1.3Two-Phase Locking (2PL)
1.4.Strict Two-Phase Locking (Strict 2PL)
2.Timestamp based concurrency control protocol
1.Lock Based Concurrency Control Protocol
• Lock-based concurrency control protocols manage concurrent access on shared data
in database systems, ensuring data consistency and preventing conflicts during
simultaneous access and modification by multiple transactions.

• These protocols use locks to regulate access to data items, granting exclusive access
to a data item to one transaction at a time.

• Transactions requesting to read or write a data item acquire a lock on that item, while
other transactions attempting to access the same item must wait until the lock is
released.
1.1 Shared Lock (S-lock): Also known as a read lock, a shared lock
allows multiple transactions to read a data item simultaneously.
However, it prevents any transaction from writing to the data
item until all shared locks are released.

1.2. Exclusive Lock (X-lock): Also known as a write lock, an


exclusive lock grants exclusive access to a data item, preventing
other transactions from reading or writing to it until the lock is
released.
1.3. Two-Phase Locking (2PL):
Growing Phase :In this phase, the transaction starts acquiring locks before
performing any modification on the data items. Once a transaction acquires a
lock, that lock can not be released until the transaction reaches the end of the
execution.
Shrinking Phase : In this phase, the transaction releases all the acquired
locks once it performs all the modifications on the data item. Once the
transaction starts releasing the locks, it can not acquire any locks further.
1.4 Strict Two-Phase Locking (Strict 2PL):
In Strict 2PL, transactions acquire all the locks they need before starting their
execution phase and hold those locks until they commit or abort. This protocol
ensures serializability but can lead to lock contention and potential deadlocks.
2.3 Timestamp Ordering Protocol (TO):
To uses timestamps to order transactions and grants locks based on this
ordering. Transactions with earlier timestamps have priority in acquiring locks.
This protocol helps prevent deadlocks but may require additional mechanisms to
handle conflicting operations.
Advantages of Concurrency

• Improved Performance: Concurrency allows multiple transactions to execute


simultaneously, thereby increasing the overall throughput.
• Enhanced Scalability: By supporting concurrent execution of transactions, a DBMS
can scale to handle larger workloads without significantly impacting performance.
• Optimized Resource Utilization: Concurrency enables efficient utilization of system
resources such as CPU, memory, and I/O devices.
• Support for Real-time Applications: Concurrency is essential for real-time and
interactive applications where timely data processing and response are critical.
• Fault Tolerance and Availability: In distributed database systems, concurrency
management plays a crucial role in ensuring fault tolerance and high availability.
Disadvantages
• Overhead: Implementing concurrency control requires additional overhead, such
as acquiring and releasing locks on database objects.
• Complexity: Implementing concurrency control can be complex, particularly in
distributed systems or in systems with complex transactional logic.
Recovery System

• A recovery system in database management refers to the set of processes and


mechanisms designed to restore a database to a consistent state after a failure or
error occurs.
• The techniques used to recover lost data due to system crashes, transaction
errors, viruses, incorrect command execution, etc.
• This is crucial for ensuring data integrity and reliability in database systems.

There are mainly two types of recovery techniques used in DBMS


a. Rollback/Undo Recovery Technique
b.Commit/Redo Recovery Technique
c. Checkpoint Recovery
Flow of DBMS Recovery

• Main Memory Buffer: When transactions are executed, their changes are
temporarily stored in a main memory buffer. This buffer acts as a staging area
before the changes are written to disk.

• Disk Storage (Logs): The changes from the main memory buffer are then
written to disk in the form of logs. These logs record the sequence of
operations performed by transactions, including updates, inserts, and deletes.

• Database: The actual database resides on disk. It contains the stored data as
well as metadata about the database structure.
a. Rollback/Undo Recovery Technique
• The rollback/undo recovery technique is based on the principle of backing out or
undoing the effects of a transaction that has not been completed successfully due to a
system failure or error.
• This technique is accomplished by undoing the changes made by the transaction using
the log records stored in the transaction log.
• The transaction log contains a record of all the transactions that have been
performed on the database.
• The system uses the log records to undo the changes made by the failed transaction
and restore the database to its previous state.
b.Commit/Redo Recovery Technique
• The commit/redo recovery technique is based on the principle of reapplying the
changes made by a transaction that has been completed successfully to the
database.
• This technique is accomplished by using the log records stored in the transaction log
to redo the changes made by the transaction that was in progress at the time of the
failure or error.
• The system uses the log records to reapply the changes made by the transaction and
restore the database to its most recent consistent state.
c.Checkpoint Recovery

• Checkpoint Recovery is a technique used to reduce the recovery time by


periodically saving the state of the database in a checkpoint file.

• In the event of a failure, the system can use the checkpoint file to restore
the database to the most recent consistent state before the failure
occurred, rather than going through the entire log to recover the database.
The Log Information looks as follows:

The log is kept on disk:


• start_transaction(T): This log entry records that transaction T starts the
execution.
• read_item(T, X): This log entry records that transaction T reads the value of
database item X.
• write_item(T, X, old_value, new_value): This log entry records that transaction
T changes the value of the database item X from old_value to new_value. The old
value is sometimes known as a before an image of X, and the new value is known
as an afterimage of X.
Ctd…
• commit(T): This log entry records that transaction T has completed all accesses
to the database successfully and its effect can be committed (recorded
permanently) to the database.
• abort(T): This records that transaction T has been aborted.
• checkpoint: A checkpoint is a mechanism where all the previous logs are
removed from the system and stored permanently in a storage disk. Checkpoint
declares a point before which the DBMS was in a consistent state, and all the
transactions were committed.
Recovery Techniques

a. Undoing
b. Deferred Update
c. Immediate Update
d. Caching/Buffering
e. Shadow Paging
f. Backward Recovery
g. Forward Recovery
a. Undoing

• If a transaction crashes, then the recovery manager may undo


transactions i.e. reverse the operations of a transaction.
• This involves examining a transaction for the log entry
• write_item(T, x, old_value, new_value) and setting the value of item x
in the database to old-value.
b. Deferred Update

• This technique does not physically update the database on disk until a transaction
has reached its commit point.
• Before reaching commit, all transaction updates are recorded in the local
transaction workspace.
• If a transaction fails before reaching its commit point, it will not have changed
the database in any way so UNDO is not needed.
• It may be necessary to REDO the effect of the operations that are recorded in the
local transaction workspace, because their effect may not yet have been written in
the database.
• Hence, a deferred update is also known as the No-undo algorithm.
c.Immediate Update

• In the immediate update, the database may be updated by some operations


of a transaction before the transaction reaches its commit point.
• However, these operations are recorded in a log on disk before they are
applied to the database, making recovery still possible.
d. Caching/Buffering:

• One or more disk pages that include data items to be updated are
cached into main memory buffers and then updated in memory before
being written back to disk.
• A collection of in-memory buffers called the DBMS cache is kept
under the control of DBMS for holding these buffers.
• A directory is used to keep track of which database items are in the
buffer.
e. Shadow Paging
• Imagine you have a book that you're constantly updating. Instead of erasing and
rewriting pages every time there's a change, you create a copy of the entire book with
the changes. This copy is your "shadow." When everything is updated and ready, you
swap the old book with the new one.

• In databases, this translates to keeping a shadow copy of the database. When changes
are made, they go to this shadow copy first.
• If something goes wrong during an update, you can revert to the last consistent state
by simply discarding the changes in the shadow copy and sticking with the original
database.
• This helps ensure data integrity during recovery processes.
f.Backward Recovery: It involves restoring a database to a previously
consistent state before a failure occurred, typically using techniques like rolling
back transactions or restoring from backups. This helps recover from failures by
undoing changes that led to the inconsistency.

g. Forward Recovery: “Roll forward “and “REDO” refers to forwarding


recovery. When a database needs to be updated with all changes verified, this
forward recovery technique is helpful. Some failed transactions in this database
are applied to the database to roll those modifications forward.
State Serializability
• serializability is a term that is a property of the system that describes how the
different process operates the shared data.
• if you have a system that does a bunch of tasks with shared data, and the end
result looks like what you would expect if each task happened one after the other,
then you can call that system serializable.
• It's like when you follow a recipe step by step and the final dish turns out exactly
as described.
• Here the cooperation of the system means there is no overlapping in the
execution of the data.
• In DBMS, when the data is being written or read then, the DBMS can stop all the
other processes from accessing the data.
Example
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)

In this example, Transaction-2 begins its execution before Transaction-1 is finished,


and they are both working on the same data, i.e., “a” and “b”, interchangeably.
Where “R”-Read, “W”-Write
Serializability testing
• By utilizing the Serialization Graph or Precedence Graph to examine a schedule’s
serializability.
• It can be described as a Graph G(V, E) with vertices V = “V1, V2, V3,…, Vn” and
directed edges E = “E1, E2, E3,…, En”. One of the two operations—READ or
WRITE—performed by a certain transaction is contained in the collection of edges.
Where Ti -> Tj, means Transaction-Ti is either performing read or write before the
transaction-Tj.
Types of Serializability

You might also like