0% found this document useful (0 votes)
37 views33 pages

Dbms Notes Ramamoorthy

Uploaded by

rama moorthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views33 pages

Dbms Notes Ramamoorthy

Uploaded by

rama moorthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIT V

QUERY PROCESSING AND TRANSACTION


MANAGEMENT

Query Processing in DBMS

Query Processing is the activity performed in extracting data


from the database. In query processing, it takes various steps for
fetching the data from the database. The steps involved are:

1. Parsing and translation


2. Optimization
3. Evaluation

The query processing works in the following way:

Parsing and Translation

 As query processing includes certain activities for data


retrieval.
 Initially, the given user queries get translated in high-level
database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of
the file system.
 After this, the actual evaluation of the queries and a variety
of query -optimizing transformations and takes place. Thus
before processing a query, a computer system needs to
translate the query into a human-readable and
understandable language.
 The translation process in query processing is similar to the
parser of a query. When a user executes any query, for
generating the internal form of the query, the parser in the
system checks the syntax of the query, verifies the name of
the relation in the database, the tuple, and finally the
required attribute value.
 The parser creates a tree of the query, known as 'parse-tree.'
Further, translate it into the form of relational algebra. With
this, it evenly replaces all the use of the views when used in
the query.
 Suppose a user executes a query. As we have learned that
there are various methods of extracting the data from the
database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000.
For doing this, the following query is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to


be translated in the form of relational algebra. We can bring this
query in the relational algebra form as:

o σsalary>10000 (πsalary (Employee))


o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational


algebra operation by using different algorithms. So, in this way,
a query processing begins its working.

Evaluation

For this, with addition to the relational algebra translation, it is


required to annotate the translated relational algebra expression
with the instructions used for specifying and evaluating each
operation. Thus, after translating the user query, the system
executes a query evaluation plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs to
construct a query evaluation plan.
o The annotations in the evaluation plan may refer to the
algorithms to be used for the particular index or the specific
operations.
o Such relational algebra with annotations is referred to
as Evaluation Primitives. The evaluation primitives carry
the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of
primitive operations used for evaluating a query. The query
evaluation plan is also referred to as the query execution
plan.
o A query execution engine is responsible for generating the
output of the given query. It takes the query execution plan,
executes it, and finally makes the output for the user query

Optimization

o The cost of the query evaluation can vary for different


types of queries. Although the system is responsible for
constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query
evaluation plan, which minimizes its cost. This type of task
performed by the database system and is known as Query
Optimization.
o For optimizing a query, the query optimizer should have an
estimated cost analysis of each operation. It is because the
overall operation cost depends on the memory allocations
to several operations, execution costs, and so on.

Transaction

o The transaction is a set of logically related operation. It


contains a group of tasks.
o A transaction is an action or series of actions. It is
performed by a single user to perform operations for
accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from


X's account to Y's account. This small transaction contains
several low-level tasks:

X's Account

Open_Account(X)
Old_Balance = X.balance
New_Balance = Old_Balance - 800
X.balance = New_Balance
Close_Account(X)

Y's Account

Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)

Read(X): Read operation is used to read the value of X from the


database and stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the


database from the buffer.

Let's take an example to debit transaction from an account


which consists of following operations:

R(X);
X = X - 500;
W(X);

Let's assume the value of X before starting of the transaction is


4000.

o The first operation reads X's value from database and stores
it in a buffer.
o The second operation will decrease the value of X by 500.
So buffer will contain 3500.
o The third operation will write the buffer's value to the
database. So X's final value will be 3500.
States of Transactions

A transaction in a database can be in one of the following states


 Active − In this state, the transaction is being executed.


This is the initial state of every transaction.
 Partially Committed − When a transaction executes its
final operation, it is said to be in a partially committed
state.
 Failed − A transaction is said to be in a failed state if
any of the checks made by the database recovery system
fails. A failed transaction can no longer proceed further.
 Aborted − If any of the checks fails and the transaction has
reached a failed state, then the recovery manager rolls back
all its write operations on the database to bring the
database back to its original state where it was prior to the
execution of the transaction. Transactions in this state are
called aborted. The database recovery module can select
one of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
 Committed − If a transaction executes all its operations
successfully, it is said to be committed. All its effects are
now permanently established on the database system.

ACID Properties

 A transaction is a very small unit of a program and it may


contain several lowlevel tasks. A transaction in a database
system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties −
in order to ensure accuracy, completeness, and data
integrity.
 Atomicity − This property states that a transaction must be
treated as an atomic unit, that is, either all of its operations
are executed or none. There must be no state in a database
where a transaction is left partially completed. States
should be defined either before the execution of the
transaction or after the execution/abortion/failure of the
transaction.
 Consistency − The database must remain in a consistent
state after any transaction. No transaction should have any
adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a
transaction, it must remain consistent after the execution of
the transaction as well.
 Durability − The database should be durable enough to
hold all its latest updates even if the system fails or
restarts. If a transaction updates a chunk of data in a
database and commits, then the database will hold the
modified data. If a transaction commits but the system fails
before the data could be written on to the disk, then that
data will be updated once the system springs back into
action.
 Isolation − In a database system where more than one
transaction are being executed simultaneously and in
parallel, the property of isolation states that all the
transactions will be carried out and executed as if it is the
only transaction in the system. No transaction will affect
the existence of any other transaction.
DBMS Concurrency Control

Concurrency Control is the management procedure that is


required for controlling concurrent execution of the operations
that take place on a database.

But before knowing about concurrency control, we should know


about concurrent execution.

Concurrent Execution in DBMS

o In a multi-user system, multiple users can access and use


the same database at one time, which is known as the
concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user
system by different users.
o While working on the database transactions, there occurs
the requirement of using the database by multiple users for
performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is
performed should be done in an interleaved manner, and no
operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on
making the concurrent execution of the transaction
operations, there occur several challenging problems that
need to be solved.

Problems with Concurrent Execution


 In a database transaction, the two main operations
are READ and WRITE operations. So, there is a need to
manage these two operations in the concurrent execution of
the transactions as if these operations are not performed in
an interleaved manner, and the data may become
inconsistent.
 So, the following problems occur with the Concurrent
Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)

 The problem occurs when two different database


transactions perform the read/write operations on the same
database items in an interleaved manner
 (i.e., concurrent execution) that makes the values of the
items incorrect hence making the database inconsistent.

For example:

Consider the below diagram where two transactions TX and


TY, are performed on the same account A where the balance
of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e.,
$300 (only read).
o At time t2, transaction TX deducts $50 from account A that
becomes $250 (only deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of
account A that will be $300 only because TX didn't update
the value yet.
o At time t4, transaction TY adds $100 to account A that
becomes $400 (only added but not updated/write).
o At time t6, transaction TX writes the value of account A
that will be updated as $250 only, as TY didn't update the
value yet.
o Similarly, at time t7, transaction TY writes the values of
account A, so it will write as done at time t4 that will be
$400. It means the value written by TX is lost, i.e., $250 is
lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)

 The dirty read problem occurs when one transaction


updates an item of the database, and somehow the
transaction fails, and before the data gets rollback, the
updated database item is accessed by another transaction.
There comes the Read-Write Conflict between both
transactions.

For example:

Consider two transactions TX and TY in the below diagram


performing read/write operations on account A where the
available balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e.,
$300.
o At time t2, transaction TX adds $50 to account A that
becomes $350.
o At time t3, transaction TX writes the updated value in
account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be
read as $350.
o Then at time t5, transaction TX rollbacks due to server
problem, and the value changes back to $300 (as initially).
o But the value for account A remains $350 for transaction
TY as committed, which is the dirty read and therefore
known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)

 Also known as Inconsistent Retrievals Problem that occurs


when in a transaction, two different values are read for the
same database item.

For example:

Consider two transactions, TX and TY, performing the


read/write operations on account A, having an available
balance = $300. The diagram is shown below:

o At time t1, transaction TX reads the value from account A,


i.e., $300.
o At time t2, transaction TY reads the value from account A,
i.e., $300.
o At time t3, transaction TY updates the value of account A
by adding $100 to the available balance, and then it
becomes $400.
o At time t4, transaction TY writes the updated value, i.e.,
$400.
o After that, at time t5, transaction TX reads the available
value of account A, and that will be read as $400.
o It means that within the same transaction TX, it reads two
different values of account A, i.e., $ 300 initially, and after
updation made by transaction TY, it reads $400. It is an
unrepeatable read and is therefore known as the
Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid


such problems that take place in concurrent execution,
management is needed, and that is where the concept of
Concurrency Control comes into role.

Concurrency Control

Concurrency Control is the working concept that is required for


controlling and managing the concurrent execution of database
operations and thus avoiding the inconsistencies in the database.
Thus, for maintaining the concurrency of the database, we have
the concurrency control protocols.
Concurrency Control Protocols

The concurrency control protocols ensure the atomicity,


consistency, isolation, durability and serializability of the
concurrent execution of the database transactions. Therefore,
these protocols are categorized as:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

Lock-Based Protocol

In this type of protocol, any transaction cannot read or write data


until it acquires an appropriate lock on it. There are two types of
lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the


data item can only read by the transaction.
o It can be shared between the transactions because when the
transaction holds a lock, then it can't update the data on the
data item.
2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as


well as written by the transaction.
o This lock is exclusive, and in this lock, multiple
transactions do not modify the same data simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction.


Simplistic lock-based protocols allow all the transactions to get
the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol


o Pre-claiming Lock Protocols evaluate the transaction to list
all the data items on which they need locks.
o Before initiating an execution of the transaction, it requests
DBMS for all the lock on all those data items.
o If all the locks are granted then this protocol allows the
transaction to begin. When the transaction is completed
then it releases all the lock.
o If all the locks are not granted then this protocol allows the
transaction to rolls back and waits until all the locks are
granted.
3. Two-phase locking (2PL)
o The two-phase locking protocol divides the execution
phase of the transaction into three parts.
o In the first part, when the execution of the transaction
starts, it seeks permission for the lock it requires.
o In the second part, the transaction acquires all the locks.
The third phase is started as soon as the transaction releases
its first lock.
o In the third phase, the transaction cannot demand any new
locks. It only releases the acquired locks.

There are two phases of 2PL:


Growing phase: In the growing phase, a new lock on the data
item may be acquired by the transaction, but none can be
released.

Shrinking phase: In the shrinking phase, existing lock held by


the transaction may be released, but no new locks can be
acquired.

In the below example, if lock conversion is allowed then the


following phase can happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in


growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in
shrinking phase.

Example:
The following way shows how unlocking and locking work with
2-PL.

Transaction T1:

o Growing phase: from step 1-3


o Shrinking phase: from step 5-7
o Lock point: at 3

Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6

4. Strict Two-phase locking (Strict-2PL)


o The first phase of Strict-2PL is similar to 2PL. In the first
phase, after acquiring all the locks, the transaction
continues to execute normally.
o The only difference between 2PL and strict 2PL is that
Strict-2PL does not release a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and
then it releases all the locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock
release.

Deadlock in DBMS

 A deadlock is a condition where two or more transactions


are waiting indefinitely for one another to give up locks.
Deadlock is said to be one of the most feared complications
in DBMS as no task ever gets finished and is in waiting
state forever.

For example:

 In the student table, transaction T1 holds a lock on some


rows and needs to update some rows in the grade table.
Simultaneously, transaction T2 holds locks on some rows
in the grade table and needs to update the rows in the
Student table held by Transaction T1.
 Now, the main problem arises. Now Transaction T1 is
waiting for T2 to release its lock and similarly, transaction
T2 is waiting for T1 to release its lock. All activities come
to a halt state and remain at a standstill. It will remain in a
standstill until the DBMS detects the deadlock and aborts
one of the transactions.

Deadlock Avoidance

o When a database is stuck in a deadlock state, then it is


better to avoid the database rather than aborting or restating
the database. This is a waste of time and resource.
o Deadlock avoidance mechanism is used to detect any
deadlock situation in advance. A method like "wait for
graph" is used for detecting the deadlock situation but this
method is suitable only for the smaller database. For the
larger database, deadlock prevention method can be used.

Deadlock Detection
o In a database, when a transaction waits indefinitely to
obtain a lock, then the DBMS should detect whether the
transaction is involved in a deadlock or not. The lock
manager maintains a Wait for the graph to detect the
deadlock cycle in the database.

Wait for Graph


o This is the suitable method for deadlock detection. In this
method, a graph is created based on the transaction and
their lock. If the created graph has a cycle or closed loop,
then there is a deadlock.
o The wait for the graph is maintained by the system for
every transaction which is waiting for some data held by
the others. The system keeps checking the graph if there is
any cycle in the graph.

The wait for a graph for the above scenario is shown below:
Deadlock Prevention

o Deadlock prevention method is suitable for a large


database. If the resources are allocated in such a way that
deadlock never occurs, then the deadlock can be prevented.
o The Database management system analyzes the operations
of the transaction whether they can create a deadlock
situation or not. If they do, then the DBMS never allowed
that transaction to be executed.

Wait-Die scheme

 In this scheme, if a transaction requests for a resource


which is already held with a conflicting lock by another
transaction then the DBMS simply checks the timestamp of
both transactions. It allows the older transaction to wait
until the resource is available for execution.

Let's assume there are two transactions Ti and Tj and let TS(T)
is a timestamp of any transaction T. If T2 holds a lock by some
other transaction and T1 is requesting for resources held by T2
then the following actions are performed by DBMS:

1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and


Tj has held some resource, then Ti is allowed to wait until
the data-item is available for execution. That means if the
older transaction is waiting for a resource which is locked
by the younger transaction, then the older transaction is
allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has
held some resource and if Tj is waiting for it, then Tj is
killed and restarted later with the random delay but with the
same timestamp.

Wound wait scheme


o In wound wait scheme, if the older transaction requests for
a resource which is held by the younger transaction, then
older transaction forces younger one to kill the transaction
and release the resource. After the minute delay, the
younger transaction is restarted but with the same
timestamp.
o If the older transaction has held a resource which is
requested by the Younger transaction, then the younger
transaction is asked to wait until older releases it.

Recovery System

o The entire DBMS is a very complex structure with multiple


transactions being performed and carried out every second.
o The toughness and strength of a system depend not only on
the complex and secured architecture of a system but also
in the way how data are managed and maintained in the
worst cases. If the underlying architecture fails or crashes,
then there must be some techniques and procedures by
which the lost data during a transaction gets recovered.
o It is the method of restoring the database to its correct state
in the event of a failure at the time of the transaction or
after the end of a process. Earlier, you have been given the
concept of database recovery as a service that should be
provided by all the DBMS for ensuring that the database is
dependable and remains in a consistent state in the presence
of failures. In this context, dependability refers to both the
flexibility of the DBMS to various kinds of failure and its
ability to recover from those failures.

Recovery Facilities

Every DBMS should offer the following facilities to help out


with the recovery mechanism:
Backup mechanism makes backup copies at a specific
interval for the database.
Logging facilities keep tracing the current state of
transactions and any changes made to the database.
Checkpoint facility allows updates to the database for getting
the latest patches to be made permanent and keep secure from
vulnerability.
Recovery manager allows the database system for restoring
the database to a reliable and steady-state after any failure
occurs.

Failure Classification
To find that where the problem has occurred, we generalize a
failure into the following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it
reaches a point from where it can't go any further. If a few
transaction or process is hurt, then this is called as transaction
failure.
Reasons for a transaction failure could be –

1. Logical errors: If a transaction cannot complete due to some


code error or an internal error condition, then the logical error
occurs.
2. Syntax error: It occurs where the DBMS itself terminates an
active transaction because the database system is not able to
execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.

2. System Crash
o System failure can occur due to power failure or other
hardware or software failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage


is assumed not to be corrupted.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk
head crash, and unreachability to the disk or any other failure,
which destroy all or part of disk storage.

Log-Based Recovery
 The log is a sequence of records. Log of each transaction is
maintained in some stable storage so that if any failure
occurs, then it can be recovered from there.
 If any operation is performed on the database, then it will
be recorded in the log.
 But the process of storing the logs should be done before
the actual transaction is applied in the database.
 Let's assume there is a transaction to modify the City of a
student. The following logs are written for this transaction.
When the transaction is initiated, then it writes 'start' log.
1. <Tn, Start>
When the transaction modifies the City from 'Noida' to
'Bangalore', then another log is written to the file.
1. <Tn, City, 'Noida', 'Bangalore' >

When the transaction is finished, then it writes another log to


indicate the end of the transaction.

1. <Tn, Commit>
1. There are two approaches to modify the database:

1. Deferred database modification:


 The deferred modification technique occurs if the
transaction does not modify the database until it has
committed.
 In this method, all the logs are created and stored in the
stable storage, and the database is updated when a
transaction commits .
2. Immediate database modification:
 The Immediate modification technique occurs if database
modification occurs while the transaction is still active.
 In this technique, the database is modified immediately
after every operation. It follows an actual database
modification.
Recovery using Log records

 When the system is crashed, then the system consults the


log to find which transactions need to be undone and
which need to be redone.
 1. If the log contains the record <Ti, Start> and <Ti,
Commit> or <Ti, Commit>, then the Transaction Ti needs
to be redone.
 2. If log contains record<Tn, Start> but does not contain
the record either <Ti, commit> or <Ti, abort>, then the
Transaction Ti needs to be undone.

Checkpoint

 The checkpoint is a type of mechanism where all the


previous logs are removed from the system and
permanently stored in the storage disk.

 The checkpoint is like a bookmark. While the execution of


the transaction, such checkpoints are marked, and the
transaction is executed then using the steps of the
transaction, the log files will be created
 When it reaches to the checkpoint, then the transaction will
be updated into the database, and till that point, the entire
log file will be removed from the file. Then the log file is
updated with the new step of transaction till next
checkpoint and so on.
 The checkpoint is used to declare a point before which the
DBMS was in the consistent state, and all transactions
were committed.

Recovery using Checkpoint

 In the following manner, a recovery system recovers the


database from this failure:

The recovery system reads log files from the end to start. It
reads log files from T4 to T1.
 Recovery system maintains two lists, a redo-list, and an
undo-list.
 The transaction is put into redo state if the recovery system
sees a log with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>. In the redo-list and their previous list, all the
transactions are removed and then redone before saving
their logs.
For example:
 In the log file, transaction T2 and T3 will have <Tn, Start>
and <Tn, Commit>. The T1 transaction will have only <Tn,
commit> in the log file. That's why the transaction is
committed after the checkpoint is crossed. Hence it puts
T1, T2 and T3 transaction into redo list.
 The transaction is put into undo state if the recovery system
sees a log with <Tn, Start> but no commit or abort log
found. In the undo-list, all the transactions are undone, and
their logs are removed.
For example:
 Transaction T4 will have <Tn, Start>. So T4 will be put
into undo list since this transaction is not yet complete and
failed amid.

You might also like