0% found this document useful (0 votes)
33 views

Unit4 Dbms PDF

A transaction is a unit of program execution that accesses and possibly updates data items. A transaction to transfer funds from account A to B involves reading the balances of both accounts, updating the balances, and writing the updated balances back. Two main issues are failures during execution and concurrent execution of multiple transactions.

Uploaded by

RS Gamer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Unit4 Dbms PDF

A transaction is a unit of program execution that accesses and possibly updates data items. A transaction to transfer funds from account A to B involves reading the balances of both accounts, updating the balances, and writing the updated balances back. Two main issues are failures during execution and concurrent execution of multiple transactions.

Uploaded by

RS Gamer
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Transaction Concept

Content (CO4)

 A transaction is a unit of program execution that accesses


and possibly updates various data items.
 E.g. transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
 Two main issues to deal with:
 Failures of various kinds, such as hardware failures and
system crashes
 Concurrent execution of multiple transactions
Example of Fund
ContentTransfer (CO4)
• Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
• Atomicity requirement
– if the transaction fails after step 3 and before step 6, money
will be “lost” leading to an inconsistent database state
• Failure could be due to software or hardware
– the system should ensure that updates of a partially
executed transaction are not reflected in the database
Example of Fund Transfer
Content (CO4) (cont..)
• Durability requirement
– the updates to the database by the transaction must persist even if there are software or
hardware failures.
• Isolation requirement
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B
• Isolation can be ensured trivially by running transactions serially
– that is, one after the other.
• However, executing multiple transactions concurrently has significant
benefits, as we will see later.
ACID Properties
Content (CO4)

Transaction -- unit of program execution that accesses and


possibly updates
• Atomicity. Either all operations of the transaction are properly
reflected in the database or none are.
• Consistency. Execution of a transaction in isolation preserves
the consistency of the database.
• Isolation. Although multiple transactions may execute
concurrently, each transaction must be unaware of other
concurrently executing transactions.
– That is, for every pair of transactions Ti and Tj, it appears to Ti that
either Tj, finished execution before Ti started, or Tj started
execution after Ti finished.
• Durability. After a transaction completes successfully, the
changes it has made to the database persist, even if there are
system failures.
Transaction State (CO4)
Content

• Active – the initial state; the transaction stays in this state


while it is executing
• Partially committed – after the final statement has been
executed.
• Failed -- after the discovery that normal execution can no
longer proceed.
• Aborted – after the transaction has been rolled back and the
database restored to its state prior to the start of the
transaction. Two options after it has been aborted:
– restart the transaction
• can be done only if no internal logical error
– kill the transaction
• Committed – after successful completion.
TransactionContent
State (Cont.)(CO4)
Schedules
Content (CO4)

• Schedule – a sequences of instructions that specify the


chronological order in which instructions of concurrent
transactions are executed.

– a schedule for a set of transactions must consist of all


instructions of those transactions

– must preserve the order in which the instructions


appear in each individual transaction.
Serializability
Content (CO4)

• Basic Assumption
– Each transaction preserves database consistency.
– Thus serial execution of a set of transactions preserves
database consistency.
– A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule.

• Different forms of schedule equivalence give rise to


the notions of:
1. conflict serializability
2. view serializability
Conflicting Instructions
Content (CO4)

• Instructions li and lj of transactions Ti and Tj respectively,


conflict if and only if there exists some item Q accessed
by both li and lj, and at least one of these instructions
wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
• Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.
– If li and lj are consecutive in a schedule and they do not conflict,
their results would remain the same even if they had been
interchanged in the schedule.
Conflict Serializability
Content (CO4)

 If a schedule S can be transformed into a schedule S´ by a


series of swaps of non-conflicting instructions, we say that
S and S´ are conflict equivalent.

 We say that a schedule S is conflict serializable if it is


conflict equivalent to a serial schedule
Conflict Serializability
Content (cont..) (CO4)

• Schedule 3 can be transformed into Schedule 6, a serial


schedule where T2 follows T1, by series of swaps of non-
conflicting instructions.
– Therefore Schedule 3 is conflict serializable.

Schedule 3 Schedule 6
Checking Whether a Schedule is Conflict
Content
Serializable Or Not (CO4)

• Follow the following steps to check whether a given non-serial


schedule is conflict serializable or not-
 Step-01:Find and list all the conflicting operations.
 Step-02: Start creating a precedence graph by drawing one node
for each transaction.
 Step-03:Draw an edge for each conflict pair such that if Xi (V)
and Yj (V) forms a conflict pair then draw an edge from Ti to
Tj.This ensures that Ti gets executed before Tj.
 Step-04: Check if there is any cycle formed in the graph. If there
is no cycle found, then the schedule is conflict serializable
otherwise not.
Conflict Serializability
Content (Cont.) (CO4)

• Example of a schedule that is not conflict serializable:


We are unable to swap instructions in the above schedule to
obtain either the serial schedule < T3, T4 >, or the serial
schedule < T4, T3 >.
DailyContent
Quiz (CO4)

PROBLEMS BASED ON CONFLICT SERIALIZABILITY-

Problem-01:
Check whether the given schedule S is conflict serializable or not-
S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)
Solution-
Step-01:
List all the conflicting operations and determine the
dependency between the transactions-
R2(A) , W1(A) (T2 → T1)
R1(B) , W2(B) (T1 → T2)
R3(B) , W2(B) (T3 → T2)
DailyContent
Quiz (CO4)

• Step-02:
• Draw the precedence graph-

 Clearly, there exists a cycle in the precedence graph.


 Therefore, the given schedule S is not conflict serializable.
DailyContent
Quiz (CO4)

PROBLEMS BASED ON CONFLICT SERIALIZABILITY-


Problem-02:
Check whether the given schedule S is conflict serializable
and recoverable or not- Solution-
Checking Whether S is
Conflict Serializable Or Not-
Step-01:
List all the conflicting operations and
determine the dependency between the
transactions-
 R2(X) , W3(X) (T2 → T3)
 R2(X) , W1(X) (T2 → T1)
 W3(X) , W1(X) (T3 → T1)
 W3(X) , R4(X) (T3 → T4)
 W1(X) , R4(X) (T1 → T4)
 W2(Y) , R4(Y) (T2 → T4)
DailyContent
Quiz (CO4)

PROBLEMS BASED ON CONFLICT SERIALIZABILITY-


Problem-02:
Check whether the given schedule S is conflict serializable
and recoverable or not- Solution-
Checking Whether S is
Conflict Serializable Or Not-
Step-01:
List all the conflicting operations and
determine the dependency between the
transactions-
 R2(X) , W3(X) (T2 → T3)
 R2(X) , W1(X) (T2 → T1)
 W3(X) , W1(X) (T3 → T1)
 W3(X) , R4(X) (T3 → T4)
 W1(X) , R4(X) (T1 → T4)
 W2(Y) , R4(Y) (T2 → T4)
Checking Whether a Schedule is Conflict
Content
Serializable Or Not contd.. (CO4)

Step-02: Draw the precedence graph-

 Clearly, there exists no cycle in the precedence graph.


 Therefore, the given schedule S is conflict serializable.

Checking Whether S is Recoverable Or Not-


Conflict serializable schedules are always recoverable.
 Therefore, the given schedule S is recoverable.
Finding the Serialized
Content Schedules-(CO4)

• All the possible topological orderings of the above


precedence graph will be the possible serialized
schedules.
• The topological orderings can be found by performing
the Topological Sort of the above precedence graph.

After performing the topological sort, the possible serialized


schedules are-

1. T1 → T3 → T4 → T2
2. T1 → T4 → T3 → T2
3. T4 → T1 → T3 → T2
View Serializability
Content (CO4)

• Let S and S´ be two schedules with the same set of


transactions. S and S´ are view equivalent if the following
three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q,
then in schedule S’ also transaction Ti must read the initial
value of Q.
Thumb Rule
“Initial readers must be same for all the data items”.

2. If in schedule S transaction Ti executes read(Q), and that


value was produced by transaction Tj (if any), then in schedule
S’ also transaction Ti must read the value of Q that was
produced by the same write(Q) operation of transaction Tj .
Thumb Rule
“Write-read sequence must be same.”
View Serializability
Content (contd..) (CO4)

3. The transaction (if any) that performs the final write(Q)


operation in schedule S must also perform the final write(Q)
operation in schedule S’.
Thumb Rule
“Final writers must be same for all the data items”
Checking Whether a Schedule is View
Content
Serializable Or Not (CO4)
• Method-01:
Check whether the given schedule is conflict
serializable or not.
• If the given schedule is conflict serializable, then it is
surely view serializable.
• If the given schedule is not conflict serializable, then it
may or may not be view serializable. Go and check
using other methods.

Thumb Rules
 All conflict serializable schedules are view serializable.
 All view serializable schedules may or may not be conflict
serializable.
Checking Whether a Schedule is View
Content
Serializable Or Not (CO4)
Method-02:
• Check if there exists any blind write operation.
Writing without reading is called as a blind write
 If there does not exist any blind write, then the schedule
is surely not view serializable.
 If there exists any blind write, then the schedule may or
may not be view serializable. Go and check using other
methods.
Thumb Rule
No blind write means not a view serializable schedule.
Checking Whether a Schedule is View
Content
Serializable Or Not (CO4)
Method-03:
in this method, try finding a view equivalent serial schedule.
 By using the above three conditions, write all the
dependencies.
 Then, draw a graph using those dependencies.
 If there exists no cycle in the graph, then the schedule is view
serializable otherwise not.
PROBLEMS BASED ONContent
VIEW SERIALIZABILITY (CO4)

Problem-01
Check whether the given schedule S is view serializable or not-
PROBLEMS BASED ONContent
VIEW SERIALIZABILITY (CO4)
Solution-
• Checking Whether S is Conflict Serializable Or Not-
• Step-01:
List all the conflicting operations and determine the
dependency between the transactions-
– W1(B) , W2(B) (T1 → T2)
– W1(B) , W3(B) (T1 → T3)
– W1(B) , W4(B) (T1 → T4)
– W2(B) , W3(B) (T2 → T3)
– W2(B) , W4(B) (T2 → T4)
– W3(B) , W4(B) (T3 → T4)
• Step-02:
– Draw the precedence graph-
– Clearly, there exists no cycle in the precedence graph.
– Therefore, the given schedule S is conflict serializable.

•Thus, we conclude that the given schedule is also view serializable.


View Serializability
Content (Cont.) (CO4)

• A schedule S is view serializable if it is view equivalent to a


serial schedule.
• Every conflict serializable schedule is also view serializable.
• Below is a schedule which is view-serializable but not
conflict serializable.

• What serial schedule is above equivalent to?


• Every view serializable schedule that is not conflict
serializable has blind writes.
View Serializability
Content (Cont.) (CO4)

Q. Check the view serializabilty

Schedule = w3(x),r2(x),w2(y),r1(z),w3(y),w1(y)

View serializable but non- conflict


serializable.
Schedule
Content(CO4)
Non-serializable schedule (CO4)
Content

• A non-serial schedule which is not serializable is called


as a non-serializable schedule.
• A non-serializable schedule is not guaranteed to
produce the same effect as produced by some serial
schedule on any consistent database.

Note:
Non-serializable schedules-
• may or may not be consistent
• may or may not be recoverable
RecoverableContent
schedule (CO4)

if a transaction Tj reads a data item previously written


by a transaction Ti , then the commit operation of Ti
appears before the commit operation of Tj.

Example :

S1: R1(x), W1(x), R2(x), R1(y), R2(y), W2(x), W1(y), C1, C2;
Irrecoverable schedule (CO4)
Content

• If in a schedule,
 A transaction performs a dirty read operation from an
uncommitted transaction
(Reading from an uncommitted transaction is called
as a dirty read)
 And commits before the transaction from which it has
read the value
then such a schedule is known as an Irrecoverable
Schedule.
• The following schedule is not recoverable if T9 commits
immediately after the read
Irrecoverable schedule (CO4)
Content

S2: R1(x), R2(x), R1(z), R3(x),


R3(y), W1(x), W3(y), R2(y),
W2(z), W2(y), C1, C2, C3;
Types of Recoverable
Content Schedules (CO4)

• A recoverable schedule may be any one of these kinds-


1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule
Cascading Schedule-
 If in a schedule, failure of one transaction causes several
other dependent transactions to rollback or abort, then
such a schedule is called as a Cascading
Schedule or Cascading Rollback or Cascading Abort.
 It simply leads to the wastage of CPU time.
 If T10 fails, T11 and T12 must
also be rolled back.
 lead to the undoing of a
significant amount of work
Cascadeless Schedules
Content (CO4)

If in a schedule, a transaction is not allowed to read a data


item until the last transaction that has written it is committed
or aborted, then such a schedule is called as a Cascadeless
Schedule.
In other words,
•Cascadeless schedule allows only committed read
operations.
•Therefore, it avoids cascading roll back and thus saves CPU
time.
Strict Schedule
Content (CO4)

• If in a schedule, a transaction is neither allowed to read nor


write a data item until the last transaction that has written it is
committed or aborted, then such a schedule is called as
a Strict Schedule.
• In other words,
 Strict schedule allows only committed read and write
operations.
 Clearly, strict schedule implements more restrictions than
cascadeless schedule.
Database Recovery
Content

the process of restoring the database to


the most recent consistent state that
existed just before the failure
RecoveryContent
Algorithms (CO4)

 Recovery algorithms are techniques to ensure database


consistency and transaction atomicity and durability
despite failures
 Recovery algorithms have two parts
1. Actions taken during normal transaction processing to
ensure enough information exists to recover from
failures
2. Actions taken after a failure to recover the database
contents to a state that ensures atomicity, consistency
and durability
Log-Based Recovery (CO4)
Content

• A log is kept on stable storage.


– The log is a sequence of log records, and maintains a
record of update activities on the database.
• When transaction Ti starts, it registers itself by writing a
<Ti start>log record
• Before Ti executes write(X), a log record <Ti, X, V1, V2> is
written, where V1 is the value of X before the write, and V2
is the value to be written to X.
– Log record notes that Ti has performed a write on data
item Xj Xj had value V1 before the write, and will have
value V2 after the write.
• When Ti finishes it last statement, the log record <Ti
commit> is written.
• We assume for now that log records are written directly to
stable storage (that is, they are not buffered)
Log-Based Content
Recovery (CO4) contd..

• Two approaches using logs


– Deferred database modification
– Immediate database modification
Deferred Database
ContentModification (CO4)

 The deferred database modification scheme records all


modifications to the log, but defers all the writes to after
partial commit.
 Assume that transactions execute serially
 Transaction starts by writing <Ti start> record to log.
 A write(X) operation results in a log record <Ti, X, V> being
written, where V is the new value for X
 Note: old value is not needed for this scheme
 The write is not performed on X at this time, but is
deferred.
 When Ti partially commits, <Ti commit> is written to the log
 Finally, the log records are read and used to actually
execute the previously deferred writes.
Deferred Database
ContentModification (CO4)

 During recovery after a crash, a transaction needs to be


redone if and only if both <Ti start> and<Ti commit> are
there in the log.
 Redoing a transaction Ti ( redoTi) sets the value of all data
items updated by the transaction to the new values.
 Crashes can occur while
 the transaction is executing the original updates, or
 while recovery action is being taken

 example transactions T0 and T1 (T0 executes before T1):

T0: read (A) T1 : read (C)


A: - A - 50 C:- C- 100
Write (A) write (C)
read (B)
B:- B + 50
write (B)
Deferred Database Modification (Cont.) (CO4)
Content

• Below we show the log as it appears at three instances of


time.

• If log on stable storage at time of crash is as in case:


(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Immediate Database
Content Modification (Cont.)
(CO4)

• The immediate database modification scheme allows


database updates of an uncommitted transaction to be made
as the writes are issued
– since undoing may be needed, update logs must have both
old value and new value
• Update log record must be written before database item is
written
– We assume that the log record is output directly to stable
storage
– Can be extended to postpone log record output, so long as
prior to execution of an output(B) operation for a data
block B, all log records corresponding to items B must be
flushed to stable storage
Immediate Database
Content Modification (Cont.)
(CO4)

• Output of updated blocks can take place at any time before or


after transaction commit
• Order in which blocks are output can be different from the
order in which they are written.
• Recovery procedure has two operations instead of one:
– undo(Ti) restores the value of all data items updated by Ti
to their old values, going backwards from the last log
record for Ti
– redo(Ti) sets the value of all data items updated by Ti to
the new values, going forward from the first log record for
Ti
Immediate Database
Content Modification (Cont.)
(CO4)

• Both operations must be idempotent


– That is, even if the operation is executed multiple times
the effect is the same as if it is executed once
• Needed since operations may get re-executed
during recovery
• When recovering after failure:
– Transaction Ti needs to be undone if the log contains
the record
<Ti start>, but does not contain the record <Ti commit>.
– Transaction Ti needs to be redone if the log contains
both the record <Ti start> and the record <Ti commit>.
• Undo operations are performed first, then redo
operations.
Immediate Database
Content Modification (Cont.)
(CO4)

Log Write Output

<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600
BB, BC
<T1 commit>
BA
• Note: BX denotes block containing X.
Immediate DB Modification
Content Recovery Example
(Cont.) (CO4)
Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B
are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Checkpoints
Content (CO4)

• Problems in recovery procedure as discussed


earlier :
1. searching the entire log is time-consuming
2. we might unnecessarily redo transactions which
have already done
3. output their updates to the database.
• Streamline recovery procedure by periodically
performing checkpointing
1. Output all log records currently residing in main
memory onto stable storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint> onto stable
storage.
Checkpoints
Content (CO4) contd..

• During recovery we need to consider only the most recent


transaction Ti that started before the checkpoint, and
transactions that started after Ti.
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is
found.
3. Need only consider the part of log following above start
record. Earlier part of log can be ignored during recovery,
and can be erased whenever desired.
4. For all transactions (starting from Ti or later) with no <Ti
commit>, execute undo(Ti). (Done only in case of
immediate modification.)
5. Scanning forward in the log, for all transactions starting
from Ti or later with a <Ti commit>, execute redo(Ti).
CheckpointsContent
Example (CO4) contd..

Tc Tf
T1
T2
T3
T4

checkpoint system failure

• T1 can be ignored (updates already output to disk due to


checkpoint)
• T2 and T3 redone.
• T4 undone
Distributed Database (CO4)
Content

Distributed Database
Management System
Centralized Databases
Content (CO4)

• Centralized database required that corporate data be


stored in a single central site
– Performance degradation as number of remote sites grew

– High cost to maintain large centralized DBs

– Reliability problems with one, central site


Centralized Databases
Content (CO4)
Centralized Databases
Content (CO4)
Challenges in Centralized
Content Databases (CO4)

Dynamic business environment and centralized


database’s shortcomings spawned a demand for
applications based on data access from different
sources at multiple locations
Business operations became more decentralized
geographically

Competition at global level

Rapid technological change in computers


Distributed Content
Databases (CO4)
Distributed Content
Databases (CO4)

Governs storage and processing of logically


related data over interconnected computer
systems in which both data and processing
functions are distributed among several sites

 A distributed database system consists of loosely


coupled sites that share no physical component
 Database systems that run on each site are
independent of each other
 Transactions may access data at one or more sites
Distributed Content
Databases (CO4)

Advantages
Data are located near “greatest demand” site
Faster data access
Faster data processing
Growth facilitation
Improved communications
Reduced operating costs
User-friendly interface
Less danger of a single-point failure
Processor independence
DDBMS Disadvantages
Content

• Complexity of management and control


• Security
• Lack of standards
• Increased storage requirements
• Greater difficulty in managing the data
environment
• Increased training cost
Distributed Processing vs Distributed Database
Content

• Distributed processing – a database’s logical processing


is shared among two or more physically independent
sites that are connected through a network
– One computer performs I/O, data selection and validation while
second computer creates reports
– Uses a single-site database but the processing chores are shared
among several sites
• Distributed database – stores a logically related database
over two or more physically independent sites. The sites
are connected via a network
– Database is composed of database fragments which are located
at different sites and may also be replicated among various sites
Distributed Data
ContentStorage (CO4)

 Replication
 System maintains multiple copies of data, stored in
different sites, for faster retrieval and fault tolerance.
 Fragmentation
 Relation is partitioned into several fragments stored
in distinct sites
 Replication and fragmentation can be combined
 Relation is partitioned into several fragments: system
maintains several identical replicas of each such
fragment.
Distributed Transactions
Content

• Transaction may access data at several sites.


• Each site has a local transaction manager responsible for:
– Maintaining a log for recovery purposes
– Participating in coordinating the concurrent execution of the
transactions executing at that site.
• Each site has a transaction coordinator, which is
responsible for:
– Starting the execution of transactions that originate at the
site.
– Distributing subtransactions at appropriate sites for
execution.
– Coordinating the termination of each transaction that
originates at the site, which may result in the transaction
being committed at all sites or aborted at all sites.
Transaction System
Content Architecture
Recovery and Concurrency
Content Control

• In-doubt transactions have a <ready T>, but neither a


<commit T>, nor an <abort T> log record.
• The recovering site must determine the commit-abort status of
such transactions by contacting other sites; this can slow and
potentially block recovery.
• Recovery algorithms can note lock information in the log.
– Instead of <ready T>, write out <ready T, L> L = list of locks held by T
when the log is written (read locks can be omitted).
– For every in-doubt transaction T, all the locks noted in the
<ready T, L> log record are reacquired.
• After lock reacquisition, transaction processing can resume; the
commit or rollback of in-doubt transactions is performed
concurrently with the execution of new transactions.
Directory Systems (CO4)
Content

 Typical kinds of directory information


 Employee information such as name, id, email, phone,
office addr, ..
 Even personal information to be accessed from multiple
places " e.g. Web browser bookmarks
 White pages
 Entries organized by name or identifier “
 Meant for forward lookup to find more about an entry
 Yellow pages
 Entries organized by properties
 For reverse lookup to find entries matching specific
requirements
 When directories are to be accessed across an
organization

You might also like