0% found this document useful (0 votes)
13 views35 pages

Chapter17 1

This document discusses system failures and transaction management in databases. It covers different types of failures like erroneous data entry, media failures, and catastrophic failures. It also explains transactions, ensuring atomicity and consistency through logging, and the roles of the transaction manager, log manager, and recovery manager in maintaining data integrity during failures.

Uploaded by

aurchichowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views35 pages

Chapter17 1

This document discusses system failures and transaction management in databases. It covers different types of failures like erroneous data entry, media failures, and catastrophic failures. It also explains transactions, ensuring atomicity and consistency through logging, and the roles of the transaction manager, log manager, and recovery manager in maintaining data integrity during failures.

Uploaded by

aurchichowdhury
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 17 (TCDS)

Coping With System


Failures
Sukarna Barua
Associate Professor, CSE, BUET
2/11/2024
Failure Modes
▪ Many things can go wrong:
▪ Erroneous data entry
▪ Media failures
▪ Catastrophic failures

2/11/2024
Erroneous Data Entry
▪ Erroneous data entry
▪ Mistype one digit of your phone number
-Impossible to detect the error
▪ Omits a digit from the phone number
-This can be addressed through constraints and triggers
-Can also be addressed from application front end and back end

2/11/2024
Media failures
▪ Media failures:
▪ Local failure of a disk such as a bit or a few bit
-Detectable through parity checks
▪ Head crashes where entire disk becomes unreadable
-RAID can be used to restore data
-Archiving data, a copy of the database on tape or optical disk.
-Archive is periodically created
-Full archive
-Incremental archive
-Archive stored at a safe location
-Online redundant copies of database distributed among several sites

2/11/2024
Catastrophic Failures
▪ Catastrophic Failures:
▪ Data storage media destroyed completely
▪ Events: Explosions, fires, or vandalisms
▪ Protection:
- RAID will not help. Why?
-Archived copies can be used to restore lost data
-Online redundant copies can restore lost data

2/11/2024
System failures
▪ Example of system failures: Memory failure, power failure, operation system
failure during the runtime of the database software
▪ A transaction in a database executes a number of steps to modify/update database
state. [Credit Account A, Debit Account B]
▪ System failures can cause a transaction to fail at an intermediate state. [Fails after
Credit Account A]
▪ This puts database in an inconsistent state. [Account A credited, Account B is not
debited]
▪ Protect from system failure:
▪ Logging all database changes
▪ In case of failure, exact changes done may be reversed and reapplied

2/11/2024
What is a transaction?
▪ Unit of execution of database operations.
▪ Starts when a session is created with a database.
▪ Each query is part of an ongoing transaction
▪ Transactions ends when COMMIT or ROLLBACK ("abort") commands are
issued.
▪ A new transaction starts after COMMIT or ROLLBACK.

2/11/2024
Atomicity of Transactions
▪ Atomicity of Transactions:
▪ A transaction must execute fully or nothing. This is called atomicity.
▪ Remember the transaction: Credit Account A, Debit Account B.
▪ Either both operations execute or neither operations execute to keep
database in a consistent state
▪ If one executes, but not the other, then database state is inconsistent.
▪ Transaction manager ensures all transactions execute "atomically"

2/11/2024
Transaction Manager
▪ Transaction Manager
▪ Ensures atomicity of database transactions.
▪ Interacts with query processor to execute queries
in a transaction
▪ Interacts with log manager to keep log records
▪ Assure that no two transactions interfere with one another
▪ Log manager:
▪ Maintains log records
▪ Logs are initially stored in buffers and sent to disk
when needed
▪ Recovery manager:
▪ Recovers data during crash events.
▪ Examines log records and repair data

2/11/2024
Transaction Manager
▪ Transaction Manager
▪ Ensures atomicity of database transactions.
▪ Interacts with query processor to execute queries
in a transaction
▪ Interacts with log manager to keep log records
▪ Assure that no two transactions interfere with one another
▪ Log manager:
▪ Maintains log records
▪ Logs are initially stored in buffers and sent to disk
when needed
▪ Recovery manager:
▪ Recovers data during crash events.
▪ Examines log records and repair data
▪ Buffer manager:
▪ Manager memory buffers and their read/write from/to disk

2/11/2024
Database State
▪ Database elements: Relations, disk blokes, tuples, etc.
▪ Database state: A database has a state consisting of values of its elements.
▪ Consistent state:
▪ Consistent states satisfy –
▪ All explicit database constraints
▪ Implicit constraints defined by triggers, and
▪ Implicit constraints that are minds of the database designer.
▪ Implicit constraints are hard to enforce using database constraints such as check, primary key
foreign key, etc.
▪ Remember the example: Credit and debit must happen together. This is an implied business
logic constraints.

2/11/2024
Correctness Principle
▪ Correctness Principle:
If a transaction executes in the absence of any other transactions or system
failures, and it starts with the database in a consistent state, then the database is
also in consistent state when the transaction ends.

2/11/2024
Converse to Correctness Principle
▪ Converse to Correctness Principle
▪ If part of a transaction executes, then database state become inconsistent.

▪ A transaction is atomic; it must be executed as a whole or not at all.

▪ Transactions that execute simultaneously are likely to lead to an inconsistent state


▪ Need to take steps to control interactions of different transactions

2/11/2024
Read and Write of Database Elements
▪ A transaction reads a database element (e.g., a disk block) as follows:
▪ Loads the elements from disk block to memory buffer
▪ Updates values of the elements in the buffer
▪ Buffer will be written to disk
▪ May or may not happen immediately [decided by buffer manager]
▪ In order to reduce number of disk I/Os, copying updated buffers to disk
may happen delayed.

2/11/2024
Primitive Operations for Read/Write
▪ Assume X is a database block.
▪ INPUT(X): Copy disk block X to memory buffer.
▪ READ(X,t): Copy disk block X to transaction's local variable t.
▪ If X is not in memory buffer, then INPUT(X) is executed before READ(X,t).
▪ WRITE(X,t): Copy updated value of t to memory buffer X.
▪ If X is not in memory buffer, INPUT(X) needs to be executed.
▪ OUTPUT(X): Copy X from memory to disk.

2/11/2024
Primitive Operations for Read/Write
▪ Assume X is a database block.
▪ INPUT(X): Copy disk block X to memory buffer.
▪ READ(X,t): Copy disk block X to transaction's local variable t.
▪ If X is not in memory buffer, then INPUT(X) is executed before READ(X,t).
▪ WRITE(X,t): Copy updated value of t to memory buffer X.
▪ If X is not in memory buffer, INPUT(X) needs to be executed.
▪ OUTPUT(X): Copy X from memory to disk.
▪ READ and WRITE are usually issued by transaction manager
▪ INPUT and OUTPUT are usually issued by buffer manager

2/11/2024
Transactions in Primitive Steps
▪ Assume a transaction T consists of following two steps:
A := A*2
B := B*2
▪ Say before T executes, A=B.
▪ After T executes, A=B should be satisfied; otherwise database is not consistent.

▪ Transaction T can be broken down into primitive database steps as follows:


READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t);

▪ After T ends, buffer manager issues OUTPUT(A) and OUTPUT(B)

2/11/2024
Transactions in Primitive Steps
▪ Assume a transaction T consists of following two steps:
A := A*2
B := B*2
▪ Say before T executes, A=B.
▪ After T executes, A=B should be satisfied; otherwise database is not consistent.
▪ Transaction T can be broken down into primitive database steps as follows:
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t);

▪ After T ends, buffer manager issues following commands:


▪ INPUT(A), INPUT(B) before READ(A,t)
▪ OUTPUT(A) and OUTPUT(B) after transaction ends.

2/11/2024
Transactions in Primitive Steps
▪ Assume A=B=t before the transaction.

▪ What happens if system crashes before OUTPUT(A)?


▪ Database is consistent as if T never executed
▪ What happens if system crashes before OUTPUT(B)?
▪ Database is in inconsistent state

2/11/2024
Log Manager: Log Records
▪ Log manager:
▪ Maintains log records
▪ Logs are initially stored in buffers and sent to disk when needed
▪ Helps maintaining consistent database state

▪ What is a log record?


▪ A log records stores what the transaction has done
▪ Stored in non-volatile storage (e.g, disk).
▪ Can be used to recover database from crash.
▪ Log records are initially written to log blocks (memory buffers allocated for logs)
▪ Log blocks are written to disk when feasible

2/11/2024
Types of Log Records
▪ <START T>: Transaction T has begun

▪ <COMMIT T>: Transaction has completed successfully.


▪ Any changes made by T must be stored in disk.
▪ What happens if buffers are not copied to disk?
▪ Log manager ensures changes will be reflected.

▪ <ABORT T>: Transaction T has not completed successfully. Any changes made by T
must not be stored in disk.
▪ What if buffer manager writes updated buffers to disk?
▪ Log manager ensures changes will not be reflected.
▪ Delayed write may be used to reduce disk I/O

2/11/2024
Types of Log Records
▪ <T,X,v>: Transaction T has changed element X whose previous value was v.
▪ Changes normally occur in memory buffers first.
▪ This log record is used by “Undo Logging” mechanism
▪ This log record is a response to WRITE(X,t) not OUTPUT(X).
▪ New value of X is not stored in log records for undo logging.
▪ Why? [we shall see soon]

▪ <T,X,v>: Transaction T has changed element X whose new value is v.


▪ This log record is used by “Redo Logging” mechanism
▪ Old value of X is not stored in log records for redo logging.
▪ Why? [we shall see soon]

2/11/2024
Undo Logging: Rules
▪ 𝑈1 : If a transaction T modifies database element X, then the log record of the form
<T,X,v> must be written to disk before updated X is written to disk.

▪ 𝑈2 : If a transaction commits, then its COMMIT log record must be written to disk
after all updated elements have been written to disk.

▪ Summery of 𝑈1 and 𝑈2 : The log records and updates be written to disk in the
following order:
a) Log records for updates
b) Updated elements
c) The COMMIT log record
▪ Constraint of ordering (a) and (b) apply to individual X [ order is not necessary for all
database elements combined changed by a transaction].

2/11/2024
Undo Logging: Rules
▪ Consider the transaction and log record processing:
(1) <START T> is written to log.
(4) <T,B,8> is written to log.
(7) <T,B,8> is written to log.
(8) FLUSH LOG flushes logs to the disk
(9-10) Updated A and B are written to disk.
(11) <COMMIT T> written to log.
(12) FLUSH LOG flushes <COMMIT T>
log record to disk.

▪ Note that before writing <COMMIT T> log record to disk, updated A and B must be written to
disk [Undo logging policy]
2/11/2024
Commit of Transactions
▪ What does it mean when a transaction is successful?
▪ COMMIT/ABORT record have been written to log and flushed to disk.
▪ User is confirmed of transaction completion (commit/abort).
▪ This is known as synchronous commit protocol.
▪ Prevents data loss during system crash. [How?]

▪ What is COMMIT/ABORT have written to log memory only?


▪ User is confirmed of transaction completion (commit/abort).
▪ COMMIT is only written to memory log but not reflected on disk.
▪ This is known as asynchronous commit.
▪ Data loss can happen due to system crash. [How?]
▪ In both cases, database remains consistent. [How?]

2/11/2024
Recovery Using Undo Logging
▪ Suppose a system failure occurs during a running transaction T.

▪ Problem: Some changes written to disk but some were not.


▪ Transaction did not completed.
▪ All partial changes (reflected on disk) need to be reversed.

▪ Solution: Recover manager uses log records to revert the partial changes reflected by
T.
▪ After all partial changes are reversed, database returns to a consistent state.

2/11/2024
Recovery Using Undo Logging
▪ Step 1: Recovery manager decides whether a transaction T has committed or not.

▪ Step 2a: T is a committed transaction:


▪ If there is a <COMMIT T> log record in disk for transaction T, then T is
considered as committed.
▪ According to "Undo Logging" policy all updates have been written to disk.
▪ T have not left database in inconsistent state.

▪ No need to do anything for committed transaction by the recovery manager.

2/11/2024
Recovery Using Undo Logging
▪ Step 2b: T is an uncommitted transaction:
▪ If there is <START T> but no <COMMIT T> log record in disk for transaction T.
▪ T is considered as uncommitted (aborted).
▪ However, some updates may have been written to disk.
▪ T is an incomplete transaction and may have left database in inconsistent state.
▪ Recovery manager checks all log records of the form <T,X,v>
▪ “Undo Logging” policy ensures that if updated X were written to disk, then <T,X,v> log record
was also written to disk.
▪ Recovery manager reverts the value of elements X to old value v.
▪ Restores database in consistent state

2/11/2024
Recovery Manager Operation
▪ Several uncommitted transactions in the log.
▪ Order of reverting all <T,X,v> is important.

▪ Algorithm for recovery:


▪ Scan log records backwards from the end and perform the following for each log
record 𝑙:
▪ If 𝑙 is <COMMIT T>, then record T as committed.
▪ If 𝑙 is <T,X,v> and T is committed, do nothing
▪ If 𝑙 is <T,X,v> and T is uncommitted, revert X to its previous value v.
▪ At the end, write <ABORT T> in logs for all uncommitted transaction, and flush
this log to the disk. [why needed?]

2/11/2024
Crashes During Recovery
▪ What happens if system crashes again during recovery?
▪ Recovery operation is idempotent.
▪ Remember <T,X,v> reverts X to its prevous value v. What if it is done multiple
times?
▪ Same effect as if done once.
▪ It does not matter the current value of X while rewriting v, we just write.
▪ Recovery operation can be repeated multiple times and results are same.
▪ Thus, if system crashes again during recovery, recovery process can be started
again.

2/11/2024
Recovery Manager Operation
▪ Analyze recover manager operation for the following cases:
▪ System crashes after step 12.
▪ Stem crashes before step 11.
▪ System crashes before step 6.
▪ System crashes before step 4.

2/11/2024
Checkpointing
▪ Log records can become very large.
▪ Difficult to scan full log records after a system crash.
▪ Use checkpointing mechanism:
▪ Log records before checkpoint need not be examined during recovery
▪ All transactions before the checkpoint must have COMMIT or ABORT record on
the log.

2/11/2024
Checkpointing
▪ Checkpointing Algorithm:
▪ Step 1: Stop accepting new transactions.
▪ Step 2: Wait until all active transactions commit or abort.
▪ Step 3: Flush the logs to the disk.
▪ Step 4: Write a log record <CKPT> and flush this log to disk.
▪ Step 5: Resume accepting new transactions.

▪ At system failure, recovery manager only needs to examine log records up to a


<CKPT> point.
▪ All transactions before <CKPT> have a COMMIT or ABORT for corresponding transaction in the
disk.
▪ COMMIT or ABORT ensures transaction reflected successfully in database
2/11/2024
Nonquiescent Checkpointing
▪ Problem with normal checkpointing:
▪ All new transactions must wait until all existing transactions complete.
▪ Effectively shuts down the database for a long time if some active transactions take
a long time to complete
▪ Solution: Nonquiescent checkpointing
▪ Algorithm for nonquiescent checkpointing:
▪ Step 1: Write a log record <START CKPT(T1,T2,..,TK)> and flush the logs.
T1,T2,..,TK are all active transactions.
▪ Step 2: Wait until T1,T2,..,TK finish. But don't prohibit new transactions from
starting.
▪ Step 3: When T1,T2,..,TK finishes, write a log record <END CKPT> and flush the
logs.
2/11/2024
Recovery With Nonquiescent Checkpointing
▪ Recovery process algorithm:
▪ Recover manager scans log records backward from the end
▪ Processes log records as like before reverting changes if necessary.
▪ Following two cases are considered for stopping scan of log records:

▪ Case 1: If it finds an <END CKPT>, then all incomplete transactions began


after <START CKPT(T1,T2,..,TK). So scanning can stop when <START
CKPT> is encountered.

▪ Case 2: If it finds <START CKPT(T1,T2,..Tk), then it knowns only T1,T2,..TK


were active. So scan can stop at the earliest <START> record of T1,T2,..TK.

2/11/2024

You might also like