0% found this document useful (0 votes)
20 views33 pages

Chapter17 1

The document discusses various types of database system failures including erroneous data entry, media failures, and catastrophic failures. It describes how failures such as disk failures can be addressed through techniques like RAID and database archiving. The document also discusses how transactions work and how the database ensures atomicity and consistency despite potential failures through the use of logging and by classifying log records for undo and redo operations during recovery.

Uploaded by

niloy2105044
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views33 pages

Chapter17 1

The document discusses various types of database system failures including erroneous data entry, media failures, and catastrophic failures. It describes how failures such as disk failures can be addressed through techniques like RAID and database archiving. The document also discusses how transactions work and how the database ensures atomicity and consistency despite potential failures through the use of logging and by classifying log records for undo and redo operations during recovery.

Uploaded by

niloy2105044
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 17 (TCDS)

Coping With System


Failures
Sukarna Barua
Associate Professor, CSE, BUET
03/20/2024
Failure Modes
 Many things can go wrong in database system:
 Erroneous data entry
 Media failures
 Catastrophic failures

03/20/2024
Erroneous Data Entry
 Erroneous data entry causes:
 Mistype one digit of your phone number
-Impossible to detect the error
 Omits a digit from the phone number
-This can be addressed through constraints and triggers
-Can also be addressed from application front end and back end

03/20/2024
Media failures
 Media failures:
 Local failure of a disk such as a bit or a few bit.
- Detectable through parity checks
 Head crashes where entire disk becomes unreadable.
- RAID can be used to restore data.
- Archiving data, a copy of the database on tape or optical disk.
- Archive is periodically created:
- Full archive: complete database is archived.
- Incremental archive: only new updates are archived.
- Archive stored at a safe location.
- Online redundant copies of database distributed among several sites.
03/20/2024
Catastrophic Failures
 Catastrophic Failures:
 Data storage media destroyed completely.
 Events: Explosions, fires, or vandalisms.
 Protection:
- RAID will not help. Why?
- Archived copies can be used to restore lost data.
- Online redundant copies can restore lost data.

03/20/2024
System failures
 Examples of system failures: Memory failure, power failure, operation system failure
during the runtime of the database software.
 System failure results in inconsistent database state:
 A transaction in a database executes a number of steps to modify/update database
state. [Credit Account A, Debit Account B]
 System failures can cause a transaction to fail at an intermediate state. [Fails after
Credit Account A]
 This puts database in an inconsistent state. [Account A credited, Account B is not
debited]
 Protect from system failure:
 Logging all database changes
 In case of failure, exact changes done may be reversed and reapplied
03/20/2024
What is a transaction?
 Transaction: A transaction is unit of execution of database operations.
 Transaction consists of one or more SQL statements.
 Transactions typically includes DML (insert/update/delete) statements.
 Transactions must COMMIT to make its changes permanent in database.
 A transaction
 starts when a session is created with a database. [ SQLPLUS, front-end, etc. ]
 each query is part of an ongoing transaction.
 ends when COMMIT or ROLLBACK ("abort") commands are issued.
 A new transaction automatically starts after the COMMIT or ROLLBACK.

03/20/2024
Atomicity of Transactions
 Atomicity of transactions:
 A transaction must execute fully or nothing at all. This is called atomicity.
 Remember the transaction: Credit Account A, Debit Account B.
 Either both operations execute or neither operations execute to keep
database in a consistent state.
 If one executes, but not the other, then database state is inconsistent.
 Transaction manager ensures all transactions execute atomically.

03/20/2024
Transaction Manager
 Database components responsible for maintaining
consistent database state:
 Transaction Manager
 Ensures atomicity of database transactions.
 Log manager:
 Maintains log records for recovery purpose.
 Recovery manager:
 Recovers data during crash events using log records.
 Buffer manager:
 Manager memory buffers and their read/write from/to disk.
 Query processor:
 Executes database queries as received from the transaction manager .
03/20/2024
Transaction Manager
 Transaction Manager:
 Ensures atomicity of database transactions.
 Interacts with query processor to execute queries
in a transaction.
 Interacts with log manager to keep log records.
 Assure that no two transactions interfere with one another
 This property is known as isolation (discussed later)

03/20/2024
Database State
 Database elements: Relations, disk blokes, tuples, etc.
 Database state: A database has a state consisting of values of its elements.
 Consistent state:
 Consistent states satisfy –
 All explicit database constraints
 Implicit constraints defined by triggers, and
 Implicit constraints that are minds of the database designer.
 Implicit constraints are hard to enforce using database constraints such as check, primary key
foreign key, etc.
 Remember the example: Credit and debit must happen together. This is an implied business
logic constraints.
03/20/2024
Correctness Principle
 Correctness Principle:
If a transaction executes in the absence of any other transactions or system
failures, and it starts with the database in a consistent state, then the database is
also in consistent state when the transaction ends.

03/20/2024
Converse to Correctness Principle
 Converse to Correctness Principle:
 If part of a transaction executes, then database state become inconsistent.

 A transaction is atomic; it must be executed as a whole or not at all.

 Transactions that execute simultaneously are likely to lead to an inconsistent state.


 Need to take steps to control interactions of different transactions

03/20/2024
Read and Write of Database Elements
 A transaction reads a database element (e.g., a disk block) as follows:
 Loads the elements from disk block to memory buffer.
 Updates values of the elements in the buffer.
 Buffer will be written to disk:
 May or may not happen immediately [decided by buffer manager]
 In order to reduce number of disk I/Os, copying updated buffers to disk
may happen delayed.

03/20/2024
Primitive Operations for Read/Write
 Assume X is a database block.
 INPUT(X): Copy disk block X to memory buffer.
 READ(X,t): Copy disk block X to transaction's local variable t.
 If X is not in memory buffer, then INPUT(X) is executed before READ(X,t).
 WRITE(X,t): Copy updated value of t to memory buffer X.
 If X is not in memory buffer, INPUT(X) needs to be executed.
 OUTPUT(X): Copy X from memory to disk.
 READ and WRITE are usually issued by transaction manager.
 INPUT and OUTPUT are usually issued by buffer manager.

03/20/2024
Transactions in Primitive Steps

 Assume a transaction T consists of following two steps:


A := A*2
B := B*2
 Say before T executes, A=B.
 After T executes, A=B should be satisfied; otherwise database is not consistent.
 Transaction T can be broken down into primitive database steps as follows:
READ(A,t); t := t*2; WRITE(A,t); READ(B,t); t := t*2; WRITE(B,t);

 Buffer manager issues following commands:


 INPUT(A), INPUT(B) before READ(A,t).
 OUTPUT(A) and OUTPUT(B) after transaction ends.

03/20/2024

Transactions in Primitive Steps

 Assume A=B=t before the transaction.

 What happens if system crashes before OUTPUT(A)?


 Database is consistent as if T never executed.
 What happens if system crashes before OUTPUT(B)?
 Database is in inconsistent state.
03/20/2024

Log Manager: Log Records

 Log manager:
 Maintains log records
 Logs are initially stored in buffers and sent to disk when needed
 Helps maintaining consistent database state

 What is a log record?


 A log records stores what the transaction has done
 Stored in non-volatile storage (e.g, disk).
 Can be used to recover database from crash.
 Log records are initially written to log blocks (memory buffers allocated for logs)
 Log blocks are written to disk when feasible

03/20/2024
Types of Log Records
 <>: Transaction T has begun.
 <>: Transaction has completed successfully.
 Any changes made by T must be stored in disk.
 What happens if buffers are not copied to disk?
 Log manager ensures changes will be reflected.
 <>: Transaction T has not completed successfully. Any changes made by T must not be
stored in disk.
 What if buffer manager writes updated buffers to disk?
 Log manager ensures changes will not be reflected.
 Delayed write may be used to reduce disk I/O.

03/20/2024

Types of Log Records

 <>: Transaction T has changed element X whose previous value was v.


 Changes normally occur in memory buffers first.
 This log record is used by “Undo Logging” mechanism.
 This log record is a response to WRITE(X,t) not OUTPUT(X).
 New value of X is not stored in log records for undo logging.
 Why? [we shall see soon]

 <>: Transaction T has changed element X whose new value is v.


 This log record is used by “Redo Logging” mechanism
 Old value of X is not stored in log records for redo logging.
 Why? [we shall see soon]

03/20/2024

Undo Logging: Rules

 : If a transaction T modifies database element X, then the log record of the form <>
must be written to disk before updated X is written to disk.

 : If a transaction commits, then its < T> log record must be written to disk after all
updated elements have been written to disk.
 Summery of and : The log records and updates be written to disk in the following
order:
a) <> Log records.
b) Updated elements X.
c) The <> log record
 Constraint of ordering (a) and (b) apply to individual X [order is not necessary for all
database elements combined changed by a transaction].

03/20/2024

Undo Logging: Rules

 Consider the transaction and log record processing:


(1) <> is written to log.
(4) <is written to log.
(7) <> is written to log.
(8) FLUSH LOG flushes logs to the disk
(9-10) Updated A and B are written to disk.
(11) <> written to log.
(12) FLUSH LOG flushes <COMMIT T>
log record to disk.

 Note that before writing <COMMIT T> log record to disk, updated A and B must be written to
disk [Undo logging policy]
03/20/2024

Commit of Transactions

 What does it mean when a transaction is successful?


 COMMIT/ABORT record have been written to log and flushed to disk.
 User is confirmed of transaction completion (commit/abort).
 This is known as synchronous commit protocol.
 Prevents data loss during system crash. [How?]

 What is COMMIT/ABORT have written to log memory only?


 User is confirmed of transaction completion (commit/abort).
 COMMIT is only written to memory log but not reflected on disk.
 This is known as asynchronous commit.
 Data loss can happen due to system crash. [How?]
 In both cases, database remains consistent. [How?]

03/20/2024

Recovery Using Undo Logging

 Suppose a system failure occurs during a running transaction T.

 Problem: Some changes written to disk but some were not.


 Transaction did not completed.
 All partial changes (reflected on disk) need to be reversed.

 Solution: Recover manager uses log records to revert the partial changes reflected by
T.
 After all partial changes are reversed, database returns to a consistent state.

03/20/2024

Recovery Using Undo Logging

 Step 1: Recovery manager decides whether a transaction T has committed or not.

 Step 2a: T is a committed transaction:


 If there is a <> log record in disk for transaction T, then T is considered as
committed.
 According to "Undo Logging" policy all updates have been written to disk.
 T have not left database in inconsistent state.
 No need to do anything for committed transaction by the recovery manager.

03/20/2024

Recovery Using Undo Logging

 Step 2b: T is an uncommitted transaction:


 If there is <> but no <> log record in disk for transaction T.
 T is considered as uncommitted (aborted).
 However, some updates may have been written to disk.
 T is an incomplete transaction and may have left database in inconsistent state .
 Recovery manager checks all log records of the form <>
 “Undo Logging” policy ensures that if updated X were written to disk, then <> log
record was also written to disk.
 Recovery manager reverts the value of elements X to old value v.
 Restores database in consistent state.

03/20/2024

Recovery Manager Operation

 Log records scanning order:


 Several uncommitted transactions in the log.
 Order of reverting all <T,X,v> is important! [ must be backward from the end ]

 Algorithm for recovery:


 Scan log records backwards from the end and perform the following for each log
record:
 If is <>, then record T as committed.
 If is <> and T is committed, do nothing
 If is <> and T is uncommitted, revert X to its previous value v.
 At the end, write <> in logs for all uncommitted transaction, and flush this log to
the disk. [why needed?]
03/20/2024

Crashes During Recovery

 What happens if system crashes again during recovery?


 Recovery operation is idempotent.
 Remember <> reverts X to its previous value v. What if it is done multiple times?
 Same effect as if done once.
 It does not matter the current value of X while rewriting v, we just write.
 Recovery operation can be repeated multiple times and results are same.
 Thus, if system crashes again during recovery, recovery process can be started
again.

03/20/2024

Recovery Manager Operation

 Analyze recover manager operation for the following cases:


 System crashes after step 12.
 Stem crashes before step 11.
 System crashes before step 6.
 System crashes before step 4.

03/20/2024

Checkpointing

 Log records can become very large.


 Difficult to scan full log records after a system crash.
 Use checkpointing mechanism:
 Log records before checkpoint need not be examined during recovery.
 All transactions before the checkpoint must have COMMIT or ABORT record on
the log.

03/20/2024

Checkpointing

 Checkpointing Algorithm:
 Step 1: Stop accepting new transactions.
 Step 2: Wait until all active transactions commit or abort.
 Step 3: Flush the logs to the disk.
 Step 4: Write a log record <> and flush this log to disk.
 Step 5: Resume accepting new transactions.

 At system failure, recovery manager only needs to examine log records up to a <>
point.
 All transactions before <> have a or for corresponding transaction in the disk.
 or ensures transaction reflected successfully in database.

03/20/2024
Nonquiescent Checkpointing
 Problem with normal checkpointing:
 All new transactions must wait until all existing transactions complete.
 Effectively shuts down the database for a long time if some active transactions take a long
time to complete
 Solution: Nonquiescent checkpointing.
 Algorithm for nonquiescent checkpointing:
 Step 1: Write a log record <> and flush the logs. are all active transactions.
 Step 2: Wait until finish. But don't prohibit new transactions from starting.
 Step 3: When finishes, write a log record <> and flush the logs.

03/20/2024
Recovery With Nonquiescent
Checkpointing
 Recovery process algorithm:
 Recover manager scans log records backward from the end
 Processes log records as like before reverting changes if necessary.
 Stop scanning log records as follows:
 Case 1: If it finds an <>, then all incomplete transactions began after <>. So
scanning can stop when <> is encountered.
 Case 2: If it finds <>, then it knowns only were active. So scan can stop at the
earliest <> record of .

03/20/2024

You might also like