Unit 6 File Indexing and Transaction Processing
Unit 6 File Indexing and Transaction Processing
Overview
Heap File Definition:
The simplest type of file organization where records are placed in the
order they are inserted.
Often used with additional access paths like secondary indexes for
efficient searching and retrieval.
Insertion of Records
Efficiency:
The last disk block of the file is copied into a buffer, the new record is
appended, and the block is rewritten back to the disk.
On average, if one record satisfies the search condition, half of the file
blocks must be read and searched.
average.
Deleting Records
Procedure:
Locate the block containing the record, copy the block into a buffer,
delete the record from the buffer, and rewrite the block back to the disk.
This leaves unused space in the disk block, leading to wasted storage if
many records are deleted.
Deletion Marker:
Reorganization
Purpose:
Record Structure
Spanned vs. Unspanned Organization:
Unordered files can use spanned (records can span multiple blocks) or
unspanned (records are confined to a single block) organization.
Modification of Records:
Direct Access
Unordered Fixed-Length Records:
If records are numbered 0, 1, 2, ..., r − 1, and records in each block are
numbered 0, 1, ..., bfr − 1 (where bfr is the blocking factor), the i-th
record is located in block ⌊ bfi r ⌋and is the (i mod bfr)-th record in
that block.
Such files are often called relative or direct files due to direct access by
relative positions.
Access Paths:
Overview
Ordered (Sequential) File Definition:
Advantages
Efficient Reading:
Binary Search:
Search Efficiency
Range Search:
Random Access:
Deletion:
Efficiency Improvement:
Record Modification
Nonordering Field:
Ordering Field:
Including Overflow:
Periodic reorganization sorts and merges the overflow file with the
master file, removing records marked for deletion.
Ordered files are often used with an additional access path called a
primary index, resulting in an indexed-sequential file.
Clustered File:
If the ordering attribute is not a key, the file is called a clustered file.
Summary
Access Time:
Use in Databases:
Key Points
Structure: Index stores values of the index field along with pointers to
disk blocks containing those values, allowing binary search on the
index.
Binary Search: Performed on the index, which is smaller than the data
file, enhancing search efficiency.
Unique values for the ordering key field ensure each record is
distinctly accessible.
2. Clustering Index:
Used when the ordering field is not a key field (multiple records can
have the same value).
Only one clustering index per file is allowed, like the primary index.
Index Entries:
<K(i), P(i)> where K(i) is the primary key and P(i) is the block
pointer.
Example:
Sparse Index: Index entry for some search key values (primary
index is typically sparse).
Efficiency:
Summary
Primary Index: Efficient for ordered files with unique key fields,
enhancing search performance but complex to manage for insertions
and deletions.
Clustering Index: Used for non-key ordered fields with potential for
multiple records sharing the same field value.
Concept:
Fan-out (fo): The blocking factor for the index, reducing the search
space faster than binary search (n-ways instead of 2-ways).
Structure:
Second Level: Primary index for the first level, using block anchors
(one entry per block of the first level).
Calculation:
Advantages:
File Organization:
Search Algorithm:
Finally, read the data file block and search for the record.
Single-User DBMS:
Multiuser DBMS:
Transaction:
Database Items:
Types of Failures:
Key Operations:
Transaction States:
Log Characteristics:
Log Records:
Recovery:
Commit Process:
Force-write log buffer to disk to ensure all log entries are saved.
Policies:
Implementation:
DBMIN allocates buffers based on the estimated needs for each file
instance in a query.
2. Consistency Preservation:
3. Isolation:
4. Durability (Permanency):
Levels of Isolation
Isolation levels define the degree to which a transaction must be isolated
from the data modifications made by other transactions. The isolation
levels, from least isolated to most isolated, include:
Level 0 Isolation:
Level 1 Isolation:
Level 2 Isolation:
Snapshot Isolation:
Summary
The ACID properties—Atomicity, Consistency, Isolation, and Durability—are
critical for ensuring the reliability and integrity of database transactions. By
enforcing these properties, the DBMS can handle concurrent transactions,
maintain consistency, and recover from failures while ensuring that the
database remains in a valid state. Different levels of isolation offer varying
degrees of transaction isolation, balancing between performance and the
strictness of isolation.