0% found this document useful (0 votes)
51 views

Unit 6 File Indexing and Transaction Processing

According to DU BSC CS Syllabus for DBMS

Uploaded by

avinash.22512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Unit 6 File Indexing and Transaction Processing

According to DU BSC CS Syllabus for DBMS

Uploaded by

avinash.22512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

💻

Unit 6: File indexing and


Transaction Processing
16.6 Files of Unordered Records (Heap Files)

Overview
Heap File Definition:

The simplest type of file organization where records are placed in the
order they are inserted.

New records are added at the end of the file.

Also known as a heap or pile file.

Often used with additional access paths like secondary indexes for
efficient searching and retrieval.

Insertion of Records
Efficiency:

Very efficient due to the simplicity of appending records.

The last disk block of the file is copied into a buffer, the new record is
appended, and the block is rewritten back to the disk.

Unit 6: File indexing and Transaction Processing 1


The address of the last file block is stored in the file header.

Searching for Records


Search Procedure:

Involves a linear search through the file block by block, making it an


expensive operation.

On average, if one record satisfies the search condition, half of the file
blocks must be read and searched.

For a file with \( b \) blocks, this requires searching 2b  blocks on


average.

If no records or multiple records satisfy the search condition, all \( b \)


blocks must be searched.

Deleting Records
Procedure:

Locate the block containing the record, copy the block into a buffer,
delete the record from the buffer, and rewrite the block back to the disk.

This leaves unused space in the disk block, leading to wasted storage if
many records are deleted.

Deletion Marker:

An alternative method involves using an extra byte or bit (deletion


marker) stored with each record.

A specific value of the marker indicates a deleted record.

Search programs only consider records without the deletion marker.

Reorganization
Purpose:

Periodic reorganization is needed to reclaim unused space from deleted


records.

During reorganization, file blocks are accessed sequentially, and


records are packed by removing deleted records.

Blocks are filled to capacity after reorganization.

Unit 6: File indexing and Transaction Processing 2


Alternative Use of Deleted Space:

Space of deleted records can be reused for new records, requiring


additional bookkeeping to track empty locations.

Record Structure
Spanned vs. Unspanned Organization:

Unordered files can use spanned (records can span multiple blocks) or
unspanned (records are confined to a single block) organization.

Applicable to both fixed-length and variable-length records.

Modification of Records:

Modifying a variable-length record may necessitate deletion and


reinsertion if the modified record does not fit in the original space.

Reading Records in Sorted Order


Creating a Sorted Copy:

Reading records in order of some field values requires creating a sorted


copy of the file.

Sorting large disk files is expensive and requires special external


sorting techniques.

Direct Access
Unordered Fixed-Length Records:

Using unspanned blocks and contiguous allocation, direct access by


record position is straightforward.

If records are numbered 0, 1, 2, ..., r − 1, and records in each block are
numbered 0, 1, ..., bfr − 1 (where bfr is the blocking factor), the i-th
record is located in block ⌊ bfi r ⌋and is the (i mod bfr)-th record in

that block.

Such files are often called relative or direct files due to direct access by
relative positions.

Access Paths:

Unit 6: File indexing and Transaction Processing 3


Direct access by position facilitates constructing access paths such as
indexes, improving search and retrieval efficiency.

16.7 Files of Ordered Records (Sorted Files)

Overview
Ordered (Sequential) File Definition:

Records are physically ordered on disk based on the values of an


ordering field.

If the ordering field is a key field (unique value in each record), it is


called the ordering key.

Often referred to as a sequential file when ordered by an ordering


key.

Advantages
Efficient Reading:

Reading records in order of the ordering key is efficient because no


sorting is required.

Facilitates efficient sequential access and search conditions like key

= value or range conditions such as value1 < key < value2 .

Next Record Access:

Finding the next record in order usually requires no additional block


accesses unless the current record is the last in the block.

Binary Search:

Binary search can be used on blocks rather than records, improving


search efficiency.

For a file with \( b \) blocks, a binary search accesses \( \log_2(b) \)


blocks on average, which is better than the linear search requiring \(
\frac{b}{2} \) blocks on average.

Search Efficiency
Range Search:

Unit 6: File indexing and Transaction Processing 4


Efficient due to the contiguous nature of records that satisfy the
search condition.

Random Access:

Provides no advantages for random or ordered access based on


nonordering fields.

Requires a linear search for random access.

Insertion and Deletion


Insertion:

Expensive as records must remain ordered.

Requires finding the correct position and shifting records, which is


time-consuming for large files.

On average, half the records must be moved, involving reading and


rewriting blocks.

Deletion:

Less severe if deletion markers and periodic reorganization are


used.

Efficiency Improvement:

Keep unused space in each block for new records.

Use an overflow or transaction file for new records, periodically


merging it with the main file to maintain order.

Record Modification
Nonordering Field:

Modified by changing and rewriting the record in its original


location.

Ordering Field:

Modification requires deletion of the old record and insertion of the


modified record due to position change.

Reading and Reorganization


Sequential Reading:

Unit 6: File indexing and Transaction Processing 5


Efficient if ignoring overflow records.

Double buffering can be used to read blocks consecutively.

Including Overflow:

Requires merging overflow records into the correct positions.

Periodic reorganization sorts and merges the overflow file with the
master file, removing records marked for deletion.

Use Cases and Limitations


Primary Index:

Ordered files are often used with an additional access path called a
primary index, resulting in an indexed-sequential file.

Improves random access time on the ordering key field.

Clustered File:

If the ordering attribute is not a key, the file is called a clustered file.

Summary
Access Time:

Average access time in block accesses for a specific record in a file


with \( b \) blocks.

Use in Databases:

Rarely used unless combined with a primary index.

Table 16.3: Summarizes average access times in block accesses to find a


specific record.

Key Points

Unit 6: File indexing and Transaction Processing 6


Ordered records provide efficient sequential and range-based access.

Binary search on blocks enhances search efficiency.

Insertions and deletions are costly operations due to the need to


maintain order.

Combining with primary indexes (indexed-sequential files) enhances


performance in database applications.

17.1 Types of Single-Level Ordered Indexes

Introduction to Ordered Indexes


Concept: Similar to a textbook index which lists important terms in
alphabetical order with page numbers for easy lookup.

Purpose: Allows efficient access to records using indexing fields


without scanning the entire file linearly.

Structure: Index stores values of the index field along with pointers to
disk blocks containing those values, allowing binary search on the
index.

Key Fields and Indexing


Indexing Field: Single field used to create an index, storing values in an
ordered manner for efficient searching.

Binary Search: Performed on the index, which is smaller than the data
file, enhancing search efficiency.

Multilevel Indexes: Extensions of binary search, partitioning search


space into n-ways for better performance.

Types of Ordered Indexes


1. Primary Index:

Defined on the ordering key field of an ordered file.

Unique values for the ordering key field ensure each record is
distinctly accessible.

Unit 6: File indexing and Transaction Processing 7


The index entry consists of the ordering key and a pointer to the
disk block containing the first record with that key value.

2. Clustering Index:

Used when the ordering field is not a key field (multiple records can
have the same value).

The file is clustered based on the ordering field.

Only one clustering index per file is allowed, like the primary index.

Unit 6: File indexing and Transaction Processing 8


Unit 6: File indexing and Transaction Processing 9
3. Secondary Index:

Defined on non-ordering fields.

Allows multiple indexes on different fields to enhance access


efficiency.

Can exist alongside a primary or clustering index.

Unit 6: File indexing and Transaction Processing 10


Primary Indexes
Structure:

Ordered file with fixed-length records, consisting of the primary key


and a pointer to a disk block.

Each entry corresponds to the first record in a block (anchor


record).

Index Entries:

<K(i), P(i)> where K(i) is the primary key and P(i) is the block
pointer.

Can point to the physical address, record address, or logical


address of the block or record.

Example:

Unit 6: File indexing and Transaction Processing 11


Index entry: <K(1) = "Aaron, Ed", P(1) = address of block 1>

Sparse vs. Dense Indexes:

Dense Index: Index entry for every search key value.

Sparse Index: Index entry for some search key values (primary
index is typically sparse).

Efficiency:

Smaller index size compared to the data file.

Requires fewer block accesses for search operations.

Example calculation shows the reduction in block accesses from 13


to 6 when using a primary index.

Insertion and Deletion Issues:

Insertion: Inserting a new record may require moving records and


updating index entries.

Overflow Handling: Use of unordered overflow files or linked lists


of overflow records to manage insertions and deletions.

Deletion: Handled using deletion markers.

Summary
Primary Index: Efficient for ordered files with unique key fields,
enhancing search performance but complex to manage for insertions
and deletions.

Clustering Index: Used for non-key ordered fields with potential for
multiple records sharing the same field value.

Secondary Index: Flexible access method for non-ordering fields,


supporting multiple indexes for varied query optimization.

Understanding the different types of single-level ordered indexes and their


applications helps in designing efficient database systems, balancing
search performance with manageability.

Unit 6: File indexing and Transaction Processing 12


17.2 Multilevel Indexes
Purpose: Reduce search space in an index more efficiently than a
binary search.

Concept:

Fan-out (fo): The blocking factor for the index, reducing the search
space faster than binary search (n-ways instead of 2-ways).

Example: With a 4,096-byte block size and entries (4-byte block


pointer + 9-byte key), the fan-out can be up to 315.

Structure:

First Level (Base Level): Index file with distinct values.

Second Level: Primary index for the first level, using block anchors
(one entry per block of the first level).

Subsequent Levels: Each level indexes the previous one, reducing


entries by a factor of fo.

Calculation:

Number of Blocks per Level:

First Level: \( r_1 \) entries.

Unit 6: File indexing and Transaction Processing 13


Second Level: \( r_2 = \lceil \frac{r_1}{fo} \rceil \).

Continue until the entries fit in a single block.

Number of Levels (t): \( t = \lceil \log_{fo} (r_1) \rceil \).

Advantages:

Fewer block accesses compared to binary search.

Example: Searching a multilevel index might require 4 block


accesses compared to 12 in a single-level binary search.

File Organization:

Common in business data processing (e.g., IBM's ISAM).

Indexed sequential file: ordered file with multilevel primary index.

Search Algorithm:

Algorithm 17.1: Searches a nondense multilevel primary index.

Read the top-level block.

Iterate through index levels, updating pointer (p) to next block


based on key comparison.

Finally, read the data file block and search for the record.

Dynamic Multilevel Index:

To handle insertions/deletions, dynamic multilevel indexes (e.g., B-


trees, B+-trees) leave space in blocks and use specific algorithms
for growing/shrinking the index.

20.1 Introduction to Transaction Processing


20.1.1 Single-User versus Multiuser Systems

Single-User DBMS:

Only one user can use the system at a time.

Typically found in personal computer systems.

Multiuser DBMS:

Allows many users to access the system and database


concurrently.

Unit 6: File indexing and Transaction Processing 14


Common in applications like airline reservations, banking, and
supermarkets.

Concurrency in Multiuser Systems:

Achieved through multiprogramming, allowing multiple processes to


execute simultaneously.

Processes are interleaved by the operating system to keep the CPU


busy.

Multiple hardware processors enable parallel processing.

Database Access in Multiuser Systems:

Users or programs constantly access and modify database items


concurrently.

20.1.2 Transactions, Database Items, Read and Write Operations, and


DBMS Buffers

Transaction:

Logical unit of database processing.

Includes read, write, insert, delete, or retrieval operations.

Database Items:

Represented as named data items.

Can be of varying granularity, from a whole disk block to a single


attribute value.

Read and Write Operations:

read_item(X) : Reads database item X into a program variable.

: Writes the value of program variable


write_item(X) X into the
database item X .

Database Buffer Management:

Buffers in main memory hold disk pages.

Buffer replacement policies determine which buffers are replaced


when new disk pages are needed.

20.1.3 Why Concurrency Control Is Needed

Unit 6: File indexing and Transaction Processing 15


Concurrency Problems:

Lost Update: Interleaved operations overwrite changes made by


another transaction.

Temporary Update (Dirty Read): Transaction reads uncommitted


changes made by another transaction.

Incorrect Summary: Aggregate functions may calculate incorrect


values due to interleaved updates.

Unrepeatable Read: A transaction reads the same item twice and


receives different values due to updates by another transaction.

20.1.4 Why Recovery Is Needed

Types of Failures:

Transaction Failures: Computer/system errors, transaction errors,


concurrency control enforcement.

Disk Failures: Read/write malfunctions, head crashes.

Physical Problems and Catastrophes: Power failure, fire, theft,


sabotage.

Unit 6: File indexing and Transaction Processing 16


Importance of Recovery:

System must recover from failures to ensure database integrity and


consistency.

Recovery techniques are crucial for maintaining database reliability.

20.2 Transaction and System Concepts

20.2.1 Transaction States and Additional Operations


Transaction: An atomic unit of work, either fully completed or not done
at all.

Key Operations:

BEGIN_TRANSACTION: Marks the start of a transaction.

READ/WRITE: Operations on database items within the transaction.

END_TRANSACTION: Marks the end of transaction operations;


checks for commit or abort.

COMMIT_TRANSACTION: Successfully ends the transaction,


making changes permanent.

ROLLBACK/ABORT: Ends the transaction unsuccessfully, undoing


any changes.

Transaction States:

Active: Transaction starts execution and performs READ/WRITE


operations.

Partially Committed: Transaction has ended, but final checks (e.g.,


for serializability) are pending.

Committed: Transaction successfully completes, and changes are


made permanent.

Failed: Transaction encounters an error and must be rolled back.

Terminated: Transaction finishes, and system information is


removed. Failed transactions may restart.

State Transition Diagram:

States: Active, Partially Committed, Committed, Failed, Terminated.

Unit 6: File indexing and Transaction Processing 17


Transitions: Begin, End, Commit, Abort, Read/Write operations.

20.2.2 The System Log


Purpose: Track transaction operations and database changes for
recovery.

Log Characteristics:

Sequential, append-only file on disk.

Log buffers in main memory hold recent entries, written to disk


when filled or needed.

Periodic backups to archival storage.

Log Records:

[start_transaction, T]: Marks the start of transaction T.

[write_item, T, X, old_value, new_value]: Records change of item X


by transaction T.

[read_item, T, X]: Records read of item X by transaction T.

[commit, T]: Marks the successful completion of transaction T.

[abort, T]: Marks the abortion of transaction T.

Recovery:

Undo changes by tracing backward through log records.

Redo operations if a transaction’s updates are recorded but not yet


applied to the database.

20.2.3 Commit Point of a Transaction

Unit 6: File indexing and Transaction Processing 18


Commit Point: Transaction reaches this point when all operations are
successfully executed, and effects are logged.

Commit Process:

Write a commit record [commit, T] to the log.

Force-write log buffer to disk to ensure all log entries are saved.

Ensure transaction effects are permanently recorded in the


database.

Recovery from Failure:

Transactions without a commit record are rolled back.

Transactions with a commit record have their changes redone if


necessary.

20.2.4 DBMS-Specific Buffer Replacement Policies


Purpose: Efficiently manage disk pages in main memory buffers.

Policies:

Domain Separation (DS): Separate buffers for different disk page


types (index, data, log). Basic LRU used within domains.

Hot Set Method: Keeps frequently accessed pages in memory


during specific operations (e.g., nested-loop joins).

DBMIN Method: Uses Query Locality Set Model (QLSM) to predict


page references and allocate buffers accordingly.

Implementation:

DS achieves better performance on average, but is static.

Variations like GRU add dynamic load-balancing.

DBMIN allocates buffers based on the estimated needs for each file
instance in a query.

20.3 Desirable Properties of Transactions


In database management, transactions should possess several properties
collectively known as the ACID properties. These properties ensure that
transactions are processed reliably and that the integrity of the database is
maintained. The ACID properties are:

Unit 6: File indexing and Transaction Processing 19


1. Atomicity:

A transaction is an indivisible unit of processing. It should be


executed in its entirety or not executed at all.

Responsibility: Transaction recovery subsystem of a DBMS.

If a transaction fails (e.g., due to a system crash), the recovery


technique must undo any effects of the transaction on the database.
Conversely, write operations of a committed transaction must
eventually be written to disk.

2. Consistency Preservation:

A transaction should transform the database from one consistent


state to another, ensuring that integrity constraints are not violated.

Responsibility: Database programmers and the DBMS module


enforcing integrity constraints.

Definition: A consistent database state satisfies all specified


constraints and conditions. If the database is consistent before a
transaction starts, it should remain consistent after the transaction
completes, provided no interference from other transactions occurs.

3. Isolation:

Transactions should appear to execute in isolation from each other,


even though they may be executing concurrently.

Responsibility: Concurrency control subsystem of the DBMS.

Explanation: Isolation ensures that the execution of a transaction is


not affected by other concurrently executing transactions. One
method to enforce isolation is to prevent a transaction's updates
from being visible to others until it is committed, solving the
temporary update problem and eliminating cascading rollbacks.

4. Durability (Permanency):

Changes applied to the database by a committed transaction must


persist, meaning they must not be lost due to any failure.

Responsibility: Recovery subsystem of the DBMS.

Explanation: Durability ensures that once a transaction commits, its


changes are permanent and will survive subsequent system

Unit 6: File indexing and Transaction Processing 20


failures.

Levels of Isolation
Isolation levels define the degree to which a transaction must be isolated
from the data modifications made by other transactions. The isolation
levels, from least isolated to most isolated, include:

Level 0 Isolation:

The transaction does not overwrite dirty reads of higher-level


transactions.

Level 1 Isolation:

Prevents lost updates.

Level 2 Isolation:

Prevents lost updates and dirty reads.

Level 3 Isolation (True Isolation):

Ensures repeatable reads in addition to preventing lost updates and


dirty reads. This means that if a transaction reads a data item, it will
read the same value in subsequent reads within the same
transaction, ensuring that no other transaction can modify the data
item during this period.

Snapshot Isolation:

Based on the idea of providing each transaction with a "snapshot"


of the database at a particular point in time, ensuring that it
operates on a consistent set of data. This approach is discussed in
detail in Section 20.6 and again in Chapter 21, Section 21.4.

Summary
The ACID properties—Atomicity, Consistency, Isolation, and Durability—are
critical for ensuring the reliability and integrity of database transactions. By
enforcing these properties, the DBMS can handle concurrent transactions,
maintain consistency, and recover from failures while ensuring that the
database remains in a valid state. Different levels of isolation offer varying
degrees of transaction isolation, balancing between performance and the
strictness of isolation.

Unit 6: File indexing and Transaction Processing 21

You might also like