Mod 4
Mod 4
What is an Index?
An index in a database is a data structure that enhances the speed of data retrieval operations on a
table. It is akin to the index at the back of a book, allowing you to quickly locate the desired content
without scanning every page.
Purpose of an Index
Caveats:
An entry-sequenced file stores records in the order they are entered, without any logical relationship
between them. A simple index is used to quickly locate records in such files by creating an auxiliary
structure.
1. Structure:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/4
A simple index is a table containing two fields:
Key field: The value used to identify a record (e.g., ID or Name).
Pointer field: A reference to the physical address (record location) in the file.
2. Indexing Process:
The index is created by scanning the file and storing key-pointer pairs for each record in the
index table.
The index is sorted by the key values for faster search.
Benefits:
1. Faster Search:
Instead of scanning the entire file, the index allows direct access to the desired record.
2. Efficient Updates:
For insertion or deletion, only the index needs adjustment.
Example:
101 Address 1
102 Address 2
Limitations:
As file size grows, the index also grows, requiring more memory.
The index must be updated with every record addition or deletion.
When an index becomes too large to fit in memory, efficient techniques are used to manage and access
it without compromising performance. Below are key aspects:
Challenges:
1. Memory Constraints:
Large datasets can lead to index structures that exceed the available memory.
Frequent disk I/O is required, which can slow down query execution.
2. Performance Overhead:
Scanning or updating large indexes stored on disk increases latency.
1. Multi-Level Indexing:
Breaks the index into smaller parts.
Uses a hierarchy, such as a two-level or three-level index, where higher-level indexes fit in
memory, and lower levels reside on disk.
2. B-Tree and B+ Tree Indexes:
Disk-optimized structures that minimize disk reads.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/4
B+ Trees store data in leaf nodes and maintain balanced height for efficient retrieval.
3. Partitioned Indexing:
Splits the index into smaller segments based on criteria like range or hash values.
Each partition can be loaded or queried independently.
4. Clustered Indexes:
Store the data physically sorted, reducing the need for large separate indexes.
5. Use of Cache:
Frequently accessed parts of the index are cached in memory.
Conclusion:
Disk-based index optimization techniques like B-Trees and multi-level indexing help manage large
indexes, balancing memory usage and query performance.
Indexing for multiple keys allows efficient retrieval of records based on combinations of multiple
columns (keys). This is particularly useful in scenarios where queries involve multiple criteria.
1. Composite Index:
A single index is created on multiple columns.
The index is sorted based on the combination of keys in the specified order.
Useful for queries like `WHERE key1 = X AND key2 = Y`.
2. Separate Indexes on Each Key:
Separate indexes are created for each key.
Database optimizers combine these indexes to filter results efficiently.
Useful for independent queries on each key.
3. Bitmap Index:
Uses bitmaps to represent key values and their presence in rows.
Efficient for combining multiple keys using logical operations like AND, OR.
Given a table with columns `City` and `Age`, a composite index on `(City, Age)` allows:
Benefits:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/4
Considerations:
Composite indexes are order-sensitive; the sequence of keys matters for optimal use.
A secondary key is a non-primary attribute used to retrieve records from a database. Unlike primary
keys, secondary keys do not uniquely identify a record but allow multiple records to share the same
value. Retrieval using combinations of secondary keys involves accessing records based on multiple non-
primary attributes.
Process of Retrieval:
Example:
Benefits:
1. Efficient Filtering:
Combines indexes to narrow down results without scanning the entire table.
2. Scalability:
Useful for large datasets with complex queries.
Considerations:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/4