0% found this document useful (0 votes)
5 views

Hashing and Types of Files

Uploaded by

marumbomwanaisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Hashing and Types of Files

Uploaded by

marumbomwanaisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

File Organization in DBMS

+
Index & Hashing in DBMS
GROUP NO 5
Introduction to File Organization

Definition: File organization refers to the way data is


stored in a database.
Importance: Efficient file organization is crucial for
quick data retrieval, efficient storage utilization, and
overall system performance.
Objectives

● To understand the different types of file organization.


● To learn the advantages and disadvantages of each type.
● To know how file organization impacts database
performance.
Types of File Organization

1. Heap (Unordered) File Organization


2. Sequential File Organization
3. Hashing File Organization
4. Clustered File Organization
5. Indexing File Organization
Heap (Unordered) File Organization

Description: Data is stored in the order it is inserted.

Advantages:

● Simple and easy to implement.


● Efficient for bulk loading of data.

Disadvantages:

● Slow retrieval as a linear search is required.


● Inefficient use of storage if many deletions occur.
Sequential File Organization

Description: Data is stored in a sequential order based on a key field.

Advantages:

● Efficient for range queries.


● Easier to implement binary search for fast retrieval.

Disadvantages:

● Insertion, deletion, and updates can be costly.


● Requires reorganization to maintain order.
Hashing File Organization

Description: Uses a hash function to determine the location of data.

Advantages:

● Very fast access for exact match queries.


● Efficient for large databases.

Disadvantages:

● Not suitable for range queries.


● Collisions can affect performance and require additional handling (e.g., chaining, open
addressing).
Clustered File Organization

● Description: Related records are grouped and stored together based


on a clustering field.
● Advantages:
○ Improves performance for related data retrieval.
○ Efficient use of I/O operations.
● Disadvantages:
○ Complexity in managing and maintaining clusters.
○ Can lead to wasted space if clustering is not properly managed.
Indexing File Organization

Description: Uses an index to quickly locate data records without searching the
entire file.

Types of Indexes:

● Single-level Index: Simple index where each entry points to a data block.
● Multi-level Index: Indexes of indexes, useful for very large datasets.
● B-tree Index: Balanced tree structure, widely used in databases.
Indexing File Organization

Advantages:

● Significant speedup in data retrieval.


● Supports both exact match and range queries.

Disadvantages:

● Additional storage for indexes.


● Overhead of maintaining indexes during insertions, deletions, and updates.
Introduction to Indexing and Hashing

Definition: Techniques used to optimize the speed of


data retrieval in a database.
Importance: Critical for enhancing the performance and
efficiency of database queries.
Objectives

● Understand the concepts of indexing and hashing.


● Learn the types and techniques of indexing and hashing.
● Identify the advantages and disadvantages of each
method.
● Explore practical use cases.
What is Indexing?

Description: A data structure that improves the


speed of data retrieval operations on a database
table.
Purpose: To quickly locate and access the data
without searching every row in a database table.
Types of Indexes

1. Primary Index
2. Secondary Index
3. Clustered Index
4. Non-Clustered Index
5. Unique Index
6. Composite Index
Primary Index

● Description: An index on a set of fields that


includes the primary key for the table.
● Advantages: Ensures uniqueness of data.
● Disadvantages: Only one primary index per table.
Secondary Index

Description: An index that is not a primary index and can be


created on non-primary key fields.
Advantages: Allows for efficient access to data based on non-
key attributes.
Disadvantages: Requires additional storage and maintenance.
Clustered Index

Description: Sorts the data rows in the table on their


key values. Only one clustered index per table.
Advantages: Improves performance of range queries.
Disadvantages: Expensive to maintain for insertions
and deletions.
Non-Clustered Index

Description: Contains a sorted list of references to the table


data, separate from the actual table.
Advantages: Multiple non-clustered indexes can exist per
table.
Disadvantages: Slower than clustered index for range
queries.
Advantages of Indexing

● Faster data retrieval.


● Efficient for range queries.
● Improves overall database performance.
Disadvantages of Indexing

● Additional storage space required.


● Overhead for index maintenance during data
modifications.
● Can slow down write operations (insertions, updates,
deletions).
What is Hashing?

Description: A technique to directly map a key to its


location in the storage, using a hash function.
Purpose: To provide constant-time access to data for
exact match queries.
Types of Hashing

1. Static Hashing
2. Dynamic Hashing
Static Hashing

Description: Fixed number of primary pages. Hash


function maps search-key values to the set of pages.
Advantages: Simple and easy to implement.
Disadvantages: Performance degrades as the dataset
grows (overflow pages).
Dynamic Hashing

● Description: The hash function is dynamically


modified to accommodate the growth of the database.
● Advantages: Scalable and handles growing datasets
efficiently.
● Disadvantages: More complex to implement and
manage.
Hash Functions

Definition: Function that converts input into a fixed-


size string of bytes.
Properties: Deterministic, uniform distribution, fast
computation, and minimal collisions.
Handling Collisions

Chaining: Each bucket in the hash table points to a


linked list of records.
Open Addressing: Searches for the next free slot
within the hash table using techniques like linear
probing, quadratic probing, or double hashing.
Advantages of Hashing

● Very fast data retrieval for exact match queries.


● Efficient use of storage space.
● Simple implementation for static hashing.
Disadvantages of Hashing

● Not suitable for range queries.


● Potential for collisions, requiring collision
resolution techniques.
● Dynamic hashing can be complex to implement.

You might also like