L2.2-File Organization Techniques
L2.2-File Organization Techniques
Takes the last disk block into a buffer; Find the corresponding block;
Adds the new record to it; Copy the block into a buffer;
Rewrites the block back to the disk Delete the record from the buffer;
Is not efficient.
It needs scanning the file record by record. (linear
search)
Deletion leaves unused space in disk block resulting in wasted storage space
The files must be periodically re-organized to claim the unused space
Ordered files
Also known as a sequential file
Records in a file can be sorted based on the values of one or more fields called the
ordering field
2. If the value of the key field in the first record is greater than the required value, the required value, if it
exists, occurs on an earlier page.
Therefore, we repeat the above steps using the lower half of the file as the new search area.
If the value of the key field in the last page is less than the required value, the required value occurs on a
later page, and so we repeat the above steps using the top half of the file as the new search area.
Using Binary Search, half the search space is eliminated from the search with each page retrieved
Ordered files
Also known as a sequential file
Records in a file can be sorted based on the values of one or more fields called the
ordering field
But searching on the non-ordering field values is the Find the position to insert the record with
same as in the unordered file. k;
Deletion can be done efficiently Make space for the record with k;
A 8 5 12 6 15 9 4 3 7 10
0 1 2 3 4 5 6 7 8 9
A 3 4 5 6 7 8 9 10 12 15
0 1 2 3 4 5 6 7 8 9
Need for a technique that uses constant time O(1), hence hashing technique
Hash files
Hashing provides a function h, hash function which is applied to the hash field value of
a record and yields the address of the disk block in which the record is stored.
The base field is called the hash field, or if it is also a key of the file, it is called the hash
key.
Records in a hash file appear to be randomly distributed across the available space
Character strings are converted into integers before the function is applied using some
Hashing O(1)
Elements are placed at their own index such that 8 stored index 8
A 3 4 8 10 13
0 1 2 3 4 5 6 7 8 9 10 11 12 13
14
Space is wasted.
Mathematical model is developed to improve this technique
Hashing
Hash Function
Function maps key space to hash table h(x)=x.
To avoid wastage of space hash function is
improved using different methods (hash
algorithms):
h(x) = x % 10
Size of
hash table
Open addressing
Chained overflow
Multiple hashing
Unchained overflow
Hash files
Open addressing
If a collision occurs, the system performs a linear search to find the first available slot to
When the last bucket has been searched, the system starts back at the first bucket.
Searching for a record employs the same technique used to store a record, except that the
record is considered not to exist when an unused slot is encountered before the record is
located
Hash files
Open Addressing
Incase of a collision insert records to the next available slot.
0 1 2 3 4 5 6 7 8 9 10
Hash files
Chained Overflow
An overflow area is maintained for collisions that cannot be placed at the hash address.
With the chained overflow technique, each bucket has an additional field, sometimes
called a synonym pointer, that indicates whether a collision has occurred, and if so,
Minimize collision
Easy to calculate
It is not sufficient simply to scatter the records that represent tuples of a relation among various
blocks
Example 1:
SELECT * FROM R
If the tuples of the above relation R are placed in different blocks, we would have to examine every block in
the storage system to find the tuples
A better idea is to reserve some blocks, perhaps several whole cylinders, for relation R.
Now, at least we can find the tuples of R without scanning the entire data store
Indexing
Indexing
Reserving some blocks, perhaps several whole cylinders, for relation R may not work with queries
Example 2:
This query specifies the value for attribute a , hence each record will have to be compared to the value of a.
An index is any data structure that takes the value of one or more fields and finds the records with
Each index file associates values of the search key with pointers to data-file records that have that value
for the attribute(s) of the search key.
Classification of an Index
Dense Index
An index entry is created for every search key value. (every record in the data file )
Sparse Index
An index entry is created for only some of the search values
Classification of an Index
Dense Index
A dense index, is a sequence of blocks holding only the keys of the records and pointers
to the records themselves . The dense index supports queries that ask for records with a
given search key value.
Example :
Given key value K , we search the index blocks for K , and when we find it, we follow the associated pointer to the
record with key K .
It might appear that we need to examine every block of the index, or half the blocks of the index, on
average, before we find K .
However, there are several factors that make the index-based search more efficient than it seems.
Classification of an Index
Dense Index
Factors that make index based search more efficient include:
1. The number of index blocks is usually small compared with the number of data blocks.
2. Since keys are sorted, we can use binary search to find K. If there are n blocks of the index,
we only look at log 2 n of them.
3. The index may be small enough to be kept permanently in main memory buffers. If so, the
search for key K involves only main-memory accesses, and there are no expensive disk I/O’s
to be performed.
Classification of an Index
Dense Index
Multilevel index
B-tree
B+ tree
Types of Indexes
Single-level index
Index and data communicate directly
Index Data block
201
200 BP1
Multilevel index
The index are broken down into several indices
Single level indexes
Single-level index
Primary Indexing
file.
There is one index entry for each block in the data file.
Each index has a value of the primary key field for the first record in a block
Index entry is created for the first record in each block of data file is known as
Primary Index: A fixed length index with 2 fields: Number of index entries= number of disk blocks
• Ordering field (primary key)
• Pointer (disk block address)
Single-level Indexing
Primary Indexing
The total number of entries in the index is equal to the total number of disk
blocks
A clustered index is created to retrieve records that have the same value
for the clustering field.
A clustered index
contains the value
and a pointer to the
first block in the
data file that has a
record with that
value for its
clustering field
Single level indexes
Secondary Index
Provides a secondary means of accessing data files for which a primary means
already exists.
Data file records could be ordered, unordered, or hashed.
The secondary index could be created on a candidate key with unique values
in every record or a non key field with duplicate values.
The secondary index is an ordered file with two fields:
1. Indexing field (same datatype as some non-ordering field)
2. Block pointer or a record pointer
consecutive blocks.
For example, to retrieve all the records with search key 20, we
not only have to look at two index blocks, but we are sent by
a primary index
Multilevel Indexes
Multilevel Indexes
It reduces the search space by the blocking factor of the index (number of records per
block) , an improvement of binary search which reduces the search space by a factor of
2.
The blocking factor is also known as the fan-out of the multi-level index.
Multi-level index considers the index file, that is the first level (or base level) as an
level. This index to the first level is called the second level of the multi-level index.
Since second level is a primary index, we can use block anchors such that the second
level has one entry for each block of the first level.
The process in repeated until all entries of some level t fit in a single block called the
top index.
Each level reduces the number of entries by a factor of the index fan out.
Multi-level Index
Example: