FS Lecture
FS Lecture
1
Indexed Sequential Access
• Up to this point, we have had to choose between
viewing a file from an indexed point of view or from a
sequential point of view.
• Here, we are looking for a single organizational method
that provides both of these views simultaneously.
• Why care about obtaining both views simultaneously? If
an application requires both interactive random access
and cosequential batch processing, both sets of actions
have to be carried out efficiently. (E.g., a student record
system at a University).
3
Maintaining a Sequence Set: The
Use of Blocks II
• Using blocks, we can thus keep a sequence set in order
by key without ever having to sort the entire set of
records.
• However, there are certain costs associated with this
approach:
– A Blocked file takes up more space than an
unblocked file because of internal fragmentation.
• The order of the records is not necessarily physically
sequential throughout the file. The maximum
guaranteed extent of physical sequentiality is within a
block.
4
Maintaining a Sequence Set: The
Use of Blocks III
• An important aspect of using blocks is the choice
of a block size. There are 2 considerations to keep
in mind when choosing a block size:
– The block size should be such that we can hold
several blocks in memory at once
– The block size should be such that we can
access a block without having to bear the cost
of a disk seek within the block read or block
write operation.
5
Adding a Simple Index to the
Sequence Set
• Each of the blocks we created for our Sequence Set
contains a range of records that might contain the
record we are seeking.
• We can construct a simple single-level index for these
blocks.
• The combination of this kind of index with the
sequence set of blocks provides complete indexed
sequential access. This method works well as long as
the entire index can be held in memory.
• If the entire index cannot be held in memory, then we
can use a B+ Tree which is a B-Tree index plus a
sequence set that holds the records.
6
The Content of the Index:
Separators Instead of Keys
• The index serves as a kind of road map for for the
sequence set ==> We do not need to have keys in
the index set.
• What we really need are separators capable of
distinguishing between two blocks.
• We can save space by using variable-length
separators and placing the shortest separator in the
index structure.
• Rules are: Key < separator ==> Go left .
Key = separator ==> Go right .
Key > separator ==> Go right
7
The Simple Prefix B+ Tree
• The separators we just identified can be formed
into a B-Tree index of the sequence set blocks and
the B-Tree index is called the index set.
• Taken together with the sequence set, the index set
forms a file structure called a simple prefix B+
Tree.
• “simple prefix” indicates that the index set
contains shortest separators, or prefixes of the
keys rather than copies of the actual keys.
8
Simple Prefix B+ Tree
Maintenance
• Changes localized to single blocks in the sequence set:
Make the changes to the sequence set and to the index set.
• Changes involving multiple blocks in the sequence set:
– If blocks are split in the sequence set, a new separator
must be inserted into the index set
– If blocks are merged in the sequence set, a separator
must be removed from the index set.
– If records are re-distributed between blocks in the
sequence set, the value of a separator in the index set
must be changed.
9
Index Set Block Size
• The physical size of a node for the index set is usually the same
as the physical size of a block in the sequence set. We, then,
speak of index set blocks, rather than nodes.
• There are a number of reasons for using a common block size for
the index and sequence sets:
– The block size for the sequence set is usually chosen because
there is a good fit among this block size, the characteristics of
the disk drive, and the amount of memory available.
– A common block size makes it easier to implement a
buffering scheme to create a virtual simple prefix B+Tree
– The index set blocks and sequence set blocks are often
mingled within the same file to avoid seeking between 2
separate files while accessing the simple prefix B+Tree.
10
Internal Structure of Index Set
Blocks: A Variable-Order B-Tree
• Given a large, fixed-size block for the index set, how
do we store the separators within it?
• There are many ways to combine the list of
separators, the index to separators, and the list of
Relative Block Numbers (RBNs) into a single index
set block.
• One possible approach includes a separator count
and keeps a count of the total length of separators.