Module 4 PDF
Module 4 PDF
4 April 2020 1
Indexed Sequential Access
❖ When we design a file, the important issue is how will we retrieve
information from a file :
❖ Indexed: the file can be seen as a set of records that are indexed by keys
(accessing the specific record)
❖ Sequential : in the order they were entered.(one after the other)
❖ Indexed
4 April 2020 Sequential file = Indexed Sequential Access Method (ISAM)
2
Range Searches
❖ “Find all students with gpa > 3.0 “
❖ If data is in sorted file, do binary search to find first such
student, then scan to find others
❖ Cost of binary search can be quite high
4 April 2020 3
4 April 2020 4
Example ISAM Tree
40 Indexed
Block
20 33 51 63
10* 15* 20* 21* 33* 37* 40* 46* 51* 55* 63* 97*
Data Block
4 April 2020 5
Comments on ISAM
❖ File creation : Leaf(data) pages allocated sequentially, sorted
by search key.
❖ Then index pages allocated.
❖ Then space for overflow pages.
❖ Index entries : <search key value, page id>; they “direct”
search for data entries, which are in leaf pages
❖ Search : start at root, use key comparisons to go to leaf
❖ Insert : Find leaf where data entry belongs, put it there
❖ (could be on overflow pages)
❖ Delete : Find and remove from leaf.
❖ If empty overflow page, deallocate
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Data B
4 April 2020 7
After deleting (42*, 51*, 97*)
Root page
40 Indexed
Block
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 55* 63*
Data B
4 April 2020 8
Maintaining a Sequence Set
❖ A Sequence set is a set of records in physical key order
which is such that it stays ordered as records are added and
deleted. (ordered file )
❖ The Idea is to use the blocks that can be read into memory
and rearranged there quickly. Like in B-Trees, blocks can be
split, merged or redistributed as necessary.
4 April 2020 9
Maintaining a Sequence Set
❖ Using blocks, we can thus keep a sequence set in order
by key without ever having to sort the entire set of records
ADAMS…BAIRD…BIXBY…BOONE….
Block 1
BYNUM…CARSON…CARTER…
Block 2
DENVER…ELLIS… Block 3
Block 2
splits and Block 4
COLE…DAVIS…
contents
are
divided
4 April 2020 11
Block splitting & concatenation
After
Block 1 ADAMS…BAIRD…BIXBY…BOONE…. DAVIS is
deleted
Block 2 BYNUM…CARSON…CARTER…
4 April 2020 12
Choice of Block Size
❖ Block : basic unit of I/O
4 April 2020 13
Adding a Simple Index to the Sequence Set
❖ Each of the blocks we created for our sequence set
contains a range of records that might contain the record we
are seeking.
1 2 3 4 5 6
CAGE 2
DUTTON 3
EVANS 4
FOLK 5
GADDIS 6
4 April 2020 15
The content of the Index : Separators Instead of Keys
❖ The index serves as a kind of road map for the sequence set
== > we do not need to have keys in the index set.
4 April 2020 16
Separators between blocks in the sequence set
BO CAM E F FOLKS
1 2 3 4 5 6
DUTU
CAMP - DVXGHESJF EMBRY -
DUTTON DZ EVANS
E
EBQX
3 ELEEMOSYNARY 4
4 April 2020 17
The Simple Prefix B+ Tree
❖ The separators we just identified can be formed into a B-Tree
index of the sequence set blocks and the B-Tree index is called
the index set.
❖ Taken together with sequence set, the index set forms a file
structure called a simple prefix B+ Tree(B-Tree index+ sequence set).
4 April 2020 18
A B-tree index set for the sequence set, forming a simple prefix B+ tree
Index Set
BO CAM F FOLKS
1 2 3 4 5 6
4 April 2020 19
Simple Prefix B+ tree Maintenance
❖ Changes localized to single blocks in the sequence set : Make
the changes to the sequence set and to the index set
❖ Changes involving multiple blocks in the sequence set:
4 April 2020 20
Simple Prefix B+ tree Maintenance
❖ Deletion without concatenation, redistribution
❖ Delete EMBRY, FOLKS
4 April 2020 21
Deletion of the EMBRY and FOLKS from the
sequence set
E
Index Set
BO CAM F FOLKS
1 2 3 4 5 6
4 April 2020 22
Insertion of EATON into the sequence set
Index Set
F FOLKS
BO CAM
1 2 3 4 5 6
4 April 2020 23
An insertion into block 1 causes a split and consequent addition of block 7
BO E
Index Set
AY CAM F FOLKS
1 7 2 3 4 5 6
4 April 2020 24
A deletion from block 2 causes underflow and the
consequent concatenation of block 2 and block 3
E
Index Set
AY BO F FOLKS
1 7 2 4 5 6
4 April 2020 25
Index Set Block Size
❖ The physical size of a node for the index set is usually the same as the
physical size of a block in the sequence set. we, then speak of index set
blocks rather than the nodes.
❖ There are number of reasons for using the common block size for the
index and sequence sets:
❖ The block size for the sequence set is usually chosen because there
is a good fit among this block size, the characteristics of the disk
drive, and the amount of memory available.
❖ The index set blocks and sequence set blocks are often mingled
within the same file to avoid seeking between 2 separate files while
accessing the simple prefix B+ Tree.
4 April 2020 26
Internal Structure of Index Set Blocks : A variable-order B-tree
❖ Given a large, fixed- size block for the index set, how do we
store the separators within it?
4 April 2020 27
Seperators : As, Ba, Bro, C, Ch, Cra, Dele, Edi, Err, Fa, File
AsBaBroCChCraDeleEdiErrFaFile 00 02 04 07 08 10 13 17 20 23 25
11 28 AsBaBroCChCraDeleEdiErrFaFile 00 02 04 07 08 10 13 17 20 23 25 B01 B02 B03 B04 B05 B06 B07 B08 B09 V10 B11
4 April 2020 28
Loading a Simple Prefix B+ Tree
4 April 2020 31
B+ Trees
❖ The difference between a simple prefix B+ Tree and a plain
B+ Tree is that the plain B+ tree does not involve the use of
prefixes as separators.
❖ B and B+ Trees are not the only tools useful for file structures
design.
❖ Simple Indexes are useful when they can held fully in main
memory
4 April 2020 33
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Common characteristics of B and B+ and Prefix B+ trees
❖ Paged Index structure = = > Broad and shallow trees
4 April 2020 35
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Differences between the various structures
4 April 2020 36
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Differences between the various structures
❖ B+ Trees :
❖ In a B+ Tree all the keys and record info is contained in
a linked list set of blocks known as sequence set
4 April 2020 38