0% found this document useful (0 votes)
126 views

Module 4 PDF

Indexed sequential files provide both indexed and sequential access to records simultaneously. They organize records into blocks that can be read into memory efficiently for local rearrangement on insertions and deletions. The blocks are kept sequentially ordered through splitting and concatenation. A simple index maps key ranges to block numbers to allow indexed access. Separators in the index distinguish blocks rather than storing exact keys to save space.

Uploaded by

Chethan Narayana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views

Module 4 PDF

Indexed sequential files provide both indexed and sequential access to records simultaneously. They organize records into blocks that can be read into memory efficiently for local rearrangement on insertions and deletions. The blocks are kept sequentially ordered through splitting and concatenation. A simple index maps key ranges to block numbers to allow indexed access. Separators in the index distinguish blocks rather than storing exact keys to save space.

Uploaded by

Chethan Narayana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

FILE STRUCTURES(Module 4)

Indexed Sequential File Access and Prefix B+ Trees

4 April 2020 1
Indexed Sequential Access
❖ When we design a file, the important issue is how will we retrieve
information from a file :
❖ Indexed: the file can be seen as a set of records that are indexed by keys
(accessing the specific record)
❖ Sequential : in the order they were entered.(one after the other)

❖ Here, we are looking for a single organizational method that


provides both of these views simultaneously(Indexed
+Sequential = B+ Tree).
❖ Why care about obtaining both views simultaneously?

Example : If an application requires both interactive random access and


consequential batch processing, both sets of actions have to be carried out
efficiently.

❖ Sequential file--→Indexed Sequential file--→B+ tree

❖ Indexed
4 April 2020 Sequential file = Indexed Sequential Access Method (ISAM)
2
Range Searches
❖ “Find all students with gpa > 3.0 “
❖ If data is in sorted file, do binary search to find first such
student, then scan to find others
❖ Cost of binary search can be quite high

❖ Simple idea : create an “ index file”


❖ Level of indirection again (do binary search on index file)

4 April 2020 3
4 April 2020 4
Example ISAM Tree

❖ Each node can hold 2 entries

❖ No need for “next-leaf-page” pointers, (Why)


Root page

40 Indexed
Block

20 33 51 63

10* 15* 20* 21* 33* 37* 40* 46* 51* 55* 63* 97*

Data Block
4 April 2020 5
Comments on ISAM
❖ File creation : Leaf(data) pages allocated sequentially, sorted
by search key.
❖ Then index pages allocated.
❖ Then space for overflow pages.
❖ Index entries : <search key value, page id>; they “direct”
search for data entries, which are in leaf pages
❖ Search : start at root, use key comparisons to go to leaf
❖ Insert : Find leaf where data entry belongs, put it there
❖ (could be on overflow pages)
❖ Delete : Find and remove from leaf.
❖ If empty overflow page, deallocate

Static tree structure : insert/delete affects only leaf


pages
4 April 2020 6
After inserting (23*, 48*, 41*, 42*)
Root page
40 Indexed
Block

20 33 51 63

Primary leaf pages

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

Data B

Overflow pages 23* 46* 41*


Overflow
42
Data Block

4 April 2020 7
After deleting (42*, 51*, 97*)
Root page
40 Indexed
Block

20 33 51 63

Primary leaf pages

10* 15* 20* 27* 33* 37* 40* 46* 55* 63*

Data B

Overflow pages 23* 46* 41*


Overflow
Data Block

4 April 2020 8
Maintaining a Sequence Set
❖ A Sequence set is a set of records in physical key order
which is such that it stays ordered as records are added and
deleted. (ordered file )

❖ Sequence set + Simple index ➔ Simple Prefix B+ Tree

❖ Since sorting and resorting the entire sequence set as


records are added and deleted is expensive, we look at other
strategies.

❖ In particular, we look at a way to localize the changes

❖ The Idea is to use the blocks that can be read into memory
and rearranged there quickly. Like in B-Trees, blocks can be
split, merged or redistributed as necessary.
4 April 2020 9
Maintaining a Sequence Set
❖ Using blocks, we can thus keep a sequence set in order
by key without ever having to sort the entire set of records

❖ However, there are certain costs associated with this


approach:

❖ A Blocked file takes more space than an


unblocked file because of internal
fragmentation.

❖ Order of the file is not physically sequential


throughout the file. The maximum guaranteed
extent of physical sequentiality is within the
block
4 April 2020 10
Block splitting & concatenation
Block 1 Initial
ADAMS…BAIRD…BIXBY…BOONE….
blocked
Block 2 BYNUM…CARSON…COLE…DAVIS… sequence
set
Block 3
DENVER…ELLIS…
Insert CARTER

ADAMS…BAIRD…BIXBY…BOONE….
Block 1
BYNUM…CARSON…CARTER…
Block 2
DENVER…ELLIS… Block 3
Block 2
splits and Block 4
COLE…DAVIS…
contents
are
divided
4 April 2020 11
Block splitting & concatenation
After
Block 1 ADAMS…BAIRD…BIXBY…BOONE…. DAVIS is
deleted
Block 2 BYNUM…CARSON…CARTER…

Available for Block 3


reuse
COLE…DENVER…ELLIS… Block 4

4 April 2020 12
Choice of Block Size
❖ Block : basic unit of I/O

❖ An important aspect of using blocks is the choice of the


block size

❖ Two considerations to keep in mind

❖ The block size should be such that we can hold several


blocks in memory at once

❖ The block size should be such that we can access a block


without having to bear the cost of disk seek within the block
read or block write operation

4 April 2020 13
Adding a Simple Index to the Sequence Set
❖ Each of the blocks we created for our sequence set
contains a range of records that might contain the record we
are seeking.

❖ We construct a simple single-level index for these blocks.

❖ The combination of this kind of index with the sequence


set of blocks provides complete indexed sequential access
(index + sequence set = indexed sequential access)

❖ This method works well as long as the entire index can be


held in memory.

❖ If the entire index can not be held in memory then we use


a B+ Tree which is a B-Tree index plus a sequence set(B-
4 April 2020 14

Tree+ sequence set = B+ Tree).


Sequence of blocks

CAMP - EMBRY - FABER - FOLKS -


ADAMS- BOLEN -
DUTTON EVANS FOLK GADDIS
BERNE CAGE

1 2 3 4 5 6

Key Block Number


BERNE 1

CAGE 2

DUTTON 3

EVANS 4

FOLK 5

GADDIS 6

4 April 2020 15
The content of the Index : Separators Instead of Keys
❖ The index serves as a kind of road map for the sequence set
== > we do not need to have keys in the index set.

❖ What we really need are separators capable of distinguishing


between two blocks.

❖ We can save space by using variable - length separators and


placing the shortest separators in the index structure.

❖ Rules are : key < separator == > Go left


key = separator == > Go right
key > separator == > Go right

4 April 2020 16
Separators between blocks in the sequence set
BO CAM E F FOLKS

ADAMS- BOLEN - CAMP - EMBRY - FABER - FOLKS -


BERNE CAGE BUTTON EVANS GADDIS GADDIS

1 2 3 4 5 6

DUTU
CAMP - DVXGHESJF EMBRY -
DUTTON DZ EVANS
E
EBQX
3 ELEEMOSYNARY 4
4 April 2020 17
The Simple Prefix B+ Tree
❖ The separators we just identified can be formed into a B-Tree
index of the sequence set blocks and the B-Tree index is called
the index set.

❖ Taken together with sequence set, the index set forms a file
structure called a simple prefix B+ Tree(B-Tree index+ sequence set).

❖ “simple prefix” indicates that the index set contains shortest


separators, or prefixes of the keys rather than copies of the
actual keys.

4 April 2020 18
A B-tree index set for the sequence set, forming a simple prefix B+ tree

Index Set

BO CAM F FOLKS

CAMP - EMBRY - FABER - FOLKS -


ADAMS- BOLEN -
DUTTON EVANS GADDIS GADDIS
BERNE CAGE

1 2 3 4 5 6
4 April 2020 19
Simple Prefix B+ tree Maintenance
❖ Changes localized to single blocks in the sequence set : Make
the changes to the sequence set and to the index set
❖ Changes involving multiple blocks in the sequence set:

❖ If blocks are split in the sequence set, a new separator


must be inserted into the index set

❖ If blocks are merged in the sequence set, a separator


must be removed from the index set.

❖ If records are redistributed between the blocks in the


sequence set, the value of the separator in the index set
must be changed.

4 April 2020 20
Simple Prefix B+ tree Maintenance
❖ Deletion without concatenation, redistribution
❖ Delete EMBRY, FOLKS

❖ Insertion without splitting


❖ Insert EATON

4 April 2020 21
Deletion of the EMBRY and FOLKS from the
sequence set
E

Index Set

BO CAM F FOLKS

CAMP - ERVIN - FABER - FROST -


ADAMS- BOLEN -
DUTTON EVANS FOLK GADDIS
BERNE CAGE

1 2 3 4 5 6
4 April 2020 22
Insertion of EATON into the sequence set

Index Set

F FOLKS
BO CAM

CAMP - EATON - FABER - FROST -


ADAMS- BOLEN -
DUTTON EVANS FOLK GADDIS
BERNE CAGE

1 2 3 4 5 6
4 April 2020 23
An insertion into block 1 causes a split and consequent addition of block 7

BO E

Index Set

AY CAM F FOLKS

ADAMS- AYERS- BOLEN CAMP - ERVIN - FABER - FROST -


AVERY BERNE - CAGE DUTTON EVANS FOLK GADDIS

1 7 2 3 4 5 6
4 April 2020 24
A deletion from block 2 causes underflow and the
consequent concatenation of block 2 and block 3
E

Index Set

AY BO F FOLKS

ADAMS- AYERS - BOLEN - ERVIN - FABER - FROST -


AVERY BERNE DUTTON EVANS FOLK GADDIS

1 7 2 4 5 6
4 April 2020 25
Index Set Block Size
❖ The physical size of a node for the index set is usually the same as the
physical size of a block in the sequence set. we, then speak of index set
blocks rather than the nodes.

❖ There are number of reasons for using the common block size for the
index and sequence sets:

❖ The block size for the sequence set is usually chosen because there
is a good fit among this block size, the characteristics of the disk
drive, and the amount of memory available.

❖ A common block size makes it easier to implement a buffering


scheme to create a virtual simple prefix B+ Tree

❖ The index set blocks and sequence set blocks are often mingled
within the same file to avoid seeking between 2 separate files while
accessing the simple prefix B+ Tree.
4 April 2020 26
Internal Structure of Index Set Blocks : A variable-order B-tree

❖ Given a large, fixed- size block for the index set, how do we
store the separators within it?

❖ There are many ways to combine the list of separators, the


index to separators, and the list of Relative Block Numbers
(RBNs) into a single index set block.

❖ One possible approach includes a separator count and keeps


a count of the total length of separators.

4 April 2020 27
Seperators : As, Ba, Bro, C, Ch, Cra, Dele, Edi, Err, Fa, File

AsBaBroCChCraDeleEdiErrFaFile 00 02 04 07 08 10 13 17 20 23 25

Variable- length separators and corresponding index

11 28 AsBaBroCChCraDeleEdiErrFaFile 00 02 04 07 08 10 13 17 20 23 25 B01 B02 B03 B04 B05 B06 B07 B08 B09 V10 B11

Separators Index to Separators Relative block numbers

Structure of an Index set block

4 April 2020 28
Loading a Simple Prefix B+ Tree

❖ Successive insertions is not a good method because splitting


and redistribution are relatively expensive and would be best
to use only for tree maintenance

❖ Starting from the sorted file, however, we can place the


records into sequence set blocks one by one, starting a new
block when the one we are working with fills up.

❖ As we make the transition between between two sequence


set blocks, we can determine the shortest separator for the
blocks.

❖ We can collect these separators into an index set block that


we build and hold in memory until it is full.
4 April 2020 29
Loading a Simple Prefix B+ Tree : Advantages
❖ The advantage of loading a simple prefix B+ Tree almost
always out weigh the disadvantages associated with the
possibility of creating blocks that contain too few records or
too few separators.

❖ A particular advantage is that the loading process goes more


quickly because :
❖ The output can be written sequentially
❖ We make only one pass over the data
❖ No blocks need to be reorganized as we proceed

❖ Advantages after the tree is loaded


❖ The blocks are 100% full
❖ Sequential loading creates a degree of spatial locality
within out file = = >seeking can be minimized
4 April 2020
B+ Trees
Contains copies of actual keys
❖ B+ tree separator

ALWAYS/ASPECT/BETTER 00 06 12 Next separator : CAT

ACCESS- ALWAYS - ASPECT - BETTER - Next


sequence set CATCH -
ALSO ASK BEST CAST
blocks: CHECK

4 April 2020 31
B+ Trees
❖ The difference between a simple prefix B+ Tree and a plain
B+ Tree is that the plain B+ tree does not involve the use of
prefixes as separators.

❖ Instead, the separators in the index set are simple copies of


the actual keys

❖ Simple Prefix B+ Tree are often more desirable than plain


B+ trees because the prefix separators take up less space than
the full keys

❖ B+ Trees, however, are sometimes, are more desirable since


❖ They do not need variable length separator fields
❖ Some key sets are not always easy to compress
effectively
4 April 2020 32
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective

❖ B and B+ Trees are not the only tools useful for file structures
design.

❖ Simple Indexes are useful when they can held fully in main
memory

❖ Hashing can provide much faster access than B and B+ Trees

4 April 2020 33
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Common characteristics of B and B+ and Prefix B+ trees
❖ Paged Index structure = = > Broad and shallow trees

❖ Height Balanced Trees

❖ Trees are grown Bottom up

❖ Operation used are split, merging and redistribution

❖ Two-to-Three Splitting and redistribution can be used to


obtain greater storage efficiency

❖ Can be Implemented as Virtual Trees Structures

❖ Can be adapted for use with variable- length records


4 April 2020 34
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Differences between the various structures

❖ B-Trees : Multi-level indexes to data files thar are entry-


sequenced

❖ Strengths : Simplicity of implementation

❖ Weakness : excessive seeking necessary for sequential


access

4 April 2020 35
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Differences between the various structures

❖ B-Tree with associated Information :

❖ These are the B-Trees that contains record contents at


every level of the B-tree

❖ Strengths : can save up space

❖ Weakness : works only when record information is


located within the B-Tree. Otherwise, too much seeking
is involved in retrieving the record information

4 April 2020 36
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Differences between the various structures
❖ B+ Trees :
❖ In a B+ Tree all the keys and record info is contained in
a linked list set of blocks known as sequence set

❖ Indexed access is provided through Indexed set

❖ Advantages over B-Tree


❖ The sequence set can be processed in a truly linear,
sequential way(B+ Tree)
❖ The index is built with a single key or separator per
block of data records (B+ Tree)rather than one key per
data record(B-tree)
❖ == > index is smaller and hence shallower :
4 April 2020 37
B-Trees, B+ Trees and Simple Prefix B+ Trees in Perspective
Differences between the various structures
❖ Simple Prefix B+ trees:
❖ The separators in the index set are smaller than the
keys in the sequence set == >Tree is even smaller

4 April 2020 38

You might also like