0% found this document useful (0 votes)
58 views37 pages

FS Mod 3 - Multilevel Indexing and B-Trees

The document discusses multilevel indexing and B-trees. It begins by describing the problem of slow access times when keeping indexes on secondary storage. B-trees were developed as a solution, providing rapid data access and retrieval with minimal overhead. The document then covers B-tree properties such as balancing, paging to improve disk utilization, searching, insertion which can cause splitting and promotion, and deletion which can cause merging or redistribution to maintain the B-tree structure.

Uploaded by

Mahesh R J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views37 pages

FS Mod 3 - Multilevel Indexing and B-Trees

The document discusses multilevel indexing and B-trees. It begins by describing the problem of slow access times when keeping indexes on secondary storage. B-trees were developed as a solution, providing rapid data access and retrieval with minimal overhead. The document then covers B-tree properties such as balancing, paging to improve disk utilization, searching, insertion which can cause splitting and promotion, and deletion which can cause merging or redistribution to maintain the B-tree structure.

Uploaded by

Mahesh R J
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Multilevel Indexing and B-Trees

Introduction-Invention of B-trees
• The goal was the discovery of a general method
for storing and retrieving data in large file
systems that would provide rapid access to the
data with minimal overhead cost.
• Douglas Comer in 1979 wrote an article “The
ubiquitous B-Tree”.
• R Bayer and E.McRight in 1972 published
“organization and Maintainance of Large ordered
Indexes” which announced B-trees to the world.
Statement of the Problem
• Fundamental problem with keeping an index
on Secondary storage is slow. This can be
broken down into two specific problems.
– Searching the index must be faster than binary
searching
– Insertion and deletion must be as fast as search
Indexing the Binary Search Trees
• Looking at the cost of keeping a list in sorted order
we can perform binary searches.
After adding NP MB TM LA UF ND TS NK
AVL Trees
• In honor of the Russian mathematicians, G.M.Adel’son-
Vel’skkii and E.M.Landis who first defined them.
• An AVL tree is hight-balanced tree. There is a limit placed on
the amount of difference allowed between the heights of
any two subtrees sharing common root.
• In AVL tree maximum allowable difference is one.
• An AVL tree hence is called height-balanced 1-tree or HB(1)
tree.
• It is a member of a more general class of height-balanced
trees known as HB(k), which are permitted to be k levels out
of balance.
• Following tree has AVL or HB(1) property.
• BCGEFDA
Paged Binary Trees
• Disk utilization of binary search tree is extremely inefficient.
i.e. when we read a node there are only three useful pieces of
information- key value and address of the left and right
subtree.
• This wastes most of the data read from the disk, which is
critical factor in the cost of searching which we can not afford.
• Paged binary tree attempts to address the problem by locating
multiple binary nodes on the same disk page.
• Here we do not incur the cost of a disk seek just to get few
bytes.
• Once we take time to seek an area of the disk we read entire
page from the file.
• Paging is potential solution to the inefficient
disk utilization of binary search trees.
• By dividing a binary tree into pages and then
storing each page in a block of contiguous
locations on disk, we should be able to reduce
the number of seeks associated with any
search.
• Paging has the potential to result faster
searching on secondary storage.
• In this tree we are able to locate any of the 63 nodes in the
tree with no more two disk accesses.
• Every page holds 7 nodes and can branch to eight new
pages.
• If we extend to one more level we add 64 new pages, we can
find any one of 511 nodes in only three seeks.
Problems with paged trees
• Inefficient disk usage : In previous tree there
are seven nodes per page. Of the 14 reference
fields in a single page 6 of them are reference
nodes within the page. i.e. we are using 14
reference fields to distinguish between 8
subtrees. Still wastage of memory.
• How to build paged tree? : We need sorted
list to build a paged tree.
B-Trees:
• Create a B-Tree for the following elements
An object oriented representation of B-Trees
Class BTree: Supporting Files of B-Tree Nodes

• Class Btree uses in-memory BTreeNode


objects, adds the file access portion and
enforces the consistent size of the nodes.
• The following code defines class Btree .
Searching in B-Tree
• Characteristics of most B-Tree algorithms
1. They are iterative
2. They work in two stages, operating alternatively on
entire pages(Class Btree) and then within pages(class
BTreeNode)

• Searching procedure is iterative, loading a page into


memory and then searching through the page,
looking for the key successively lower levels of the
tree until reaches the leaf level.
Insertion
• There are two important observations we can make
about the insertion, splitting and promotion process:
• The first operation in method Insert is to search to the root for
key using FindLeaf:
thisNode = FindLeaf(key);
• The next step is to insert key into the leaf node
result = thisNode->Insert(key,recAddr)
• When overflow is detected, the node must be split into two
nodes using following code
newNode=NewNode();
thisNode->Split(newNode);
Store(thisNode);
Store(newNode);
• The next step is to update the parent node. Since the largest key
in thisNode has changed,method UpdateKey is used to record
the change
parentNode->UpdateKey(largestKey, thisNode->LargestKey());
Testing the B-Tree
Worst Case Search Depth
• It is important to understand the relationship between the
page size of B-tree , the number of keys to be stored in the
tree, and the number of levels that the tree can extend.
• Example: Suppose we want to store 1000000 keys and
that, given nature of storage hardware and the size of
keys, it is reasonable to consider using a B-tree of order
512.
• In the worst case what will be the max number of disk
accesses required to locate a key in the tree? Or how deep
the tree will be?
• We can answer this by noting every key appears
in the leaf level. Hence , we need to calculate the
maximum height of a tree with 1000000 in the
leaves.
• By observing formal definition of B-tree
properties to calculate minimum number of
descendants that can extend from any level of B-
tree of some given order.
• The worst case occurs when every page of the
tree has only maximum number of descendants.
• In such case the keys are spread over a maximal
height for the tree and a minimal breadth.
• For a B-tree of order m, the minimum number of
descendants from the root page is 2, so the second
level of the tree contains only 2 pages.
• Each of these pages, in turn, has at least m/2
descendants.
• The third level then contains 2Xm/2 pages.
• The general pattern of the relation between depth and
the minimum number of descendants takes following
form:
Deletion, Merging and Redistribution
1. Deletion of C from above tree does not affect the tree.
2. Deletion of P changes P to O in the second level and
the root.
3. Deleting H, Causes an underflow and two leaf nodes
were merged.

You might also like