0% found this document useful (0 votes)
51 views

Indexing and Hashing: (Emphasis On B+ Trees)

This document discusses different indexing methods for databases, with an emphasis on B+ trees. It defines indexing concepts like pointers and index records. It explains index-sequential files and their disadvantages. It then describes how B+ tree indexes maintain efficiency during data insertion and deletion by using a balanced tree structure. It provides an example of a B+ tree and discusses search, insertion, and deletion processes. It compares B+ and B- trees and binary trees versus B-trees.

Uploaded by

Goutham Tvs
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Indexing and Hashing: (Emphasis On B+ Trees)

This document discusses different indexing methods for databases, with an emphasis on B+ trees. It defines indexing concepts like pointers and index records. It explains index-sequential files and their disadvantages. It then describes how B+ tree indexes maintain efficiency during data insertion and deletion by using a balanced tree structure. It provides an example of a B+ tree and discusses search, insertion, and deletion processes. It compares B+ and B- trees and binary trees versus B-trees.

Uploaded by

Goutham Tvs
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Indexing and Hashing

(emphasis on B+ trees)

By Huy Nguyen
Cs157b
0900-1015 TR
Lee, Sin-Min
Storage
• We have been studying database languages
and queries and have yet to study how data is
stored.

• We will look at how data is accessed and the


different Index methods used.

• Different types index methods have good and


bad qualities which I will try to address.
Indices
• Indices is used to look up, input and delete
data in a ordered manner.

• Speed and efficiency is a main goal in the


different types of Indexing.

• Speed and efficiency includes access


type, access time, insertion time, deletion
time, and space overhead.
Definitions
• Pointer- identifies the disk block and offset
in the disk block.

• Index record- holds search key value and


pointers to the records with the value.

• Clustering index- an index whose search


key also defines the sequential order of
the file.
Index-Sequential File Organization
• Index-Sequential files are files (which holds
information for data) ordered sequentially on a
search key.

• Main disadvantage is that performance


degrades as file size grows for lookups and
sequential scans.

• Degradation can be fixed with reorganization of


the file. Reorganization require lot of overhead
space so frequent reorganization is undesirable.
B+ Tree Index Files
• Index-Sequential file organization is used
but not as much as B+ index structure.

• B+ Tree indexing maintain efficiency


despite insertion and deletion of data.

• They use idea of a balance tree in which


every path from the root of the tree to a
leaf is of the same length.
B+ Tree Index Files(cont’d)
• In a B+ Tree data structures, each node
corresponds to a disk block.

• Each node is kept between half-full and


completely full.

• A node in a B+- tree has n-1 search key


values K1, K2….Kn-1 and n pointers P1,
P2…Pn.
B+ Tree Index Files(cont’d)
• Search-key values are kept in sorted
order.

• Leaf nodes store the records instead of


pointers to records.

• Search-key values are kept in sorted order


B+ Tree Index Files(cont’d)
• Pointer P points to a file record with a
search key value of a K.

• Each leaf holds up to n-1 values .

• A non-leaf node can hold up to n pointers


and must hold n/2 ceiling pointers.
B+ Tree Index Files(cont’d)
• Number of pointers in a node is called fan
out of a node.

• The root must hold at least 2 pointers but


can have less than n/2.
Example of B+-Tree of Order 4 (with
L = 5)

41 55 75

22 30 35 45 49 60 67 87 91 100

12 22 30 35 41 45 49 55 60 67 75 87 91 100
15 24 32 37 43 46 51 57 62 72 77 88 94 110
20 27 34 38 44 48 53 58 64 74 80 90 97 112
28 39 54 59 65 83 99 114
29 66 85 120
Search in a B+ Tree
• Search: Start at root; use key comparisons
to go to leaf.
Inserting a Data Entry into a B+
Tree
1) Find correct leaf node
2) Add index entry to the node
3) If enough space, done!
4) Else, split the node
Redistribute entries evenly between the current
node and the new node
5) Insert <middle key, ptr to new node> to the
parent
6) Go to Step 3
After
After
Deleting a Data Entry from a B+
Tree
1) Find correct leaf node
2) Remove the entry from the node
3) If the node is at least half full, done!
4) Else, possibly borrow some entries from a
sibling
5) If not possible, merge the node with the sibling
6) Delete the separator between the node and the
sibling from the parent node
7) Go to Step 3
After
Binary Trees VS. BTrees
• Binary tree only have 2 children max.

• For large files binary tree will be too high because of


the limit of children and not enough keys per records.

• Btrees disk size can have many children depending


on the disk block.

• Btrees are more realistic for indexing files because


they easily maintain balance and can store many
keys in only a few records.
B+ VS. B- Trees
• B+ trees store redundant search key values
because index is smaller.

• In a B+ tree, all pointers to data records exists at


the leaf-level nodes.

• B-tree eliminates redundancy but require


additional pointers to do so.

• In a B-tree, pointers to data records exist at all


levels of the tree.
References
• http://www.cs.sjsu.edu/~lee/cs157b/cs157
b.html
• http://www.cs.ualberta.ca/~drafiei/291/note
s/6-tree-indexes.pdf
• A. Silberschatz, H.F. Korth, S. Sudershan:
Database System Concepts, 5th ed.,
McGraw-Hill, 2006

You might also like