0% found this document useful (0 votes)
55 views46 pages

B Tree

The document discusses the structure and properties of B+ trees, which are commonly used dynamic multi-level index structures for databases. B+ trees allow for efficient insertion, deletion, and search operations. The structure includes internal nodes that link to child nodes and leaf nodes that store data records.

Uploaded by

Shreya Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views46 pages

B Tree

The document discusses the structure and properties of B+ trees, which are commonly used dynamic multi-level index structures for databases. B+ trees allow for efficient insertion, deletion, and search operations. The structure includes internal nodes that link to child nodes and leaf nodes that store data records.

Uploaded by

Shreya Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

B+ Tree Index Structure

B+-Tree Index File


• So far, we have seen single level indexing (primary indexing, cluster based
indexing and secondary indexing) and also multilevel indexing for accessing the
records stored in disk
• ISAM files – Ordered files with a multilevel primary/clustering index
• Index sequential files are static structure
• Disadvantage of indexed-sequential files
• Performance degrades as file grows, since many overflow blocks get created
• Periodic reorganization of entire file is required
• If the files are dynamic, we need to go for dynamic multi-level index structures
based on B+ trees.
• B+ tree indices are alternative of to index sequential files
Advantages of B+ Tree
• Advantage of B+-tree index files:
• Unlike index sequential structure, B+ tree has many fan outs i.e. number of
pointers to child nodes of a node, which minimizes the number of I/O access
• Automatically reorganizes itself with small, local, changes
• Efficient for insertion, deletion and update and search
• Keep keys in sorted order for sequential traversing
• Use a hierarchical index to minimize the number of disk reads
• Reorganization of entire file is not required to maintain performance

• (Minor) disadvantage of B+-trees:


• Extra insertion and deletion overhead, space overhead
• Advantages of B+-trees outweigh disadvantages, and they are used extensively.
Properties of B+ Tree
B+ tree is rooted tree satisfying following properties
• It is a generalization of BST
• Balanced search trees : All leaves are at the same level
• It has root, internal nodes and leaf nodes
• Root may be leaf or a node with two or more children
• Each node that is not a root or a leaf has between and n children
• A leaf node has between and n − 1 values

• Special cases: If the root is not a leaf, it has at least 2 children


If the root is a leaf (that is, there are no other nodes in the tree), it
can have between 0 and (n − 1) values
Properties of B+ Tree

• B+ tree is an extension of B tree. B+ tree has index page and data


page. In B tree, keys/ indices and records both are stored in leaf nodes
as well as internal nodes. That’s why the fan-out in B-trees is less
• However, in B+ tree, internal nodes contain the search keys/index and
leaf nodes store the data records
• All leaf nodes are linked up as a singly linked list to make the search
operation more efficient
• Structure of leaf and non-leaf nodes is different in B+ tree, while, it is
same in B tree.
Properties of B+ Tree
• B+ tree use fill factor to control growth and shrinkage. Minimum fill factor is
50%
• Support both random and sequential access of records

Root
40

20 33 51 63

10 15 20 27 33 37 40 46 51 55 63 97
Properties of B+ Tree

• Order of B+ tree is number of children of a node


• Order of internal node
The maximum number of tree pointers held in it
Maximum of (n-1) keys can be present in an internal node
• Order of leaf node
The maximum number of record pointers held in it
It is equal to the number of keys in a leaf node
Node Structure
• Typical node

• Ki are the search-key values


• Pi are pointers to children (for non-leaf nodes) or pointers to records or
buckets of records (for leaf nodes).
• The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(assume no duplicate keys)

• Node structure order =3 P1 K1 P2 K2 P3


Leaf Nodes in B+ tree

• For i = 1, 2, . . ., n–1, pointer Pi points to a file record with search-key value Ki,
• If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than or equal to
Lj’s search-key values
• Pn points to next leaf node in search-key order
Leaf Nodes in B+ tree

P1 K1 P2 K2 P3

Record of K2
Record of K1
Record of K2
Non Leaf Nodes in B+ tree

• Non leaf nodes form a multi-level sparse index on the leaf


nodes. For a non-leaf node with n pointers:
• All the search-keys in the subtree to which P1 points are less
than K1
• For 2  i  n – 1, all the search-keys in the subtree to which
Pi points have values greater than or equal to Ki–1 and less
than Ki
• All the search-keys in the subtree to which Pn points have
values greater than or equal to Kn–1
Non Leaf Nodes in B+ tree

P1 K1 P2 K2 P3

S1 S2 S3
Example of B+ Tree

• Order n =6
• Leaf nodes must have between 3 and 5 values
((n-1/2 ) and n –1, with n = 6
• Non-leaf nodes other than root must have between 3 and 6 children
(n/2 and n with n =6
• Root must have at least 2 children
Observations about B+ tress

• Since the inter-node connections are done by pointers, there is no


assumption that in the B+-tree, the “logically” close blocks are
“physically” close.
• The non-leaf levels of the B+-tree form a hierarchy of sparse indices.
• The B+-tree contains a relatively small number of levels (logarithmic
in the size of the main file), thus searches can be conducted efficiently.
• Insertions and deletions to the main file can be handled efficiently, as
the index can be restructured in logarithmic time
Searching in B+ tree

Searching in B+ tree
• Search records which has the search key value k
• Examine the root and finding the smallest search key (k1) in the node that
is greater than k
• Follow the pointer P1 for the corresponding k1 until reach to the leaf node
• If k is equal to key in node then follow the corresponding pointer to
access the record.
B+ Tree Insertion (1/4)
• Find the appropriate leaf node in which search key would appear
• If node contains key, do nothing. Insert the record with the key into file and add to
bucket a pointer to file
• Otherwise, insert the key in the node (if not full) and position it so that search key
values are sorted in the node and then add the record to file

• Example : order = 6, no. of keys = 5


10 13 14 15  

10 12 13 14 15
• Insert 12
B+ Tree Insertion (2/4)
• If node is full or overflow, i.e. when number of search key values
exceeds n-1. Do the following steps

• Insert 11:
10 11 12 13 14 15

Node is overflow as the maximum number of keys= 5


B+ Tree Insertion (3/4)
If overflow happens in leaf node
• Split the node into two nodes
• 1st node contains ceil((n-1)/2) values ( n: number of pointer)
• 2nd node contains remaining values
• Copy the smallest search value of the 2nd node to parent node

13        

10 11 12     13 14 15    
B+ Tree Insertion (4/4)
If overflow happens in non-leaf node
• Split the node into two nodes
• 1st node contains (ceil(n/2) -1) values ( n: number of pointer/children)
• Move the smallest of remaining value , together with pointer, to parent
node
• 2nd node contains remaining values
12        

10 11       13 14  15    
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
• Insert
1
1

28
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1

• Insert 3,
5

29
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1

• Insert 3,
5
1 3 5

30
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1 3 5

• Insert
7

31
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
1 3 5

• Insert
7

1 3 5 7

32
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5

1 3 5 7

• Insert
9

33
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5

1 3 5 7

• Insert
9
5

1 3 5 7 9

34
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5

1 3 5 7 9

• Insert
2

35
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5

1 3 5 7 9

• Insert
2
5

1 2 3 5 7 9

36
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5

1 2 3 5 7 9

• Insert
4

37
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
5

1 2 3 5 7 9

• Insert
3 5
4
1 2 3 4 5 7 9

38
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
3 5

1 2 3 4 5 7 9

• Insert
6
3 5 7

1 3 5 7
2 4 6 9

40
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
3 5 7

1 2 3 4 5 6 7 9

• Insert
8
3 5 7

1 2 3 4 5 6 7 8 9

42
Insert 1, 3, 5, 7, 9, 2, 4, 6, 8, 10
3 5 7

1 2 3 4 5 6 7 8 9

• Insert
10 7

3 5 9

7 8 9 10

1 2 3 4 5 6
44
• Example : To calculate of order p of B+ tree

• Search key field is V = 9 bytes . Block size is B = 512 bytes . Record pointer is Pr = 7 bytes . Block
pointer/tree pointer is P = 6 bytes
• An internal node can have up to p tree pointers and p-1 key fields, these must fit in a single block . Thus,
p * P + (p-1) * V ≤ B
p*6+(p-1)9 ≤ 512 29
15p ≤ 521
p ≤ 34 for intermediate nodes
• The leaf nodes have the same number of values and pointers, except they are data pointers and next
pointer.
• The order of pleaf can be calculated as follows:
Pleaf * (Pr+V) + P ≤ 512
Pleaf * (7+9) + 6 ≤ 512
Pleaf * 16 ≤ 512 - 6
Pleaf * 16 ≤ 506
Pleaf ≤ 31
• Construct a B+ tree on previous example
• Assume each node is 69% full
• p = 34, pleaf = 31
• On the average, each internal node have 0.69 * 34 ~ 23 pointers and 22 key values
• On the average leaf node has 0.69 * 31 ~ 21 data record pointers

• Root 1 node 22 key entries (22+1=23) ptrs


• Level1 23odes 23*22 (506) (506+23=529) ptrs
• Level2 23*23(529) 528*22(11638) (11638+529=12167)
• Leaf 529*23(12167) 12167*21 (255507) data record ptrs
Numerical on Secondary Index
In a file, number of records r= 30,000
Each record is fixed length with size= 100 bytes. Block size= 1024 bytes
Blocking factor (bfr) = flooring (1024/100)= 10 records/ block
Number of required blocks for r = ceiling (30000/10)= 3000
If we perform linear search on data file , it will take 3000/2= 1500 block access on average
Let we use secondary index on a non key attribute of size 9 bytes and index pointer is 6 bytes long ,
so each index record is of (9+6)= 15 bytes long
Thus blocking factor is = flooring (1024/15)= 68 entries/block
Since secondary index is dense, so number of index entries = number of records
No of required blocks in index structure = ceiling (30000/ 68) =442 blocks
If we do perform binary search then it would require ceiling (log(442))= 9 blocks access
To search a record using the index we need one additional block access in data file
Thus total block access is 10
Numerical on Multi level Index
• In multilevel index, the index file is the first level (base) of multilevel index, as an
ordered file with distinct values for each primary key.
• We can create a primary index for the first level is called the second level multilevel
index. The second level has one entry for each block, as it is a primary index (we can
use block anchors)
• The process repeats for 3rd level and so on until all entries fit in one disk block for
the top level
b1= number of blocks in first level= 442 blocks
b2= number of blocks in second level= ceiling (442/68) = 7 blocks
b3 = ceiling (7/ 68)= 1 block
So, three block access + 1 ( for record in data file)= 4 block access are needed

You might also like