0% found this document useful (0 votes)
17 views76 pages

B Trees

Uploaded by

kr gael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views76 pages

B Trees

Uploaded by

kr gael
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

B-trees

Eduardo Laber
David Sotelo
What are B-trees?
• Balanced search trees designed for secondary
storage devices

• Similar to AVL-trees but better at minimizing


disk I/O operations

• Main data structure used by DBMS to store


and retrieve information
What are B-trees?
• Nodes may have many children (from a few to
thousands)

• Branching factor can be quite large

• Every B-tree of n keys has height O(log n)

• In practice, its height is smaller than the height of


an AVL-Tree
B-trees and Branching Factor
Definition of B-trees
B-tree is a rooted tree containing the following
five properties:

1. Every node x has the following attributes:


a) x.n, the number of keys stored in node x
b) The x.n keys: x.key1 ≤ x.key2 ≤ ... ≤ x.keyx.n
c) The boolen x.leaf indicating if x is a leaf or an
internal node
Definition of B-trees
2. If x is an internal node it contains x.n + 1
pointers x.p1 , x.p2 , ... , x.p(x.n + 1) to its children

3. The keys x.keyi separate ranges of trees stored


in each subtree (x.pi , x.pi+1 )

4. All leaves have the same depth == tree’s


height.
Definition of B-trees
5. Bounds on the number of keys of a node:

• Let B be a positive integer representing the order of


the B-tree.

• Every node (except the root) must have at least B keys.

• Every node (except the root) must have at most 2B


keys.

• Root is free to contain between 1 and 2B nodes (why?)


Example of a B-tree
Exercise 1

Enumerate all valid B-trees of order 2


that represent the set {1, 2, ... , 8}
Exercise 1
Solution:

4 5

1 2 3 5 6 7 8 1 2 3 4 6 7 8

3 6

1 2 4 5 7 8
The height of a B-tree
Theorem : Let h be the height of a B-tree of n
keys and order B > 1. Then: h ≤ log B (n+1)/2

Proof:
• Root contains at least one key.
• All other nodes contain at least B keys
• At least one key at depth 0
• At least 2B keys at depth 1
• At least 2B2 + B keys at depth 2
• At least 2Bi + Bi-1 + Bi-2 + ... + B keys at depth i
Proof (continued)
h
n 1   2 B i

i 1
h 1
B 1
 n 1  2
B 1
h n 1 h
 n 2 B  1  B
2
n 1
 h log B ■
2
Searching a B-tree
• Similar to searching a binary search tree.

• Multiway branching decision according to the


number of the node’s chidren.

• Recursive procedure with a time complexity of


O(B log B n) for a tree of n keys and order B.
Searching a B-tree
B-TREE-SEARCH (x, k)
1 i=1
2 while i ≤ x.n and k > x.keyi do i = i + 1
3 if i ≤ x.n and k == x.keyi then return (x, i)
4 if x.leaf then return NIL
5 else
DISK-READ(x.pi)
return B-TREE-SEARCH (x.pi , k)
Searching a B-tree
• Search for the key F

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key F

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key F

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key N

C G K P S

A B D E F H I L M N O Q R T U
Searching a B-tree
Lemma: The time complexity of procedure
B-TREE-SEARCH is O(B log B n)

Proof:
• Number of recursive calls is equal to tree’s
height.
• The height of a B-tree is O(log B n)
• Cost between B and 2B iterations per call.
• Total of O(B log B n) steps. ■
Exercise 2
• Suppose that B-TREE-SEARCH is implemented
to use binary search rather than linear search
within each node.

• Show that this changes makes the time


complexity O(lg n), independently of how B
might be chosen as a function of n.
Exercise 2
Solution:
• By using binary search the number of steps of
the algorithm becomes O(lg B log B n) .

• Observe that log B n = lg n / lg B .

• Therefore O(lg B log B n) = O(lg n).


Linear or Binary B-tree search ?
Lemma: If 1 < B < n then lg n ≤ B logB n

Proof: lg n B
B
B log B n log B n 
lg B
B

lg n  B 
n B  B n B / B
 B
   
B
 lg n  B lg n / B lg n / n B

 lg n B
/ n lg n lg n
B 1
Inserting a key into a B-tree
• The new key is always inserted into an existing leaf node (why?)

• Firstly we search for the leaf position at which to insert the new
key.

• If such a node is full we split it.

• A split operation splits a full node around its median key into
two nodes having B keys each.

• Median key moves up into splitted node’s parent


(insertion recursive call).
Split operation
• Inserting key F into a full node (B = 2)

A C E G K M O Q
Split operation
• Node found but already full

A C E FG K M O Q
Split operation
• Median key identified

A C E FG K M O Q
Split operation
• Splitting the node

E J

A C F G K M O Q
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J T X

A C F G K M O Q UW Y Z
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J T X

A C F G K MNO Q UW Y Z
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J N T X

A C F G K M O Q UW Y Z

SPLIT
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)

E J SPLIT T X

A C F G K M O Q U W Y Z
Inserting a key into a B-tree
B-TREE-INSERT (x, k, y)
1 i=1
2 while i ≤ x.n and k < x.keyi do i = i + 1
3 x.n = x.n + 1
4 x.keyi = k
5 x.pi+1 = y
6 for j = x.n downto i+1 do
7 x.keyj = x.keyj-1
8 x.pj = x.pj-1
9 end-for
10 DISK-WRITE(x)
Inserting a key into a B-tree
B-TREE-INSERT (x, k)
11 if x.n > 2*B then
12 [m, z] = SPLIT (x)
13 if x.parent != NIL then
14 DISK-READ (x.parent)
15 end-if
16 else
17 x.parent = ALLOCATE-NODE()
18 DISK-WRITE (x)
19 root = x.parent
20 end-else
21 B-TREE-INSERT (x.parent, m, z)
22 end-if
Inserting a key into a B-tree
SPLIT (x)
1 z = ALLOCATE-NODE()
2 m = FIND-MEDIAN (x)
3 COPY-GREATER-ELEMENTS(x, m, z)
4 DISK-WRITE (z)
5 COPY-SMALLER-ELEMENTS(x, m, x)
6 DISK-WRITE (x)
7 return [m, z]
Inserting a key into a B-tree
• Function B-TREE-INSERT has three arguments:
– The node x at which an element of key k should be inserted
– The key k to be inserted
– A pointer y to the left child of k to be used as one of the
pointers of x during insertion process.

• There is a global variable named root which is a pointer to the root


of the B-Tree.

• Observe that the field x.parent was not defined as an original B-


tree attribute, but is considered just to simplify the process.

• The fields x.leaf should also be updated accordingly.


Inserting a key into a B-tree
Lemma: The time complexity of B-TREE-INSERT is O(B log B n)
Proof:
• Recall that B-TREE-SEARCH function is called first and costs
O(log n) by using binary search. Then, B-TREE-INSERT starts by
visiting a node and proceeds upward.

• At most one node is visited per level/depth and only visited nodes
can be splitted. A most one node is created during the insertion
process. Cost for splitting is proportional to 2B.

• Number of visited nodes is equal to tree’s height and the height of


a B-tree is O(log B n). Cost between B and 2B iterations per visited
node. Total of O(B log B n) steps. ■
Some questions on insertion
• Which split operation increases the tree’s height?
The split of the root of the tree.

• How many DISK-READ operations are executed by the


insertion algorithm?
Every node was read at least twice.

• Does binary search make sense here?


Not exactly. We already pay O(B) to split a node (for
finding the median).
Drawbacks of our insertion method
• Once that the key’s insertion node is found it may be
necessary to read its parent node again (due to splitting).

• DISK-READ/WRITE operations are expensive and would be


executed al least twice for each node in the key’s path.

• It would be necessary to store a nodes’s parent or to use


the recursion stack to keep its reference.

• (Mond and Raz, 1985) provide a solution that spends one


DISK-READ/WRITE per visited node (See at CLRS)
Exercise 3

• Show the results of inserting the keys


E, H, B, A, F, G, C, J, D, I
in order into an empty B-tree of order 1.
Exercise 3
Solution: (final configuration)
E

B G I

A C D F H J
Exercise 4

• Does a B-tree of order 1 is a good choice for a balanced


search tree?

• What about the expression h ≤ log B (n+1)/2 when B =


1?
Deleting a key from a B-tree
• Analogous to insertion but a little more complicated.

• A key can be deleted from any node (not just a leaf) and
can affect its parent and its children (insertion operation
just affect parents).

• One must ensure that a node does not get to small


during deletion (less than B keys).

• As a result deleting a node is the most complex operation


on B-trees. It will be considered in 4 particular cases.
Deleting a key from a B-tree

• Case 1: The key is in a leaf node with more


than B elements.

Procedure:
 Just remove the key from the node.
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements (B = 2)

E J T X

A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements (B = 2)

E J T X

A D F G K M O Q U W Y Z
Deleting a key from a B-tree
Case 2: The join procedure
• The key k1 to be deleted is in a leaf x with exactly B elements.
• Let y be a node that is an “adjacent brother” of x.
• Suppose that y has exactly B elements.
Procedure:
 Remove the key k1.
 Let k2 be the key that separates nodes x and y in their parent.
 Join the the nodes x and y and move the key k2 from the parent
to the new joined node.
 If the parent of x becomes with B-1 elements and also has an
“adjacent brother” with B elements, apply the join procedure
recursively for the parent of x (seen as x) and its adjacent
brother (seen as y).
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

K T X

...
H I O Q U W Y Z
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K T X

...
H I O Q U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K T X

...
H I O U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K T X

...
H I O U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

F
Parent

K X

...
H I O T U W Y Z
Join
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)

K X

...
H I O T U W Y Z
Deleting a key from a B-tree
Case 3: join and split
• The key k1 to be deleted is in a leaf x with exactly B elements.
• Let y be a node that is an “adjacent brother” of x.
• Suppose that y has more than B elements.
Procedure:
 Remove the key k1.
 Let k2 be the key that separates nodes x and y in their parent.
 Join the the nodes x and y and move the key k2 from the parent
to the new joined node z.
 Find the median key m of z
 Determine the new nodes x and y by splitting z around m.
 Insert m into the parent of x and y.
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

E J T X

A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

E J T X

A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

E J T X

A C D G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

N
Parent

E J T X

A C D G K M O Q U W Y Z
Node y Node x
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

N
Parent

E J T X

A C D G K M O Q U W Y Z
Node y Node x
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

J T X

Join
A C D E G
K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

J T X
Median key

A C D E G
K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

D J T X

Split

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)

D J T X

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
Case 4: internal node
• The key k1 to be deleted is in a node x that is not a leaf or
a root.

Procedure:
 Let k2 be the smallest key that is greater than k1.
 Let y be the node of k2, which will be a leaf.
 Insert key k2 into x.
 Remove the key k1 from x.
 Solve now the problem of removing k2 from a leaf y,
previously considered.
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

D J T X

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J T X

A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J T X

A C E G K M O Q U W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U X

A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

N
Node x

D J U

A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)

D J N U

A C E G K M O Q W X Y Z

You might also like