BTree

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

B-Trees with Functional and imperative implementation

Liu Xinyu

September 6, 2010

Abstract B-Tree is introduced by Introduction to Algorithms book[1] as one of the advanced data structures. It is important to the modern le systems, some of them are implemented based on B+ tree, which is extended from B-tree. It is also widely used in many database systems. This post provides some implementation of B-trees both in imperative way as described in [1] and in functional way with a kind of modify-and-x approach. There are multiple programming languages used, including C++, Haskell, Python and Scheme/Lisp. There may be mistakes in the post, please feel free to point out. A This post is generated by L TEX 2 , and provided with GNU FDL(GNU Free Documentation License). Please refer to http://www.gnu.org/copyleft/fdl.html for detail.

Keywords: B-Trees

Introduction

In Introduction to Algorithm book, B-tree is introduced with the the problem of how to access a large block of data on magnetic disks or secondary storage devices[1]. B-tree is commonly used in databases and le-systems. It is also helpful to understand B-tree as a generalization of balanced binary search tree[2].
Liu Xinyu Email: [email protected]

Refer to the Figure 1, It is easy to found the dierence and similarity of B-tree regarding to binary search tree.
M

Figure 1: An example of B-Tree Lets review the denition of binary search tree [3]. A binary search tree is either an empty node; or a node contains 3 parts, a value, a left child which is a binary search tree and a right child which is also a binary search tree. An it satises the constraint that. all the values in left child tree is less than the value of of this node; the value of this node is less than any values in its right child tree. The constraint can be represented as the following. for any node n, it satises the below equation. x LEF T (n), y RIGHT (n) V ALU E(x) < V ALU E(n) < V ALU E(y) (1) If we extend this denition to allow multiple keys and children, we get the below denition. A B-tree is either an empty node; or a node contains n keys, and n+1 children, each child is also a B-Tree, we denote these keys and children as key1 , key2 , ..., keyn and c1 , c2 , ..., cn , cn+1 . Figure 2 illustrates a B-Tree node. The keys and children in a node satisfy the following order constraints. Keys are stored in non-decreasing order. that is key1 key2 ... keyn ; for each keyi , all values stored in child ci are no bigger than keyi , while all values stored in child ci+1 are no less than keyi . 2

C[1]

K[1]

C[2]

K[2]

...

C[n]

K[n]

C[n+1]

Figure 2: A B-Tree node

The constraints can be represented as in equation refeq:btree-order as well. xi ci , i = 0, ..., n, x1 key1 x2 key2 ... xn keyn xn+1 (2) Finally, if we added some constraints to make the tree balanced, we get the complete denition of B-tree. All leaves have the same depth; We dene an integral number, t, as the minimum degree of a B-tree; each node can have at most 2t 1 keys; each node can have at least t 1 keys, except the root; In this post, Ill rst introduce How to generate B-trees by insertion algorithm. Two dierent methods will be explained. One method is discussed in [1] book, the other is a kind of modify-x approach which quite similar to the algorithm Okasaki used in red-black tree[4]. This method is also discussed in wikipedia[2]. After that, how to delete element from B-tree is explained. As the last part, algorithm for searching in B-tree is also provided. This article provides example implementation in C, C++, Haskell, Python, and Scheme/Lisp languages. All source code can be downloaded in appendix 7, please refer to appendix for detailed information about build and run.

Denition

Similar as Binary search tree, B-tree can be dened recursively. Because there are multiple of keys and children, a collection container can be used to store them.

Denition of B-tree in C++


ISO C++ support using const integral number as template parameter. This feature can used to dene B-tree with dierent minimum degree as dierent type. // t : minimum d e g r e e o f Bt r e e template<c l a s s K, int t> struct BTree{ 3

typedef K k e y t y p e ; typedef s t d : : v e c t o r <K Keys ; > typedef s t d : : v e c t o r <BTree> C h i l d r e n ; BTree ( ) { } BTree ( ) { for (typename C h i l d r e n : : i t e r a t o r i t=c h i l d r e n . b e g i n ( ) ; i t != c h i l d r e n . end ( ) ; ++i t ) delete ( i t ) ; } bool f u l l ( ) { return k e y s . s i z e ( ) == 2 t 1; } bool l e a f ( ) { return c h i l d r e n . empty ( ) ; } Keys k e y s ; Children children ; }; In order to support random access to keys and children, the inner data structure uses STL vector. The node will recursively release all its children. and a two simple auxiliary member functions full and leaf are provided to test if a node is full or is a leaf node.

Denition of B-tree in Python


If the minimum degree is 2, the B-tree is commonly called as 2-3-4 tree. For illustration purpose, I set 2-3-4 tree as default. TREE 2 3 4 = 2 #by d e f a u l t , c r e a t e 234 t r e e c l a s s BTreeNode : def i n i t ( s e l f , t=TREE 2 3 4 , l e a f=True ) : self . leaf = leaf self . t = t s e l f . k e y s = [ ] #s e l f . d a t a = . . . s e l f . children = [ ] Its quite OK for B-tree not only store keys, but also store satellite data. However, satellite data is omitted in this post. Also there are some auxiliary member functions dened c l a s s BTreeNode : #. . . def i s f u l l ( s e l f ) : 4

return l e n ( s e l f . k e y s ) == 2 s e l f . t 1 This member function is used to test if a node is full.

Denition of B-tree in Haskell


In Haskell, record syntax is used to dene BTree, so that keys and children can be access easily later on. Some auxiliary functions are also provided. data BTree a = Node{ k e y s : : [ a ] , c h i l d r e n : : [ BTree a ] , d e g r e e : : Int } deriving (Eq, Show) A u x i l i a r y f u n c t i o n s empty deg = Node [ ] [ ] deg f u l l : : BTree a > Bool f u l l t r = ( length $ k e y s t r ) > 2 ( d e g r e e t r )1

Denition of B-tree in Scheme/Lisp


In Scheme/Lisp, because a list can contain both children and keys at same time, we can organize a B-tree with children and keys interspersed in list. for instance, below list represents a B-tree, the root has one key c and two children, the left child is a leaf node, with keys A and B, while the right child is also a leaf with keys D and E. ( ( A B ) C ( D E ) ) However, this denition doesnt hold the information of minimum degree t. The solution is to pass t as a parameter for all operations. Some auxiliary functions are provided so that we can access and test a B-tree easily. ( d e f i n e ( keys t r ) ( i f ( null ? t r ) () ( i f ( l i s t ? ( car tr )) ( keys ( cdr t r ) ) ( cons ( c a r t r ) ( k e y s ( c d r t r ) ) ) ) ) ) ( define ( children tr ) ( i f ( null ? t r ) () ( i f ( l i s t ? ( car tr )) ( cons ( c a r t r ) ( c h i l d r e n ( c d r t r ) ) ) ( c h i l d r e n ( cdr t r ) ) ) ) )

( define ( leaf ? tr ) ( or ( null ? t r ) ( not ( l i s t ? ( c a r t r ) ) ) ) ) Here we assume the key is a simple value, such as a number, or a string, but not a list. In case we nd a element is a list, it represents a child B-tree. All above functions are dened based on this assumption.

Insertion

Insertion is the basic operation to B-tree, a B-tree can be created by inserting keys repeatedly. The essential idea of insertion is similar to the binary search tree. If the keys to be inserted is x, we examine the keys in a node to nd a position where all the keys on the left are less than x, while all the keys on the right hand are greater than x. after that we can recursively insert x to the child node at this position. However, this basic idea need to be ne tuned. The rst thing is what the recursion termination criteria is. This problem can be easily solved by dene the rule that, in case the node to be inserted is a leaf node. We neednt do inserting recursively. This is because leaf node dont have children at all. We can just put the x between all left hand keys and right hand keys, which cause the keys number of a leaf node increased by one. The second thing is how to keep the balance properties of a B-tree when inserting. if a leaf has already 2t 1 keys, it will break the rule of each node can has at most 2t 1 keys after we insert x to it. Below sections will show 2 major methods to solve this problem.

3.1

Splitting

Regarding to the problem of insert a key to a node, which has already 2t 1 keys, one solution is to split the node before insertion. In this case, we can divide the node into 3 parts as shown in Figure 3. the left part contains rst t 1 keys and t children, while the right part contains the last t 1 keys and t children. Both left part and right part are valid B-tree nodes. the middle part is just the t-th key. It is pushed up to its parent node (if it already root node, then the t-th key, with 2 children turn be the new root). 3.1.1 Imperative splitting

If we skip the disk accessing part as explained in [1]. The imperative splitting algorithms can be shown as below. 1: procedure B-TREE-SPLIT-CHILD(node, i) 2: x CHILDREN (node)[i] 3: y CREAT E N ODE() 4: IN SERT (KEY S(node), i, KEY S(x)[t]) 5: IN SERT (CHILDREN (node), i + 1, y) 6

K[1]

K[2]

...

K[t]

...

K[2t-1]

C[1]

C[2]

...

C[t]

C[t+1]

...

C[2t-1]

C[2t]

a. Before split,
... K[t] ...

K[1]

K[2]

...

K[t-1]

K[t+1]

...

K[2t-1]

C[1]

C[2]

...

C[t]

C[t+1]

...

C[2t-1]

b. After split, Figure 3: Split node

KEY S(y) KEY S(x)[t + 1...2t 1] KEY S(x) KEY S(x)[1...t 1] 8: if y is not leaf then 9: CHILDREN (y) CHILDREN (x)[t + 1...2t] 10: CHILDREN (x) CHILDREN (x)[1...t] 11: end if 12: end procedure This algorithm take 2 parameters, one is a B-tree node, the other is the index to indicate which child of this node will be split.
6: 7:

Split implemented in C++ The algorithm can be implemented in C++ as a member function of B-tree node. template<c l a s s K, int t> struct BTree{ // . . . void s p l i t c h i l d ( int i ) { BTree<K, t > x = c h i l d r e n [ i ] ; BTree<K, t > y = new BTree<K, t > ( ) ; k e y s . i n s e r t ( k e y s . b e g i n ()+ i , x>k e y s [ t 1 ] ) ; c h i l d r e n . i n s e r t ( c h i l d r e n . b e g i n ()+ i +1, y ) ; y>k e y s = Keys ( x>k e y s . b e g i n ()+ t , x>k e y s . end ( ) ) ;

x>k e y s = Keys ( x>k e y s . b e g i n ( ) , x>k e y s . b e g i n ()+ t 1); i f ( ! x>l e a f ( ) ) { y>c h i l d r e n = C h i l d r e n ( x>c h i l d r e n . b e g i n ()+ t , x>c h i l d r e n . end ( ) ) ; x>c h i l d r e n = C h i l d r e n ( x>c h i l d r e n . b e g i n ( ) , x>c h i l d r e n . b e g i n ()+ t ) ; } } Split implemented in Python We can dene splitting operation as a member method of B-tree as the following. c l a s s BTreeNode : #. . . def s p l i t c h i l d ( s e l f , i ) : t = self . t x = s e l f . children [ i ] y = BTreeNode ( t , x . l e a f ) s e l f . k e y s . i n s e r t ( i , x . k e y s [ t 1]) s e l f . c h i l d r e n . i n s e r t ( i +1, y ) y . keys = x . keys [ t : ] x . k e y s = x . k e y s [ : t 1] i f not y . l e a f : y . children = x . children [ t : ] x . children = x . children [ : t ] 3.1.2 Functional splitting

For functional algorithm, splitting will return a tuple, which contains the left part and right as B-Trees, along with a key. 1: function B-TREE-SPLIT(node) 2: ks KEY S(node)[1...t 1] 3: ks KEY S(node)[t + 1...2t 1] 4: if node is not leaf then 5: cs CHILDREN (node)[1...t] 6: cs CHILDREN (node)[t...2t] 7: end if 8: return (CREAT E B T REE(ks, cs), KEY S(node)[t], CREAT E B T REE(ks , cs )) 9: end function Split implemented in Haskell Haskell prelude provide take/drop functions to get the part of the list. These functions just returns empty list if the list passed in is empty. So there is no need to test if the node is leaf.

s p l i t : : BTree a > ( BTree a , a , BTree a ) s p l i t ( Node ks c s t ) = ( c1 , k , c2 ) where c1 = Node ( take ( t 1) ks ) ( take t c s ) t c2 = Node ( drop t ks ) ( drop t c s ) t k = head ( drop ( t 1) ks ) Split implemented in Scheme/Lisp As mentioned previously, the minimum degree t is passed as a parameter. The splitting is performed according to t. ( define ( split tr t ) ( i f ( leaf ? tr ) ( l i s t ( l i s t h e a d t r ( t 1 ) ) ( l i s t r e f t r ( t 1 ) ) ( l i s t t a i l tr t )) ( l i s t ( l i s t h e a d t r ( ( t 2 ) 1 ) ) ( l i s t r e f t r ( ( t 2 ) 1 ) ) ( l i s t t a i l t r ( t 2 ) ) ) ) ) When splitting a leaf node, because there is no child at all, the program simply take the rst t 1 keys and the last t 1 keys to form two child, and left the t-th key as the only key of the new node. It will return these 3 parts in a list. When splitting a branch node, children must be also taken into account, thats why the rst 2t 1 and the last 2t 1 elements are taken.

3.2

Split before insert method

Note that the split solution will push a key up to its parent node, It is possible that the parent node be full if it has already 2t 1 keys. Regarding to this issue, the [1] provides a solution to check every node from root along the path until leaf, in case there is a node in this path is full. the split is applied. Since the parent of this node has been examined except the root node, which ensure the parent node has less than 2t 1 keys, the pushing up of one key wont make the parent full. This approach need only a single pass down the tree without need of any back-tracking. The main insert algorithm will rst check if the root node need split. If yes, it will create a new node, and set the root as the only child, then performs splitting. and set the new node as the root. After that, the algorithm will try to insert the key to the non-full node. 1: function B-TREE-INSERT(T, k) 2: rT 3: if r is full then 4: s CREAT E N ODE() 5: AP P EN D(CHILDREN (s), r) 6: B T REE SP LIT CHILD(s, 1) 7: rs 9

end if B T REE IN SERT N ON F U LL(r, k) return r end function The algorithm B T REE IN SERT N ON F U L assert that the node passed in is not full. If it is a leaf node, the new key is just inserted to the proper position based on its order. If it is a branch node. The algorithm nds a proper child node to which the new key will be inserted. If this child node is full, the splitting will be performed rstly. 1: procedure B-TREE-INSERT-NONFUL(T, k) 2: if T is leaf then 3: i1 4: while i LEN GT H(KEY S(T )) and k > KEY S(T )[i] do 5: ii+1 6: end while 7: IN SERT (KEY S(T ), i, k) 8: else 9: i LEN GT H(KEY S(T )) 10: while i > 1andk < KEY S(T )[i] do 11: ii1 12: end while 13: if CHILDREN (T )[i] is full then 14: B T REE SP LIT CHILD(T, i) 15: if k > KEY S(T )[i] then 16: ii+1 17: end if 18: end if 19: B T REE IN SERT N ON F U LL(CHILDREN (T )[i], k) 20: end if 21: end procedure Note that this algorithm is actually recursive. Consider B-tree typically has minimum degree t relative to magnetic disk structure, and it is balanced tree, Even small depth can support huge amount of data (with t = 10, maximum to 10 billion data can be stored in a B-tree with height of 10). Of course it is easy to eliminate the recursive call to improve the algorithm. In the below language specic implementations, Ill eliminate recursion in C++ program, and show the recursive version in Python program.
8: 9: 10: 11:

Insert implemented in C++ The main insert program in C++ examine if the root is full and performs splitting accordingly. Then it will call insert nonfull to do the further process. template<c l a s s K, int t> BTree<K, t > i n s e r t ( BTree<K, t > t r , K key ) { BTree<K, t > r o o t ( t r ) ; 10

i f ( r o o t > f u l l ( ) ) { BTree<K, t > s = new BTree<K, t > ( ) ; s>c h i l d r e n . push back ( r o o t ) ; s>s p l i t c h i l d ( 0 ) ; root = s ; } return i n s e r t n o n f u l l ( r o o t , key ) ; } The recursion is eliminated in insert nonfull function. If the current node is leaf, it will call ordered insert to insert the key to the correct position. If it is branch node, the program will nd the proper child tree and set it as the current node in next loop. Splitting is performed if the child tree is full. template<c l a s s K, int t> BTree<K, t > i n s e r t n o n f u l l ( BTree<K, t > t r , K key ) { typedef typename BTree<K, t > : : Keys Keys ; typedef typename BTree<K, t > : : C h i l d r e n C h i l d r e n ; BTree<K, t > r o o t ( t r ) ; while ( ! t r >l e a f ( ) ) { unsigned int i =0; while ( i < t r >k e y s . s i z e ( ) && t r >k e y s [ i ] < key ) ++i ; i f ( t r >c h i l d r e n [ i ]> f u l l ( ) ) { t r >s p l i t c h i l d ( i ) ; i f ( key > t r >k e y s [ i ] ) ++i ; } t r = t r >c h i l d r e n [ i ] ; } o r d e r e d i n s e r t ( t r >keys , key ) ; return r o o t ; } Where the ordered insert is dened as the following. template<c l a s s C o l l > void o r d e r e d i n s e r t ( C o l l& c o l l , typename C o l l : : v a l u e t y p e x ) { typename C o l l : : i t e r a t o r i t = c o l l . b e g i n ( ) ; while ( i t != c o l l . end ( ) && i t < x ) ++i t ; c o l l . insert ( it , x ) ; } For convenience, I dened auxiliary functions to convert a list of keys into the B-tree. template<c l a s s T> 11

T i n s e r t k e y (T t , typename T : : k e y t y p e x ) { return i n s e r t ( t , x ) ; } template<c l a s s I t e r a t o r , c l a s s T> T l i s t t o b t r e e ( I t e r a t o r f i r s t , I t e r a t o r l a s t , T t ) { return s t d : : accumulate ( f i r s t , l a s t , t , s t d : : p t r f u n ( i n s e r t k e y <T> ) ) ; } In order to print the result as human readable string, a recursive convert function is provided. template<c l a s s T> s t d : : s t r i n g b t r e e t o s t r (T t r ) { typename T : : Keys : : i t e r a t o r k ; typename T : : C h i l d r e n : : i t e r a t o r c ; std : : ostringstream s ; s<< ( ; i f ( t r >l e a f ( ) ) { k=t r >k e y s . b e g i n ( ) ; s<<k++; for ( ; k!= t r >k e y s . end ( ) ; ++k ) s<< , <<k ; } else { for ( k=t r >k e y s . b e g i n ( ) , c=t r >c h i l d r e n . b e g i n ( ) ; k!= t r >k e y s . end ( ) ; ++k , ++c ) s<<b t r e e t o s t r ( c)<< , <<k<< , ; s<<b t r e e t o s t r ( c ) ; } s<< ) ; return s . s t r ( ) ; } With all the above dened program, some simple test cases can be fed to verify the program. const char s s [ ] = {G , M , P , X , A , C , D , E , J , K , \ N , O , R , S , T , U , V , Y , Z } ; BTree<s t d : : s t r i n g , 2> t r 2 3 4=l i s t t o b t r e e ( s s , s s+s i z e o f ( s s ) / s i z e o f ( char ) , new BTree<s t d : : s t r i n g , 2 >); s t d : : cout<<234 t r e e o f ; s t d : : copy ( s s , s s+s i z e o f ( s s ) / s i z e o f ( char ) , s t d : : o s t r e a m i t e r a t o r <s t d : : s t r i n g >( s t d : : cout , , ) ) ; s t d : : cout<<\n<<b t r e e t o s t r ( t r 2 3 4)<< \n ; delete t r 2 3 4 ; 12

BTree<s t d : : s t r i n g , 3> t r = l i s t t o b t r e e ( s s , s s+s i z e o f ( s s ) / s i z e o f ( char ) , new BTree<s t d : : s t r i n g , 3 >); s t d : : cout<<Bt r e e with t=3 o f ; s t d : : copy ( s s , s s+s i z e o f ( s s ) / s i z e o f ( char ) , s t d : : o s t r e a m i t e r a t o r <s t d : : s t r i n g >( s t d : : cout , , ) ) ; s t d : : cout<<\n<<b t r e e t o s t r ( t r )<< \n ; delete t r ; Run these lines will generate the following result: 2-3-4 tree of G, M, P, X, A, C, D, E, J, K, N, O, R, S, T, U, V, Y, Z, (((A), C, (D)), E, ((G, J, K), M, (N, O)), P, ((R), S, (T), U, (V), X, (Y, Z))) B-tree with t=3 of G, M, P, X, A, C, D, E, J, K, N, O, R, S, T, U, V, Y, Z, ((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z)) Figure 4 shows the result.
E P

a. Insert result of a 2-3-4 tree,


D M P T

b. Insert result of a B-tree with minimum degree of 3. Figure 4: insert result

Insert implemented in Python Implement the above insertion algorithm in Python is straightforward, we change the index starts from 0 instead of 1. def B t r e e i n s e r t ( t r , key ) : # + d a t a parameter root = tr i f root . i s f u l l ( ) : s = BTreeNode ( r o o t . t , F a l s e ) 13

s . children . i n s e r t (0 , root ) s . s p l i t c h i l d (0) root = s B t r e e i n s e r t n o n f u l l ( r o o t , key ) return r o o t And the insertion to non-full node is implemented as the following. def B t r e e i n s e r t n o n f u l l ( t r , key ) : if tr . leaf : o r d e r e d i n s e r t ( t r . keys , key ) #d i s k w r i t e ( t r ) else : i = len ( t r . keys ) while i >0 and key < t r . k e y s [ i 1 ] : i = i 1 #d i s k r e a d ( t r . c h i l d r e n [ i ] ) if tr . children [ i ] . i s f u l l ( ) : tr . s p l i t c h i l d ( i ) i f key>t r . k e y s [ i ] : i = i +1 B t r e e i n s e r t n o n f u l l ( t r . c h i l d r e n [ i ] , key ) Where the function ordered insert function is used to insert an element to an ordered list. Since Python standard list dont support order information. The program is written as below. def o r d e r e d i n s e r t ( l s t , x ) : i = len ( l s t ) l s t . append ( x ) while i >0 and l s t [ i ]< l s t [ i 1 ] : ( l s t [ i 1] , l s t [ i ] ) = ( l s t [ i ] , l s t [ i 1]) i=i 1 For the array based collection, append on the tail is much more eective than insert in other position, because the later takes O(n) time, if the length of the collection is n. This program will rst append the new element at the end of the existing collection, then iterate from the last element to the rst one, and check if the current two elements next to each other are ordered. If not, these two elements will be swapped. For easily creating a B-tree from a list of keys, we can write a simple helper function. def l i s t t o B t r e e ( l , t=TREE 2 3 4 ) : t r = BTreeNode ( t ) for x in l : t r = B t r e e i n s e r t ( tr , x ) return t r

14

By default, this function will create a 2-3-4 tree, and user can specify the minimum degree as the second argument. The rst argument is a list of keys. This function will repeatedly insert every key into the B-tree which starts from an empty tree. In order to print the B-tree out for verication, an auxiliary printing function is provided. def B t r e e t o s t r ( t r ) : res = ( if tr . leaf : r e s += , . j o i n ( t r . k e y s ) else : fo r i in r a n g e ( l e n ( t r . k e y s ) ) : r e s+= B t r e e t o s t r ( t r . c h i l d r e n [ i ] ) + , + t r . k e y s [ i ] + , r e s += B t r e e t o s t r ( t r . c h i l d r e n [ l e n ( t r . k e y s ) ] ) r e s += ) return r e s After that, some smoke test cases can be use to verify the insertion program. c l a s s BTreeTest : def run ( s e l f ) : s e l f . test insert () def t e s t i n s e r t ( s e l f ) : l s t = [ G , M , P , X , A , C , D , E , J , K , \ N , O , R , S , T , U , V , Y , Z ] tr = l i s t to B tr e e ( l s t ) print B t r e e t o s t r ( t r ) print B t r e e t o s t r ( l i s t t o B t r e e ( l s t , 3 ) ) Run the test cases prints two dierent B-trees. They are identical to the C++ program outputs. (((A), C, (D)), E, ((G, J, K), M, (N, O)), P, ((R), S, (T), U, (V), X, (Y, Z))) ((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z))

3.3

Insert then x method

Another approach to implement B-tree insertion algorithm is just nd the position for the new key and insert it. Since such insertion may violate B-tree properties. We can then apply a xing procedure after that. If a leaf contains too much keys, we split it into 2 leafs and push a key up to the parent branch node. Of course this operation may cause the parent node violate the B-tree properties, so the algorithm need traverse from leaf to root to perform the xing. By using recursive implementation these xing method can also be realized from top to bottom. 1: function B-TREE-INSERT(T, k) 15

return F IX ROOT (RECU RSIV E IN SERT (T, k)) end function Where F IX ROOT examine if the root node contains too many keys, and do splitting if necessary. 1: function FIX-ROOT(T ) 2: if F U LL?(T ) then 3: T B T REE SP LIT (T ) 4: end if 5: return T 6: end function And the inner function IN SERT (T, k) will rst check if T is leaf node or branch node. It will do directly insertion for leaf and recursively do insertion for branch. 1: function RECURSIVE-INSERT(T, k) 2: if LEAF ?(T ) then 3: IN SERT (KEY S(T ), k) 4: return T 5: else 6: initialize empty arrays of k , k , c , c 7: i1 8: while i <= LEN GT H(KEY S(T )) and KEY S(T )[i] < k do 9: AP P EN D(k , KEY S(T )[i]) 10: AP P EN D(c , CHILDREN (T )[i]) 11: ii+1 12: end while 13: k KEY S(T )[i...LEN GT H(KEY S(T ))] 14: c CHILDREN (T )[i + 1...LEN GT H(CHILDREN (T ))]] 15: c CHILDREN (T )[i] 16: lef t (k , c ) 17: right (k , c ) 18: return M AKEBT REE(lef t, RECU RSIV EIN SERT (c, k), right) 19: end if 20: end function Figure 5 shows the branch case. The algorithm rst locates the position. for certain key ki , if the new key k to be inserted satisfy ki1 < k < ki , Then we need recursively insert k to child ci . This position divides the node into 3 parts, the left part, the child ci and the right part. The procedure M AKE B T REE take 3 arguments, which relative to the left part, the result after insert k to ci and right part. It tries to merge these 3 parts into a new B-tree branch node. However, insert key into a child may make this child violate the B-tree property if it exceed the limitation of the number of keys a node can have. M AKE B T REE will detect such situation and try to x the problem by splitting.
2: 3:

16

k, K[i-1]<k<K[i] insert to K[1] K[2] ... K[i-1] K[i] ... K[n]

C[1]

C[2]

...

C[i-1]

C[i]

C[i+1]

...

C[n]

C[n+1]

a. locate the child to insert,


K[1] K[2] ... K[i-1] k, K[i-1]<k<K[i] K[i] K[i+1] ... K[n]

recursive insert C[1] C[2] ... C[i-1] C[i] C[i+1] ... C[n+1]

b. recursive insert, Figure 5: Insert a key to a branch node

function MAKE-B-TREE(L, C, R) if F U LL?(C) then 3: return F IX F U LL(L, C, R) 4: else 5: T CREAT E N EW N ODE() 6: KEY S(T ) KEY S(L) + KEY S(R) 7: CHILDREN (T ) CHILDREN (L) + [C] + CHILDREN (R) 8: return T 9: end if 10: end function Where F IX F U LL just calls splitting process. 1: function FIX-FULL(L, C, R) 2: (C , K, C ) B T REE SP LIT (C) 3: T CREAT E N EW N ODE() 4: KEY S(T ) KEY S(L) + [K] + KEY S(R) 5: CHILDREN (T ) CHILDREN (L) + [C , C ] + CHILDREN (R) 6: return T 7: end function Note that splitting may push one extra key up to the parent node. However, even the push-up causes the violation of B-tree property, it will be recursively xed.
1: 2:

17

Insert implemented in Haskell Realize the above recursive algorithm in Haskell can implement this insert-xing program. The main program is provided as the following. i n s e r t : : (Ord a )= BTree a > a > BTree a > insert t r x = fixRoot $ i n s t r x It will just call an auxiliary function ins then examine and x the root node if contains too many keys. import q u a l i f i e d Data . List a s L . . . i n s : : (Ord a ) = BTree a > a > BTree a > i n s ( Node ks [ ] t ) x = Node (L . i n s e r t x ks ) [ ] t i n s ( Node ks c s t ) x = make ( ks , cs ) ( i n s c x ) ( ks , cs ) where ( ks , ks ) = L . partition (<x ) ks ( cs , ( c : cs ) ) = L . splitAt ( length ks ) c s The ins function uses pattern matching to handle the two dierent cases. If the node to be inserted is leaf, it will call insert function dened in Haskell standard library, which can insert the new key x into the proper position to keep the order of the keys. If the node to be inserted is a branch node, the program will recursively insert the key to the child which has the range of keys cover x. After that, it will call make function to combine the result together as a new node. the examine and xing are performed also by make function. The function xRoot rst check if the root node contains too many keys, if it exceeds the limit, splitting will be applied. The split result will be used to make a new node, so the total height of the tree increases. f i x R o o t : : BTree a > BTree a f i x R o o t ( Node [ ] [ t r ] ) = t r s h r i n k h e i g h t f i x R o o t t r = i f f u l l t r then Node [ k ] [ c1 , c2 ] ( d e g r e e t r ) else tr where ( c1 , k , c2 ) = s p l i t t r The following is the implementation of make function. make : : ( [ a ] , [ BTree a ] ) > BTree a > ( [ a ] , [ BTree a ] ) > BTree a make ( ks , cs ) c ( ks , cs ) | f u l l c = f i x F u l l ( ks , cs ) c ( ks , cs ) | otherwise = Node ( ks ++ks ) ( cs ++[ c]++cs ) ( d e g r e e c ) While xFull are given like below.

18

f i x F u l l : : ( [ a ] , [ BTree a ] ) > BTree a > ( [ a ] , [ BTree a ] ) > BTree a f i x F u l l ( ks , cs ) c ( ks , cs ) = Node ( ks ++[k]++ks ) ( cs ++[ c1 , c2]++cs ) ( d e g r e e c ) where ( c1 , k , c2 ) = s p l i t c In order to print B-tree content out, an auxiliary function toString is provided to convert a B-tree to string. t o S t r i n g : : (Show a )= >BTree a > String t o S t r i n g ( Node ks [ ] ) = ( ++(L . i n t e r c a l a t e , (map show ks ))++ ) t o S t r i n g t r = ( ++(t o S t r ( k e y s t r ) ( c h i l d r e n t r ))++ ) where t o S t r ( k : ks ) ( c : c s ) = ( t o S t r i n g c)++ , ++(show k)++ , ++(t o S t r ks c s ) toStr [ ] [ c ] = toString c With all the above denition, the insertion program can be veried with some simple test cases. l i s t T o B T r e e : : ( Ord a )= a]>Int>BTree a >[ l i s t T o B T r e e l s t t = f o l d l i n s e r t ( empty t ) l s t t e s t I n s e r t = do putStrLn $ t o S t r i n g $ l i s t T o B T r e e GMPXACDEJKNORSTUVYZ 3 putStrLn $ t o S t r i n g $ l i s t T o B T r e e GMPXACDEJKNORSTUVYZ 2 Run testInsert will generate the following result. ((A, C, D, E), G, (J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z)) (((A), C, (D)), E, ((G, J, K), M, (N)), O, ((P), R, (S), T, (U), V, (X, Y, Z))) Compare the results output by C++ or Python program with this one, as shown in gure 6 we can found that there are dierent points. However, the B-tree built by Haskell program is still valid because all B-tree properties are satised. The main reason for this dierence is because of the approaching change. Insert implemented in Scheme/Lisp The main function for insertion in Scheme/Lisp is given as the following. ( define ( btreeinsert tr x t ) ( define ( ins tr x) ( i f ( leaf ? tr ) ( orderedinsert tr x) ; ; l e a f ( let (( res ( partitionby tr x )) ( l e f t ( car res )) ( c ( cadr r e s ) ) ( r i g h t ( caddr r e s ) ) ) 19

a. Insert result of a 2-3-4 tree (insert-xing method),


G M P T

b. Insert result of a B-tree with minimum degree of 3 (insert-xing method). Figure 6: insert and xing results

( makebtree l e f t ( i n s c x ) r i g h t t ) ) ) ) ( fixroot ( ins tr x) t )) The program simply calls an internal function and performs xing on it. The internal ins function examine if the current node is a leaf node. In case the node is a leaf, it only contains keys, we can located the position and insert the new key there. Otherwise, we partition the node into 3 parts, the left part, the child which the recursive insertion will performed on, and the right part. The program will do the recursive insertion and then combine these three part to a new node. xing will be happened during the combination. Function ordered-insert can help to traverse a ordered list and insert the new key to proper position as below. ( define ( orderedinsert l s t x) ( d e f i n e ( insertby lessp l s t x ) ( i f ( null ? l s t ) ( list x) ( i f ( lessp x ( car l s t ) ) ( cons x l s t ) ( cons ( c a r l s t ) ( i n s e r t b y l e s s p ( c d r l s t ) x ) ) ) ) ) ( i f ( string ? x ) ( i n s e r t b y string <? l s t x ) ( insertby < l s t x ) ) ) In order to deal with B-trees with key types both as string and as number, we abstract the less-than function as a parameter and pass it to an internal function. Function partition-by uses a similar approach. 20

( define ( partitionby tr x) ( d e f i n e ( partby pred t r x ) ( i f (= ( length t r ) 1 ) ( l i s t () ( car tr ) ( ) ) ( i f ( pred ( c a d r t r ) x ) ( l e t ( ( r e s ( partby pred ( cddr t r ) x ) ) ( l e f t ( car res )) ( c ( cadr r e s ) ) ( r i g h t ( caddr r e s ) ) ) ( l i s t ( c o ns p a ir ( c a r t r ) ( c a d r t r ) l e f t ) c r i g h t ) ) ( l i s t ( ) ( car t r ) ( cdr t r ) ) ) ) ) ( i f ( string ? x ) ( partby string <? t r x ) ( partby < t r x ) ) ) Where cons-pair is a helper function which can put a key, a child in front of a B-tree. ( d e f i n e ( c o n s pair c k l s t ) ( cons c ( cons k l s t ) ) ) In order to xing the root of a B-tree, which contains too many keys, a x-root function is provided. ( d e f i n e ( f u l l ? t r t ) ; ; t : minimum d e g r e e (> ( length ( k e y s t r ) ) ( ( 2 t ) 1 ) ) ) ( define ( fixroot tr t ) ( cond ( ( f u l l ? t r t ) ( s p l i t t r t ) ) ( else tr ))) When we turn the recursive insertion result to a new node, we need do xing if the result node contains too many keys. ( d e f i n e ( makebtree l c r t ) ( cond ( ( f u l l ? c t ) ( f i x f u l l l c r t ) ) ( e l s e ( append l ( cons c r ) ) ) ) ) ( define ( fixfull l c r t ) (append l ( s p l i t c t ) r ) ) With all above facilities, we can test the program for verication. In order to build the B-tree easily from a list of keys, some simple helper functions are given. ( d e f i n e ( l i s t >b t r e e l s t t ) ( f o l d l e f t ( lambda ( t r x ) ( b t r e e i n s e r t t r x t ) ) ( d e f i n e ( str> s l i s t s ) 21 () l s t ))

( i f ( stringnull ? s ) () ( cons ( s t r i n g h e a d s 1 ) ( str> s l i s t ( s t r i n g t a i l s 1 ) ) ) ) ) A same simple test case as the Haskell one is feed to our program. ( define ( testinsert ) ( l i s t >b t r e e ( str> s l i s t GMPXACDEJKNORSTUVYZBFHIQW ) 3 ) ) Evaluate test-insert function can get a B-tree. ( ( ( A B ) C ( D E F ) G ( H I J K ) ) M ( ( N O ) P ( Q R S ) T ( U V ) W ( X Y Z ) ) ) It is as same as the result output by the Haskell program.

Deletion

Deletion is another basic operation of B-tree. Delete a key from a B-tree may cause violating of B-tree balance properties, that a node cant contains too few keys (no less than t 1 keys, where t is minimum degree). Similar to the approaches for insertion, we can either do some preparation so that the node from where the key will be deleted contains enough keys; or do some xing after the deletion if the node has too few keys.

4.1

Merge before delete method

In textbook[1], the delete algorithm is given as algorithm description. The pseudo code is left as exercises. The description can be used as a good reference when writing the pseudo code. 4.1.1 Merge before delete algorithm implemented imperatively

The rst case is the trivial, if the key k to be deleted can be located in node x and x is a leaf node. we can directly remove k from x. Note that this is a terminal case. For most B-trees which have not only a leaf node as the root. The program will rst examine non-leaf nodes. The second case states that, the key k can be located in node x, however, x isnt a leaf node. In this case, there are 3 sub cases. If the child node y precedes k contains enough keys (more than t). We replace k in node x with k , which is the predecessor of k in child y. And recursively remove k from y. The predecessor of k can be easily located as the last key of child y. If y doesnt contains enough keys, while the child node z follows k contains more than t keys. We replace k in node x with k , which is the successor of k in child z. And recursively remove k from z. The successor of k can be easily located as the rst key of child z. 22

Otherwise, if neither y, nor z contains enough keys, we can merge y, k and z into one new node, so that this new node contains 2t 1 keys. After that, we can then recursively do the removing. Note that after merge, if the current node doesnt contain any keys, which means k is the only key in x, y and z are the only two children of x. we need shrink the tree height by one. The case 2 is illustrated as in gure 7, 8, and 9.

Figure 7: case 2a. Replace and delete from predecessor. Note that although we use recursive way to delete keys in case 2, the recursion can be turned into pure imperative way. Well show such program in C++ implementation. the last case states that, if k cant be located in node x, the algorithm need try to nd a child node ci of x, so that sub-tree ci may contains k. Before the deletion is recursively applied in ci , we need be sure that there are at least t keys in ci . If there are not enough keys, we need do the following adjustment. We check the two sibling of ci , which are ci1 and ci+1 . If either one of them contains enough keys (at least t keys), we move one key from x down to ci , and move one key from the sibling up to x. Also we need move the relative child from the sibling to ci . This operation makes ci contains enough keys OK for deletion. we can next try to delete k from ci recursively. 23

Figure 8: case 2b. Replace and delete from successor.

Figure 9: case 2c. Merge and delete.

24

In case neither one of the two siblings contains enough keys, we then merge ci , a key from x, and either one of the sibling into a new node, and do the deletion on this new node. Case 3 is illustrated in gure 10, 11.

Figure 10: case 3a. Borrow from left sibling. By implementing the above 3 cases into pseudo code, the B-tree delete algorithm can be given as the following. First there are some auxiliary functions to do some simple test and operations on a B-tree. 1: function CAN-DEL(T ) 2: return number of keys of T t 3: end function Function CAN DEL test if a B-tree node contains enough keys (no less than t keys). 1: procedure MERGE-CHILDREN(T, i) Merge children i and i + 1 2: x CHILDREN (T )[i] 3: y CHILDREN (T )[i + 1] 4: AP P EN D(KEY S(x), KEY S(T )[i]) 5: CON CAT (KEY S(x), KEY S(y)) 6: CON CAT (CHILDREN (x), CHILDREN (y) 25

Figure 11: case 3b. Borrow Merge and delete.

26

REM OV E(KEY S(T ), i) REM OV E(CHILDREN (T ), i + 1) end procedure Procedure M ERGE CHILDREN merges the i-th child, the i-th key, and i + 1-th child of node T into a new child, and remove the i-th key and i + 1-th child after merging. With these helper functions, the main algorithm of B-tree deletion is described as below. 1: function B-TREE-DELETE(T, k) 2: i1 3: while i <= LEN GT H(KEY S(T )) do 4: if k = KEY S(T )[i] then 5: if T is leaf then case 1 6: REM OV E(KEY S(T ), k) 7: else case 2 8: if CAN DEL(CHILDREN (T )[i]) then case 2a 9: KEY S(T )[i] LAST KEY (CHILDREN (T )[i]) 10: BT REEDELET E(CHILDREN (T )[i], KEY S(T )[i]) 11: else if CAN DEL(CHILDREN (T )[i + 1]) then case 2b 12: KEY S(T )[i] F IRST KEY (CHILDREN (T )[i + 1]) 13: BT REEDELET E(CHILDREN (T )[i+1], KEY S(T )[i]) 14: else case 2c 15: M ERGE CHILDREN (T, i) 16: B T REE DELET E(CHILDREN (T )[i], k) 17: if KEY S(T ) = N IL then 18: T CHILDREN (T )[i] Shrinks height 19: end if 20: end if 21: end if 22: return T 23: else if k < KEY S(T )[i] then 24: BREAK 25: else 26: ii+1 27: end if 28: end while
7: 8: 9: 29: 30: 31: 32: 33: 34: 35:

if T is leaf then return T k doesnt exist in T at all end if if not CAN DEL(CHILDREN (T )[i]) then case 3 if i > 1 and CAN DEL(CHILDREN (T )[i 1]) then case 3a: left sibling IN SERT (KEY S(CHILDREN (T )[i]), KEY S(T )[i 1]) KEY S(T )[i1] P OP BACK(KEY S(CHILDREN (T )[i

27

1]))
36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60:

if CHILDREN (T )[i] isnt leaf then c P OP BACK(CHILDREN (CHILDREN (T )[i 1])) IN SERT (CHILDREN (CHILDREN (T )[i]), c) end if else if i <= LEN GT H(CHILDREN (T )) and CAN DEL(CHILDREN (T )[i+ 1] then case 3a: right sibling AP P EN D(KEY S(CHILDREN (T )[i]), KEY S(T )[i]) KEY S(T )[i] P OP F RON T (KEY S(CHILDREN (T )[i + 1])) if CHILDREN (T )[i] isnt leaf then c P OP F RON T (CHILDREN (CHILDREN (T )[i+1])) AP P EN D(CHILDREN (CHILDREN (T )[i]), c) end if else case 3b if i > 1 then M ERGE CHILDREN (T, i 1) else M ERGE CHILDREN (T, i) end if end if end if B T REE DELET E(CHILDREN (T )[i], k) recursive delete if KEY S(T ) = N IL then Shrinks height T CHILDREN (T )[1] end if return T end function Merge before deletion algorithm implemented in C++

4.1.2

The C++ implementation given here isnt simply translate the above pseudo code into C++. The recursion can be eliminated in a pure imperative program. In order to simplify some B-tree node operation, some auxiliary member functions are added to the B-tree node class denition. template<c l a s s K, int t> struct BTree{ // . . . // merge c h i l d r e n [ i ] , k e y s [ i ] , and c h i l d r e n [ i +1] t o one node void m e r g e c h i l d r e n ( int i ) { BTree<K, t > x = c h i l d r e n [ i ] ; BTree<K, t > y = c h i l d r e n [ i + 1 ] ; x>k e y s . push back ( k e y s [ i ] ) ; c o n c a t ( x>keys , y>k e y s ) ; c o n c a t ( x>c h i l d r e n , y>c h i l d r e n ) ; k e y s . e r a s e ( k e y s . b e g i n ()+ i ) ; 28

c h i l d r e n . e r a s e ( c h i l d r e n . b e g i n ()+ i +1); y>c h i l d r e n . c l e a r ( ) ; delete y ; } k e y t y p e r e p l a c e k e y ( int i , k e y t y p e key ) { k e y s [ i ]= key ; return key ; } bool can remove ( ) { return k e y s . s i z e ( ) >=t ; } // . . . Function replace key can update the i-th key of a node with a new value. Typically, this new value is pulled from a child node as described in deletion algorithm. It will return the new value. Function can remove will test if a node contains enough keys for further deletion. Function merge children can merge the i-th child, the i-th key, and the i + 1-th children into one node. This operation is reverse operation of splitting, it can double the keys of a node, so that such adjustment can ensure a node has enough keys for further deleting. Note that, unlike the other languages equipped with GC, in C++ program, the memory must be released after merging. This function uses concat function to concatenate two collections. It is dened as the following. template<c l a s s C o l l > void c o n c a t ( C o l l& x , C o l l& y ) { s t d : : copy ( y . b e g i n ( ) , y . end ( ) , s t d : : i n s e r t i t e r a t o r <C o l l >(x , x . end ( ) ) ) ; } With these helper functions, the main program of B-tree deleting is given as below. template<c l a s s T> T d e l (T t r , typename T : : k e y t y p e key ) { T r o o t ( t r ) ; while ( ! t r >l e a f ( ) ) { unsigned int i = 0 ; bool l o c a t e d ( f a l s e ) ; while ( i < t r >k e y s . s i z e ( ) ) { i f ( key == t r >k e y s [ i ] ) { l o c a t e d = true ; i f ( t r >c h i l d r e n [ i ]>can remove ( ) ) { // c a s e 2a key = t r >r e p l a c e k e y ( i , t r >c h i l d r e n [ i ]>k e y s . back ( ) ) ; t r >c h i l d r e n [ i ]>k e y s . pop back ( ) ; 29

t r = t r >c h i l d r e n [ i ] ; } e l s e i f ( t r >c h i l d r e n [ i +1]>can remove ( ) ) { // c a s e 2 b key = t r >r e p l a c e k e y ( i , t r >c h i l d r e n [ i +1]>k e y s . f r o n t ( ) ) ; t r >c h i l d r e n [ i +1]>k e y s . e r a s e ( t r >c h i l d r e n [ i +1]>k e y s . b e g i n ( ) ) ; t r = t r >c h i l d r e n [ i + 1 ] ; } e l s e { // c a s e 2 c t r >m e r g e c h i l d r e n ( i ) ; i f ( t r >k e y s . empty ( ) ) { // s h r i n k s h e i g h t T temp = t r >c h i l d r e n [ 0 ] ; t r >c h i l d r e n . c l e a r ( ) ; delete t r ; t r = temp ; } } break ; } e l s e i f ( key > t r >k e y s [ i ] ) i ++; else break ;

} if ( located ) continue ; i f ( ! t r >c h i l d r e n [ i ]>can remove ( ) ) { // c a s e 3 i f ( i >0 && t r >c h i l d r e n [ i 1]>can remove ( ) ) { // c a s e 3a : l e f t s i b l i n g t r >c h i l d r e n [ i ]>k e y s . i n s e r t ( t r >c h i l d r e n [ i ]>k e y s . b e g i n ( ) , t r >k e y s [ i 1 ] ) ; t r >k e y s [ i 1] = t r >c h i l d r e n [ i 1]>k e y s . back ( ) ; t r >c h i l d r e n [ i 1]>k e y s . pop back ( ) ; i f ( ! t r >c h i l d r e n [ i ]> l e a f ( ) ) { t r >c h i l d r e n [ i ]> c h i l d r e n . i n s e r t ( t r >c h i l d r e n [ i ]> c h i l d r e n . b e g i n ( ) , t r >c h i l d r e n [ i 1]> c h i l d r e n . back ( ) t r >c h i l d r e n [ i 1]> c h i l d r e n . pop back ( ) ; } } e l s e i f ( i <t r >c h i l d r e n . s i z e ( ) && t r >c h i l d r e n [ i +1]>can remove ( ) ) { // c a s e 3a : r i g h t s i b l i n g t r >c h i l d r e n [ i ]>k e y s . push back ( t r >k e y s [ i ] ) ; t r >k e y s [ i ] = t r >c h i l d r e n [ i +1]>k e y s . f r o n t ( ) ; t r >c h i l d r e n [ i +1]>k e y s . e r a s e ( t r >c h i l d r e n [ i +1]>k e y s . b e g i n ( ) ) ; i f ( ! t r >c h i l d r e n [ i ]> l e a f ( ) ) { t r >c h i l d r e n [ i ]> c h i l d r e n . push back ( t r >c h i l d r e n [ i +1]> c h i l d r e n . f r o n t r >c h i l d r e n [ i +1]> c h i l d r e n . e r a s e ( t r >c h i l d r e n [ i +1]> c h i l d r e n . b e g i n 30

} } else { i f ( i >0) t r >m e r g e c h i l d r e n ( i 1); else t r >m e r g e c h i l d r e n ( i ) ; } } t r = t r >c h i l d r e n [ i ] ; } t r >k e y s . e r a s e ( remove ( t r >k e y s . b e g i n ( ) , t r >k e y s . end ( ) , key ) , t r >k e y s . end ( ) ) ; i f ( r o o t >k e y s . empty ( ) ) { // s h r i n k s h e i g h t T temp = r o o t >c h i l d r e n [ 0 ] ; r o o t >c h i l d r e n . c l e a r ( ) ; delete r o o t ; r o o t = temp ; } return r o o t ; } Please note how the recursion be eliminated. The main loop terminates only if the current node which is examined is a leaf. Otherwise, the program will go through the B-tree along the path which may contains the key to be deleted, and do proper adjustment including borrowing keys from other nodes, or merging to make the candidate nodes along this path all have enough keys to perform deleting. In order to verify this program, a quick and simple parsing function which can turn a B-tree description string into a B-tree is provided. Error handling of parsing is omitted for illusion purpose. template<c l a s s T> T p a r s e ( s t d : : s t r i n g : : i t e r a t o r& f i r s t , s t d : : s t r i n g : : i t e r a t o r l a s t ){ T t r = new T; ++f i r s t ; // ( while ( f i r s t != l a s t ) { i f ( f i r s t== ( ) { // c h i l d t r >c h i l d r e n . push back ( p a r s e <T>( f i r s t , l a s t ) ) ; } e l s e i f ( f i r s t == , | | f i r s t == ) ++f i r s t ; // s k i p d e l i m i n a t o r e l s e i f ( f i r s t == ) ) { ++f i r s t ; return t r ; } e l s e { // key 31

typename T : : k e y t y p e key ; while ( f i r s t != , && f i r s t != ) ) key+= f i r s t ++; t r >k e y s . push back ( key ) ; } } // s h o u l d n e v e r run h e r e return 0 ; } template<c l a s s T> T s t r t o b t r e e ( s t d : : s t r i n g s ) { std : : s t r i n g : : i t e r a t o r f i r s t ( s . begin ( ) ) ; return p a r s e <T>( f i r s t , s . end ( ) ) ; } After that, the testing can be performed as below. void t e s t d e l e t e ( ) { s t d : : cout<< t e s t d e l e t e . . . \ n ; const char s= ( ( ( A, B) , C, (D, E , F ) , G, ( J , K, L ) , M, (N, O) ) , P , ( (Q, R, S ) , T, (U, V) , X, (Y, Z ) ) ) ; typedef BTree<s t d : : s t r i n g , 3> BTr ; BTr t r = s t r t o b t r e e <BTr>( s ) ; s t d : : cout<< b e f o r e d e l e t e : \ n<<b t r e e t o s t r ( t r )<<\n ; const char ks [ ] = { F , M , G , D , B , U } ; for ( unsigned int i =0; i <s i z e o f ( ks ) / s i z e o f ( char ) ; ++i ) t r= t e s t d e l ( t r , ks [ i ] ) ; delete t r ; } template<c l a s s T> T t e s t d e l (T t r , typename T : : k e y t y p e key ) { s t d : : cout<< d e l e t e <<key<<= >\n ; = t r = d e l ( t r , key ) ; s t d : : cout<<b t r e e t o s t r ( t r )<<\n ; return t r ; } Run test delete will generate the below result. test delete... before delete: (((A, B), C, (D, E, F), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) delete F==> (((A, B), C, (D, E), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) delete M==> (((A, B), C, (D, E), G, (J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) 32

delete G==> (((A, B), C, (D, E, J, delete D==> ((A, B), C, (E, J, K), delete B==> ((A, C), E, (J, K), L, delete U==> ((A, C), E, (J, K), L,

K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z)) (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z)) (N, O), P, (Q, R), S, (T, V), X, (Y, Z))

Figure 12, 13, and 14 show this deleting test process step by step. The nodes modied are shaded. The rst 5 steps are as same as the example shown in textbook[1] gure 18.8.
P

a. A B-tree before performing deleting;


P

b. After delete key F, case 1; Figure 12: Result of B-tree deleting program (1)

4.1.3

Merge before deletion algorithm implemented in Python

In Python implementation, detailed memory management can be handled by GC. Similar as the C++ program, some auxiliary member functions are added to B-tree node denition. c l a s s BTreeNode : #. . . def m e r g e c h i l d r e n ( s e l f , i ) : #merge c h i l d r e n [ i ] and c h i l d r e n [ i +1] by p u s h i n g k e y s [ i ] down s e l f . c h i l d r e n [ i ] . k e y s += [ s e l f . k e y s [ i ] ] + s e l f . c h i l d r e n [ i + 1 ] . k e y s 33

c. After delete key M, case 2a;


P

d. After delete key G, case 2c; Figure 13: Result of B-tree deleting program (2)

e. After delete key D, case 3b, and height is shrunk;


E L P T X

f. After delete key B, case 3a, borrow from right sibling;


E L P S X

g. After delete key U, case 3a, borrow from left sibling; Figure 14: Result of B-tree deleting program (3)

34

s e l f . c h i l d r e n [ i ] . c h i l d r e n += s e l f . c h i l d r e n [ i + 1 ] . c h i l d r e n s e l f . k e y s . pop ( i ) s e l f . c h i l d r e n . pop ( i +1) def r e p l a c e k e y ( s e l f , i , key ) : s e l f . k e y s [ i ] = key return key def can remove ( s e l f ) : return l e n ( s e l f . k e y s ) >= s e l f . t The member function names are same with the C++ program, so that the meaning for each of them can be referred in previous sub section. In contrast to the C++ program, a recursion approach similar to the pseudo code is used in this Python program. def B t r e e d e l e t e ( t r , key ) : i = len ( t r . keys ) while i >0: i f key == t r . k e y s [ i 1 ] : i f t r . l e a f : # c a s e 1 i n CLRS t r . k e y s . remove ( key ) #d i s k w r i t e ( t r ) e l s e : # c a s e 2 i n CLRS i f t r . c h i l d r e n [ i 1 ] . can remove ( ) : # c a s e 2a key = t r . r e p l a c e k e y ( i 1, t r . c h i l d r e n [ i 1 ] . k e y s [ 1 ] ) B t r e e d e l e t e ( t r . c h i l d r e n [ i 1] , key ) e l i f t r . c h i l d r e n [ i ] . can remove ( ) : # c a s e 2 b key = t r . r e p l a c e k e y ( i 1, t r . c h i l d r e n [ i ] . k e y s [ 0 ] ) B t r e e d e l e t e ( t r . c h i l d r e n [ i ] , key ) else : # case 2c t r . m e r g e c h i l d r e n ( i 1) B t r e e d e l e t e ( t r . c h i l d r e n [ i 1] , key ) i f t r . keys ==[]: # t r e e s h r i n k s in h e i g h t t r = t r . c h i l d r e n [ i 1] return t r e l i f key > t r . k e y s [ i 1 ] : break else : i = i 1 # case 3 if tr . leaf : return t r #key doesn t e x i s t a t a l l i f not t r . c h i l d r e n [ i ] . can remove ( ) : i f i >0 and t r . c h i l d r e n [ i 1 ] . can remove ( ) : # l e f t s i b l i n g t r . c h i l d r e n [ i ] . k e y s . i n s e r t ( 0 , t r . k e y s [ i 1]) t r . k e y s [ i 1] = t r . c h i l d r e n [ i 1 ] . k e y s . pop ( ) 35

i f not t r . c h i l d r e n [ i ] . l e a f : t r . c h i l d r e n [ i ] . c h i l d r e n . i n s e r t ( 0 , t r . c h i l d r e n [ i 1 ] . c h i l d r e n . po e l i f i <l e n ( t r . c h i l d r e n ) and t r . c h i l d r e n [ i + 1 ] . can remove ( ) : #r i g h t s i b l t r . c h i l d r e n [ i ] . k e y s . append ( t r . k e y s [ i ] ) t r . k e y s [ i ]= t r . c h i l d r e n [ i + 1 ] . k e y s . pop ( 0 ) i f not t r . c h i l d r e n [ i ] . l e a f : t r . c h i l d r e n [ i ] . c h i l d r e n . append ( t r . c h i l d r e n [ i + 1 ] . c h i l d r e n . pop ( 0 else : # case 3b i f i >0: t r . m e r g e c h i l d r e n ( i 1) else : tr . merge children ( i ) B t r e e d e l e t e ( t r . c h i l d r e n [ i ] , key ) i f t r . keys ==[]: # t r e e s h r i n k s in h e i g h t tr = tr . children [ 0 ] return t r In order to verify the deletion program, similar test cases are fed to the function.

def t e s t d e l e t e ( ) : print t e s t d e l e t e t = 3 t r = BTreeNode ( t , F a l s e ) t r . k e y s =[P ] t r . c h i l d r e n =[BTreeNode ( t , F a l s e ) , BTreeNode ( t , F a l s e ) ] t r . c h i l d r e n [ 0 ] . k e y s =[C , G , M ] t r . c h i l d r e n [ 0 ] . c h i l d r e n =[BTreeNode ( t ) , BTreeNode ( t ) , BTreeNode ( t ) , BTreeN t r . c h i l d r e n [ 0 ] . c h i l d r e n [ 0 ] . k e y s =[A , B ] t r . c h i l d r e n [ 0 ] . c h i l d r e n [ 1 ] . k e y s =[D , E , F ] t r . c h i l d r e n [ 0 ] . c h i l d r e n [ 2 ] . k e y s =[J , K , L ] t r . c h i l d r e n [ 0 ] . c h i l d r e n [ 3 ] . k e y s =[N , O ] t r . c h i l d r e n [ 1 ] . k e y s =[T , X ] t r . c h i l d r e n [ 1 ] . c h i l d r e n =[BTreeNode ( t ) , BTreeNode ( t ) , BTreeNode ( t ) ] t r . c h i l d r e n [ 1 ] . c h i l d r e n [ 0 ] . k e y s =[Q , R , S ] t r . c h i l d r e n [ 1 ] . c h i l d r e n [ 1 ] . k e y s =[U , V ] t r . c h i l d r e n [ 1 ] . c h i l d r e n [ 2 ] . k e y s =[Y , Z ] print B t r e e t o s t r ( t r ) l s t = [ F , M , G , D , B , U ] reduce ( t e s t d e l , lst , tr ) def t e s t d e l ( t r , key ) : print d e l e t e , key t r = B t r e e d e l e t e ( t r , key ) print B t r e e t o s t r ( t r ) return t r

36

In this test case, the B-tree is constructed manually. It is identical to the B-tree built in C++ deleting test case. Run the test function will generate the following result. test delete (((A, B), C, (D, E, F), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) delete F (((A, B), C, (D, E), G, (J, K, L), M, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) delete M (((A, B), C, (D, E), G, (J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) delete G (((A, B), C, (D, E, J, K), L, (N, O)), P, ((Q, R, S), T, (U, V), X, (Y, Z))) delete D ((A, B), C, (E, J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z)) delete B ((A, C), E, (J, K), L, (N, O), P, (Q, R, S), T, (U, V), X, (Y, Z)) delete U ((A, C), E, (J, K), L, (N, O), P, (Q, R), S, (T, V), X, (Y, Z)) This result is as same as the one output by C++ program.

4.2

Delete and x method

From previous sub-sections, we see how complex is the deletion algorithm, There are several cases, and in each case, there are sub cases to deal. Another approach to design the deleting algorithm is a kind of delete-then-x way. It is similar to the insert-then-x strategy. When we need delete a key from a B-tree, we rstly try to locate which node this key is contained. This will be a traverse process from the root node towards leaves. We start from root node, If the key doesnt exist in the node, well traverse deeper and deeper until we rich a node. If this node is a leaf node, we can remove the key directly, and then examine if the deletion makes the node contains too few keys to maintain the B-tree balance properties. If it is a branch node, removing the key will break the node into two parts, we need merge them together. The merging is a recursive process which can be shown in gure 15. When do merging, if the two nodes are not leaves, we merge the keys together, and recursively merge the last child of the left part and the rst child of the right part as one new child node. Otherwise, if they are leaves, we merely put all keys together. Till now, we do the deleting in straightforward way. However, deleting will decrease the number of keys of a node, and it may result in violating the Btree balance properties. The solution is to perform a xing along the path we traversed from root. When we do recursive deletion, the branch node is broken into 3 parts. The left part contains all keys less than k, say k1 , k2 , ..., ki1 , and children 37

Figure 15: Delete a key from a branch node. Removing ki breaks the node into 2 parts, left part and right part. Merging these 2 parts is a recursive process. When the two parts are leaves, the merging terminates.

38

Figure 16: Denote ci as the result of recursively deleting key k, from child ci , we should do xing when making the left part, ci and right part together to a new node.

c1 , c2 , ..., ci1 , the right part contains all keys greater than k, say ki , ki+1 , ..., kn+1 , and children ci+1 , ci+2 , ..., cn+1 , the child ci which recursive deleting applied becomes ci . We need make these 3 parts to a new node as shown in gure 16. At this time point, we can examine if ci contains enough keys, it the number of keys is to less (less than t 1, but not t in contrast to merge and delete approach), we can either borrow a key-child pair from left part or right part, and do a inverse operation of splitting. Figure 17 shows an example of borrow from left part. In case both left part and right part is empty, we can simply push ci up. 4.2.1 Delete and x algorithm implemented functionally

By summarize all above analysis, we can draft the delete and x algorithm. 1: function B-TREE-DELETE(T, k) 2: return F IX ROOT (DEL(T, k)) 3: end function 4: function DEL(T, k) 5: if CHILDREN (T ) = N IL then leaf node 6: DELET E(KEY S(T ), k) 7: return T 8: else branch node

39

Figure 17: Borrow a key-child pair from left part and un-split to a new child.

n LEN GT H(KEY S(T )) i LOW ER BOU N D(KEY S(T ), k) if KEY S(T )[i] = k then 12: kl KEY S(T )[1, ..., i 1] 13: kr KEY S(T )[i + 1, ..., n] 14: cl CHILDREN (T )[1, ..., i] 15: cr CHILDREN (T )[i + 1, ..., n + 1] 16: return M ERGE(CREAT E B T REE(kl , cl ), CREAT E B T REE(kr , cr )) 17: else 18: kl KEY S(T )[1, ..., i 1] 19: kr KEY S(T )[i, ..., n] 20: c CHILDREN (T )[i] 21: cl CHILDREN (T )[1, ..., i 1] 22: cr CHILDREN (T )[i + 1, ..., n + 1] 23: return M AKE((kl , cl ), c, (kr , cr )) 24: end if 25: end if 26: end function The main delete function will call an internal DEL function to performs the work, after that, it will apply F IX ROOT to check if need to shrink the tree height. So the F IX ROOT function we dened in insertion section should be
9: 10: 11:

40

updated as the following. 1: function FIX-ROOT(T ) 2: if KEY S(T ) = N IL then Single child, shrink the height 3: T CHILDREN (T )[1] 4: else if F U LL?(T ) then 5: T B T REE SP LIT (T ) 6: end if 7: return T 8: end function For the recursive merging, the algorithm is given as below. The left part and right part are passed as arguments. If they are leaves, we just put all keys together. Otherwise, we recursively merge the last child of left and the rst child of right to a new child, and make this new merged child and the other two parts it breaks into a new node. 1: function MERGE(L, R) 2: if L, R are leaves then 3: T CREAT E N EW N ODE() 4: KEY S(T ) KEY S(L) + KEY S(R) 5: return T 6: else 7: mgetsLEN GT H(KEY S(L)) 8: ngetsLEN GT H(KEY S(R)) 9: kl KEY S(L) 10: kr KEY S(R) 11: cl CHILDREN (L)[1, ..., m 1] 12: cr CHILDREN (R)[2, ..., n] 13: c M ERGE(CHILDREN (L)[m], CHILDREN (R)[1]) 14: return M AKE B T REE((kl , cl ), c, (kr , cr )) 15: end if 16: end function In order to make the three parts, the left L, the right R and the child ci into a node, we need examine if ci contains enough keys, together with the process of ensure it contains not too much keys during insertion, we updated the algorithm like the following. 1: function MAKE-B-TREE(L, C, R) 2: if F U LL?(C) then 3: return F IX F U LL(L, C, R) 4: else if LOW ?(C) then 5: return F IX LOW (L, C, R) 6: else 7: T CREAT E N EW N ODE() 8: KEY S(T ) KEY S(L) + KEY S(R) 9: CHILDREN (T ) CHILDREN (L) + [C] + CHILDREN (R) 10: return T 11: end if 41

end function Where F IX LOW is dened as the following. In case the left part isnt empty, it will borrow a key-child pair from the left, and do un-split to make the child contains enough keys, then recursively call M AKE B T REE; If the left part is empty, it will try to borrow key-child pair from the right part, and if both sides are empty, it will returns the child node as result, so that the height shrinks. 1: function FIX-LOW(L, C, R) 2: kl , cl L 3: kr , cr R 4: m LEN GT H(kl ) 5: n LEN GT H(kr ) 6: if kl = N IL then 7: kl kl [1, ..., m 1] 8: cl cl [1, ..., m 1] 9: C U N SP LIT (cl [m], kl [m], C) 10: return M AKE B T REE((kl , cl ), C , R) 11: else if kr = N IL then 12: kr kr [2, ..., n] 13: cr cr [2, ..., n] 14: C U N SP LIT (C, kr [1], cr [1]) 15: return M AKE B T REE(L, C , (kr , cr )) 16: else 17: return C 18: end if 19: end function Function U N SP LIT denes as the inverses operation of splitting. 1: function UN-SPLIT(L, k, R) 2: T CREAT E B T REE N ODE() 3: KEY S(T ) KEY S(L) + [k] + KEY S(R) 4: CHILDREN (T ) CHILDREN (L) + CHILDREN (R) 5: return T 6: end function
12:

4.2.2

Delete and x algorithm implemented in Haskell

Based on the analysis of delete-then-xing approach, a Haskell program can be provided accordingly. The core deleting function is simple, it just call an internal removing function, then examine the root node to see if the height of the tree can be shrunk. import q u a l i f i e d Data . List a s L delete : : (Ord a ) = BTree a > a > BTree a > delete t r x = f i x R o o t $ d e l t r x

42

d e l : : (Ord a ) = BTree a > a > BTree a > d e l ( Node ks [ ] t ) x = Node (L . delete x ks ) [ ] t d e l ( Node ks c s t ) x = case L . elemIndex x ks of Just i > merge ( Node ( take i ks ) ( take ( i +1) c s ) t ) ( Node ( drop ( i +1) ks ) ( drop ( i +1) c s ) t ) Nothing > make ( ks , cs ) ( d e l c x ) ( ks , cs ) where ( ks , ks ) = L . partition (<x ) ks ( cs , ( c : cs ) ) = L . splitAt ( length ks ) c s Lets focus on the del function, if try to delete a key from a leaf node, it just calls delete function dened in Data.List library. If the key doesnt exist at all, the pre-dened delete function will simply return the list without any modication. For the case of deleting a key from a branch node, it will rst examine if the key can be located in this node, and apply recursive merge after remove this key. Otherwise, it will locate the proper child and do recursive delete-then-xing on this child. Note that partition and splitAt functions dened in Data.List can help to split the key and children list at the position that all elements on the left is less than the key while the right part are greater than the key. The recursive merge program has two patterns, merge two leaves and merge two branches. It is given as the following. merge : : BTree a > BTree a > BTree a merge ( Node ks [ ] t ) ( Node ks [ ] ) = Node ( ks++ks ) [ ] t merge ( Node ks c s t ) ( Node ks cs ) = make ( ks , i n i t c s ) ( merge ( l a s t c s ) ( head cs ) ) ( ks , t a i l cs ) Where init, last, tail functions are used to manipulate list which are dened in Haskell prelude. The xing part of delete-then-xing is dened inside make function. make : : ( [ a ] , [ BTree a ] ) > BTree a > ( [ a ] , [ BTree a ] ) > BTree a make ( ks , cs ) c ( ks , cs ) | f u l l c = f i x F u l l ( ks , cs ) c ( ks , cs ) | low c = fixLow ( ks , cs ) c ( ks , cs ) | otherwise = Node ( ks ++ks ) ( cs ++[ c]++cs ) ( d e g r e e c ) Where function low is used to test if a node contains too few keys. low : : BTree a > Bool low t r = ( length $ k e y s t r ) < ( d e g r e e t r )1 The real xing is implemented by try to borrow keys either from left sibling or right sibling as the following. fixLow : : ( [ a ] , [ BTree a ] ) > BTree a > ( [ a ] , [ BTree a ] ) > BTree a fixLow ( ks @( : ) , cs ) c ( ks , cs ) = make ( i n i t ks , i n i t cs ) 43

( u n s p l i t ( l a s t cs ) ( l a s t ks ) c ( ks , cs ) fixLow ( ks , cs ) c ( ks @( : ) , cs ) = make ( ks , cs ) ( u n s p l i t c ( head ks ) ( head cs ( t a i l ks , t a i l cs ) fixLow c = c Note that by using x@( : ) like pattern can help to ensure x is not empty. Here function unsplit is used which will do inverse splitting operation like below. u n s p l i t : : BTree a > a > BTree a > BTree a u n s p l i t c1 k c2 = Node ( ( k e y s c1 )++[k]++( k e y s c2 ) ) ( ( c h i l d r e n c1)++( c h i l d r e n c2 ) ) ( d e g r e e c1 ) In order to verify the Haskell program, we can provide some simple test cases. import C o n t r o l .Monad ( foldM , mapM )

t e s t D e l e t e = foldM delShow ( l i s t T o B T r e e GMPXACDEJKNORSTUVYZBFHIQW 3 ) EGAMU where delShow t r x = do l e t t r = delete t r x putStrLn $ d e l e t e ++(show x ) putStrLn $ t o S t r i n g t r return t r Where function listToBTree and toString are dened in previous section when we explain insertion algorithm. Run this function will generate the following result. delete E (((A, B), C, (D, F), G, (H, I, J, K)), M, ((N, O), P, (Q, R, S), T, (U, V), W, (X, Y, Z))) delete G (((A, B), C, (D, F), H, (I, J, K)), M, ((N, O), P, (Q, R, S), T, (U, V), W, (X, Y, Z))) delete A ((B, C, D, F), H, (I, J, K), M, (N, O), P, (Q, R, S), T, (U, V), W, (X, Y, Z)) delete M ((B, C, D, F), H, (I, J, K, N, O), P, (Q, R, S), T, (U, V), W, (X, Y, Z)) delete U ((B, C, D, F), H, (I, J, K, N, O), P, (Q, R, S, T, V), W, (X, Y, Z)) If we try to delete the same key from the same B-tree as in merge and xing approach, we can found that the result is dierent by using delete-then-xing 44

methods. Although the results are not as same as each other, both satisfy the B-tree properties, so they are all correct.
M

a. A B-tree before performing deleting;


M

b. After delete key E Figure 18: Result of delete-then-xing (1)

4.2.3

Delete and x algorithm implemented in Scheme/Lisp

In order to implement delete program in Scheme/Lisp, we provide an extra function to test if a node contains too few keys after deletion. ( d e f i n e ( low ? t r t ) ; ; t : minimum d e g r e e (< ( length ( k e y s t r ) ) ( t 1 ) ) ) And some general purpose list manipulation functions are dened. ( d e f i n e ( rest l s t k ) ( l i s t t a i l l s t ( ( length l s t ) k ) ) ) ( define ( exceptrest l s t k) ( l i s t h e a d l s t ( ( length l s t ) k ) ) ) ( define ( first lst ) ( i f ( null ? l s t ) ( ) ( c a r l s t ) ) ) ( define ( last l s t ) 45

c. After delete key G;


H M P T W

d. After delete key A; Figure 19: Result of delete-then-xing (2)

e. After delete key M;


H P W

f. After delete key U; Figure 20: Result of delete-then-xing (3)

46

( i f ( null ? l s t )

() ( car ( lastpair l s t ) ) ) )

( define ( inits lst ) ( i f ( null ? l s t ) ( ) ( e x c e p t l a s t p a i r l s t ) ) ) Function rest can extract the last k elements from a list, while except-rest used to extract all except the last k elements. rst can be treat as a safe car, it will return empty list but not throw exception when the list is empty. Function last returns the last element of a list, and if the list is empty, it will return empty result. Function inits returns all excluding the last element. And a inversion operation of splitting is provided. ( d e f i n e ( unsplit l s t ) ( l e t ( ( c1 ( c a r l s t ) ) ( k ( cadr l s t ) ) ( c2 ( caddr l s t ) ) ) (append c1 ( l i s t k ) c2 ) ) ) \ end { l s t l i s i n g } The main f u n c t i o n o f d e l e t i o n i s d e f i n e d a s t h e f o l l o w i n g . \ begin { l s t l i s t i n g } ( define ( btreedelete tr x t ) ( define ( del tr x) ( i f ( leaf ? tr ) ( delete x t r ) ( let (( res ( partitionby tr x )) ( l e f t ( car res )) ( c ( cadr r e s ) ) ( r i g h t ( caddr r e s ) ) ) ( i f ( equal ? ( f i r s t r i g h t ) x ) ( mergebtree (append l e f t ( l i s t c ) ) ( c d r r i g h t ) t ) ( makebtree l e f t ( d e l c x ) r i g h t t ) ) ) ) ) ( fixroot ( del tr x) t )) It is implemented in a similar way as the insertion, call an internal dened del function then apply xing process on it. In the internal deletion ction, if the B-tree is a leaf node, the standard list deleting function dened in standard library is applied. If it is a branch node, we call the partition-by function dened previously. This function will divide the node into 3 parts, all children and keys less than x as the left part, a child node next, all keys not less than (greater than or equal to) x and children s the right part. If the rst key in right part is equal to x, it means x can be located in this node, we remove x from right and then call merge-btree to merge left+c, right-x to one new node. ( d e f i n e ( mergebtree t r 1 t r 2 t ) ( i f ( l e a f ? tr1 ) 47

(append t r 1 t r 2 ) ( makebtree ( i n i t s t r 1 ) ( mergebtree ( l a s t t r 1 ) ( c a r t r 2 ) t ) ( cdr tr2 ) t ))) Otherwise, x may be located in c, so we need recursively try to delete x from c. Function x-root is updated to handle the cases for deletion as below. ( define ( fixroot tr t ) ( cond ( ( null ? t r ) ( ) ) ; ; empty t r e e (( f u l l ? tr t ) ( split tr t )) ( ( null ? ( k e y s t r ) ) ( c a r t r ) ) ; ; s h r i n k h e i g h t ( else tr ))) We added one case to handle if a node contains too few keys after deleting in make-btree. ( d e f i n e ( makebtree l c r t ) ( cond ( ( f u l l ? c t ) ( f i x f u l l l c r t ) ) ( ( low ? c t ) ( f i x l o w l c r t ) ) ( e l s e ( append l ( cons c r ) ) ) ) ) Where x-low is dened to try to borrow a key and a child either from left sibling or right sibling. ( d e f i n e ( f i x l o w l c r t ) ( cond ( ( not ( null ? ( k e y s l ) ) ) ( makebtree ( e x c e p t r e s t l 2 ) ( u n s p l i t (append ( re s t l 2 ) ( l i s t c ) ) ) r t )) ( ( not ( null ? ( k e y s r ) ) ) ( makebtree l ( u n s p l i t ( cons c ( l i s t h e a d r 2 ) ) ) ( l i s t t a i l r 2) t ) ) ( else c ))) In order to verify the the deleting program, a simple test is fed to the above dened function. ( define ( testdelete ) ( d e f i n e ( delandshow t r x ) ( let (( r ( btreedelete tr x 3))) ( b e g i n ( d i s p l a y r ) ( d i s p l a y \n ) r ) ) ) ( f o l d l e f t delandshow ( l i s t >b t r e e ( str> s l i s t GMPXACDEJKNORSTUVYZBFHIQW ) 3 ) ( str> s l i s t EGAMU ) ) ) Run the test will generate the following result.

48

( ( (A B) ( ( (A B) ( (B C D ( (B C D ( (B C D

C (D C (D F) H F) H F) H

F) F) (I (I (I

G H J J J

(H I J K) ) M ( (N O) P (Q R S ) T (U V) W (X Y Z ) ) ) ( I J K) ) M ( (N O) P (Q R S ) T (U V) W (X Y Z ) ) ) K) M (N O) P (Q R S ) T (U V) W (X Y Z ) ) K N O) P (Q R S ) T (U V) W (X Y Z ) ) K N O) P (Q R S T V) W (X Y Z ) )

Compare with the output by the Haskell program in previous section, it can be found they are same.

Searching

Although searching in B-tree can be considered as a generalized form of tree search which extended from binary search tree, its good to mention that in disk access case, instead of just returning the satellite data corresponding to the key, its more meaningful to return the whole node, which contains the key.

5.1

Imperative search algorithm

When searching in Binary tree, there are only 2 dierent directions, left and right to go further searching, however, in B-tree, we need extend the search directions to cover the number of children in a node. 1: function B-TREE-SEARCH(T, k) 2: loop 3: i1 4: while i LEN GT H(KEY S(T )) and k > KEY S(T )[i] do 5: k k+1 6: end while 7: if i LEN GT H(KEY S(T )) and k = KEY S(T )[i] then 8: return (T, i) 9: end if 10: if T is leaf then 11: return N IL k doesnt exist at all 12: else 13: T CHILDREN (T )[i] 14: end if 15: end loop 16: end function When doing search, the program examine each key from the root node by traverse from the smallest towards the biggest one. in case it nd a matched key, it returns the current node as well as the index of this keys. Otherwise, if it nds this key satisfying ki < k < ki+1 , The program will update the current node to be examined as child node ci+1 . If it fails to nd this key in a leaf node, empty value is returned to indicate the fail case. Note that in Introduction to Algorithm, this program is described with recursion, Here the recursion is eliminated.

49

search program in C++ In C++ implementation, we can use pair provided in STL library as the return type. template<c l a s s T> s t d : : p a i r <T , unsigned int> s e a r c h (T t , typename T : : k e y t y p e k ) { for ( ; ; ) { unsigned int i ( 0 ) ; for ( ; i < t>k e y s . s i z e ( ) && k > t>k e y s [ i ] ; ++i ) ; i f ( i < t>k e y s . s i z e ( ) && k == t>k e y s [ i ] ) return s t d : : m a k e pa i r ( t , i ) ; i f ( t>l e a f ( ) ) break ; t = t>c h i l d r e n [ i ] ; } return s t d : : m a k e p a i r ( (T ) 0 , 0 ) ; // not found } And the test cases are given as below. void t e s t s e a r c h ( ) { s t d : : cout<< t e s t s e a r c h . . . \ n ; const char s s [ ] = { G , M , P , X , A , C , D , E , J , K , \ N , O , R , S , T , U , V , Y , Z } ; BTree<s t d : : s t r i n g , 3> t r = l i s t t o b t r e e ( s s , s s+s i z e o f ( s s ) / s i z e o f ( char ) new BTree<s t d : : s t r i n g , 3 >); s t d : : cout<< \n<<b t r e e t o s t r ( t r )<<\n ; for ( unsigned int i =0; i <s i z e o f ( s s ) / s i z e o f ( char ) ; ++i ) t e s t s e a r c h ( tr , s s [ i ] ) ; t e s t s e a r c h ( t r , W ) ; delete t r ; } template<c l a s s T> void t e s t s e a r c h (T t , typename T : : k e y t y p e k ) { s t d : : p a i r <T , unsigned int> r e s = s e a r c h ( t , k ) ; if ( res . f i r s t ) s t d : : cout<< found <<r e s . f i r s t >k e y s [ r e s . s e c o n d]<<\n ; else s t d : : cout<< not found <<k<< \n ; } Run test search function will generate the following result. test search... ((A, C), D, (E, G, J, K), M, (N, O), P, (R, S), T, (U, V, X, Y, Z)) found G found M 50

... found Z not found W Here the program can nd all keys we inserted. search program in Python Change a bit the above algorithm in Python gets the program corresponding to the pseudo code mentioned in Introduction to Algorithm textbook. def B t r e e s e a r c h ( t r , key ) : for i in r a n g e ( l e n ( t r . k e y s ) ) : i f key<= t r . k e y s [ i ] : break i f key == t r . k e y s [ i ] : return ( t r , i ) if tr . leaf : return None else : i f key>t r . k e y s [ 1 ] : i=i +1 #d i s k r e a d return B t r e e s e a r c h ( t r . c h i l d r e n [ i ] , key ) There is a minor modication from the original pseudo code. We uses forloop to iterate the keys, the the boundary check is done by compare the last key in the node and adjust the index if necessary. Lets feed some simple test cases to this program. def t e s t s e a r c h ( ) : l s t = [ G , M , P , X , A , C , D , E , J , K , \ N , O , R , S , T , U , V , Y , Z ] t r = l i s t t o B t r e e ( l s t , 3) print t e s t s e a r c h \n , B t r e e t o s t r ( t r ) fo r i in l s t : t e s t s e a r c h ( tr , i ) t e s t s e a r c h ( t r , W ) def t e s t s e a r c h ( tr , k ) : r e s = B t r e e s e a r c h ( tr , k ) i f r e s i s None : print k , not found else : ( node , i ) = r e s print found , node . k e y s [ i ]

Run the function test search will generate the following result.

51

found found ... found W not

G M Z found

5.2

Functional search algorithm

The imperative algorithm can be turned into Functional by performing recursive search on a child in case key cant be located in current node. 1: function B-TREE-SEARCH(T, k) 2: i F IN D F IRST (x x >= k, KEY S(T )) 3: if i exists and k = KEY S(T )[i] then 4: return (T, i) 5: end if 6: if T is leaf then 7: return N IL k doesnt exist at all 8: else 9: return B T REE SEARCH(CHILDREN (T )[i], k) 10: end if 11: end function Search program in Haskell In Haskell program, we rst lter out all keys less than the key to be searched. Then check the rst element in the result. If it matches, we return the current node along with the index as a tuple. Where the index start from 0. If it doesnt match, We then do recursive search till leaf node. s e a r c h : : ( Ord a )= BTree a > a > Maybe ( BTree a , I n t ) > s e a r c h tr@ ( Node ks c s ) k | m a t c h F i r s t k $ drop l e n ks = J u s t ( t r , l e n ) | o t h e r w i s e = i f n u l l c s then Nothing else search ( cs ! ! len ) k where m a t c h F i r s t x ( y : ) = x==y matchFirst x = False l e n = l e n g t h $ f i l t e r (<k ) ks The verication test cases are provided as the following. t e s t S e a r c h = mapM ( showSearch ( l i s t T o B T r e e l s t 3 ) ) $ l s t++L where showSearch t r x = do case s e a r c h t r x o f J u s t ( , i ) > putStrLn $ found ++ ( show x ) Nothing > putStrLn $ not found ++ ( show x ) l s t = GMPXACDEJKNORSTUVYZBFHIQW 52

Here we construct a B-tree from a series of string, then we check if each element in this string can be located. Finally, an non-existed element L is fed to verify the failure case. Run this test function generates the following results. foundG foundM ... foundW not foundL Search program in Scheme/Lisp Because we intersperse children and keys in one list in Scheme/Lisp B-tree denition, the search function just move one step a head to locate the key in a node. ( d efin e ( btreesearch tr x) ; ; f i n d t h e s m a l l e s t i n d e x where k e y s [ i ]>= x ( d e f i n e ( findindex t r x ) ( l e t ( ( pred ( i f ( string ? x ) string >=? >=))) ( i f ( null ? t r ) 0 ( i f (and ( not ( l i s t ? ( c a r t r ) ) ) ( pred ( c a r t r ) x ) ) 0 (+ 1 ( f i n d i n d e x ( c d r t r ) x ) ) ) ) ) ) ( let ( ( i ( findindex t r x ) ) ) ( i f (and (< i ( length t r ) ) ( equal ? x ( l i s t r e f t r i ) ) ) ( cons t r i ) ( i f ( l e a f ? t r ) #f ( b t r e e s e a r c h ( l i s t r e f t r ( i 1 ) ) x ) ) ) ) ) The program denes an inner function to nd the index of the rst element which is greater or equal to the key we are searching. If the key pointed by this index matches, we are done. Otherwise, this index points to a child which may contains this key. The program will return false result in case the current node is a leaf node. We can run the below testing function to verify this searching program. ( define ( testsearch ) ( d e f i n e ( searchandshow t r x ) ( i f ( btreesearch tr x) ( d i s p l a y ( l i s t found x ) ) ( d i s p l a y ( l i s t not found x ) ) ) ) ( l e t ( ( l s t ( str> s l i s t GMPXACDEJKNORSTUVYZBFHIQW ) ) ( t r ( l i s t >b t r e e l s t 3 ) ) ) (map ( lambda ( x ) ( searchandshow t r x ) ) ( cons L l s t ) ) ) ) \ end { l s t l i s i t n g }

53

A nonexisted key L i s f i r s t l y fed , and then a l l e l e m e n t s which used t o form t h e Btree a r e l o o k e d up f o r v e r i f i c a t i o n . \ begin { l s t l i s t i n g } ( not found L ) ( found

G) ( found

M) . . . ( found

W)

Notes and short summary

In this post, we explained the B-tree data structure as a kind of extension from binary search tree. The background knowledge of magnetic disk access is skipped, user can refer to [1] for detail. For the three main operations, insertion, deletion, and searching, both imperative and functional algorithms are illustrated. The complexity isnt discussed here, However, since B-tree are dened to maintain the balance properties, all operations mentioned here perform O(lgN ) where N is the number of the keys in a B-tree.

Appendix

All programs provided along with this article are free for downloading.

7.1

Prerequisite software

GNU Make is used for easy build some of the program. For C++ and ANSI C programs, GNU GCC and G++ 3.4.4 are used. For Haskell programs GHC 6.10.4 is used for building. For Python programs, Python 2.5 is used for testing, for Scheme/Lisp program, MIT Scheme 14.9 is used. all source les are put in one folder. Invoke make or make all will build C++ and Haskell program. Run make Haskell will separate build Haskell program. the executable le is htest (with .exe in Window like OS). It is also possible to run the program in GHCi.

7.2

Tools

Besides them, I use graphviz to draw most of the gures in this post. In order to translate the B-tree output to dot script. A Haskell tool is provided. It can be used like this. bt2dot filename.dot "string" Where lename.dot is the output le for the dot script. It can parse the string which describes B-tree content and translate it into dot script. This source code of this tool is BTr2dot.hs, it can also be downloaded with this article. download position: http://sites.google.com/site/algoxy/btree/btree.zip 54

References
[1] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Cliord Stein. Introduction to Algorithms, Second Edition. The MIT Press, 2001. ISBN: 0262032937. [2] B-tree, Wikipedia. http://en.wikipedia.org/wiki/B-tree [3] Liu Xinyu. Comparison of imperative and functional implementation of binary search tree. http://sites.google.com/site/algoxy/bstree [4] Chris Okasaki. FUNCTIONAL PEARLS Red-Black Trees in a Functional Setting. J. Functional Programming. 1998

55

You might also like