0% found this document useful (0 votes)
93 views27 pages

B Tree: Muhammad Haris Department of Computer Science M.haris@nu - Edu.pk

The document discusses B-trees, which are self-balancing search trees used to store data in external memory. It defines B-trees as m-way trees where m is the maximum number of children per node. All leaf nodes must be at the same level. B-trees allow efficient retrieval of data from slow storage devices like disks by minimizing the number of disk accesses needed. The document provides examples of constructing a B-tree by inserting keys in order and explains that insertion may cause nodes to split and keys to propagate up the tree.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views27 pages

B Tree: Muhammad Haris Department of Computer Science M.haris@nu - Edu.pk

The document discusses B-trees, which are self-balancing search trees used to store data in external memory. It defines B-trees as m-way trees where m is the maximum number of children per node. All leaf nodes must be at the same level. B-trees allow efficient retrieval of data from slow storage devices like disks by minimizing the number of disk accesses needed. The document provides examples of constructing a B-tree by inserting keys in order and explains that insertion may cause nodes to split and keys to propagate up the tree.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

B Tree

Muhammad Haris
Department of Computer Science
[email protected]
1
Today’s Lecture
Introduction to B tree
Properties of B tree
Insertion into B tree
Examples
Activity to submit now
B-tree concept
BST?
AVL trees/Balance tree

 B-Tree is a good structure if much of the tree is in


slow memory (disk),
 Since smaller the height
 Pick large block of data
 Used in cache applications
Definition of a B-tree
A self balancing tree called M-way tree
M is the order of B tree
M could be 3 4 5 6 7 etc
M is the maximum number of children of nodes in a B-tree.
[Not the values]
All leaf nodes are at the same Level
Keys will be always in sorted form in each node
Definition of a B-tree properties
Every node can have maximum m Children
Every node can have minimum children
Root => 2
Leaf node => 0 children
Mid or internal node in tree => ceil[m/2]

Every node can have m-1 maximum keys


Every node can All node will be inserted in the leaf
node have minimum keys ceil[m/2]-1 except root node
Root node can have one value
Extra information
Self balancing tree
Allow node to have more than 2 children in specific range
Max child = 2t i.e m
Min child=Ceil [m/2] called t
 Where m=2t and t is the branching factor that should be
greater than 1.
Balanced = leaves are at same height
Disc operations efficient
M>2
T>1
Definition
 The major advantage of the B+ tree (and B-trees in general) over binary
search trees is that they play well with caches. If you have a binary search
tree whose nodes are stored in more or less random order in memory, then
each time you follow a pointer, the machine will have to pull in a new
block of memory into the processor cache, which is dramatically slower
than accessing memory already in cache.
 The B+-tree and the B-tree work by having each node store a huge number
of keys or values and have a large number of children. They are typically
packed together in a way that makes it possible for a single node to fit nicely
into cache (or, if stored on disk, to be pulled from the disk in a single
read operation). You then have to do more work to find a key within the
node or determine which child to read next, but because all memory accesses
done on a single node can be done without going back to disk, the access
times are very small. This means that even though in principle a BST might
be better in terms of number of memory accesses, the B+-tree and the B-tree
can performed better in terms of the runtime of those memory accesses.
The typical use case for a B+-tree or B-tree is in a
database, where there is a huge amount of information
and the data are so numerous that they can't all fit into
main memory. Accordingly, the data can then be
stored in a B+-tree or B-tree on a hard disk
somewhere. This minimizes the number of disk reads
necessary to pull in the data during lookups. Some
filesystems (like ext4, I believe) use B-trees as well
for the same reason - they minimize the number of
disk lookups necessary, which is the real bottleneck.
An example B-Tree
26 A B-tree of order 5

6 12

42 51 62
1 2 4 7 8 13 15 18 25

27 29 45 46 48 53 55 60 64 70 90

Note that all the leaves are at the same level


9 B-Trees
Properties of B-trees

1. Every node x has the following fields


a. x.n: the number of keys currently stored in node x.
e.g 1|2|4.n is 3.
b. The x.n keys themselves, stored in non-decreasing order
so that x.key1 ≤ x.key2≤ … ≤ x.keyx.n
e.g 1|2|4 are ordered
c. x.leaf, a boolean that is TRUE if x is a leaf and FALSE
otherwise.
2. Each internal (=non-leaf) node contains x.n+1 pointers x.c 1, x.c2,
…, x.cx.n+1 to children.
3. Leaf nodes have no such pointers.

4. The keys x.keyi separate the ranges of keys stored in each subtree:
if ki is any key stored in the subtree with root x.c i then
k1 ≤ x.key1 ≤ k2 ≤ x.key2 ≤ … ≤ x.keyx.n ≤ kx.n+1
e.g consider 6|12 node

6 12

1 2 4 7 8 13 15 18 25
B tree Overall properties
Balance tree m-way tree
More than 2 Childs but actually its balance tree (BST)
All leaf nodes must be at same level 2,3,4 or k
Always add items to the leaf node
All order of m leaf have following properties
Ever node has at most m Childs
Min children could be zero for leaf, 2 for root and ceil of (
m/2) for all other nodes
Every node has m-1 keys (values)
Min keys for root will be 1
All other nodes will have minimum keys ceilof(m/2)-1
M way B-tree
 5 way tree
A B-tree of order 5, that is, internal nodes can have children
three, four or five children
m-1 nodes max keys

• 3 way tree
A B-tree of order 3, that is, internal nodes can have two or
three children.
m-1 nodes max keys
Insertion value X into B-tree

1. Using Search Procedure for M-way trees, find leaf


node to which X should be added

2. Add X to this node in the appropriate place among the


values already there. Being a leaf node there are no sub
trees to worry about.

3. If there are M-1 or fewer values in the node after


adding X, then we are finished
4. If there are M values after adding X in the node.
 Split the node into three parts
Left : the first (M-1)/2 values
Middle : ((M-1)/2 +1)
Right: the last (M-1)/2 values
Move up the middle key

 This strategy might have to be repeated all the way to the top

 If necessary, the root is split in two and the middle key is


promoted to a new root, making the tree one level higher
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

Constructing a B-tree
Suppose we start with an empty B-tree and keys arrive
in the following order:1 12 8 2 25 6 14 28 17 7
52 16 48 68 3 26 29 53 55 45
We want to construct a B-tree of order 5
The first four items go into the root:

1 2 8 12

To put the fifth item in the root (Step 4)


Therefore, when 25 arrives, pick the middle key to
make a new root
16 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

Constructing a B-tree (contd.)

1 2 12 25

6, 14, 28 get added to the leaf nodes:


8

1 2 6 12 14 25 28

17 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

Constructing a B-tree (contd.)


Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17

1 2 6 12 14 25 28

7, 52, 16, 48 get added to the leaf nodes


8 17

1 2 6 7 12 14 16 25 28 48 52

18 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

Constructing a B-tree (contd.)


Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48

1 2 6 7 12 14 16 25 26 28 29 52 53 55 68

Adding 45 causes a split of 25 26 28 29

and promoting 28 to the root then causes the root to split

19 B-Trees
1 12 8 2 25 6 14 28 17 7 52 16 48 68 3 26 29 53 55 45

Constructing a B-tree (contd.)

17

3 8 28 48

1 2 6 7 12 14 16 25 26 29 45 52 53 55 68

20 B-Trees
Exercise in Inserting a B-Tree
Home Task
Insert the following Letters to a 3-way B-tree:

CNGAHEKQMFWLT

21 B-Trees
Analysis of B-Tree
Two Principle component of running time :
The number of disc accesses
The CPU computing time
B-Tree-Search
Search 55

17

3 8 28 48

1 2 6 7 12 14 16 25 26 29 45 52 53 55 68

24 B-Trees
Analysis of B-Tree Search
Number of Disk access
The B-TREE-SEARCH procedure accesses disk O(h)=
O(logt n) , where h is the height of the B-tree and n is
the number of keys in the B-tree.
Assumption 2t=m
Each node has 2t-1 items/keys

CPU time
Since x.n < 2t, the while loop of lines 2–3 takes O(t)
time within each node
the total CPU time is O(th)=O(t logtn).
Self study
Read code for the insertion in a B tree (CLRS pages :
Chapter 18 p 491 t0 495)
Analysis of pseudo code should be O(th)
Contd..
 It is actually a proactive insertion algorithm where before going down to a
node, we split it if it is full.
 The advantage of splitting before is, we never traverse a node twice. If we
don’t split a node before going down to it and split it only if new key is
inserted (reactive), we may end up traversing all nodes again from leaf to
root.
 This happens in cases when all nodes on the path from root to leaf are full.
So when we come to the leaf node, we split it and move a key up. Moving a
key up will cause a split in parent node (because parent was already full).
 This cascading effect never happens in this proactive insertion algorithm.
 There is a disadvantage of this proactive insertion though, we may do
unnecessary splits.

You might also like