0% found this document useful (0 votes)
7 views35 pages

Lec-12 BTrees

The document discusses B-Trees, which are multi-way search trees used to store large amounts of data on disk. B-Trees minimize the number of disk accesses needed for operations. The key properties and operations of B-Trees are described, including searching, inserting keys through node splitting, and the logarithmic height of B-Trees.

Uploaded by

Sara Zara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views35 pages

Lec-12 BTrees

The document discusses B-Trees, which are multi-way search trees used to store large amounts of data on disk. B-Trees minimize the number of disk accesses needed for operations. The key properties and operations of B-Trees are described, including searching, inserting keys through node splitting, and the logarithmic height of B-Trees.

Uploaded by

Sara Zara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Lecture 12 30/10/2003 B-Trees

Applied Algorithms
(Lecture 12)
B-Trees
Fall-23

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Motivation
• When data is too large to fit in the main memory,
then it is retrieved on need basis from the disk.

• Thus for large files, the number of disk accesses


becomes important.
• A disk access is unbelievably expensive compared
to a typical computer instruction (mechanical
limitations).
• One disk access is worth 200,000 computer
instructions.
• The number of disk accesses will dominate the
running time of the solution.
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

Motivation (contd.)
• Secondary memory (disk) is divided into equal-sized
blocks (typical size are 512, 2048,4096, or 8192
bytes).

• The basic I/O operation transfers the contents of one


disk block to/from RAM.

• Our goal is to devise multi way search tree that will


minimize file access ( by exploring disk block read).
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

Multi way search trees(of order m)

• A generalization of Binary Search Trees.

• Each node has at most m children.

• If k ≤ m is the number of children, then the node has


exactly k-1 keys.

• The tree is ordered.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

B-Trees
• A B-tree of order m is m-way search tree.

• B-Trees are balanced search trees designed to


work well on direct access secondary storage
devices.

• B-Trees are similar to Red-Black Trees, but


are better at minimizing disk I/O operations.

• All leaves are on the same level.


National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

B-Tree Properties
B-Tree is a rooted tree with root[T] with the following
properties:

1-Every node x has the following fields.


a-n[ x], the number of keys currently stored in x.

b-The n[ x] keys, themselves stored in non decreasing


(Ascending) order.
key1[x] ≤ key2[x] ≤ … ≤ key n [x].

c-Leaf[ x], a Boolean value that is TRUE if x is leaf,


and false if x is internal node.
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

Properties Contd…
2-if x is an internal node, it also contains n[ x]+1 pointers
to its children. Leaf node contains no children.

3-The keys keyi[ x] separate the range of keys stored in


each sub tree : if k1 is any key stored in the sub tree
with root c1[ x], then:
k1≤ key1[x] ≤ k2 ≤ key2[x] ≤…key n[ x] [ x] ≤ kn[x]+1

4-Each leaf has the same depth, which is the height of


the tree h.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Properties Contd…
5- There are lower and upper bound on the number of keys a
node can contain.

These bounds can be expressed in terms of a fixed integer t ≥2,


called the minimum degree of B-Tree.

Why t cant be 1?

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Properties Contd…

a-Every node other than the root must have at least t-1
keys, Every internal node other than root, thus has at
least t children. If the tree is non empty, the root must
have at least one key.

b-Every node can contain at most 2t-1 keys. Therefore,


an internal node can have at most 2t children. We say
a node is full if it contains exactly 2t-1 keys.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Height of a B-Tree

• What is the maximum height of a B-Tree with N


entries?

• This question is important, because the maximum


height of a B-Tree will give an upper bound on the
number of disk accesses.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Height of a B-Tree.

If n ≥ 1, than for any n-key B-Tree T of height h and


minimum degree t ≥ 2,

 n 1
h  log  
t  2 
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees
root[T]
# of nodes
1 1

t-1 t-1 2
t t
t-1 t-1 t-1 t-1 2t
t t t
t

t-1 t-1 t-1 t-1 t-1 t-1 t-1 t-1 2t2

A B-Tree of height 3 containing minimum possible keys


National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

proof
• Number of nodes is minimized, when root
contains one key and all other nodes contain t-1
keys.

• 2 nodes at depth 1, 2t nodes at depth 2, 2t2


nodes at depth 3 and so on.

• At depth h, there are 2th-1 nodes.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Proof( Contd.)

• Thus number of keys (n) satisfies the inequality:


h
n  1  (t  1) 2t i 1
i 1

 t h 1
n  1  2(t  1) 
 t 1 
n  2t h  1
n  1  2t h
 n 1
h  log t  
 2 
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

Numerical Example

For N= 2,000,000 (2 Million), and m=100,


the maximum height of a tree of order m
will be only 3, whereas a binary tree would
be of height larger than 20.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Operation on B-Trees
• Searching a B-Tree.

• Creating an empty B-Tree.

• Splitting a node in B-Tree.

• Inserting a Key into B-Tree.

• Deleting a Key from B-Tree.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Searching a B-Tree

• It is much like searching a BST, except that


instead of making a binary or “two way”
decision at each node, we make a multi way
branching decision according to number of
children.

• In other words, at each internal node x, we


make an (n[x]+1)-way branching decision.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Searching (Contd.)

• B-TREE-SEARCH takes as input a pointer to


the root node x of a sub tree and a key k to be
searched for in that sub tree.

• The top level call is thus of the form B-TREE-


SEARCH( root[T], k).

• If k is in the B-Tree, this procedure returns the


ordered pair (y, i), consisting of a node y and an
index i, such that keyi[y]=k.
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

Searching( Contd.)
• The nodes encountered during the recursion
forms a path downward from the root of the
tree.

• The number of disk pages accessed by


procedure is therefore O(h)=O(logt(n))

● Since n[ x] ≤ 2t, thus time taken to search


within each node is O(t), and the total CPU
time is O(t*h)=O(t log(n))

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Splitting a node in B-Tree

• Inserting a key into B-Tree is significantly more


complicated than inserting a key into BST

• Fundamental operation used during insertion is


splitting of a full node y (having 2t-1 keys)
around its median key keyi [y] into two nodes
having t-1 keys each.

• The median key moves up into y ’s parent.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Splitting (contd..)

• y ‘s parent must be non-full prior to splitting of y.

• If y has no parent, then the tree grows in height


by one.

• So splitting is the mean by which B-Tree grows.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Splitting (contd..)

• If a node becomes full, it is necessary to


perform a split operation.

• The B-TREE-SPLIT-CHILD algorithm will run in


O(t), where t is constant.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Splitting (contd..)

…N W… …N S W…

P Q R S T U V P Q R T U V

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Insertion in a B-Tree

• To perform an insertion in a B-tree, the


appropriate node for the key must be located
using an algorithm similar to B-Tree-Search

• Next, the key must be inserted into the node

• If the node is not full prior to the insertion, no


special action is required

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Insertion in a B-Tree (cont…)

• Splitting the node results in moving one key to


the parent node, what if the parent node is full?
• Then parent has to be split too.
• This process may repeat all the way up to the
root and may require splitting the root node
• This approach requires two passes. The first
pass locates the node where the key should be
inserted; the second pass performs any
required splits on the ancestor nodes
National University Of Computer & Emerging Sciences
Lecture 12 30/10/2003 B-Trees

Insertion in a B-Tree (cont…)

• Since each access to a node may correspond


to a costly disk access, it is desirable to avoid
the second pass by ensuring that the parent
node is never full
• To accomplish this, the algorithm splits any full
nodes encountered while descending the tree
• Is there a problem?

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Insertion in a B-Tree (cont…)


• This approach may result in unnecessary split
operations
• But it guarantees that the parent never needs to be
split and eliminates the need for a second pass up
the tree
• What is the penalty?

• Since a split runs in linear time, it has little effect on


the O(t logt n) running time of B-Tree-Insert.

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees
Initial Tree and Assume t=3
Minimum Number of keys at any internal node = t-1 = 2
Maximum Number of keys at any node = 2t-1 = 5

G M P Q X

A C D E J K N O R S T U V Y Z

Inserting B G M P X

A B C D E J K N O R S T U V Y Z

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Inserting L

G M P X

A B C D E J K L N O R S U V Y Z

Inserting F Node is already full


Is it as much simple? We have to split it
first

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Inserting F G M P X

A B C D E J K L N O R S U V Y Z

Median: Split here, C will move to parent node

C G M P X

A B D E F J K L N O R S U V Y Z

What will happen if we want to insert T?

What will happen if we want to insert Q?

National University Of Computer & Emerging Sciences


Lecture 12 30/10/2003 B-Trees

Deleting a key from B-Tree


• Deletion from a B-tree is analogous to
insertion but a little more complicated.

• For further details, see Section 19.3 of


Cormen et al book.

National University Of Computer & Emerging Sciences

You might also like