0% found this document useful (0 votes)
13 views75 pages

Ch18 - B-Trees

Uploaded by

jimpix2244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views75 pages

Ch18 - B-Trees

Uploaded by

jimpix2244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

B-Trees

Amanna Ghanbari Talouki

1
B-tree
 Defined by one parameter: t
 Balanced n-ary tree
 Each node contains between t-1 and 2t-1 keys/data
values (i.e. multiple data values per tree node)
 keys/data are stored in sorted order
 one exception: root can have < t-1 keys
 Each internal node contains between t and 2t
children
 the keys of a parent delimit the values of the children keys
 For example, if keyi = 15 and keyi+1 = 25 then child i + 1
must have keys between 15 and 25
 all leaves have the same depth
2
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

3
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

Balanced: all leaves have the same depth

4
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

Each node contains between t-1 and 2t – 1


keys stored in increasing order

5
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

Each node contains between t and 2t children

6
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

The keys of a parent delimit the values that


a child’s keys can take
7
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

The keys of a parent delimit the values that


a child’s keys can take
8
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

The keys of a parent delimit the values that


a child’s keys can take
9
Example B-tree: t = 2
GNT

C K Q X

A DE F H LM P RS W YZ

The keys of a parent delimit the values that


a child’s keys can take
10
When do we use B-trees over
other balanced trees?
B-trees are generally an on-disk data structure

Memory is limited or there is a large amount of data to be


stored

In the extreme, only one node is kept in memory and the


rest on disk

Size of the nodes is often determined by a page size on


disk.

Databases frequently use B-trees


11
Notes about B-trees
Because t is generally large, the height of a B-tree is
usually small

t = 1001 with height 2, how many values can we have?


Each internal node contains
between t and 2t children
Each node contains between t-1 and 2t-1 keys/data
values (i.e. multiple data values per tree node)

root level 1 level 2

2001+2002 * 2001 + 2002*2002*2001 = 8,024,024,007 12


(over 8 billion keys!!!)
Notes about B-trees
Because t is generally large, the height of a B-tree
is usually small

We will count both run-time as well as the number


of disk accesses.

13
Height of a B-tree
B-trees have a similar feeling to BSTs

We saw for BSTs that most of the operations depended on


the height of the tree

How can we bound the height of the tree?

We know that nodes must have a minimum number of


keys/data items (t-1)

For a tree of height h, what is the smallest number of keys?


14
Minimum number of nodes at
each depth?

In general? 15
Minimum number of keys/values
min. keys min. number
root
per node of nodes

n  1  (t  1)i 1 2t i 1
h

16
Minimum number of
keys/values

n  1  (t  1)i 1 2t i 1
h

 t h 1 
 1  2(t  1) 
 t 1 
 2t h  1
so,

t h  (n  1) / 2
(n  1)
h  log t
2 17
Searching B-Trees
Find value k in B-Tree

GNT

C K Q X

A DE F H LM P RS W YZ

18
Searching B-Trees
Find value k in B-Tree node x

number of keys

key[i]

child[i]

19
Searching B-Trees

make disk reads


explicit

20
Searching B-Trees

iterate through the sorted keys


and find the correct location

21
Searching B-Trees

if we find the value


in this node, return it

22
Searching B-Trees

if it’s a leaf and we didn’t


find it, it’s not in the tree

23
Searching B-Trees

Recurse on the proper


child where the value is
between the keys

24
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

25
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

26
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

find the correct


location

27
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

the value is not in


this node

28
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

this is not a
leaf node

29
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

30
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

find the correct


location

31
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

not in this node and


this is not a leaf

32
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

33
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

find the correct


location

34
Search example: R
GNT

C K Q X

A DE F H LM P RS W YZ

35
Search running time
How many calls to BTreeSearch?
 O(height of the tree)
 O(logtn)
Disk accesses?
 One for each call – O(logtn)
Computational time?
 O(t) keys per node
 linear search
 O(t logtn)

36
B-Tree insert
Starting at root, follow the search path down the tree
 If the node is full (contains 2t - 1 keys)
 split the keys into two nodes around the median value

 add the median value to the parent node

 If the node is a leaf, insert it into the correct spot

Observations
 Insertions always happens in the leaves

37
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

38
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

39
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CG

40
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CGN

41
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CGN Node is full, so split

42
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

G Node is full, so split

C N

43
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

AC N

44
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

AC N

?
45
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

AC HN

46
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

AC HN

?
47
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

ACE HN

48
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

ACE HN

?
49
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

ACE HKN

50
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

ACE HKN

?
51
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

ACE HKN Node is full, so split

52
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

GK

ACE H N Node is full, so split

53
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

GK

ACE H NQ

54
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

GK

ACE H MNQ

55
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

GK

ACE H MNQ

56
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CGK

A E H MNQ

57
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CGK

A EF H MNQ

58
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CGK

A EF H MNQ

59
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

CGK root is full, so split

A EF H MNQ

?
60
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

G root is full, so split

C K

A EF H MNQ

61
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

C K

A EF H MNQ node is full, so split

62
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

C KN

A EF H M Q node is full, so split

63
Insertion: t = 2
GCNAHEKQMFWLTZDPRXYS

C KN

A EF H M QW

64
Insertion: t = 2
GCNAHEKQMFW…

C KN

A EF H M QW

65
Insertion: t = 3

66
Correctness of insert
Starting at root, follow search path down the tree
 If the node is full (contains 2t - 1 keys), split the keys
around the median value into two nodes and add the
median value to the parent node
 If the node is a leaf, insert it into the correct spot

Does it add the value in the correct spot?


 Follows the correct search path
 Inserts in correct position

67
Correctness of insert
Starting at root, follow search path down the tree
 If the node is full (contains 2t - 1 keys), split the keys
around the median value into two nodes and add the
median value to the parent node
 If the node is a leaf, insert it into the correct spot

Do we maintain a proper B-tree?


 Maintain t-1 to 2t-1 keys per node?
 Always split full nodes when we see them
 Only split full nodes
 All leaves at the same level?
 Only add nodes at leaves
68
Insert running time
Without any splitting?
 Similar to BTreeSearch, with one extra disk write
at the leaf
 O(logtn) disk accesses
 O(t logtn) computation time

69
When a node is split
How many disk accesses?
 3 disk write operations
 2 for the new nodes created by the split (one is reused, but
must be updated)
 1 for the parent node to add median value
Runtime to split a node?
 O(t) – iterating through the elements a few times since
they’re already in sorted order

Maximum number of nodes split for a call to


insert?
 O(height of the tree) 70
Running time of insert
O(logtn) disk accesses

O(t logtn) computational costs

71
Review of Deletions

 All Deletions take place in leaf nodes


 To delete an internal key, swap it with its
successor or predecessor which is a leaf.
 Then Delete
 Deficient Nodes are legalized by:
 Rotation with a sibling and parent.
OR
 Combining with key from parent and sibling

 Propagating up the tree until a legal node is


encountered.
72
Deletion : t = 3

73
Deletion : t = 3

74
Running time of Deletion
O(logtn) disk accesses

O(t logtn) computational costs

75

You might also like