0% found this document useful (0 votes)
61 views11 pages

4.4 Balanced Trees: Symbol Table

Symbol table: key-value pair abstraction. Insert a value with specified key. Search for value given key. Delete value with given key. Randomized BST. O(log N) time per op. Store subtree count in each node. Allows 1, 2, or 3 keys per node; keeps tree balanced. Perfect balance. Every path from root to leaf has same length.

Uploaded by

Sahil Qureshi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views11 pages

4.4 Balanced Trees: Symbol Table

Symbol table: key-value pair abstraction. Insert a value with specified key. Search for value given key. Delete value with given key. Randomized BST. O(log N) time per op. Store subtree count in each node. Allows 1, 2, or 3 keys per node; keeps tree balanced. Perfect balance. Every path from root to leaf has same length.

Uploaded by

Sahil Qureshi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Symbol Table Review

4.4 Balanced Trees Symbol table: key-value pair abstraction.


!Insert a value with specified key.
!Search for value given key.
!Delete value with given key.

Randomized BST.
! O(log N) time per op. [unless you get ridiculously unlucky]
! Store subtree count in each node.
! Generate random numbers for each insert/delete op.

This lecture. 2-3-4 trees, red-black trees, B-trees.

Reference: Chapter 13, Algorithms in Java, 3rd Edition, Robert Sedgewick.

Robert Sedgewick and Kevin Wayne • Copyright © 2006 • http://www.Princeton.EDU/~cos226 2

2-3-4 Tree

2-3-4 Trees 2-3-4 tree. Generalize node to allow multiple keys; keep tree balanced.

Perfect balance. Every path from root to leaf has same length.

Allow 1, 2, or 3 keys per node.


! 2-node: one key, two children.
! 3-node: two keys, three children.
! 4-node: three keys, four children. K R

< K K—R > R

C E M O W

A D F G J L N Q S V Y Z

Robert Sedgewick and Kevin Wayne • Copyright © 2006 • http://www.Princeton.EDU/~cos226 4


2-3-4 Tree: Search 2-3-4 Tree: Insert

Search. Insert.
! Compare search key against keys in node. Search to bottom for key.
!

! Find interval containing search key. 2-node at bottom: convert to 3-node.


!

! Follow associated link (recursively). 3-node at bottom: convert to 4-node.


!

4-node at bottom: ??
!

K R K R

< K K —R > R < K K—R > R

C E M O W C E M O W

A D F G J L N Q S V Y Z A D F G J L N Q S V Y Z

5 6

2-3-4 Tree: Splitting Four Nodes 2-3-4 Tree: Splitting a Four Node

Transform tree on the way down. Ex. To split a four node, move middle key up.
! Ensures last node is not a 4-node.
! Local transformation to split 4-nodes:

D D Q

K Q W K W
A-C A-C

Invariant. Current node is not a 4-node.


Consequence. Insertion at bottom is easy since it's not a 4-node. E-J L-P R-V X-Z
E-J L-P R-V X-Z

7 8
2-3-4 Tree 2-3-4 Tree: Balance

Tree grows up from the bottom. Property. All paths from root to leaf have same length.

X A

Tree height.
! Worst case: lg N [all 2-nodes]
! Best case: log4 N = 1/2 lg N [all 4-nodes]
M P
! Between 10 and 20 for a million nodes.
! Between 15 and 30 for a billion nodes.

L E

9 10

2-3-4 Tree: Implementation? Symbol Table: Implementations Cost Summary

Direct implementation. Complicated because of:


! Maintaining multiple node types. Worst Case Average Case
! Implementation of getChild(). Implementation Search Insert Delete Search Insert Delete
! Large number of cases for split(). Sorted array log N N N log N N N

Unsorted list N 1 1 N 1 1

Hashing N 1 N 1* 1* 1*

private void insert(Key key, Val val) { BST N N N log N † log N † log N †

Node x = root;
Randomized BST log N ‡ log N ‡ log N ‡ log N log N log N
while (x.getChild(key) != null) {
x = x.getChild(key); Splay log N § log N § log N § log N § log N § log N §

if (x.is4Node()) x.split();
2-3-4 log N log N log N log N log N log N
}
if (x.is2Node()) x.make3Node(key, val);
else if (x.is3Node()) x.make4Node(key, val); * assumes hash map is random for all keys
† N is the number of nodes ever inserted
} ‡ probabilistic guarantee
§ amortized guarantee
fantasy code

Note. Comparison within nodes not accounted for.

11 12
Red-Black Tree

Red-Black Trees Represent 2-3-4 tree as a BST.


! Use "internal" edges for 3- and 4- nodes.
red glue

not 1-1 because 3-nodes swing either way.

! Correspondence between 2-3-4 trees and red-black trees.

Robert Sedgewick and Kevin Wayne • Copyright © 2006 • http://www.Princeton.EDU/~cos226 14

Red-Black Tree: Splitting Nodes Red-Black Tree: Splitting Nodes

Two easy cases. Switch colors. Two easy cases. Switch colors.

Two hard cases. Use rotations.

do single rotation

do double rotation

15 16
Red-Black Tree: Splitting Nodes Red-Black Tree: Insertion

inserting G

E
change colors

X A

right rotate R !

M P

left rotate E !
L E

17 18

Red-Black Tree: Balance Symbol Table: Implementations Cost Summary

Property A. Every path from root to leaf has same number of black links.
Worst Case Average Case
Property B. At most one red link in-a-row. Implementation Search Insert Delete Search Insert Delete

Sorted array log N N N log N N N


Property C. Height of tree is less than 2 lg N + 2.
Unsorted list N 1 1 N 1 1

Hashing N 1 N 1* 1* 1*

BST N N N log N † log N † log N †

Randomized BST log N ‡ log N ‡ log N ‡ log N log N log N

Splay log N § log N § log N § log N § log N § log N §

Red-Black log N log N log N log N log N log N

* assumes hash map is random for all keys


† N is the number of nodes ever inserted
‡ probabilistic guarantee
§ amortized guarantee

Note. Comparison within nodes are accounted for.

19 20
Red-Black Trees: Practice

Red-black trees vs. splay trees.


! Fewer rotations than splay trees.
at most 2 per insertion
B-Trees
possible to eliminate
! One extra bit per node for color.

Red-black trees vs. hashing.


! Hashing code is simpler and usually faster:
arithmetic to compute hash vs. comparison.
! Hashing performance guarantee is weaker.
! BSTs have more flexibility and can support wider range of ops.

In the wild. Red-black trees are widely used as system symbol tables.
! Java: java.util.TreeMap, java.util.TreeSet.
! C++ STL: map, multimap, multiset.
! Linux kernel: linux/rbtree.h.

21 Robert Sedgewick and Kevin Wayne • Copyright © 2006 • http://www.Princeton.EDU/~cos226

B-Tree B-Tree Example

B-Tree. Generalizes 2-3-4 trees by allowing up to M links per node.

Main application: file systems.


!Reading a page into memory from disk is expensive.
!Accessing info on a page in memory is free.
!Goal: minimize # page accesses.
!Node size M = page size.
Page

Space-time tradeoff. insert 275


! M large " only a few levels in tree.
! M small " less wasted space.
! Typical M = 1000, N < 1 trillion.

Bottom line. Number of page accesses is logMN per op.

3 or 4 in practice! M=5

23 24
B-Tree Example (cont) Symbol Table: Implementations Cost Summary

Worst Case Average Case

Implementation Search Insert Delete Search Insert Delete

Sorted array log N N N log N N/2 N/2

Unsorted list N N N N N N

Hashing N 1 N 1* 1* 1*

BST N N N log N † log N † log N †

Randomized BST log N ‡ log N ‡ log N ‡ log N § log N § log N §

Splay log N § log N § log N § log N § log N § log N §

Red-Black log N § log N § log N § log N § log N § log N §

B-Tree 1 1 1 1 1 1

page accesses

B-Tree. Number of page accesses is logMN per op.


effectively a constant
25 26

B-Trees in the Wild Summary

Variants. Goal. ST implementation with log N guarantee for all ops.


! B trees: Bayer-McCreight. [1972, Boeing] ! Probabilistic: randomized BST.
! B+ trees: all data in external nodes. ! Amortized: splay tree.
! B* trees: keeps pages at least 2/3 full. ! Worst-case: red-black tree.
! R-trees for spatial searching: GIS, VLSI. ! Algorithms are variations on a theme: rotations when inserting.

File systems.
! Windows: HPFS. Abstraction extends to give search algorithms for huge files.
! Mac: HFS, HFS+. !B-tree.
! Linux: ReiserFS, XFS, Ext3FS, JFS.

Databases.
! Most common index type in modern databases.
! ORACLE, DB2, INGRES, SQL, PostgreSQL, …

27 28
Splay Trees

Splay Trees Splay trees = self-adjusting BST.


! Tree automatically reorganizes itself after each op.
! After inserting x or searching for x, rotate x up to root using
double rotations.
! Tree remains "balanced" without explicitly storing any balance
information.

Amortized guarantee: any sequence of N ops, starting from empty


splay tree, takes O(N log N) time.
! Height of tree can be N.
! Individual op can take linear time.

Robert Sedgewick and Kevin Wayne • Copyright © 2006 • http://www.Princeton.EDU/~cos226 30

A Self-Adjusting Tree Splay Trees

Splay.
! Check two links above current node.
! ZIG-ZAG: if orientations differ, same as root insertion.
! ZIG-ZIG: if orientations match, do top rotation first.

x y

z ZIG-ZAG x z
A

y
D A B C D

B C
31 32
Splay Trees Splay Trees

Splay. Splay.
Check two links above current node.
! ! Check two links above current node.
ZIG-ZAG: if orientations differ, same as root insertion.
! ! ZIG-ZAG: if orientations differ, same as root insertion.
ZIG-ZIG: if orientations match, do top rotation first.
! ! ZIG-ZIG: if orientations match, do top rotation first.

z x

y y
D A

x ZIG-ZIG z
C B

ZAG-ZAG
A B C D
Root = Splay Root Insertion Splay Insertion
33 34

Splay Example Splay Example

Search for 1. Search for 1.


10 10

9 9

8 8

7 7

6 ZIG-ZIG 6 ZIG-ZIG

5 5

4 4

3 1

2 2

1 3

35 36
Splay Example Splay Example

Search for 1. Search for 1.


10 10

9 9

8 8

7 1

6 ZIG-ZIG 6 ZIG-ZIG

1 4 7

4
2 5

2 5
3

37 38

Splay Example Splay Example

Search for 1. Search for 1.


10 1

1 10

8 8

6 9 6 9

4 7 ZIG 4 7

2 5 2 5

3 3

39 40
Splay Example Splay Trees

Search for 2. Intuition.


!Splay rotations halve search path.
Reduces length of path for many other nodes in tree.
2
!
1

10 1 8
insert 1, 2, …, 40 insert 1, 2, …, 40
8 4
10

6 9 3 6 9
Splay(2)
4 7 search for
5 7 search 1 random key
2 5
search 2
3

search 3

search 4

41 42

Symbol Table: Implementations Cost Summary

Worst Case Average Case

Implementation Search Insert Delete Search Insert Delete

Sorted array log N N N log N N N

Unsorted list N 1 1 N 1 1

Hashing N 1 N 1* 1* 1*

BST N N N log N log N sqrt(N) †

Randomized BST log N ‡ log N ‡ log N ‡ log N log N log N

Splay log N § log N § log N § log N § log N § log N §

* assumes we know location of node to be deleted


† if delete allowed, insert/search become sqrt(N)
‡ probabilistic guarantee
§ amortized guarantee

Splay: Sequence of N ops takes linearithmic time.


Ahead: Can we do all ops in log N time guaranteed?

43

You might also like