0% found this document useful (0 votes)
51 views

Where We Are: CSE332: Data Abstractions Lecture 6: Dictionaries Binary Search Trees

This document discusses dictionaries and binary search trees. It begins by reviewing the stack, queue, and priority queue abstract data types that have been covered previously. Next, it states that dictionaries (also called maps) associate keys with values and will be the focus going forward. It then provides an overview of binary search trees and how they can be used to implement efficient dictionaries before discussing some simple dictionary implementations and their time complexities.

Uploaded by

Lucia Makwasha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Where We Are: CSE332: Data Abstractions Lecture 6: Dictionaries Binary Search Trees

This document discusses dictionaries and binary search trees. It begins by reviewing the stack, queue, and priority queue abstract data types that have been covered previously. Next, it states that dictionaries (also called maps) associate keys with values and will be the focus going forward. It then provides an overview of binary search trees and how they can be used to implement efficient dictionaries before discussing some simple dictionary implementations and their time complexities.

Uploaded by

Lucia Makwasha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Where we are

Studying the absolutely essential ADTs of computer science and


classic data structures for implementing them

ADTs so far:
CSE332: Data Abstractions 1. Stack: push, pop, isEmpty, …
2. Queue: enqueue, dequeue, isEmpty, …
Lecture 6: Dictionaries; Binary Search Trees 3. Priority queue: insert, deleteMin, …

Next:
Dan Grossman 4. Dictionary (a.k.a. Map): associate keys with values
Spring 2010 – probably the most common, way more than priority queue

Spring 2010 CSE332: Data Abstractions 2

The Dictionary (a.k.a. Map) ADT Comparison: The Set ADT


• djg
• Data: Dan The Set ADT is like a Dictionary without any values
– set of (key, value) pairs Grossman
– A key is present or not (no repeats)
– keys must be insert(djg, ….) …
comparable
For find, insert, delete, there is little difference
• trobison
Tyler – In dictionary, values are “just along for the ride”
• Operations:
Robison – So same data-structure ideas work for dictionaries and sets
– insert(key,value)
find(trobison) …
– find(key)
Tyler, Robison, … But if your Set ADT has other important operations this may not hold
– delete(key) • sandona1
Brent
– union, intersection, is_subset
– …
Sandona – notice these are binary operators on sets
Will tend to emphasize the keys, …
don’t forget about the stored values

Spring 2010 CSE332: Data Abstractions 3 Spring 2010 CSE332: Data Abstractions 4
Dictionary data structures A Modest Few Uses

Will spend the next 1.5-2 weeks implementing dictionaries with Any time you want to store information according to some key and
three different data structures be able to retrieve it efficiently
– Lots of programs do that!
1. AVL trees
– Binary search trees with guaranteed balancing
• Networks: router tables
2. B-Trees • Operating systems: page tables
– Also always balanced, but different and shallower • Compilers: symbol tables
• Databases: dictionaries with other nice properties
3. Hashtables
• Search: inverted indexes, phone directories, …
– Not tree-like at all
• Biology: genome maps
Skipping: Other balanced trees (red-black, splay) • …

But first some applications and less efficient implementations…

Spring 2010 CSE332: Data Abstractions 5 Spring 2010 CSE332: Data Abstractions 6

Simple implementations Simple implementations


For dictionary with n key/value pairs For dictionary with n key/value pairs

insert find delete insert find delete


• Unsorted linked-list • Unsorted linked-list O(1) O(n) O(n)

• Unsorted array • Unsorted array O(1) O(n) O(n)

• Sorted linked list • Sorted linked list O(n) O(n) O(n)

• Sorted array • Sorted array O(n) O(log n) O(n)

We’ll see a Binary Search Tree (BST) probably does better, but We’ll see a Binary Search Tree (BST) probably does better, but
not in the worst case unless we keep it balanced not in the worst case unless we keep it balanced

Spring 2010 CSE332: Data Abstractions 7 Spring 2010 CSE332: Data Abstractions 8
Lazy Deletion Some tree terms (mostly review)
10 12 24 30 41 42 44 45 50
9 8 9 9 9 9 8 9 9
• There are many kinds of trees
A general technique for making delete as fast as find: – Every binary tree is a tree
– Instead of actually removing the item just mark it deleted – Every list is kind of a tree (think of “next” as the one child)
Plusses:
• There are many kinds of binary trees
– Simpler
– Every binary min heap is a binary tree
– Can do removals later in batches
– Every binary search tree is a binary tree
– If re-added soon thereafter, just unmark the deletion
Minuses:
• A tree can be balanced or not
– Extra space for the “is-it-deleted” flag – A balanced tree with n nodes has a height of O(log n)
– Data structure full of deleted nodes wastes space
– Different tree data structures have different “balance
– find O(log m) time where m is data-structure size (okay) conditions” to achieve this
– May complicate other operations
Spring 2010 CSE332: Data Abstractions 9 Spring 2010 CSE332: Data Abstractions 10

Binary Trees Binary Tree: Some Numbers


Recall: height of a tree = longest path from root to leaf (count edges)
• Binary tree is empty or
– a root (with data)
– a left subtree (maybe empty) A For binary tree of height h:
– a right subtree (maybe empty) – max # of leaves:

B C
• Representation: – max # of nodes:

Data D E F – min # of leaves:


left right
pointer pointer G H – min # of nodes:

• For a dictionary, data will include a


key and a value
I J

Spring 2010 CSE332: Data Abstractions 11 Spring 2010 CSE332: Data


12Abstractions
Binary Trees: Some Numbers Calculating height
Recall: height of a tree = longest path from root to leaf (count edges)
What is the height of a tree with root r?
For binary tree of height h: int treeHeight(Node root) {
– max # of leaves:
2h ???
– max # of nodes:
2(h + 1) - 1
}
– min # of leaves: 1
– min # of nodes: h+1

For n nodes, we cannot do better than O(log n) height,


and we want to avoid O(n) height
Spring 2010 CSE332: Data Abstractions 13 Spring 2010 CSE332: Data Abstractions 14

Calculating height Tree Traversals


What is the height of a tree with root r?
A traversal is an order for visiting all the nodes of a tree
int treeHeight(Node root) {
if(root == null) • Pre-order: root, left subtree, right subtree
return -1; +
return 1 + max(treeHeight(root.left), • In-order: left subtree, root, right subtree
treeHeight(root.right));
} • Post-order: left subtree, right subtree, root * 5

2 4
Running time for tree with n nodes: O(n) – single pass over tree
(an expression tree)
Note: non-recursive is painful – need your own stack of pending
nodes; much easier to use recursion’s call stack

Spring 2010 CSE332: Data Abstractions 15 Spring 2010 CSE332: Data Abstractions 16
More on traversals Binary Search Tree
void inOrdertraversal(Node t){ A • Structural property (“binary”)
if(t != null) {
traverse(t.left); – each node has ≤ 2 children
process(t.element); B C – result: keeps operations simple 8
traverse(t.right);
} • Order property
D E F G 5 11
}
– all keys in left subtree smaller
A than node’s key
Sometimes order doesn’t matter
B – all keys in right subtree larger 2 6 10 12
• Example: sum all elements
D than node’s key
Sometimes order matters
E – result: easy to find any given key
• Example: print tree with parent above 4 7 9 14
indented children (pre-order) C
• Example: evaluate an expression tree F
13
(post-order) G
Spring 2010 CSE332: Data Abstractions 17 Spring 2010 CSE332: Data Abstractions 18

Are these BSTs? Are these BSTs?

5 8 5 8

4 8 5 11 4 8 5 11

1 7 11 2 7 6 10 18 1 7 11 2 7 6 10 18

3 4 15 20 3 4 15 20

21 21

Spring 2010 CSE332: Data Abstractions 19 Spring 2010 CSE332: Data Abstractions 20
Find in BST, Recursive Find in BST, Iterative

12 Data find(Key key, Node root){ 12 Data find(Key key, Node root){
if(root == null) while(root != null
return null; && root.key != key) {
5 15 if(key < root.key) 5 15 if(key < root.key)
return find(key,root.left); root = root.left;
if(key > root.key) else(key > root.key)
2 9 20 return find(key,root.right); 2 9 20 root = root.right;
return root.data; }
} if(root == null)
7 10 17 30 7 10 17 30 return null;
return root.data;
}

Spring 2010 CSE332: Data Abstractions 21 Spring 2010 CSE332: Data Abstractions 22

Other “finding operations” Insert in BST


12
insert(13)
• Find minimum node insert(8)
12 5 15
– “the liberal algorithm” insert(31)
• Find maximum node
5 15
– “the conservative algorithm” 2 9 13 20

• Find predecessor of a non-leaf 2 9 20 (New) insertions happen


7 10 17 30
• Find successor of a non-leaf only at leaves – easy!
• Find predecessor of a leaf
7 10 17 30
• Find successor of a leaf 8 31

Spring 2010 CSE332: Data Abstractions 23 Spring 2010 CSE332: Data Abstractions 24
Deletion in BST Deletion
• Removing an item disrupts the tree structure
12
• Basic idea: find the node to be removed, then
5 15 “fix” the tree so that it is still a binary search tree

• Three cases:
2 9 20 – node has no children (leaf)
– node has one child
7 10 17 30 – node has two children

Why might deletion be harder than insertion?

Spring 2010 CSE332: Data Abstractions 25 Spring 2010 CSE332: Data Abstractions 26

Deletion – The Leaf Case Deletion – The One Child Case

delete(17) 12 delete(15) 12

5 15 5 15

2 9 20 2 9 20

7 10 17 30 7 10 30

Spring 2010 CSE332: Data Abstractions 27 Spring 2010 CSE332: Data Abstractions 28
Deletion – The Two Child Case Deletion – The Two Child Case
12 Idea: Replace the deleted node with a value guaranteed to be
delete(5) between the two child subtrees
5 20
Options:
• successor from right subtree: findMin(node.right)
2 9 30 • predecessor from left subtree: findMax(node.left)
– These are the easy cases of predecessor/successor
7 10
Now delete the original node containing successor or predecessor
• Leaf or one child case – easy cases of delete!
What can we replace 5 with?

Spring 2010 CSE332: Data Abstractions 29 Spring 2010 CSE332: Data Abstractions 30

BuildTree for BST BuildTree for BST


• We had buildHeap, so let’s consider buildTree • Insert keys 1, 2, 3, 4, 5, 6, 7, 8, 9 into an empty BST

• What we if could somehow re-arrange them


• Insert keys 1, 2, 3, 4, 5, 6, 7, 8, 9 into an empty BST
– median first, then left median, right median, etc.
– 5, 3, 7, 2, 1, 4, 8, 6, 9
– If inserted in given order, 1 5
what is the tree?
– What tree does that give us?
– What big-O runtime for 2
O(n2) 3 7
this kind of sorted input? Not a happy place – What big-O runtime?
3 O(n log n), definitely better
– Is inserting in the reverse order 2 4 6 8
any better?
1 9

Spring 2010 CSE332: Data Abstractions 31 Spring 2010 CSE332: Data Abstractions 32
Unbalanced BST Balanced BST

• Balancing a tree at build time is insufficient, as sequences of Observation


operations can eventually transform that carefully balanced tree • BST: the shallower the better!
into the dreaded list • For a BST with n nodes inserted in arbitrary order
– Average height is O(log n) – see text for proof
• At that point, everything is – Worst case height is O(n)
O(n) and nobody is happy 1 • Simple cases such as inserting in key order lead to
– find the worst-case scenario
– insert
– delete 2 Solution: Require a Balance Condition that
1. ensures depth is always O(log n) – strong enough!
3 2. is easy to maintain – not too strong!

Spring 2010 CSE332: Data Abstractions 33 Spring 2010 CSE332: Data Abstractions 34

Potential Balance Conditions Potential Balance Conditions


1. Left and right subtrees of the root 3. Left and right subtrees of every node
have equal number of nodes have equal number of nodes
Too weak! Too strong!
Height mismatch example: Only perfect trees (2n – 1 nodes)

2. Left and right subtrees of the root 4. Left and right subtrees of every node
have equal height have equal height

Too weak! Too strong!


Double chain example: Only perfect trees (2n – 1 nodes)

Spring 2010 CSE332: Data Abstractions 35 Spring 2010 CSE332: Data Abstractions 36
The AVL Balance Condition
Left and right subtrees of every node
have heights differing by at most 1

Definition: balance(node) = height(node.left) – height(node.right)

AVL property: for every node x, –1 ≤ balance(x) ≤ 1

• Ensures small depth


– Will prove this by showing that an AVL tree of height
h must have a number of nodes exponential in h

• Easy (well, efficient) to maintain


– Using single and double rotations

Spring 2010 CSE332: Data Abstractions 37

You might also like