AVL Tree
AVL Tree
AVL Trees
15-122: Principles of Imperative Computation
Frank Pfenning
Lecture 18
March 22, 2011
Introduction
Binary search trees are an excellent data structure to implement associative arrays, maps, sets, and similar interfaces. The main difficulty, as discussed in last lecture, is that they are efficient only when they are balanced.
Straightforward sequences of insertions can lead to highly unbalanced trees
with poor asymptotic complexity and unacceptable practical efficiency. For
example, if we insert n elements with keys that are in strictly increasing or
decreasing order, the complexity will be O(n2 ). On the other hand, if we
can keep the height to O(log(n)), as it is for a perfectly balanced tree, then
the commplexity is bounded by O(n log(n)).
The solution is to dynamically rebalance the search tree during insert
or search operations. We have to be careful not to destroy the ordering
invariant of the tree while we rebalance. Because of the importance of binary search trees, researchers have developed many different algorithms
for keeping trees in balance, such as AVL trees, red/black trees, splay trees,
or randomized binary search trees. They differ in the invariants they maintain (in addition to the ordering invariant), and when and how the rebalancing is done.
In this lecture we use AVL trees, which is a simple and efficient data
structure to maintain balance, and is also the first that has been proposed.
It is named after its inventors, G.M. Adelson-Velskii and E.M. Landis, who
described it in 1962.
L ECTURE N OTES
AVL Trees
L18.2
10
4
L ECTURE N OTES
16
13
height=3
heightinv.sa4sed
19
AVL Trees
L18.3
If we insert a new element with a key of 14, the insertion algorithm for
binary search trees without rebalancing will put it to the right of 13.
10
4
16
13
height=4
heightinv.sa.sed
19
14
Now the tree has height 4, and one path is longer than the others. However,
it is easy to check that at each node, the height of the left and right subtrees
still differ only by one. For example, at the node with key 16, the left subtree
has height 2 and the right subtree has height 1, which still obeys our height
invariant.
Now consider another insertion, this time of an element with key 15.
This is inserted to the right of the node with key 14.
10
4
16
13
height=5
heightinv.violatedat13,16,10
19
14
15
L ECTURE N OTES
AVL Trees
L18.4
All is well at the node labeled 14: the left subtree has height 0 while the
right subtree has height 1. However, at the node labeled 13, the left subtree
has height 0, while the right subtree has height 2, violating our invariant.
Moreover, at the node with key 16, the left subtree has height 3 while the
right subtree has height 1, also a difference of 2 and therefore an invariant
violation.
We therefore have to take steps to rebalance tree. We can see without
too much trouble, that we can restore the height invariant if we move the
node labeled 14 up and push node 13 down and to the right, resulting in
the following tree.
10
4
16
14
13
height=4
heightinv.restoredat14,16,10
19
15
L ECTURE N OTES
AVL Trees
L18.5
and which is greater than y. The tree on the right is after the left rotation.
(,+)
(,+)
y
(,x)
le,rota1onatx
(x,y) (y,+)
(y,+)
(,x) (x,y)
From the intervals we can see that the ordering invariants are preserved, as
are the contents of the tree. We can also see that it shifts some nodes from
the right subtree to the left subtree. We would invoke this operation if the
invariants told us that we have to rebalance from right to left.
We implement this with some straightforward code. First, recall the
type of trees from last lecture. We do not repeat the function is_ordtree
that checks if a tree is ordered.
struct tree {
elem data;
struct tree* left;
struct tree* right;
};
typedef struct tree* tree;
bool is_ordtree(tree T);
The main point to keep in mind is to use (or save) a component of the
input before writing to it. We apply this idea systematically, writing to a
location immediately after using it on the previous line. We repeat the type
specification of tree from last lecture.
tree rotate_left(tree T)
//@requires is_ordtree(T);
//@requires T != NULL && T->right != NULL;
//@ensures is_ordtree(\result);
//@ensures \result != NULL && \result->left != NULL;
{
tree root = T->right;
L ECTURE N OTES
AVL Trees
L18.6
T->right = root->left;
root->left = T;
return root;
}
The right rotation is entirely symmetric. First in pictures:
(,+)
(,+)
y
z
y
rightrota1onatz
(z,+)
(,y) (y,z)
(,y)
(y,z) (z,+)
Then in code:
tree rotate_right(tree T)
//@requires is_ordtree(T);
//@requires T != NULL && T->left != NULL;
//@ensures is_ordtree(\result);
//@ensures \result != NULL && \result->right != NULL;
{
tree root = T->left;
T->left = root->right;
root->right = T;
return root;
}
L ECTURE N OTES
AVL Trees
L18.7
Inserting an Element
h+3
(,+)
y
h+2
le1rota6onatx
y
z
h+1
h
(,x)
(x,y)
L ECTURE N OTES
(y,z) (z,+)
(,x) (x,y)
(y,z) (z,+)
AVL Trees
L18.8
We fix this with a left rotation, the result of which is displayed to the right.
In the second case we consider we once again insert into the right subtree, but now the left subtree of the right subtree has height h + 1.
(,+)
x
h+3
z
h+2
doublerota8onatzandx
h+1
(,+)
y
h
(,x)
(x,y)
(y,z) (z,+)
(,x) (x,y)
(y,z) (z,+)
In that case, a left rotation alone will not restore the invariant (see Exercise 1). Instead, we apply a so-called double rotation: first a right rotation at
z, then a left rotation at the root. When we do this we obtain the picture on
the right, restoring the height invariant.
There are two additional symmetric cases to consider, if we insert the
new element on the left (see Exercise 4).
We can see that in each of the possible cases where we have to restore
the invariant, the resulting tree has the same height h + 2 as before the
insertion. Therefore, the height invariant above the place where we just
restored it will be automatically satisfied.
Checking Invariants
The interface for the implementation is exactly the same as for binary search
trees, as is the code for searching for a key. In various places in the algorithm we have to compute the height of the tree. This could be an operation
of asymptotic complexity O(n), unless we store it in each node and just look
it up. So we have:
struct tree {
elem data;
int height;
struct tree* left;
struct tree* right;
};
L ECTURE N OTES
AVL Trees
L18.9
L ECTURE N OTES
AVL Trees
L18.10
Implementing Insertion
The code for inserting an element into the tree is mostly identical with
the code for plain binary search trees. The difference is that after we insert into the left or right subtree, we call a function rebalance_left or
rebalance_right, respectively, to restore the invariant if necessary and calculate the new height.
tree tree_insert(tree T, elem e)
//@requires is_avl(T);
//@ensures is_avl(\result);
{
assert(e != NULL); /* cannot insert NULL element */
if (T == NULL) {
T = leaf(e); /* create new leaf with data e */
} else {
int r = compare(elem_key(e), elem_key(T->data));
if (r < 0) {
T->left = tree_insert(T->left, e);
T = rebalance_left(T); /* also fixes height */
} else if (r == 0) {
T->data = e;
} else { //@assert r > 0;
T->right = tree_insert(T->right, e);
T = rebalance_right(T); /* also fixes height */
}
}
return T;
}
L ECTURE N OTES
AVL Trees
L18.11
AVL Trees
L18.12
Experimental Evaluation
We would like to assess the asymptotic complexity and then experimentally validate it. It is easy to see that both insert and search operations take
type O(h), where h is the height of the tree. But how is the height of the tree
related to the number of elements stored, if we use the balance invariant of
AVL trees? It turns out that h is O(log(n)). It is not difficult to prove this,
but it is beyond the scope of this course.
To experimentally validate this prediction, we have to run the code with
inputs of increasing size. A convenient way of doing this is to double the
size of the input and compare running times. If we insert n elements into
the tree and look them up, the running time should be bounded by c n
log(n) for some constant c. Assume we run it at some size n and observe
r = c n log(n). If we double the input size we have c (2 n) log(2 n) =
2 c n (1 + log(n)) = 2 r + 2 c n, we mainly expect the running
time to double with an additional summand that roughly doubles with as n
doubles. In order to smooth out minor variations and get bigger numbers,
we run each experiment 100 times. Here is the table with the results:
n
BSTs
29
0.129
1.018
210
0.281 2r + 0.023
2.258
211
0.620 2r + 0.058
3.094
212
1.373 2r + 0.133
7.745
213
214
215
We see in the third column, where 2r stands for the doubling of the previous value, we are quite close to the predicted running time, with a approximately linearly increasing additional summand.
In the fourth column we have run the experiment with plain binary
search trees which do not rebalance automatically. First of all, we see that
they are much less efficient, and second we see that their behavior with
increasing size is difficult to predict, sometimes jumping considerably and
sometimes not much at all. In order to understand this behavior, we need
to know more about the order and distribution of keys that were used in
this experiment. They were strings, compared lexicographically. The keys
L ECTURE N OTES
AVL Trees
L18.13
Exercises
Exercise 1 Show that in the situation on page 8 a single left rotation at the root
will not necessarily restore the height invariant.
Exercise 2 Show, in pictures, that a double rotation is a composition of two rotations. Discuss the situation with respect to the height invariants after the first
rotation.
Exercise 3 Show that left and right rotations are inverses of each other. What can
you say about double rotations?
Exercise 4 Show the two cases that arise when inserting into the left subtree
might violate the height invariant, and show how they are repaired by a right rotation, or a double rotation. Which two single rotations does the double rotation
consist of in this case?
Exercise 5 Strengthen the invariants in the AVL tree implementation so that the
assertions and postconditions which guarantee that rebalancing restores the height
invariant and reduces the height of the tree follow from the preconditions.
L ECTURE N OTES