TREES
TREES
A connected graph that contains no simple circuits is called a tree. Trees were used as long
ago as 1857, when the English mathematician Arthur Cayley used them to count certain types
of chemical compounds. Since that time, trees have been employed to solve problems in a
wide variety of disciplines.
Trees are particularly useful in computer science, where they are employed in a wide range of
algorithms. For instance, trees are used to construct efficient algorithms for locating items in
a list. They can be used in algorithms, such as Huffman coding, that construct efficient codes
saving costs in data transmission and storage. Trees can be used to study games such as
checkers and chess and can help determine winning strategies for playing these games. Trees
can be used to model procedures carried out using a sequence of decisions. Constructing
these models can help determine the computational complexity of algorithms based on a
sequence of decisions, such as sorting algorithms.
Definition of a Tree
Let G=(V , E) be a loop free and no multiple edges undirected graph. The graph G is called
a tree if G is connected and contains no cycles. Therefore, any tree must be a simple graph.
But the graph G2 is not a tree because it contains the cycle {a,b},{b,c},{c,a}.
Solution: G1 and G2 are trees, because both are connected graphs with no cycles.
G3 is not a tree because e, b, a, d, e is a cycle in this graph.
G4 is not a tree because it is not connected.
Forests
A graph containing no cycle but that are not necessarily connected is called a forest, G1 and
G2 are forests with 2 trees
Rooted Trees
Figure 4 displays the rooted trees formed by designating a to be the root and c to be the root,
respectively, in the tree T. We usually draw a rooted tree with its root at the top of the graph.
The arrows indicating the directions of the edges in a rooted tree can be omitted, because the
choice of root determines the directions of the edges.
Terminologies of a tree
Parent: Suppose that T is a rooted tree. If v is a vertex in T other than the root, the parent of
v is the unique vertex u such that there is a directed edge from u to v
Child: When u is the parent of v, v is called a child of u.
Siblings: Vertices with the same parent are called siblings
Ancestors: The ancestors of a vertex other than the root are the vertices in the path from the
root to this vertex, excluding the vertex itself and including the root (that is, its parent, its
parent’s parent, and so on, until the root is reached)
Descendants: The descendants of a vertex v are those vertices that have v as an ancestor.
Leaf: A vertex of a rooted tree is called a leaf if it has no children.
Internal vertices: Vertices that have Children are called internal vertices. The root is an
internal vertex unless it is the only vertex in the graph, in which case it is a leaf.
Subtree: If a is a vertex in a tree, the subtree with a as its root is the subgraph of the tree
consisting of a and its descendants and all edges incident to these descendants.
EXAMPLE 2 : In the rooted tree T (with root a) shown in Figure 5, find the following.
a) The parent of c,
b) The children of g,
c) The siblings of h,
d) All ancestors of e,
e) All descendants of b,
f) All internal vertices,
g) All leaves.
h) What is the subtree rooted at g?
Solution:
a) The parent of c is b.
b) The children of g are h, i, and j.
c) The siblings of h are i and j.
d) The ancestors of e are c, b, and a.
e) The descendants of b are c, d, and e.
f) The internal vertices are a, b, c, g, h, and j.
g) The leaves are d, e, f , i, k, l, and m.
h) The subtree rooted at g is shown in Figure 6.
m-ary tree
A rooted tree is called an m-ary tree if every internal vertex has no more than m children.
The tree is called a full m-ary tree if every internal vertex has exactly m children. An m-ary
tree with m = 2 is called a binary tree.
EXAMPLE 3 : Are the rooted trees in Figure 7 full m-ary trees for some positive integer m?
Solution: T1 is a full binary tree because each of its internal vertices has two children. T2 is a
full 3-ary tree because each of its internal vertices has three children. In T3 each internal
vertex has five children, so T3 is a full 5-ary tree. T4 is not a full m-ary tree for any m because
some of its internal vertices have two children and others have three children.
EXAMPLE 4 : What are the left and right children of d in the binary tree T shown in Figure
8(a) (where the order is that implied by the drawing)? What are the left and right subtrees of
c?
Solution: The left child of d is f and the right child is g. We show the left and right subtrees
of c in Figures 8(b) and 8(c), respectively.
Representing Organizations
The structure of a large organization can be modeled using a rooted tree. Each vertex in this
tree represents a position in the organization. An edge from one vertex to another indicates
that the person represented by the initial vertex is the (direct) boss of the person represented
by the terminal vertex. The graph shown in Figure 10 displays such a tree. In the organization
represented by this tree, the Director of Hardware Development works directly for the Vice
President of R&D. The root of this tree is the vertex representing the President of the
organization.
Computer File Systems
Files in computer memory can be organized into directories. A directory can contain both
files and subdirectories. The root directory contains the entire file system.
Thus, a file system may be represented by a rooted tree, where the root represents the root
directory, internal vertices represent subdirectories, and leaves represent ordinary files or
empty directories. One such file system is shown in Figure 11. In this system, the file khr is
in the directory rje. (Note that links to files where the same file may have more than one
pathname can lead to circuits in computer file systems.)
Exercises
1) Which of these graphs are trees?
5) Is the rooted tree in Exercise 3 a full m-ary tree for some positive integer m?
6) Is the rooted tree in Exercise 4 a full m-ary tree for some positive integer m?
7) Draw the subtree of the tree in Exercise 3 that is rooted at
i) a. ii) c. iii) e.
8) Draw the subtree of the tree in Exercise 4 that is rooted at
ii) a. ii) c. iii) e.
Applications of Trees
Introduction
The first problem is: How should items in a list be stored so that an item can be easily
located? The second problem is: How should a set of characters be efficiently coded by bit
strings?
Solution: Figure 1 displays the steps used to construct this binary search tree.
The word mathematics is the key of the root. Because physics comes after
mathematics (in alphabetical order), add a right child of the root with key physics.
Because geography comes before mathematics, add a left child of the root with key
geography. Next, add a right child of the vertex with key physics, and assign it the
key zoology, because zoology comes after mathematics and after physics.
Similarly, add a left child of the vertex with key physics and assign this new vertex
the key meteorology.
Add a right child of the vertex with key geography and assign this new vertex the key
geology.
Add a left child of the vertex with key zoology and assign it the key psychology.
Add a left child of the vertex with key geography and assign it the key chemistry.
(The reader should work through all the comparisons needed at each step.)
Prefix Codes
Consider using bit strings of different lengths to encode letters. Letters that occur
more frequently should be encoded using short bit strings, and longer bit strings
should be used to encode rarely occurring letters.
When letters are encoded using varying numbers of bits, some method must be used
to determine where the bits for each character start and end.
For instance, if e were encoded with 0, a with 1, and t with 01, then the bit string 0101
could correspond to eat, tea, eaea, or tt.
One way to ensure that no bit string corresponds to more than one sequence of letters
is to encode letters so that the bit string for a letter never occurs as the first part of the
bit string for another letter. Codes with this property are called prefix codes.
For instance, the encoding of e as 0, a as 10, and t as 11 is a prefix code. A word can
be recovered from the unique bit string that encodes its letters.
For example, the string 10110 is the encoding of ate. To see this, note that the initial 1
does not represent a character, but 10 does represent a (and could not be the first part
of the bit string of another letter).
Then, the next 1 does not represent a character, but 11 does represent t. The final bit,
0, represents e.
Example 2: Draw a binary tree that represents the encoding of e by 0, a by 10, t by 110, n by
1110, and s by 1111.
Solution: Figure 5 shows the binary corresponding tree
Decoding bit string
Characters are encoded with the bit string constructed using the labels of the edges in the
unique
Path from the root to the leaves.
Example 3: Decode the string encoded by 11111011100 using the code in Figure 5.
Solution: This bit string can be decoded by starting at the root, using the sequence of bits to
form a path that stops when a leaf is reached. Consequently, the initial 1111 corresponds to
the path starting at the root, going right four times, leading to a leaf in the graph that has s as
its label, because the string 1111 is the code for s.
Continuing with the fifth bit, we reach a leaf next after going right then left, when the vertex
labeled with a, which is encoded by 10, is visited.
Starting with the seventh bit, we reach a leaf next after going right three times and then left,
when the vertex labeled with n, which is encoded by 1110, is visited.
Finally, the last bit, 0, leads to the leaf that is labeled with e. Therefore, the original word is
sane.
HUFFMAN CODING
This algorithm takes as input the frequencies (which are the probabilities of occurrences) of
symbols in a string and produces as output a prefix code that encodes the string using the
fewest possible bits, among all possible binary prefix codes for these symbols. This
algorithm, known as Huffman coding,
It was developed by David Huffman in a term paper he wrote in 1951 while a graduate
student at MIT. (Note that this algorithm assumes that we already know how many times
each symbol occurs in the string, so we can compute the frequency of each symbol by
dividing the number of times this symbol occurs by the length of the string.)
Huffman coding is a fundamental algorithm in data compression, the subject devoted to
reducing the number of bits required to represent information.
Huffman coding is extensively used to compress bit strings representing text and it also plays
an important role in compressing audio and image files.
Example 4: Use Huffman coding to encode the following symbols with the frequencies
listed: A: 0.08, B: 0.10, C: 0.12, D: 0.15, E: 0.20, F: 0.35. What is the average number of bits
used to encode a character?
Solution: Figure 6 displays the steps used to encode these symbols.
The encoding produced encodes A by 111, B by 110, C by 011, D by 010, E by 10, and F by
00.
The average number of bits used to encode a symbol using this encoding is
3 X 0.08 + 3 X 0.10 + 3 X 0.12 + 3 X 0.15 + 2 X 0.20 + 2 X 0.35 = 2.45.
Exercises
1) Build a binary search tree for the words banana, peach, apple, pear, coconut, mango,
and papaya using alphabetical order.
2) Build a binary search tree for the words oenology, phrenology, campanology,
ornithology, ichthyology, limnology, alchemy, and astrology using alphabetical order.
3) Which of these codes are prefix codes?
i) a: 11, e: 00, t: 10, s: 01
ii) a: 0, e: 1, t: 01, s: 001
iii) a: 101, e: 11, t: 001, s: 011, n: 010
iv) a: 010, e: 11, t: 011, s: 1011, n: 1001, i: 10101
4) Construct the binary tree with prefix codes representing these coding schemes.
i) a: 11, e: 0, t: 101, s: 100
ii) a: 1, e: 01, t: 001, s: 0001, n: 00001
iii) a: 1010, e: 0, t: 11, s: 1011, n: 1001, i: 10001
5) What are the codes for a, e, i, k, o, p, and u if the coding scheme is represented by this
tree?
6) Given the coding scheme a: 001, b: 0001, e: 1, r: 0000, s: 0100, t: 011, x: 01010, find
the word represented by
i) 01110100011. ii) 0001110000.
iii) 01100101010.
7) Use Huffman coding to encode these symbols with given frequencies: a: 0.20, b: 0.10,
c: 0.15, d: 0.25, e: 0.30. What is the average number of bits required to encode a
character?
8) Use Huffman coding to encode these symbols with given frequencies: A: 0.10, B:
0.25, C: 0.05, D: 0.15, E: .30, F: 0.07, G: 0.08. What is the average number of bits
required to encode a symbol?
Traversal Algorithms
Procedures for systematically visiting every vertex of an ordered rooted tree are called
Traversal Algorithms.
Three of the most commonly used algorithms are,
1) Preorder Traversal,
2) Inorder Traversal,
3) Postorder Traversal.
Each of these algorithms can be defined recursively.
1) Preorder Traversal,
Definition:
EXAMPLE 1 In which order does a preorder traversal visit the vertices in the ordered rooted
tree T shown in Figure 3?
Solution: The steps of the inorder traversal of the ordered rooted tree T are shown in Figure
6.
The inorder traversal begins with an inorder traversal of the subtree with root b, the
root a, the inorder listing of the subtree with root c, which is just c, and the inorder
listing of the subtree with root d.
The inorder listing of the subtree with root b begins with the inorder listing of the
subtree with root e, the root b, and f .
The inorder listing of the subtree with root d begins with the inorder listing of the
subtree with root g, followed by the root d, followed by h, followed by i.
The inorder listing of the subtree with root e is j, followed by the root e, followed by
the inorder listing of the subtree with root k.
The inorder listing of the subtree with root g is l, g, m.
The inorder listing of the subtree with root k is n, k, o, p.
Consequently, the inorder listing of the ordered rooted tree is j, e, n, k, o, p, b, f , a, c,
l, g, m, d, h, i.
3) Postorder Traversal.
Definition
EXAMPLE 4 : In which order does a postorder traversal visit the vertices of the ordered
rooted tree T shown in Figure 3?
Solution: The steps of the postorder traversal of the ordered rooted tree T are shown in
Figure 8.
The postorder traversal begins with the postorder traversal of the subtree with root b,
the postorder traversal of the subtree with root c, which is just c, the postorder
traversal of the subtree with root d, followed by the root a.
The postorder traversal of the subtree with root b begins with the postorder traversal
of the subtree with root e, followed by f , followed by the root b.
The postorder traversal of the rooted tree with root d begins with the postorder
traversal of the subtree with root g, followed by h, followed by i, followed by the root
d.
The postorder traversal of the subtree with root e begins with j, followed by the
postorder traversal of the subtree with root k, followed by the root e.
The postorder traversal of the subtree with root g is l, m, g.
The postorder traversal of the subtree with root k is n, o, p, k.
Therefore, the postorder traversal of T is j, n, o, p, k, e, f , b, c, l, m, g, h, i, d, a.
Exercises
1) Determine the preorder, Inorder and Postorser traversal of the following
ordered rooted trees.