Trees
Trees
Unit 8: Trees
Dr Fatimah Adamu-Fika
Faculty of Computing
Kaduna, Nigeria
© 2020-2022
CSC 204: FUNDAMENTALS OF DATA STRUCTURES
UNIT 4: TREES
TABLE OF CONTENTS
Binary tree................................................................................................................................................. 6
Types of Binary Tree.............................................................................................................................................6
Balanced vs Non-Balanced Binary Trees .........................................................................................................7
Binary Tree Representation .................................................................................................................................8
Array Binary Tree .............................................................................................................................................8
Linked Representation of Tree ........................................................................................................................9
Summary ................................................................................................................................................. 18
Real Life Applications of Tree............................................................................................................................ 18
Notes and What Comes Later ........................................................................................................................... 19
Exercises.................................................................................................................................................. 19
OVERVIEW, CONCEPTS AND TERMINOLOGIES
Root
A Level 0
B C D Level 1
Height Depth
E F G H Level 2
Sibling
s
Leaf / Terminal /
External Nodes
I J Level 3
• The height of a node is the number of edges from the node to the deepest leaf (i.e., the longest
path from that node to a leaf node).
• The depth (or level) of a node is the number of edges from the root to the node, i.e., the
distance between the root and the node in question.
o A tree consisting of a single node has depth of 0.
o By convention, an empty tree has a depth of –1.
• Level of a node represents the generation of a node. If the root node is at level 0, then its next
child node is at level 1, its grandchild is at level 2, and so on.
• The height of a Tree is the height of the root node or the depth of the deepest node, i.e., the
distance (edge count) between the farthest leaf to the root.
• The degree of a node is the total number of branches of that node.
• Path refers to the sequence of nodes along the edges of a tree.
• A collection of disjoint trees is called a forest.
• You can create a forest by cutting the root of a tree.
• Subtree represents the descendants of a node.
• Traversing means passing through nodes in a specific order.
• Visiting refers to checking the value of a node when control is on the node.
TREE ADT
Tree is an Abstract Data Type that follows a hierarchical pattern for data allocation and manipulation.
Generally, a tree allows nodes to contain an arbitrary number of elements; and it allows each node to
have an arbitrary number of children. Hence,
We can specify the ADT for an arbitrary collection of objects that follows a hierarchical pattern in terms
of the set of operations on that objects. We can identify some possible operations of a tree as follows:
TRAVERSING A TREE
In order to perform any operation on a tree, we need to reach to the specific node. Tree traversal
algorithm helps in visiting a required node in the tree. There might be different reasons why we may
need to traverse a tree, for example, we might be looking for the smallest value in the tree or we might
be interested in finding the average of all the values in the tree. Any of these operations will require us
to visit each node of the tree.
Linear structures like arrays, linked-lists, stack and queues have only one way of reading the data within
them, i.e., elements are read sequentially in one order. But a hierarchical data structure like a tree can
be traversed in various ways, because there is more than one possible next element from a given node.
Because of the hierarchical nature of a tree, and because nodes are visited is some sequence not in
parallel, some nodes must be deferred – stored in some way and visited at a later time. This is usually
done with the aid of a stack or queue
There are two common way of traversing a tree. Depth First Search (DFS) and Breath First Search (BFS).
Depth-first search is easily implemented via a stack while breadth-first search is easily implemented via
a queue.
• DFS: These searches are referred to as depth-first search, since the search tree is deepened as
much as possible on each child before going to the next sibling.
• BFS: Trees are traversed in level-order, where we visit every node on a level before going to a
lower level. This search is referred to as breadth-first search (BFS), as the search tree is
broadened as much as possible on each depth before going to the next depth.
• Binary trees commonly use DFS traversal, thus, this will be our focus.
There are many possible implementations of the tree ADT. One possible implementation is to consider
a linked implementation using a singly linked list. We can distinguish between two different kinds of
tree:
SPECIALISED TREE S
A tree is characterised by the number of children a node can have, and the orientation imposed on the
nodes and their children.
A general tree is characterised by the lack of any arrangement or limitations on the number of children
a node can have. This means a node can have any number of children, and the orientation of the tree
can be any combination of these. Thus, the degree of the nodes can range from 0 to any number.
Any tree with a hierarchical structure can be described as a general tree. Figure 2 depicts a typical
general tree, with node ‘a’ as the root node.
b c f e
d g h
In this course our discussions will focus on trees with nodes that have a maximum of two children.
These types of trees are called binary trees.
Often, a particular application will require a more specialised structure. In this case, we can design and
implement a specialised tree ADT. Our discussions will eventually narrow down to a specialised binary
tree called binary search tree.
BINARY TREE
A binary tree is a tree data structure in which each parent node can have at most two children. Hence,
the degree of the nodes ranges from 0 to 2.
The representation of nodes in a binary tree is defined by three fields: the data, its left child, and its
right child. Figure 1 above, is an example of a binary tree.
Binary trees are highly functional ADTs that can be subdivided into a variety of types.
Depending on how nodes are arranged in a binary tree, it can be full (or proper), complete and perfect:
• Full binary tree: when each internal node has either 0 or 2 children (but never 1), such a binary
tree is full. A full binary tree is also known as a proper binary tree.
• Complete binary tree: when all levels except the last one is full of nodes, i.e., all the levels are
completely filled except possibly the lowest one, which is filled from the left, the binary tree is
complete. Hence, all leaf nodes in a complete binary must lean towards the left, this means the
last leaf node may not have a right sibling.
• Perfect binary tree: when all the levels (including the last one) are full of nodes, i.e., every
internal node has exactly two child nodes and all the leaf nodes are at the same level, the binary
tree is perfect. All the internal nodes of a perfect binary tree have a degree of 2.
These properties are not always mutually exclusive; a tree can have more than one:
A binary tree of depth d is balanced if every node at depths 0, 1, …, d–2 has two children. A node at
depth d–1 may have two, one, or no children. By definition every node at depth d has no children. A
balanced binary tree of depth d has at least 2d and at most 2d+1 – 1 nodes. Conversely:
An ill-balanced binary tree of depth d could have as few as d+1 nodes. Conversely:
Thus, a balanced binary tree is a specialised tree in which the difference between the heights of the left
subtree and right subtree of any node is not more than 1. A balanced binary tree is also called a height-
balanced binary tree. A balanced binary tree is governed by the following factors:
o difference between the left and the right subtree for any node is not more than one.
o the left subtree is balanced.
o the right subtree is balanced.
hdf = 2 hdf = 0
hdf = 0 hdf = 1 hdf = 0 hdf = 1
hdf = 0 hdf = 0 hdf = 0 hdf = 0
hdf = 0 hdf = 0 hdf = 0
In addition to the properties of a tree, a binary tree consists of nodes connected by a maximum of two
links (i.e., edges) in a hierarchy, plus a header (i.e., address to the root node). Every node other than
the root node is the left or right child of exactly one other node (its parent). The size of a binary tree is
the number of nodes (elements). In a binary tree, there is exactly one sequence of links between the
root node and any node N. Both links of a leaf node are null. The header of an empty binary tree is null.
If the index of any element in the array is i, the element in the index 2i+1 will become the left child and
element in 2i+2 index will become the right child. Also, the parent of any element at index i is given by
the lower bound of (i-1)/2.In order to represent a tree using an array, the numbering of nodes can start
either from 0 to n – 1, where n is the size of the tree, i.e., the number of nodes in the tree.
Thus,
Conversely,
The numbers attached to the nodes of the binary tree in Figure 6, corresponds to their positions
(indices) in the array representation of the tree. Node b has no left child is index 3 will be null; similarly,
positions 9 and 10 (left and right children of node d), positions 11 and 12 (left and right nodes of node
e) and position 13 (left child of node g) will all be null. Since there are no elements in those positions.
a b c d e f g
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
FIGURE 7. ARRAY REPRESENTATION OF THE BINARY TREE DEPICTED IN FIGURE 6.
The array representation works best with a complete binary tree, as all leaf nodes are at lowest level
towards the left. Thus, there will not be empty positions between elements in the array.
Understanding how the mapping of array indices to tree positions is crucial to understanding how a
data structure called Heap data structure works and how it is used to implement a sorting algorithm
called Heap Sort.
Linked list is a natural structure to represent the binary tree data structure.
d e f
g
leaf nodes
A binary search tree (BST) is a particular application of binary tree. It is a binary tree which a parent
node has children with values smaller than it in its left subtree and has children with values greater
than it in the right subtree.
Thus, a BST has the following property: For every node in the binary tree, if that node contains element
elem:
• The node’s left sub-tree contains only elements less than elem (or is empty).
• The node’s right sub-tree contains only elements greater than elem (or is empty).
• Duplicates: Some BST doesn’t allow duplicates while others add the same values as a right child.
Other implementations might keep a count on a case of duplicity (we are going to do this one
later).
The organisation of a BST allows for faster searching, lookup and data manipulation. It reduces the
search time to half of linear search, as we have seen in arrays. This makes BSTs widely applicable in
searching and sorting applications.
TYPES OF BST
Although in this discussion we will be focusing on general binary search trees, we will mention some
specialised types of BSTs but will not go into detailed review.
AVL
AVL tree is a self-balancing binary search tree in which each node maintains extra information called a
balance factor whose value is either -1, 0 or +1. AVL tree got its name after its inventors Georgy Adelson-
Velsky and Landis.
The balance factor is the is the difference in heights between the left and right subtrees. The children
nodes are rebalanced when the height difference exceeds one. AVL trees are height-balanced and are
rebalanced through single or double rotations.
RED-BLACK TREES
Red-Black tree is a self-balancing binary search tree similar to an AVL tree. Red-black tree is height-
balanced. They differ from AVL tree in the sense that they can be balanced in less than three rotations
and they maintain an extra information that defines whether a node is black or red. These colours is
what ensures that the tree remains balanced during insertions and deletions.
B-TREES
B-tree is a special type of self-balancing search tree in which each node can contain more than one key
and can have more than two children. It is a generalized form of the binary search tree. It is also known
as a height-balanced m-way tree. This means that the B-tree allows its nodes to have more than two
children.
BASIC BEHAVIOUR OF A BST
The basic operations that can be performed on a BST data structure, are the following:
INSERTION OPERATION
The very first insertion creates the tree. Afterwards, whenever an element is to be added, its proper
location is first located. Start searching from the root node, then if the element is less than the node’s
value, search for an empty location in the left subtree and insert the data. Otherwise, search for an
empty location in the right subtree and insert the data.
1. If tree is empty, the first node becomes the root, and we are done.
2. Compare parent’s value if it’s higher go right, if it’s lower go left. If it’s the same, then the value
already exists so that you can increase the duplicate count (multiplicity).
3. Repeat #2 until we found an empty slot to insert the new node.
4. Insert the node:
a. If it’s going to be a left child, connect the parent’s left link to it. Otherwise, connect the
parent’s right link to it.
Example 1: Given an empty BST, and insert the following elements into it: 35, 45, 15, 20.
Step 1 Step 2
35 35
45
Step 3 Step 4
35 35
15 45 15 45
20
1. Since we have an empty tree, the first insertion will create 35 as the root node.
2. Second elements will become right child of the root, since 45 is greater than 35.
3. 15 will become the left child of the root, because it is less than the root.
4. 20 will pass down the left subtree of the root because it less than the root. Since its greater
than 15, its proper location is the right child of 15.
DELETION OPERATION
To delete a node, first, we locate its place in the tree. We start searching from the root node. If the
search value is less than the node’s value, we look down the left subtree. If we find it, we remove the
element. Otherwise, we search down the right subtree and remove the element if it exists.
If a node exists, removing it is a little trickier than adding a new one, so let’s explain it with the
following cases.
c. We adjust the link from its parent to connect to its left child. Thus, becoming the new
parent of the sibling.
d. Then, we adjust the right link of the left child to point to its sibling.
4. Deleting the root.
a. Deleting the root is very similar to removing nodes with 0, 1, or 2 children discussed
earlier. The only difference is that afterward, we need to update the reference of the
root of the tree.
Example 2: Given a BST with elements: 35, 45, 15, 20, 17, 55, 40 (shown in Figure 10), we are going to
perform the different scenarios for BST Node deletion:
35
15 45
20 40 55
17
1. We will delete node 17, a leaf node, demonstrated in Error! Reference source not found..
2. We will delete node 15, a node with 1 child, shown in Figure 12.
3. We will delete node 45, a node with 2 children, depicted in Figure 13.
35 35
15 45 15 45
55 20 40 55
20 40
35 35
15 45 20 45
55 40 55
20 40
a. adjust the link from its parent node to connect with its child
35 35
20 45 20 55
40 55 40
a. adjust the link from its parent node to connect with its right child
b. adjust the left link of the right child to connect with its former sibling
35 35
20 45 20 40
55 55
40
a. adjust the link from its parent node to connect with its left child
b. adjust the right link of the left child to connect with its former sibling
Traversing the tree means visit all nodes (elements) of the tree in some order. We can visit the root,
traverse its left subtree and its right subtree in three different ways:
• In-order: this traverses the left sub-tree first, then visits the root, and then traverses the right
sub-tree. In the list generated from this traversal, parent nodes appear in-between their
children, i.e., the left child and all their descendants on the left subtree appear to their left,
while their right child and all its descendants appear to its right.
• Pre-order: this visits the root first, then traverses the left sub-tree, and then traverses the right
sub-tree. In a pre-order traversal list of nodes, the parent appears first before all their
descendants. Thus, the root node is always the first node in the list.
• Post-order: this traverses the left sub-tree first, then traverses the right sub-tree, and then
visits the root last. In post-order traversal information, parent nodes always appear after all
their descendants. So, the root node is always the last node in the list.
Example 4: Given the BST with elements: 35, 45, 15, 20, 17, 55, 40, in Figure 10. Traversing the tree
will give the following sequence of nodes:
TRAVERSAL ALGORITHMS
We can easily traverse a tree either by using stacks or recursion. Traversing recursively can be done in
three steps.
IN-ORDER TRAVERSAL
Traversing a tree in-order by recursion entails:
1. Recursively traversing the left subtree until all nodes in the subtree are visited.
2. Visit the root node.
3. Recursively traverse the right subtree until all nodes in the subtree are visited.
When performing a pre-order traversal using a stack, we push the right subtree into the stack before
the left, to ensure the left subtree is processed first. We carry out the traversal as follows:
POST-ORDER TRAVERSAL
Performing a post-order traversal by recursion require us to:
1. Recursively traverse the left subtree until all nodes in the subtree are visited.
2. Recursively traverse the right subtree until all nodes in the subtree are visited.
3. Visit the node.
Any binary tree can be reconstructed using information from its traversal. To rebuild the binary tree,
we need its in-order traversal nodes sequence plus either its pre-order traversal or its post-order
traversal information.
1. Find the last node in post-order traversal list. This element is the root, as root node always gets
visited at the end of post-order traversal.
2. Search for the root node identified in step 1 in the in-order traversal sequence of nodes to
determine its left and right subtrees. Everything on left side of the root in the in-order traversal
list of nodes forms part of its left subtree and everything on its right side, forms part of its right
subtree.
3. For each subtree, repeat until the tree is fully constructed:
a. Step 1: find the node nearest to the parent node in the post-order traversal list. This
will be the topmost element in the subtree, and the other nodes in that subtree will be
its descendants. Children nodes are always to the left of their parents.
b. Step 2: in the in-order traversal list, find the nodes to its left to form its left subtree and
to its right to form the right subtree.
SEARCH OPERATION
Whenever we are looking for an element, we start searching from the root node, if the search element’s
value is less than the root’s value, we search for the element in the left subtree. Otherwise, we search
for the element in the right subtree.
We will repeat this process on each node, until the element is found, or all the nodes are visited.
• insert, delete, and search algorithms all depend on whether the tree is well-balanced or ill-
balanced.
• Insertion: O(log n) well-balanced, O(n) ill-balanced.
o search from the root until you find a null link, create a new node containing the element
to be added, and replace the null with a reference to the new node. The new node is always
a leaf node.
o Do nothing if you find the element already in the tree
• Deletion: O(log n) well-balanced, O(n) ill-balanced.
o Search for the node containing the element to delete. If the node found has zero or one
child nodes, replace the link to the node to be deleted with its child link (or null).
o If the node found has two children, first, remove it by copying the element from the
leftmost node in the right sub-tree to the node containing the deleted element. Next delete
the leftmost node in the right sub-tree (which can have at most one child, and so is an
“easy” deletion).
o Do nothing if you do not find the element in the tree
• Search: O(log n) well-balanced, O(n) ill-balanced.
o search from the root downwards, going left or right depending on whether the target is
less than or greater than the element in the current node.
SUMMARY
• Abstract Data Types that follow a hierarchical pattern data allocation and manipulation are
known as trees.
• A tree is represented by nodes connected by edges.
• Trees form a tree-like data structure, with the root node leading to parent nodes, which in turn
lead to children nodes. End nodes that have no children are leaf nodes.
• Trees in data structures, due to their non-linear structural nature allows faster response time
during search, as well as greater convenience during the design process.
• In in-order traversal, nodes are a visited in order, the left subtree is visited first, then the parent
node, and then the right subtree. In pre-order traversal, the parent node is visited first, then
the left subtree and then the right subtree. In post-order traversal the parent node is visited
last, the left subtree is listed first, and then the right subtree is visited before the parent.
• A binary tree can be reconstructed using its traversal information.
• In the next unit, we will look at Sets and Maps including how to implement Hash-table.
EXERCISES
b c
d e f
a.
k l
m n
p
b.
8. Consider the following binary trees and state whether they are: full or not, complete or not,
perfect or not, and balanced or ill-balanced.
b c
d e f
a.
b c
d e
f g
b.
a
b c
d e f g
c.
b c
d e
d.