Topic 4 Trees
Topic 4 Trees
A tree data structure is a hierarchical structure that is used to represent and organize data in a
way that is easy to navigate and search. It is a collection of nodes that are connected by edges
and has a hierarchical relationship between the nodes.
Basic Terminologies:
Parent Node: The node which is a predecessor of a node is called the parent node of that
node. {B} is the parent node of {D, E}.
Child Node: The node which is the immediate successor of a node is called the child
node of that node. Examples: {D, E} are the child nodes of {B}.
Root Node: The topmost node of a tree or the node which does not have any parent node
is called the root node. {A} is the root node of the tree. A non-empty tree must contain
exactly one root node and exactly one path from the root to all other nodes of the tree.
Leaf Node or External Node: The nodes which do not have any child nodes are called
leaf nodes. {K, L, M, N, O, P} are the leaf nodes of the tree.
Ancestor of a Node: Any predecessor nodes on the path of the root to that node are
called Ancestors of that node. {A,B} are the ancestor nodes of the node {E}
1
TOPIC 4: TREES ADT
Descendant: Any successor node on the path from the leaf node to that node. {E,I} are
the descendants of the node {B}.
Sibling: Children of the same parent node are called siblings. {D,E} are called siblings.
Level of a node: The count of edges on the path from the root node to that node. The root
node has level 0.
Internal node: A node with at least one child is called Internal Node.
Neighbor of a Node: Parent or child nodes of that node are called neighbors of that node.
Height of a Node: The height of a node is the number of edges from the node to the
deepest leaf (ie. the longest path from the node to a leaf node).
Depth of a Node: The depth of a node is the number of edges from the root to the node.
Height of a Tree: The height of a Tree is the height of the root node or the depth of the
deepest node.
Forest: A collection of disjoint trees is called a forest; You can create a forest by cutting
the root of a tree.
2
TOPIC 4: TREES ADT
Path: Path refers to the sequence of nodes along the edges of a tree.
Visiting: Visiting refers to checking the value of a node when control is on the node.
Traversing: Traversing means passing through nodes in a specific order.
Keys: Key represents a value of a node based on which a search operation is to be carried
out for a node.
3
TOPIC 4: TREES ADT
4
TOPIC 4: TREES ADT
The algorithm depends on the property of BST that if each left subtree has values below root and
each right subtree has values above the root.
If the value is below the root, we can say for sure that the value is not in the right subtree; we
need to only search in the left subtree and if the value is above the root, we can say for sure that
the value is not in the left subtree; we need to only search in the right subtree.
Algorithm:
If root == NULL
return NULL;
If number == root->data
return root->data;
If number < root->data
return search(root->left)
If number > root->data
return search(root->right)
b) Insert Operation
Inserting a value in the correct position is similar to searching because we try to maintain the rule
that the left subtree is lesser than root and the right subtree is larger than root.
We keep going to either right subtree or left subtree depending on the value and when we reach a
point left or right subtree is null, we put the new node there.
Algorithm:
If node == NULL
return createNode(data)
if (data < node->data)
node->left = insert(node->left, data);
else if (data > node->data)
node->right = insert(node->right, data);
return node;
c) Deleting a Node in BST
Algorithm
1. Perform search for value X
2. If X is a leaf, delete X
5
TOPIC 4: TREES ADT
3. AVL Tree
6
TOPIC 4: TREES ADT
AVL tree got its name after its inventor Georgy Adelson-Velsky and Landis.
AVL tree is a self-balancing binary search tree in which each node maintains extra
information called a balance factor whose value is either -1, 0 or +1.
Balance Factor
Balance factor of a node in an AVL tree is the difference between the height of the left subtree
and that of the right subtree of that node.
Balance Factor = (Height of Left Subtree - Height of Right Subtree) or (Height of Right Subtree
- Height of Left Subtree)
The self-balancing property of an AVL tree is maintained by the balance factor. The value of
balance factor should always be -1, 0 or +1.
Number of edges: An edge can be defined as the connection between two nodes. If a tree
has N nodes then it will have (N-1) edges. There is only one path from each node to any
other node of the tree.
Depth of a node: The depth of a node is defined as the length of the path from the root to
that node. Each edge adds 1 unit of length to the path. So, it can also be defined as the
number of edges in the path from the root of the tree to the node.
Height of a node: The height of a node can be defined as the length of the longest path
from the node to a leaf node of the tree.
Height of the Tree: The height of a tree is the length of the longest path from the root of
the tree to a leaf node of the tree.
Degree of a Node: The total count of subtrees attached to that node is called the degree
of the node. The degree of a leaf node must be 0. The degree of a tree is the maximum
degree of a node among all the nodes in the tree.
7
TOPIC 4: TREES ADT
Data Compression: Huffman coding is a popular technique for data compression that
involves constructing a binary tree where the leaves represent characters and their
frequency of occurrence. The resulting tree is used to encode the data in a way that
minimizes the amount of storage required.
Compiler Design: In compiler design, a syntax tree is used to represent the structure of a
program.
Database Indexing: B-trees and other tree structures are used in database indexing to
efficiently search for and retrieve data.
Tree offer Efficient Searching Depending on the type of tree, with average search times
of O(log n) for balanced trees like AVL.
The recursive nature of trees makes them easy to traverse and manipulate using
recursive algorithms.
Unbalanced Trees, meaning that the height of the tree is skewed towards one side, which
can lead to inefficient search times.
Trees demand more memory space requirements than some other data structures like
arrays and linked lists, especially if the tree is very large.
The implementation and manipulation of trees can be complex and require a good
understanding of the algorithms.
Tree Traversal
In order to perform any operation on a tree, you need to reach to the specific node. The tree
traversal algorithm helps in visiting a required node in the tree.
8
TOPIC 4: TREES ADT
In-order Traversal
Pre-order Traversal
Post-order Traversal
In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the right sub-tree.
We should always remember that every node may represent a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending
order.
We start from A, and following in-order traversal, we move to its left subtree B.B is also
traversed in-order. The process goes on until all the nodes are visited. The output of in-order
traversal of this tree will be −
D→B→E→A→F→C→G
Algorithm
9
TOPIC 4: TREES ADT
Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and finally the right
subtree.
We start from A, and following pre-order traversal, we first visit A itself and then move to its left
subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The
output of pre-order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Post-order Traversal
10
TOPIC 4: TREES ADT
In this traversal method, the root node is visited last, hence the name. First we traverse the left
subtree, then the right subtree and finally the root node.
We start from A, and following pre-order traversal, we first visit the left subtree B. B is also
traversed post-order. The process goes on until all the nodes are visited. The output of post-order
traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
Activity:
11
TOPIC 4: TREES ADT
The reverse polish notation refers to a mathematical notation representing arithmetic expressions
where operators follow the operands. Operators are functions such as addition, subtraction,
multiplication, division, exponential, etc. Additionally, the operation is performed on numerical
values or variables, which serve as the operands.
For example, a normal mathematical expression looks like this (Infix notation):
(2 + 1) x 8
Conventionally, we evaluate what is inside the brackets first by removing them. As a result, we
obtain the sum of 2 and 1, which equals 3. Subsequently, we multiply 3 by 8, resulting in 24.
21+8x
Thus, considering stacks allows for quick evaluation of this expression using reverse Polish
notation. Stacks help manage data and can do push and pop functions.
First, push the number”2” into the stack to assess the expression. Now, push”1”. There are only
two numbers and nothing to do with them. Next, we push the operator “+” onto the stack.
Therefore, we have an operator and two operands, we can pop them from the stack and perform
the operation. Consequently, we add 1 to 2, resulting in the sum of 3.
Now, only three are in the stack. Now, push 8. Again, there are only two numbers. We can pop
the stack when the operator “x” is pushed. Then, we multiply 3 by 8, resulting in the product of
24. Therefore, the only value in the stack is 24. We can observe the unnecessary use of brackets
in RPN by evaluating each term individually, one by one.
12
TOPIC 4: TREES ADT
Furthermore, computers use reverse polish notation calculators. Hewlett and Packard were
among the first companies to use this system in their desktop calculators in the 1970s and 80s.
Example 1:
Example 2:
2. Push 1 into the stack. This is the second value and is on the position above the 5.
3. Apply the subtraction operation by taking two operands from the stack (1 and 5). The top
value (1) is subtracted from the value below it (5), and the result (4) is stored back to the
stack. 4 is now the only value in the stack and is in the bottom.
4. Push 3 into the stack. This value is in the position above 4 in the stack.
5. Apply the multiplication operation by taking the last two numbers off the stack and
multiplying them. The result is then placed back into the stack. After this operation, the
stack now only contains the number 12.
13