Tree and Huffman Coding
Tree and Huffman Coding
Tree
Dr Deepak Gupta
Assistant Professor, SMIEEE
CSED, MNNIT Allahabad, Prayagraj
Email: [email protected]
Non-linear Data Structures
A data structure is said to be non-linear if its elements form a hierarchical
relationship where data items appear at various levels.
Trees and Graphs are widely used non-linear data structures. Tree and
graph structures represent hierarchical relationships between individual
data elements.
Trees can be defined recursively as:
A tree is a finite set of nodes such that:
1. There is a distinguished node called root node
2. The remaining nodes are partitioned into n>=0 disjoint sets T1, T2 ,…,
Tn where each of these sets is a tree. The sets T1, T2,…,Tn are the
subtrees of the root.
Basic Terminologies in Tree Data Structure:
Parent Node: The node which is a predecessor of a node is called the parent node of that
node. {B} is the parent node of {D, E}.
Child Node: The node that is the immediate successor of a node is called the child node of
that node. Examples: {D, E} are the child nodes of {B}.
Root Node: The topmost node of a tree or the node that does not have any parent node is
called the root node. {A} is the root node of the tree. A non-empty tree must contain
exactly one root node and exactly one path from the root to all other nodes of the tree.
Leaf Node or External Node: The nodes that do not have any child nodes are called leaf
nodes. {K, L, M, N, O, P, G} are the leaf nodes of the tree.
Ancestor of a Node: Any predecessor nodes on the path of the root to that node are called
Ancestors of that node. {A,B} are the ancestor nodes of the node {E}
Descendant: A node x is a descendant of another node y if and only if y is an ancestor of y.
Sibling: Children of the same parent node are called siblings. {D,E} are called siblings.
Level of a node: The count of edges on the path from the root node to that node. The root
node has level 0.
Internal node: A node with at least one child is called an Internal Node.
Neighbour of a Node: Parent or child nodes of that node are called neighbors of that node.
Subtree: Any node of the tree along with its descendant.
Binary Tree
A binary tree is a finite set of nodes that is:
1. Either empty or
2. Consists of a distinguished node called root and the remaining nodes
are partitioned into two disjoint sets T1 (called left subtree) and T2 (called
right subtree) and both of them are binary trees.
Property 1: The maximum number of nodes on any level i is 2i where i ≥0.
Proof:
If we sum up the maximum number of nodes possible on each level then we
can get the maximum number of nodes possible in the binary tree. First level
is 0 and last level is h-1. By using property 1, the total number of nodes
possible in a binary tree of height h is given by:
h 1
n 2i
i 0
n 1 21 22 ... 2h 1
2( h 1) 1 1
n
2 1
n 2h 1
Property 3: The minimum number of nodes possible in a binary tree of
height h is equal to h.
Property 7: A strictly binary tree with n non-leaf nodes has n+1 leaf nodes.
Property 8: A strictly binary tree with n leaf nodes always has 2n-1 nodes.
Remark: Number of Leaf nodes = Number of Internal nodes + 1
2. Extended Binary Tree
If in a binary tree, each empty subtree (NULL link) is replaced by a special
node then the resulting tree is an extended binary tree or 2-tree.
So we can convert a binary tree to an extended binary tree by adding special
nodes to leaf nodes and nodes that have only one child.
The special nodes added to the tree are called external nodes and the
original nodes of the tree are internal nodes.
Internal nodes
External nodes
The path length for any node is the number of edges traversed from that
node to the root node.
The path length of a tree is the sum of the path lengths of all the nodes of
the tree.
The internal path length of a binary tree is the
sum of the path lengths of all internal nodes
which is equal to the sum of levels of all internal
nodes.
Internal path length (I): 0+1+1+2+2+3 = 9
The external path length of a binary tree is the
sum of the path lengths of all external nodes
which is equal to the sum of levels of all
external nodes.
External path length (E): 2+2+3+3+3+4+4 = 21
Property 10: If the height of a complete binary tree is h, h ≥1, then the
minimum number of nodes possible is 2h-1 and the maximum number of nodes
possible is 2h -1.
Proof: The number of nodes will be maximum when the last level also
contains maximum nodes i.e. all levels are full, and so total nodes will be 2h -1
from property 2.
The number of nodes will be minimal when the last level has only one node. In
this case the total nodes will be
Total nodes in a full binary tree of height (h-1) + one node
i.e. (2h-1 -1)+1 = 2h-1.
5. Balanced Binary Tree
A balanced binary tree is a binary tree
in which the height of the left and the
right sub-trees of every node may
differ by at most 1.
Remark: AVL Tree and Red-Black Tree are
well-known data structures.
The process of visiting each node of the tree exactly once is called tree
traversal. The traversal of the binary tree involves three basic activities such
as:
1. Visiting the root
2. Traverse the left sub-tree
3. Traverse the right sub-tree
These three basic activities are done in a different order which follows:
1. Pre-order Traversal (NLR)
a. Visit the root (N)
b. Traverse the left subtree of root in
preorder(L)
c. Traverse the right subtree of root in
preorder(R)
D H B E I F C J G K
Pre-order : B D H E Pre-order : C F I G J K
In-order : D H B E In-order : I F C J G K
B C
D H E I F J G K
Pre-order : D H Pre-order : F I Pre-order : G J K
In-order : D H In-order : I F In-order : J G K
left subtree of B left subtree of C right subtree of C
B C
D E F G
H I J K
Creation of Binary Tree from In-order and Post-order traversals
D H B E I F C J G K
Post-order : H D E B Post-order : I F J K G C
In-order : D H B E In-order : I F C J G K
B C
D H E I F J G K
Post-order : H D Post-order : I F Post-order : J K G
In-order : D H In-order : I F In-order : J G K
left subtree of B left subtree of C right subtree of C
B C
D E F G
H I J K
Creation of Binary Tree from Pre-order and Post-order traversals
B = x1 B C C = x2
D H E I G F
Pre-order : D E H I Pre-order : F G
Post-order : D H I E Post-order : F G
Find the node succussing the root node in pre-order, say x1 (i.e. D) and the node
preceding the root node in post-order say x2 (i.e. E).
• If x1 != x2, then x1 is taken as left child i.e. D and x2 is taken as right child i.e. E of
the root node.
Find the position of x2, i.e. ‘E’ in pre-order and position of x1, i.e. ‘D’ in the post order.
• Consider two sets of pre-order and post-order traversal of left and right sub-tree of root.
• The first set consists of nodes that are present after x1 and before x2 in pre-order
traversal i.e. NULL and the nodes present before x1 in post-order i.e. NULL.
• The second set consists of nodes that are present after x2 in pre-order traversal i.e. HI
and the nodes present after x1 and before x2 in post-order traversal i.e. HI.
A
B C
G F
D = x1 D E E = x2 Pre-order : F G
Post-order : F G
H I
Pre-order : HI
Post-order : HI
arithmetic expression: 4 / ( 2 - ( - 8 * 3 ) )
Construction of Expression Tree:
First of all, we will do scanning of the given expression into left to the right
manner, then one by one check the identified character:
1. If a scanned character is an operand, we will apply the push operation and push it
into the stack.
2. If a scanned character is an operator, we will apply the pop operation into it to
remove the two values from the stack to make them its child, and after then we will
push back the current parent node into the stack.
In-order: 10 , 15 , 20 , 23 , 25 , 30 , 35 , 39 , 42
Level order: 30, 20, 39, 10, 25, 35, 42, 15, 23
Remark: Note that the in-order traversal of a binary search tree gives us all keys of
that tree in ascending order.
Searching in a Binary Search Tree
Suppose the data elements are - 45, 15, 79, 90, 10, 55, 12, 20, 50
Given a BST, the task is to delete a node in this BST, which can be broken
down into 3 cases:
Case 1. Delete a Leaf Node in BST
Case 2. Delete a Node with Single Child in BST
Deleting a single child node is also simple in BST. Copy the child to the node and
delete the node.
A binary tree with n nodes has 2n pointers out of which n+1 are always
NULL, so we can see that about half the space allocated for pointers is
wasted. We can utilize this wasted to contain some useful information.
A left NULL pointer can be used to store the address of inorder predecessor
of the node and a right NULL pointer can be used to store the address of
inorder successor of the node.
These pointers are called threads and a binary tree which implements these
pointers is called a threaded binary tree.
Types of Threaded Binary Tree
Depending on the type of threading, there are two types of threaded binary
tree:
Types of Threaded Binary Tree
struct node
{
struct node *left;
boolean lthread;
int info;
boolean rthread;
struct node *right;
Fully in-threaded binary tree };
If the left pointer of the node is a thread, then thread will point to inorder
predecessor. If the left pointer is not a thread i.e. node has a left child then
to find the inorder predecessor, we move to this left child and keep on
moving right till we find a node with no right child.
The technique for balancing a binary search tree was introduced by Russian
mathematicians G.M. Adelson, Velski and E. M. Lendis in 1962.
AVL tree is a self-balancing binary search tree in which the heights of the
two sub-trees of a node may differ by at most one. Because of this property,
AVL tree is also known as a height-balanced tree.
The key advantage of using an AVL tree is that it takes O(logn) time to
perform search, insertion and deletion operations in average case as well as
worst case (because the height of the tree is limited to O(logn)).
The structure of an AVL tree is same as that of a binary search tree but with
a little difference. In its structure, it stores an additional variable called the
BalanceFactor.
The balance factor of a node is calculated by subtracting the height of its
right sub-tree from the height of its left sub-tree.
Balance factor = Height (left sub-tree) – Height (right sub-tree)
A binary search tree in which every node
has a balance factor of -1, 0 or 1 is said to be
height balanced. A node with any other
balance factor is considered to be
unbalanced and requires rebalancing.
bf = hl – hr = {-1, 0, 1}
If the balance factor of a node is 1, then it means that the left sub-tree of the
tree is one level higher than that of the right sub-tree. Such a tree is called
Left-heavy tree.
If the balance factor of a node is 0, then it means that the height of the left
sub-tree is equal to the height of its right sub-tree.
If the balance factor of a node is -1, then it means that the left sub-tree of
the tree is one level lower than that of the right sub-tree. Such a tree is
called Right-heavy tree.
Prepared by: Dr Deepak Gupta, CSED, MNNIT Allahabad, India
Types of Rotations
1. LL Rotation: Inserted node is in the left subtree of left subtree of C
3. Insert A
4. Insert E
On inserting E, BST becomes unbalanced as the Balance Factor of I is 2, since if we travel from E
to I we find that it is inserted in the left subtree of right subtree of I, we will perform LR Rotation
on node I. LR = RR + LL rotation
5. Insert C, F, D
Deleting a node from an AVL tree is similar to that in a binary search tree.
Deletion may disturb the balance factor of an AVL tree and therefore the tree
needs to be rebalanced in order to maintain the AVLness.
For this purpose, we need to perform rotations. The two types of rotations
are L rotation and R rotation. Here, we will discuss R rotations. L rotations
are the mirror images of them.
If the node which is to be deleted is present in the left sub-tree of the critical
node, then L rotation needs to be applied else if, the node which is to be
deleted is present in the right sub-tree of the critical node, the R rotation will
be applied.
Let us consider that, A is the critical node and B is the root node of its left
sub-tree. If node X, present in the right sub-tree of A, is to be deleted, then
there can be three different situations:
1. R0 rotation (Node B has balance factor 0 )
If the node B has 0 balance factor, and the balance factor of node A
disturbed upon deleting the node X, then the tree will be rebalanced by
rotating tree using R0 rotation.
Example: Delete node 30 from the AVL tree shown in the following image.
2. R1 Rotation (Node B has balance factor 1)
Dr Deepak Gupta
Assistant Professor, SMIEEE
CSED, MNNIT Allahabad, Prayagraj
Email: [email protected]
Huffman Coding
a: 05 b: 48 c: 07 d: 17 e: 10 f: 13
Sort the nodes based on their frequencies. Further, consider the two nodes
having minimum frequencies:
a: 05 c: 07 e: 10 f: 13 d: 17 b: 48
Connect these two nodes at a newly created common node that will store
only the sum of frequencies of all the nodes connected below it.
12 e: 10 f: 13 d: 17 b: 48
a: 05 c: 07
Prepared by: Dr Deepak Gupta, CSED, MNNIT Allahabad, India
Repeat the same process:
12 e: 10 f: 13 d: 17 b: 48
a: 05 c: 07
Further, consider the two nodes having minimum frequencies:
22 f: 13 d: 17 b: 48
12 e: 10
a: 05 c: 07
Prepared by: Dr Deepak Gupta, CSED, MNNIT Allahabad, India
Further, consider the two nodes having minimum frequencies:
22 f: 13 d: 17 b: 48
12 e: 10
a: 05 c: 07
22 30 b: 48
12 e: 10 f: 13 d: 17
a: 05 c: 07
Prepared by: Dr Deepak Gupta, CSED, MNNIT Allahabad, India
22 30 b: 48
12 e: 10 f: 13 d: 17
a: 05 c: 07
12 e: 10 f: 13 d: 17
a: 05 c: 07
Prepared by: Dr Deepak Gupta, CSED, MNNIT Allahabad, India
Further, consider the
two nodes having 100
minimum frequencies: 0 1
52 b: 48
0 1
22 30 Character Code
0 1 0 1 a 0000
b 1
12 e: 10 f: 13 d: 17
c 0001
0 1
d 011
a: 05 c: 07
e 001
f 010
a 5 0000 5*4 = 20
b 48 1 48*1 = 48
c 7 0001 7*4 = 28
d 17 011 17*3 = 51
e 10 001 10*3 = 30
f 13 010 13*3 = 39
6 * 8 = 48
18 bits 216 bits
bits
Without encoding, the total size of the string was 800 bits. After encoding
the size is reduced to 48 + 18 + 216 = 282.