Information Entropy Fundamentals
Information Entropy Fundamentals
VEERENDESWARI
UNIT I
Information entropy fundamentals: Information – entropy - properties of information and entropy -
relation between information and probability - mutual and self-information - coding theory- code
efficiency and redundancy - Shannon‘s theorem - construction of basic codes-Shannon and Fanon
coding, Huffman coding – arithmetic coding.
1.1 INTRODUCTION
The performance of the communication system is measured in terms of it error probability. The
performance of the system depends upon the available signal power, channel noise and bandwidth;
based on these parameters it is possible to establish the condition for the error less transmission. The
information theory is used for mathematical modeling and analysis for communication system.
1
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
= 2 bits
Example 2: calculate the amount of information if binary digits (binits) occur with equal
likelihood in binary PCM.
Soln:
In binary PCM, there are only two binary levels, i.e 1 or 0. Since they occur with
equal likelihood, their probabilities of occurrence will be,
Here probability of each message is, Pk = 1/M, hence above equation will be
N
We know that M = 2 , hence the above equation will be,
2
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
2. Prove the statement
“If receiver knows the message being transmitted, the amount of information carried is zero”.
Solution: here it is stated that receiver ―knows‖ the message. This means only one message is
transmitted. Hence probability of occurrence of this message will be Pk = 1. This is because only one
message and is occurrence is certain. The amount of information carried by this type of message is,
= 0 bits
This proves the statement that if receiver knows message, the amount of information carried is zero.
As Pk is decreased from 1 to 0, Ik increases monotonically from 0 to infinity. This shows that amount
of information conveyed is greater when receiver correctly identifies less likely messages.
3. Prove the statement
If I1 is the information carried by message m1 and I2 is the information carried by message
m2, then prove that the amount of information carried compositely due to m1 and m2 is , I1,2 = I1+I2
3
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
1.4 ENTROPY
Entropy of a source is the measure of information.
Average of Information represented by Entropy.
Basically source codes try to reduce the redundancy present in the source, and represent
the source with fewer bits that carry more information.
4
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
=0
Now consider the second case when Pk = 0. Instead of putting pk = 0 directly let us consider
the limiting case i.e.,
The RHS of above equation is zero when P 0.Hence Entropy will be zero.
H=0
1. Prove the statement
‖If there are M numbers of equally and likely messages, then Entropy of the source are log2 M ―
Proof:
We know that for ‗M‘ number of equally likely messages, probability
is, P = 1/M
This probability is same for all ‗M‘ message.
P1 = P2 = P3 = P4 =…..PM = 1/M
Entropy is given by equation
2. Prove that the upper bound on entropy is given as Hmax ≤ log2 M .here M is the number of
message emitted by the source.
5
IT T64 INFORMATION CODING TECHNIQUES
…………..(1)
This states that the entropy of zero memory information source with ‗M‘ number of
symbols becomes maximum if and only if all the source symbols are equiprobable.
The source emits ‗M‘ symbol(X=S0,S1,….Sm)with probabilities {P0,P1,…..Pm}.
From equation 1
Multiplying equation(2) by 1 and replacing 1 by ∑ =0 Pk as ∑ =0 Pk = 1[sum of probabilities]
To convert this equality sign into inequality sign, the property of natural algorithm is used
6
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
1.6.1Joint entropy
The joint entropy of two discrete random variables X and Y is merely the entropy of their pairing:
(X,Y). This implies that if X and Y are independent, then their joint entropy is the sum of their
individual entropies.
For example, if (X,Y) represents the position of a chess piece — X the row and Y the column, then the
joint entropy of the row of the piece and the column of the piece will be the entropy of the position of
the piece.
Despite similar notation, joint entropy should not be confused with cross entropy.
The conditional entropy or conditional uncertainty of X given random variable Y (also called the
equivocation of X about Y) is the average conditional entropy over Y
7
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
Because entropy can be conditioned on a random variable or on that random variable being a certain
value, care should be taken not to confuse these two definitions of conditional entropy, the former of
which is in more common use. A basic property of this form of conditional entropy is that:
Mutual information I(X;Y) represents the amount of uncertainty about the channel input after the
channel output has been observed.
The mutual information I(X ;Y) has the following important properties:
1. The mutual information of a channel is symmetric, that is
I(X ;Y) = I(Y;X)
Where I(X;Y) represents the amount of uncertainty about the channel input that is resolved by observing the channel
output and I(Y;X) is the measure of uncertainty the channel output that is resolved by sending the channel input.
(2)
By substituting the equation 1 and 2 , interchanging the order of summation , we get
Hence proved.
8
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
2. The mutual information is always nonnegative I(X;Y) ≥ 0
The joint probability
(3)
(4)
Sub eqn 4 in 1, we get
I(X;Y) is zero when the input and output symbols are statistically independent.
3. The mutual information of a channel is related to the joint entropy of the channel input
and channel output is given by
Coding theory is one of the most important and direct applications of information theory. It can be
subdivided into source coding theory and channel coding theory. Using a statistical description for
data, information theory quantifies the number of bits needed to describe the data, which is the
information entropy of the source.
9
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
Data compression (source coding): There are two formulations for the compression problem:
This division of coding theory into compression and transmission is justified by the information
transmission theorems, or source–channel separation theorems that justify the use of bits as the
universal currency for information in many contexts. However, these theorems only hold in the
situation where one transmitting user wishes to communicate to one receiving user. In scenarios with
more than one transmitter (the multiple-access channel), more than one receiver (the broadcast
channel) or intermediary "helpers" (the relay channel), or more general networks, compression
followed by transmission may no longer be optimal. Network information theory refers to these
multi-agent communication models.
Source Coding refers to the conversion of symbols for a source into a binary data suitable for
transmission. The objective of source coding is to minimize the average bit rate required for
representation of source. Code length and efficiency are the terms related to source coding.
The design of a variable length code such that its average code word length approaches the
entropy DMS is often referred to as entropy coding. There are 2 types
1. Shannon-Fano Coding
2. Huffman Coding
Code efficiency
The Code efficiency is denotes η and is defined as η = Lmin /L where L min Minimum
Possible Value of L, when η approaches unity, the code is said to be efficient [ (i.e) η = 1].
η = H(x) /L
10
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
The method of encoding is directed towards constructing reasonably efficient separable binary
code for a source without memory. Let {S} be ensemble of the messages to be transmitted and [P] be
their corresponding probabilities.
1. No codeword Ck can be obtained from each other by adding more binary boits to
the shorter sequence.
2. The transmission of the encoded message is reasonably efficient, that is 1 and 0 appear
independently and with almost equal probabilities.
Coding algorithm:
Step1: List the symbols in the descending order in accordance with their probabilities.
Step2: Partition the symbol set into two most equiprobable subsets {x1} and {x2).
Step3: Assign ‗0‘ to each symbol contained in one subset and ‗1‘ to each symbols in the other subset.
Step4: repeat the same procedure for subsets of P{x1} and {x2}, until each subsets contain single
symbol, that is {x1} will be partitioned into the subsets {x11} and {x12}. Now the codeword
corresponding to a ,message in x12 will begin with 01.
Solution:
11
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
HUFFMAN CODING
12
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
Step 7: Read the codeword of each symbol from the last splitting stage.
Problem: A discreate memory less emits five symbols with probabilities {0.4, 0.1, 0.2, 0.1, 0.2}.
Find Huffman code and its length by placing the combined symbol as high as possible.
Solution:
L = 2.2bits/symbol
Problem 2 : compare the Huffman coding and Shannon – Fano coding algorithms for data
compression. For a discrete memoryless source „X‟ with six symbols x1, x2,….. x6, find a
Compact code for every
If the probability distribution is as follows:
Calculate entropy of the source, average length of the code, efficiency and redundancy of the
code.
Solution:
i) Entropy of the source:
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
is given as
Putting the values in above equation
= (0.3)(2)+(0.25)2+(0.2)2+(0.12)3+(0.08)4+(0.05)4
= 2.38
iv) To obtain the code efficiency
Code efficiency is given by
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
= 0.99
v) To obtain the code redundancy of the code:
Redundancy is given as,
Here 0.01 indicates that there are 1% of redundant bits in the code.
vi) Huffman coding:
1) To obtain the codeword
15
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
ARITHMETIC CODING
Principle:
The codes are calculated for short messages
The number of characters in the encoded message depends upon precision of the number that
can be represented.
As the length of the message increases, the number of significant digits in the codeword also
increases.
Example1:
With the following symbols and their probability of occurrence, encode the message “wen#”
using arithmetic coding algorithm.
Symbol e n t w #
Probability 0.3 0.3 0.2 0.1 0.1
Solution:
Step1: First all the probabilities in cumulative way.
16
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
In the above figure observe that probability of ‗e‘ is assigned the range 0 to 0.3. Similarly the
probability of ‗n‘ is assigned the range 0.3 to 0.6.
Step 2: Encoding of „we‟ in „went#‟
The first character is ‗w‘ in the string went#. Observer that ‗w‘ lies in the cumulative probability
range of 0.8 to 0.9. Hence the final codeword will lie in the range of 0.8 to 0.9. now the total range of
probabilities will be from 0.0 to 0.9 only.
Based on the above calculations, the cumulative probabilities are indicated in figure.
17
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
rd
Fig c: cumulative probabilities of occurrence at 3 character place
Step 4:
In fig (c), observe that ‗n‘ has the cumulative probability range of 0.809 to 0.818. hence to determine
th
the probability of occurrence of 4 character in the string after ‗wen‘.
Character Cumulative
e 0.809+0.3X0.009 = 0.8117
n 0.8117+0.3X0.009 = 0.8144
t 0.8144+0.3X0.009 = 0.8162
w 0.8162+0.3X0.009 = 0.8171
# 0.8171+0.3X0.009 = 0.818
Based on the above calculations, the cumulative probabilities are indicated in figure.
th
Fig d: cumulative probabilities of occurrence at 4 character place
Above figure shows that the codeword for the string ‗went‘ will lie in between 0.8144≤codeword
≤0.8162
Step 5: Encoding of complete string
In fig d, the cumulative probability of ‗t‘ lies in the range of 0.8144 to 0.8171. Hence w =we
th
have to determine the probability of occurrence of 5 character after ‗went‘.
Character Cumulative
e 0.8144+0.3X0.0018 = 0.81494
n 0.81494+0.3X0.0018 = 0.81548
t 0.81548+0.3X0.0018 = 0.81584
w 0.81584+0.3X0.0018 = 0.81602
18
IT T64 INFORMATION CODING TECHNIQUES
# 0.81602+0.3X0.0018 = 0.8162
Above figure shows that the codeword for the string ‗went#‘ will lie in the range of
th
Fig e: cumulative probabilities of occurrence at 5 character place
0.81602≤codeword ≤0.8162
Any number in the above range will represent the string ‗went#‘.