0% found this document useful (0 votes)
112 views19 pages

Information Entropy Fundamentals

Uploaded by

maskon.alien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views19 pages

Information Entropy Fundamentals

Uploaded by

maskon.alien
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

IT T64 INFORMATION CODING TECHNIQUES J.

VEERENDESWARI
UNIT I
Information entropy fundamentals: Information – entropy - properties of information and entropy -
relation between information and probability - mutual and self-information - coding theory- code
efficiency and redundancy - Shannon‘s theorem - construction of basic codes-Shannon and Fanon
coding, Huffman coding – arithmetic coding.

1.1 INTRODUCTION
The performance of the communication system is measured in terms of it error probability. The
performance of the system depends upon the available signal power, channel noise and bandwidth;
based on these parameters it is possible to establish the condition for the error less transmission. The
information theory is used for mathematical modeling and analysis for communication system.

1.2 INFORMATION OR SELF INFORMATION


Let us consider the communication system which transmits messages m1 ,m 2,m 3,…, with
probabilities of occurrence p1,p2, p3… the Amount of information transmitted through the message mk
with probability pk is

Amount of information: Ik = log2 (1/Pk)


Unit of information is ‘bits’

1.3 PROPERTIES OF INFORMATION


1. If there is more uncertainty about the message, information carried is also more.
2. If receiver knows the message being transmitted, the amount of information carried is zero
3. If I1 is the information carried by the message m1and I2 is the information carried by the
message m2, then amount of information carried component due to m1 and m2is I1 +I2
N
4. If there are M=2 equally and likely message, then amount of information carried by each
message will be N bits.
Example 1: calculate the amount of information if Pk = ¼
Solution : The information is given as,

1
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

= 2 bits

Example 2: calculate the amount of information if binary digits (binits) occur with equal
likelihood in binary PCM.
Soln:
In binary PCM, there are only two binary levels, i.e 1 or 0. Since they occur with
equal likelihood, their probabilities of occurrence will be,

Hence amount of information carried will be given by equation

Thus the correct identification of binary PCM carries 1 bit of information.


1. Prove the following statement
If there are M equally likely and independent messages, then prove that amount of
N
information carried by each message will be , I= N bits, where M=2 and N is an integer.
Solution: since all the M message is equally likely and independent, probability of occurrence
of each message will be 1/M.
The amount of information is given as,

Here probability of each message is, Pk = 1/M, hence above equation will be

N
We know that M = 2 , hence the above equation will be,

2
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
2. Prove the statement
“If receiver knows the message being transmitted, the amount of information carried is zero”.

Solution: here it is stated that receiver ―knows‖ the message. This means only one message is
transmitted. Hence probability of occurrence of this message will be Pk = 1. This is because only one
message and is occurrence is certain. The amount of information carried by this type of message is,

= 0 bits
This proves the statement that if receiver knows message, the amount of information carried is zero.
As Pk is decreased from 1 to 0, Ik increases monotonically from 0 to infinity. This shows that amount
of information conveyed is greater when receiver correctly identifies less likely messages.
3. Prove the statement
If I1 is the information carried by message m1 and I2 is the information carried by message
m2, then prove that the amount of information carried compositely due to m1 and m2 is , I1,2 = I1+I2

3
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

1.4 ENTROPY
 Entropy of a source is the measure of information.

 Average of Information represented by Entropy.

 Basically source codes try to reduce the redundancy present in the source, and represent
the source with fewer bits that carry more information.

1.5 PROPERTIES OF ENTROPY


1. Entropy is zero if the event is sure or it is impossible. i.e.,
H=0 if Pk =0 or 1
2. When Pk = 1/M for all the ‗M‘ symbols, then the symbols are equally likely. For such source
entropy is given as
H = log2 M
3. upper bound on entropy is given as
Hmax = log2 M

Example 1: Calculate Entropy When Pk =0 and Pk= 1


Soln:

Since Pk = 1 in above equation

4
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

=0
Now consider the second case when Pk = 0. Instead of putting pk = 0 directly let us consider
the limiting case i.e.,

With Pk tending to ‗0‘ above equation will be,


The RHS of above equation is zero when P 0.Hence Entropy will be zero.
H=0
1. Prove the statement
‖If there are M numbers of equally and likely messages, then Entropy of the source are log2 M ―
Proof:
We know that for ‗M‘ number of equally likely messages, probability
is, P = 1/M
This probability is same for all ‗M‘ message.
P1 = P2 = P3 = P4 =…..PM = 1/M
Entropy is given by equation

(Add ‗M‘ number of terms)


In the above equation there are ‗M‘ numbers of terms in summation. Hence after adding these
terms above equation becomes,

2. Prove that the upper bound on entropy is given as Hmax ≤ log2 M .here M is the number of
message emitted by the source.

5
IT T64 INFORMATION CODING TECHNIQUES
…………..(1)
This states that the entropy of zero memory information source with ‗M‘ number of
symbols becomes maximum if and only if all the source symbols are equiprobable.
The source emits ‗M‘ symbol(X=S0,S1,….Sm)with probabilities {P0,P1,…..Pm}.

From equation 1
Multiplying equation(2) by 1 and replacing 1 by ∑ =0 Pk as ∑ =0 Pk = 1[sum of probabilities]

To convert this equality sign into inequality sign, the property of natural algorithm is used

6
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

1.6 TYPES OF ENTROPY

1.6.1Joint entropy

The joint entropy of two discrete random variables X and Y is merely the entropy of their pairing:
(X,Y). This implies that if X and Y are independent, then their joint entropy is the sum of their
individual entropies.

For example, if (X,Y) represents the position of a chess piece — X the row and Y the column, then the
joint entropy of the row of the piece and the column of the piece will be the entropy of the position of
the piece.

Despite similar notation, joint entropy should not be confused with cross entropy.

1.6.2 Conditional entropy (equivocation)

The conditional entropy or conditional uncertainty of X given random variable Y (also called the
equivocation of X about Y) is the average conditional entropy over Y

7
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

Because entropy can be conditioned on a random variable or on that random variable being a certain
value, care should be taken not to confuse these two definitions of conditional entropy, the former of
which is in more common use. A basic property of this form of conditional entropy is that:

1.7 MUTUAL INFORMATION


Mutual information of the channel is the average amount of information gained by the transmitter
when the state of the receiver is known. It is represented by I(X,Y)
=

Mutual information I(X;Y) represents the amount of uncertainty about the channel input after the
channel output has been observed.
The mutual information I(X ;Y) has the following important properties:
1. The mutual information of a channel is symmetric, that is
I(X ;Y) = I(Y;X)
Where I(X;Y) represents the amount of uncertainty about the channel input that is resolved by observing the channel
output and I(Y;X) is the measure of uncertainty the channel output that is resolved by sending the channel input.

From bayes rules of conditional probabilities

(2)
By substituting the equation 1 and 2 , interchanging the order of summation , we get

Hence proved.
8
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
2. The mutual information is always nonnegative I(X;Y) ≥ 0
The joint probability

(3)

(4)
Sub eqn 4 in 1, we get

With fundamentally inequality I(X;Y) ≥ 0


With equality if, only if

I(X;Y) is zero when the input and output symbols are statistically independent.
3. The mutual information of a channel is related to the joint entropy of the channel input
and channel output is given by

1.8 CODING THEORY

Coding theory is one of the most important and direct applications of information theory. It can be
subdivided into source coding theory and channel coding theory. Using a statistical description for
data, information theory quantifies the number of bits needed to describe the data, which is the
information entropy of the source.

9
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
 Data compression (source coding): There are two formulations for the compression problem:

1. Lossless data compression: the data must be reconstructed exactly;


2. Lossy data compression: allocates bits needed to reconstruct the data, within a specified
fidelity level measured by a distortion function. This subset of Information theory is called
rate–distortion theory.
3. Error-correcting codes (channel coding): While data compression removes as much
redundancy as possible, an error correcting code adds just the right kind of redundancy (i.e.,
error correction) needed to transmit the data efficiently and faithfully across a noisy channel.

This division of coding theory into compression and transmission is justified by the information
transmission theorems, or source–channel separation theorems that justify the use of bits as the
universal currency for information in many contexts. However, these theorems only hold in the
situation where one transmitting user wishes to communicate to one receiving user. In scenarios with
more than one transmitter (the multiple-access channel), more than one receiver (the broadcast
channel) or intermediary "helpers" (the relay channel), or more general networks, compression
followed by transmission may no longer be optimal. Network information theory refers to these
multi-agent communication models.
Source Coding refers to the conversion of symbols for a source into a binary data suitable for
transmission. The objective of source coding is to minimize the average bit rate required for
representation of source. Code length and efficiency are the terms related to source coding.
The design of a variable length code such that its average code word length approaches the
entropy DMS is often referred to as entropy coding. There are 2 types
1. Shannon-Fano Coding
2. Huffman Coding

Code efficiency

The Code efficiency is denotes η and is defined as η = Lmin /L where L min Minimum
Possible Value of L, when η approaches unity, the code is said to be efficient [ (i.e) η = 1].

η = H(x) /L

SHANNON – FANO CODING

10
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
The method of encoding is directed towards constructing reasonably efficient separable binary
code for a source without memory. Let {S} be ensemble of the messages to be transmitted and [P] be
their corresponding probabilities.

It is desired to associate a word or sequence(Ck) of binary numbers of unspecified length(Ik) to


each message (Sk) such that:

1. No codeword Ck can be obtained from each other by adding more binary boits to
the shorter sequence.
2. The transmission of the encoded message is reasonably efficient, that is 1 and 0 appear
independently and with almost equal probabilities.

Coding algorithm:

Step1: List the symbols in the descending order in accordance with their probabilities.

Step2: Partition the symbol set into two most equiprobable subsets {x1} and {x2).

Step3: Assign ‗0‘ to each symbol contained in one subset and ‗1‘ to each symbols in the other subset.

Step4: repeat the same procedure for subsets of P{x1} and {x2}, until each subsets contain single
symbol, that is {x1} will be partitioned into the subsets {x11} and {x12}. Now the codeword
corresponding to a ,message in x12 will begin with 01.

This encoding procedure is said to be an ―Optimum‖ procedure to minimize the average of


message /symbols.

Solution:

11
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

Average code length of the code word

Entropy of the source

HUFFMAN CODING

Step 1: List the source symbols in the order of decreasing probabilities.


Step 2: Splitting: assign a 0 and 1 to the two source symbols of the lowest probability. It is referred to
as splitting stage of coding.
Step 3: Combine these two source symbols to a new source symbols with probability equal to sum of
the two original probabilities.
Step4: Place the new symbol in the list in accordance with its probability value.
Step 5: Repeat the procedure until the final list of source symbol contain only two symbols.
Step 6: Assign 0 and 1 to these two symbols.

12
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
Step 7: Read the codeword of each symbol from the last splitting stage.

Problem: A discreate memory less emits five symbols with probabilities {0.4, 0.1, 0.2, 0.1, 0.2}.
Find Huffman code and its length by placing the combined symbol as high as possible.
Solution:

L = 2.2bits/symbol

Problem 2 : compare the Huffman coding and Shannon – Fano coding algorithms for data
compression. For a discrete memoryless source „X‟ with six symbols x1, x2,….. x6, find a
Compact code for every
If the probability distribution is as follows:

Calculate entropy of the source, average length of the code, efficiency and redundancy of the
code.
Solution:
i) Entropy of the source:
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

For six message above equation becomes,


= P1log2 (1/P1) + P2log2 (1/P2)+ P3log2 (1/P3)+ P4log2 (1/P4)+ P5log2 (1/P5)+ P6log2
(1/P6) Putting values,
H= 0.3 log2 (1/0.3)+0.25 log2 (1/0.25)+ 0.2 log2 (1/0.2)+ 0.12 log2 (1/0.12)+ 0.08 log2 (1/0.08)+
0.05log2 (1/0.05)
=0.521+0.5+0.4643+0.367+0.2915+0.216
H = 2.3568 bits of information/message

ii) To obtain codewords

iii) To obtain average number of bits per message ( ):

is given as
Putting the values in above equation
= (0.3)(2)+(0.25)2+(0.2)2+(0.12)3+(0.08)4+(0.05)4
= 2.38
iv) To obtain the code efficiency
Code efficiency is given by
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
= 0.99
v) To obtain the code redundancy of the code:
Redundancy is given as,

Here 0.01 indicates that there are 1% of redundant bits in the code.
vi) Huffman coding:
1) To obtain the codeword

15
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

ARITHMETIC CODING
Principle:
  The codes are calculated for short messages
 The number of characters in the encoded message depends upon precision of the number that
 can be represented.
 As the length of the message increases, the number of significant digits in the codeword also
increases.

Example1:
With the following symbols and their probability of occurrence, encode the message “wen#”
using arithmetic coding algorithm.
Symbol e n t w #
Probability 0.3 0.3 0.2 0.1 0.1

Solution:
Step1: First all the probabilities in cumulative way.

16
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI

In the above figure observe that probability of ‗e‘ is assigned the range 0 to 0.3. Similarly the
probability of ‗n‘ is assigned the range 0.3 to 0.6.
Step 2: Encoding of „we‟ in „went#‟
The first character is ‗w‘ in the string went#. Observer that ‗w‘ lies in the cumulative probability
range of 0.8 to 0.9. Hence the final codeword will lie in the range of 0.8 to 0.9. now the total range of
probabilities will be from 0.0 to 0.9 only.

Character Cumulative probability


e 0.8+0.3X0.1 0.83
n 0.83+0.3X0.1 0.86
t 0.86+0.3X0.1 0.88
w 0.88+0.3X0.1 0.89
# 0.89+0.3X0.1 0.9

Fig b: cumulative probabilities of occurrence at second character place


Step 3:
The second character is ‗e‘ in the string ‗went#‘. In fig (b), observe that ‗e‘ has the cumulative
probability range of 0.8 to 0.83. Hence new probabilities of all the characters are calculated in the
range of 0.8 to 0.83.
Character Cumulative probability
e 0.8+0.3X0.03 0.809
n 0.809+0.3X0.03 0.818
t 0.818+0.3X0.03 0.824
w 0.824+0.3X0.03 0.827
# 0.827.3X0.03 0.83

Based on the above calculations, the cumulative probabilities are indicated in figure.

17
IT T64 INFORMATION CODING TECHNIQUES J. VEERENDESWARI
rd
Fig c: cumulative probabilities of occurrence at 3 character place
Step 4:
In fig (c), observe that ‗n‘ has the cumulative probability range of 0.809 to 0.818. hence to determine
th
the probability of occurrence of 4 character in the string after ‗wen‘.
Character Cumulative
e 0.809+0.3X0.009 = 0.8117
n 0.8117+0.3X0.009 = 0.8144
t 0.8144+0.3X0.009 = 0.8162
w 0.8162+0.3X0.009 = 0.8171
# 0.8171+0.3X0.009 = 0.818

Based on the above calculations, the cumulative probabilities are indicated in figure.

th
Fig d: cumulative probabilities of occurrence at 4 character place
Above figure shows that the codeword for the string ‗went‘ will lie in between 0.8144≤codeword
≤0.8162
Step 5: Encoding of complete string
In fig d, the cumulative probability of ‗t‘ lies in the range of 0.8144 to 0.8171. Hence w =we
th
have to determine the probability of occurrence of 5 character after ‗went‘.
Character Cumulative
e 0.8144+0.3X0.0018 = 0.81494
n 0.81494+0.3X0.0018 = 0.81548
t 0.81548+0.3X0.0018 = 0.81584
w 0.81584+0.3X0.0018 = 0.81602

18
IT T64 INFORMATION CODING TECHNIQUES

# 0.81602+0.3X0.0018 = 0.8162

Above figure shows that the codeword for the string ‗went#‘ will lie in the range of

th
Fig e: cumulative probabilities of occurrence at 5 character place
0.81602≤codeword ≤0.8162
Any number in the above range will represent the string ‗went#‘.

You might also like