0% found this document useful (0 votes)
61 views44 pages

ECE4007 Information Theory and Coding: DR - Sangeetha R.G

This document discusses information theory and coding. It provides an overview of concepts like entropy, joint entropy, conditional entropy, and their properties. Entropy is a measure of uncertainty in a data source and is defined as the expected value of the information content. Joint entropy measures the total uncertainty in two random variables, while conditional entropy measures the remaining uncertainty of one variable given knowledge of the other. These entropy measures play an important role in characterizing communication channels in information theory.

Uploaded by

Tanmoy Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views44 pages

ECE4007 Information Theory and Coding: DR - Sangeetha R.G

This document discusses information theory and coding. It provides an overview of concepts like entropy, joint entropy, conditional entropy, and their properties. Entropy is a measure of uncertainty in a data source and is defined as the expected value of the information content. Joint entropy measures the total uncertainty in two random variables, while conditional entropy measures the remaining uncertainty of one variable given knowledge of the other. These entropy measures play an important role in characterizing communication channels in information theory.

Uploaded by

Tanmoy Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

ECE4007

INFORMATION THEORY AND


CODING
Dr.Sangeetha R.G
Associate Professor Senior
SENSE
Syllabus
Module:2 Entropy 6 hours CO: 2
Uncertainty, self-information, average information, mutual
information and their properties - Entropy and information
rate of Markov sources - Information measures of
continuous random variables.
Basic Information Theory
What is information?

• Can we measure information?


• Consider the two following sentences:

1. There is a traffic jam on I5


2. There is a traffic jam on I5 near Exit 234

Sentence 2 seems to have more information than that of


sentence 1. From the semantic viewpoint, sentence 2 provides
more useful information.
What is information?
• It is hard to measure the “semantic” information!
• Consider the following two sentences

1. There is a traffic jam on I5 near Exit 160


2. There is a traffic jam on I5 near Exit 234

It’s not clear whether sentence 1 or 2 would have more information!


What is information?

• Let’s attempt at a different definition


of information.
– How about counting the number of
letters in the two sentences:

1. There is a traffic jam on I5 (22 letters)


2. There is a traffic jam on I5 near Exit 234 (33 letters)

Definitely something we can measure and compare!


What is information?
• First attempt to quantify information by Hartley (1928).

– Every symbol of the message has a choice of s possibilities.


– A message of length l , therefore can have sl distinguishable possibilities .

– Information measure is then the logarithm of sl

I  log(s l )  l log(s )

Intuitively, this definition makes sense:


one symbol (letter) has the information of log(s ) then a sentence of length l
should have l times more information, i.e. l log s
How about we measure information as the number of
Yes/No questions one has to ask to get the correct
answer to a simple game below

How many questions?


1 2
2
3 4

1 2 3 4
How many questions?
5 6 7 8 4
9 10 11 12

13 14 15 16

Randomness due to uncerntainty of where the circle is!


Shannon’s Information
Theory
The
Claude Shannon:
Shannon A Mathematical Theory of Communication
Bell System Technical Journal, 1948

 Shannon’s measure of information is the number of bits to


represent the amount of uncertainty (randomness) in a
data source, and is defined as entropy
n
H   pi log( pi )
i 1

Where there are n symbols 1, 2, … n, each with


probability of occurrence of pi
Shannon’s Entropy
• Consider the following string consisting of symbols a and
b:

abaabaababbbaabbabab… ….

– On average, there are equal number of a and b.


– The string can be considered as an output of a below
source with equal probability of outputting symbol a or b:
0.5 a

We want to characterize the average


information generated by the source!

0.5 b

source
Intuition on Shannon’s Entropy
n
Why H   pi log( pi )
i 1
Suppose you have a long random string of two binary symbols 0 and 1, and the
probability of symbols 0 and 1 are p
0 and 1 p
Ex: 00100100101101001100001000100110001 ….
If any string is long enough say N, it is likely to contain Np0 0’s and Np1 1’s.
The probability of this string pattern occurs is equal to

p  p0Np0 p1Np1
 Np0  Np1
Hence, # of possible patterns is 1 / p  p p1
0
1
# bits to represent all possible patterns is log( p  Np0
0 p1 Np1 )   Npi log pi
i 0
1
  pi log pi
The average # of bits to represent the symbol is therefore

i 0
More Intuition on Entropy
• Assume a binary memoryless source, e.g., a flip of a coin. How much
information do we receive when we are told that the outcome is heads?

– If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the amount
of information is 1 bit.
– H(X)= -[(0.5)log(0.5)+(0.5)log(0.5)]=1bit
– If we already know that it will be (or was) heads, i.e., P(heads) = 1, the
amount of information is zero!
– H(X)= -[(1)log(1)+(0)log(0)]=0 bits
– If the coin is not fair, e.g., P(heads) = 0.9, the amount of information is
more than zero but less than one bit!

– Intuitively, the amount of information received is the same if P(heads) =


0.9 or P (heads) = 0.1.
H(X)= -[(0.1)log(0.1)+(0.9)log(0.9)]=0.4689bits
Self Information
• So, let’s look at it the way Shannon did.
• Assume a memoryless source with
– alphabet A = (a1, …, an)
– symbol probabilities (p1, …, pn).
• How much information do we get when finding out that
the next symbol is ai?
• According to Shannon the self information of ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.

For both the events to happen, the probability is


pA ¢ pB. However, the amount of information
should be added, not multiplied.

No, we want the information to increase with decreasing


probabilities, so let’s use the negative logarithm.
Self Information

Example 1:

Example 2:

Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
Self Information

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty


about the following symbol.
Entropy

Example: Binary Memoryless Source


BMS 01101000…

Let

Ofte
n den
Then oted

1
The uncertainty (information) is greatest when

0 0.5 1
Example
Three symbols a, b, c with corresponding probabilities:

P = {0.5, 0.25, 0.25}

What is H(P)?

Three weather conditions in Corvallis: Rain, sunny, cloudy with


corresponding probabilities:

Q = {0.48, 0.32, 0.20}

What is H(Q)?
Entropy: Three properties
1. It can be shown that 0 ≤ H ≤ log N.

2. Maximum entropy (H = log N) is reached


when all symbols are equiprobable, i.e.,
pi = 1/N.

3. The difference log2 (N)– H is called the


redundancy of the source.
Properties of Self Information

Note that I(xi) satisfies the following properties.


1. I(xi) = 0 for P(xi) = 1
2. I(xi) ≥ 0
3.I(xi) > I(xj) if P(xi) < P(xj)
4.I(xi, xj) = I(xi) + I(xj) if xi and xj are independent
Joint entropy
› Recall that the entropy of rv X over X , is defined by

H(X ) = − Σ PX (x ) log PX (x ) x ∈X

Shorter notation: for X ∼ p, let H(X ) =x− ∑ p(x ) log
(where the summation is over the domain of X ).
p(x )
› Thejoint entropyof (jointly distributed) rvs X and Y with (X, Y ) ∼ p, is
H(X, Y ) = − Σ p(x, y ) log p(x, y )
x,y
This is simply the entropy of the
rv Z = (X, Y ).
Conditional entropy
Let (X, Y ) ∼
› p.
For x ∈ Supp(X ), the random variable Y |X = x is well
› defined. The entropy of Y conditioned on X , is defined by
H(Y |X ) ∶= E H(Y |X = x ) = E H(Y |
x←
X)
X X

Measures the uncertainty in Y given


› Let pX & pY |X be the marginal & conational distributions induced by p.
X.
p (xX ) ⋅ H(Y |
X= x
H(Y |X ) = Σ
x )
∈X
Conditional and Joint Entropies
 Using the input probabilities P (xi), output probabilities P
(yj), transition probabilities P (yj/xi) and joint probabilities P
(xi, yj), various entropy functions for a channel with m inputs
and n outputs are defined

H (X)= − ∑P(xi)log 2 p(xi)

H (Y)= − ∑P(y𝑗 )log 2 p(𝑦𝑗)


j=1
Contd…

The joint entropy H (X, Y) is the average uncertainty of the


communication channel as a whole. Few useful relationships
among the above various entropies are as under:
a.H (X, Y) = H (X/Y) + H (Y)
b.H (X, Y) = H (Y/X) + H (X)
c.H (X, Y) = H (X) + H (Y)
d.H (X/Y) = H (X, Y) – H (Y)

X and Y are statisticallyindependent


Contd…

The conditional entropy or conditional uncertainty of X given


random variable Y is the average conditional entropy over Y.
The joint entropy of two discrete random variables X and Y is
merely the entropy of their pairing: (X, Y), this implies that if
X and Y are independent, then their joint entropy is the sum of
their individual entropies.
The Mutual Information:
 Mutual information measures the amount of information that can
be obtained about one random variable by observing another.
 It is important in communication where it can be used to
maximize the amount of information shared between sent and
received signals.
 The mutual information denoted by I (X, Y) of a channel is
defined by:
I (X; Y)= H(X)− H(X|Y) bits/symbol
 Since H (X) represents the uncertainty about the channel input
before the channel output is observed and H (X/Y) represents
the uncertainty about the channel input after the channel output
is observed, the mutual information I (X;Y) represents the
uncertainty about the channel input that is resolved by observing
the channel output.
Properties of Mutual Information I (X; Y)
 I (X; Y) = I(Y; X)
 I (X; Y) ≥ 0
 I (X; Y) = H (Y) – H (Y/X)
 I (X; Y) = H(X) + H(Y) – H(X,Y)

 The Entropy corresponding to mutual information [i.e. I (X,


Y)] indicates a measure of the information transmitted
through a channel. Hence, it is called ‘Transferred
information’.
BSC
Mutual Information
Conditional Self Information
Average Mutual Information
Average Self Information
Information Measures for Continuous
Random Variables

You might also like