ECE4007 Information Theory and Coding: DR - Sangeetha R.G
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
I log(s l ) l log(s )
1 2 3 4
How many questions?
5 6 7 8 4
9 10 11 12
13 14 15 16
abaabaababbbaabbabab… ….
0.5 b
source
Intuition on Shannon’s Entropy
n
Why H pi log( pi )
i 1
Suppose you have a long random string of two binary symbols 0 and 1, and the
probability of symbols 0 and 1 are p
0 and 1 p
Ex: 00100100101101001100001000100110001 ….
If any string is long enough say N, it is likely to contain Np0 0’s and Np1 1’s.
The probability of this string pattern occurs is equal to
p p0Np0 p1Np1
Np0 Np1
Hence, # of possible patterns is 1 / p p p1
0
1
# bits to represent all possible patterns is log( p Np0
0 p1 Np1 ) Npi log pi
i 0
1
pi log pi
The average # of bits to represent the symbol is therefore
i 0
More Intuition on Entropy
• Assume a binary memoryless source, e.g., a flip of a coin. How much
information do we receive when we are told that the outcome is heads?
– If it’s a fair coin, i.e., P(heads) = P (tails) = 0.5, we say that the amount
of information is 1 bit.
– H(X)= -[(0.5)log(0.5)+(0.5)log(0.5)]=1bit
– If we already know that it will be (or was) heads, i.e., P(heads) = 1, the
amount of information is zero!
– H(X)= -[(1)log(1)+(0)log(0)]=0 bits
– If the coin is not fair, e.g., P(heads) = 0.9, the amount of information is
more than zero but less than one bit!
Example 1:
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
you’ll measure in nats, if you pick the 10-log, you’ll get Hartleys,
if you pick the 2-log (like everyone else), you’ll get bits.
Self Information
Let
Ofte
n den
Then oted
1
The uncertainty (information) is greatest when
0 0.5 1
Example
Three symbols a, b, c with corresponding probabilities:
What is H(P)?
What is H(Q)?
Entropy: Three properties
1. It can be shown that 0 ≤ H ≤ log N.
H(X ) = − Σ PX (x ) log PX (x ) x ∈X
›
Shorter notation: for X ∼ p, let H(X ) =x− ∑ p(x ) log
(where the summation is over the domain of X ).
p(x )
› Thejoint entropyof (jointly distributed) rvs X and Y with (X, Y ) ∼ p, is
H(X, Y ) = − Σ p(x, y ) log p(x, y )
x,y
This is simply the entropy of the
rv Z = (X, Y ).
Conditional entropy
Let (X, Y ) ∼
› p.
For x ∈ Supp(X ), the random variable Y |X = x is well
› defined. The entropy of Y conditioned on X , is defined by
H(Y |X ) ∶= E H(Y |X = x ) = E H(Y |
x←
X)
X X