0% found this document useful (0 votes)
17 views5 pages

hw2 Sol

The document presents solutions to Homework Assignment No. 2 for a course on Information Theory. It covers various topics including the weak law of large numbers, entropy calculations, and properties of Markov chains. The solutions involve mathematical derivations and examples related to probability distributions and information measures.

Uploaded by

呂尚豪
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

hw2 Sol

The document presents solutions to Homework Assignment No. 2 for a course on Information Theory. It covers various topics including the weak law of large numbers, entropy calculations, and properties of Markov chains. The solutions involve mathematical derivations and examples related to probability distributions and information measures.

Uploaded by

呂尚豪
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

COM 5130 Information Theory Spring 2024

Solution to Homework Assignment No. 2


1. (a) We have
1
log p(X1 , X2 , ..., Xn )1/n = log p(X1 , X2 , ..., Xn )
n
n
1X
= log p(Xi ).
n i=1

Since X1 , X2 , ..., Xn are i.i.d., log p(X1 ), log p(X2 ), ..., log p(Xn ) are i.i.d.. By the
weak law of large numbers, as n → ∞, log p(X1 , X2 , ..., Xn )1/n approaches the
common mean

E[log p(X)] = −H(X)

in probability. Therefore, as n → ∞, p(X1 , X2 , ..., Xn )1/n → 2−H(X) in probability.


(b) We have
n
X
−(1/n) log q(X1 , X2 , ..., Xn ) = −(1/n) log q(Xi ).
i=1

Since X1 , X2 , ..., Xn are i.i.d., − log q(X1 ), − log q(X2 ), ..., − log q(Xn ) are i.i.d..
By the weak law of large numbers, as n → ∞, −(1/n) log q(X1 , X2 , ..., Xn ) ap-
proaches the common mean
X 1
E[− log q(X)] = p(x) log
x∈X
q(x)
X 1 X p(x)
= p(x) log + p(x) log
x∈X
p(x) x∈X q(x)
= H(X) + D(p||q)

in probability.

2. (a) Since
X 1 1 1 1 1
H(X) = p(x) log = · 1 + · 2 + · 3 + · 3 = 1.75
x
p(x) 2 4 8 8

we have

|A(n)
ϵ | ≤ 2
n(H(X)+ϵ)
= 28·(1.75+0.1) = 214.8 ≈ 28526.2.
(n) (n)
As |Aϵ | is an integer, we obtain |Aϵ | ≤ 28526.

1
(b) Denote the number of A′ s in the sequence as nA , the number of B ′ s as nB , the
number of C ′ s as nC , and the number of D′ s as nD . We have
 nA  nB  nC  nD
1 1 1 1
p(x1 , x2 , ..., xn ) = = 2−(nA +2nB +3nC +3nD ) .
2 4 8 8

Since

2−n(H(X)+ϵ) = 2−14.8 ≤ p(x1 , x2 , ..., xn ) ≤ 2−n(H(X)−ϵ) = 2−13.2

we obtain nA + 2nB + 3nC + 3nD = 14, and also nA + nB + nC + nD = 8. The


following are all possible combinations of (nA , nB , nC , nD ) along with the total
number of different permutations for each combination.

(nA , nB , nC , nD ) total number of permutations


(5, 0, 0, 3) 8!/(5!3!) = 56
(5, 0, 1, 2) 8!/(5!1!2!) = 168
(4, 2, 0, 2) 8!/(4!2!2!) = 420
(5, 0, 2, 1) 8!/(5!2!1!) = 168
(4, 2, 1, 1) 8!/(4!2!1!1!) = 840
(3, 4, 0, 1) 8!/(3!4!1!) = 280
(5, 0, 3, 0) 8!/(5!3!) = 56
(4, 2, 2, 0) 8!/(4!2!2!) = 420
(3, 4, 1, 0) 8!/(3!4!1!) = 280
(2, 6, 0, 0) 8!/(2!6!) = 28

Hence there are totally 2716 sequences in the typical set. Since 211 = 2048 <
2716 < 212 = 4096, we should have Rn = 12, which gives R = 3/2.

3. For Ln ⊆ X n such that |Ln | ≤ 2Rn , where R < H, we have

P {Ln } = P Ln ∩ A(n) + P Ln ∩ A(n)c


 
ϵ ϵ
X  (n)c
≤ P (x) + P Aϵ .
(n)
x∈Ln ∩Aϵ

 (n)  (n)c
For sufficiently large n, we have P Aϵ > 1 − ϵ, which implies P Aϵ < ϵ. Also
X X
P (x) ≤ 2−n(H−ϵ) ≤ |Ln | 2−n(H−ϵ) ≤ 2−n(H−R−ϵ) .
(n) (n)
x∈Ln ∩Aϵ x∈Ln ∩Aϵ

Hence, we obtain
P {Ln } < 2−n(H−R−ϵ) + ϵ
which can be made arbitrarily small for sufficiently large n. Therefore, we have

lim P {Ln } = 0.
n→∞

2
4. (a) By the chain rule of entropy, we have
1 1
H (X1 , X2 , . . . , Xn ) − H (X1 , X2 , . . . , Xn+1 )
n n+1
1
= H (X1 , X2 , . . . , Xn )
n
1
− [H (Xn+1 |X1 , X2 , . . . , Xn ) + H (X1 , X2 , . . . , Xn )]
n+1
1 1
= H (X1 , X2 , . . . , Xn ) − H (Xn+1 |X1 , X2 , . . . , Xn )
n (n + 1) n+1
1
= [H (X1 ) + H (X2 |X1 ) + · · · + H (Xn |X1 , X2 , . . . , Xn−1 )]
n (n + 1)
1
− H (Xn+1 |X1 , X2 , . . . , Xn ) .
n+1
Since the process is stationary, we have

H (X1 ) + H (X2 |X1 ) + · · · + H (Xn |X1 , X2 , . . . , Xn−1 )


= H (Xn+1 ) + H (Xn+1 |Xn ) + · · · + H (Xn+1 |X2 , X3 , . . . , Xn ) .

Therefore, we have
1 1
H (X1 , X2 , . . . , Xn ) − H (X1 , X2 , . . . , Xn+1 )
n n+1
1
= [H (Xn+1 ) + H (Xn+1 |Xn ) + · · · + H (Xn+1 |X2 , X3 , . . . , Xn )]
n (n + 1)
n
− H (Xn+1 |X1 , . . . , Xn )
n (n + 1)
n−1
1 X
= [H (Xn+1 |Xn , Xn−1 , . . . , Xn−i+1 ) − H (Xn+1 |X1 , X2 , . . . , Xn )] ≥ 0
n (n + 1) i=0

as conditioning would reduce entropy.


(b) By the stationarity and the chain rule of entropy, we have

H (X0 |X−1 , X−2 , . . . , X−n ) = H (X0 , X−1 , . . . , X−n ) − H (X−1 , X−2 , . . . , X−n )
= H (Xn , Xn−1 , . . . , X0 ) − H (Xn , Xn−1 , . . . , X1 )
= H (X0 |X1 , X2 , . . . , Xn ) .

5. (a) Suppose P = [Pij ] is an n × n doubly stochastic transition probability matrix.


Let
µ = (1/n, 1/n, . . . , 1/n)
be a uniform distribution of length n and let pi denote the ith row of P , i.e.,

pi = (Pi1 , Pi2 , . . . , Pin )

3
for 1 ≤ i ≤ n. Then we have
n
1X
µP = pi
n i=1
n n n
!
1 X X X
= Pi1 , Pi2 , . . . , Pin =µ
n i=1 i=1 i=1
Pn
since i=1 Pij = 1, for all 1 ≤ j ≤ n. Thus, the uniform distribution is a
stationary distribution for a Markov chain with a doubly stochastic transition
probability matrix.
(b) Let P = [Pij ] be an n × n transition probability
Pn matrix for a Markov chain. We
have Pij ≥ 0, for all 1 ≤ i, j ≤ n, and j=1 Pij = 1, for all 1 ≤ i ≤ n. Suppose
µ = (1/n, 1/n, . . . , 1/n)
is a stationary distribution for this Markov chain. We then have
n n n
!
1 X X X
µP = Pi1 , Pi2 , . . . , Pin = µ
n i=1 i=1 i=1

which yields
n
X
Pij = 1
i=1
for all 1 ≤ j ≤ n. Therefore, P is doubly stochastic.
6. (a) Let P = [Pij ] be the transition probability matrix for this Markov chain where
Pij = P (Xn = j|Xn−1 = i). It is clear that the alphabet for Xn is {0, 1, 2}. When
Xn−1 = 0, the only possible value of Xn is 1. When Xn−1 = 1, Xn can be 0, 1, 2
with probabilities 1/4, 1/2, 1/4, respectively. When Xn−1 = 2, again the only
possible value of Xn is 1. We then obtain
 
0 1 0
P =  1/4 1/2 1/4  .
0 1 0
(b) Let µ = (µ0 , µ1 , µ2 ) be the stationary distribution, and then µ = µP . We have


 µ1 /4 = µ0
µ0 + (µ1 /2) + µ2 = µ1


 µ1 /4 = µ2
µ0 + µ1 + µ2 = 1

which yields µ0 = 1/6, µ1 = 2/3, µ2 = 1/6. Hence limn→∞ P (Xn = 0) = 1/6,


limn→∞ P (Xn = 1) = 2/3, and limn→∞ P (Xn = 2) = 1/6. Therefore, we obtain
 
1 2 3 1
lim H(Xn ) = log 6 + log + log 6
n→∞ 6 3 2 6
1
= log 3 − log 2.
3
4
(c) We have
2 X
2
X 1
lim H(Xn |Xn−1 ) = µi Pij log
n→∞
i=0 j=0
Pij
 
2 1 1 1
= log 4 + log 2 + log 4 = log 2.
3 4 2 4

(d) We have

lim I(Xn ; Xn−1 ) = lim [H(Xn ) − H(Xn |Xn−1 )]


n→∞ n→∞
= lim H(Xn ) − lim H(Xn |Xn−1 )
n→∞ n→∞
4
= log 3 − log 2.
3

You might also like