hw2 Sol
hw2 Sol
Since X1 , X2 , ..., Xn are i.i.d., log p(X1 ), log p(X2 ), ..., log p(Xn ) are i.i.d.. By the
weak law of large numbers, as n → ∞, log p(X1 , X2 , ..., Xn )1/n approaches the
common mean
Since X1 , X2 , ..., Xn are i.i.d., − log q(X1 ), − log q(X2 ), ..., − log q(Xn ) are i.i.d..
By the weak law of large numbers, as n → ∞, −(1/n) log q(X1 , X2 , ..., Xn ) ap-
proaches the common mean
X 1
E[− log q(X)] = p(x) log
x∈X
q(x)
X 1 X p(x)
= p(x) log + p(x) log
x∈X
p(x) x∈X q(x)
= H(X) + D(p||q)
in probability.
2. (a) Since
X 1 1 1 1 1
H(X) = p(x) log = · 1 + · 2 + · 3 + · 3 = 1.75
x
p(x) 2 4 8 8
we have
|A(n)
ϵ | ≤ 2
n(H(X)+ϵ)
= 28·(1.75+0.1) = 214.8 ≈ 28526.2.
(n) (n)
As |Aϵ | is an integer, we obtain |Aϵ | ≤ 28526.
1
(b) Denote the number of A′ s in the sequence as nA , the number of B ′ s as nB , the
number of C ′ s as nC , and the number of D′ s as nD . We have
nA nB nC nD
1 1 1 1
p(x1 , x2 , ..., xn ) = = 2−(nA +2nB +3nC +3nD ) .
2 4 8 8
Since
Hence there are totally 2716 sequences in the typical set. Since 211 = 2048 <
2716 < 212 = 4096, we should have Rn = 12, which gives R = 3/2.
(n) (n)c
For sufficiently large n, we have P Aϵ > 1 − ϵ, which implies P Aϵ < ϵ. Also
X X
P (x) ≤ 2−n(H−ϵ) ≤ |Ln | 2−n(H−ϵ) ≤ 2−n(H−R−ϵ) .
(n) (n)
x∈Ln ∩Aϵ x∈Ln ∩Aϵ
Hence, we obtain
P {Ln } < 2−n(H−R−ϵ) + ϵ
which can be made arbitrarily small for sufficiently large n. Therefore, we have
lim P {Ln } = 0.
n→∞
2
4. (a) By the chain rule of entropy, we have
1 1
H (X1 , X2 , . . . , Xn ) − H (X1 , X2 , . . . , Xn+1 )
n n+1
1
= H (X1 , X2 , . . . , Xn )
n
1
− [H (Xn+1 |X1 , X2 , . . . , Xn ) + H (X1 , X2 , . . . , Xn )]
n+1
1 1
= H (X1 , X2 , . . . , Xn ) − H (Xn+1 |X1 , X2 , . . . , Xn )
n (n + 1) n+1
1
= [H (X1 ) + H (X2 |X1 ) + · · · + H (Xn |X1 , X2 , . . . , Xn−1 )]
n (n + 1)
1
− H (Xn+1 |X1 , X2 , . . . , Xn ) .
n+1
Since the process is stationary, we have
Therefore, we have
1 1
H (X1 , X2 , . . . , Xn ) − H (X1 , X2 , . . . , Xn+1 )
n n+1
1
= [H (Xn+1 ) + H (Xn+1 |Xn ) + · · · + H (Xn+1 |X2 , X3 , . . . , Xn )]
n (n + 1)
n
− H (Xn+1 |X1 , . . . , Xn )
n (n + 1)
n−1
1 X
= [H (Xn+1 |Xn , Xn−1 , . . . , Xn−i+1 ) − H (Xn+1 |X1 , X2 , . . . , Xn )] ≥ 0
n (n + 1) i=0
H (X0 |X−1 , X−2 , . . . , X−n ) = H (X0 , X−1 , . . . , X−n ) − H (X−1 , X−2 , . . . , X−n )
= H (Xn , Xn−1 , . . . , X0 ) − H (Xn , Xn−1 , . . . , X1 )
= H (X0 |X1 , X2 , . . . , Xn ) .
3
for 1 ≤ i ≤ n. Then we have
n
1X
µP = pi
n i=1
n n n
!
1 X X X
= Pi1 , Pi2 , . . . , Pin =µ
n i=1 i=1 i=1
Pn
since i=1 Pij = 1, for all 1 ≤ j ≤ n. Thus, the uniform distribution is a
stationary distribution for a Markov chain with a doubly stochastic transition
probability matrix.
(b) Let P = [Pij ] be an n × n transition probability
Pn matrix for a Markov chain. We
have Pij ≥ 0, for all 1 ≤ i, j ≤ n, and j=1 Pij = 1, for all 1 ≤ i ≤ n. Suppose
µ = (1/n, 1/n, . . . , 1/n)
is a stationary distribution for this Markov chain. We then have
n n n
!
1 X X X
µP = Pi1 , Pi2 , . . . , Pin = µ
n i=1 i=1 i=1
which yields
n
X
Pij = 1
i=1
for all 1 ≤ j ≤ n. Therefore, P is doubly stochastic.
6. (a) Let P = [Pij ] be the transition probability matrix for this Markov chain where
Pij = P (Xn = j|Xn−1 = i). It is clear that the alphabet for Xn is {0, 1, 2}. When
Xn−1 = 0, the only possible value of Xn is 1. When Xn−1 = 1, Xn can be 0, 1, 2
with probabilities 1/4, 1/2, 1/4, respectively. When Xn−1 = 2, again the only
possible value of Xn is 1. We then obtain
0 1 0
P = 1/4 1/2 1/4 .
0 1 0
(b) Let µ = (µ0 , µ1 , µ2 ) be the stationary distribution, and then µ = µP . We have
µ1 /4 = µ0
µ0 + (µ1 /2) + µ2 = µ1
µ1 /4 = µ2
µ0 + µ1 + µ2 = 1
(d) We have