0% found this document useful (0 votes)
87 views18 pages

Prob 02

1) The document provides solutions to selected problems from Chapter 2 on information theory. 2) Solution 1 proves Fano's inequality, relating the conditional entropy H(X|Y) to the probability of error Pe in guessing X when given Y. 3) Solution 2 proves the Csiszár sum identity, which expresses the mutual information between a sequence of random variables in terms of their pairwise mutual informations.

Uploaded by

Tuhin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views18 pages

Prob 02

1) The document provides solutions to selected problems from Chapter 2 on information theory. 2) Solution 1 proves Fano's inequality, relating the conditional entropy H(X|Y) to the probability of error Pe in guessing X when given Y. 3) Solution 2 proves the Csiszár sum identity, which expresses the mutual information between a sequence of random variables in terms of their pairwise mutual informations.

Uploaded by

Tuhin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Solutions to Selected Problems in Chapter 2 1

SOLUTIONS TO SELECTED PROBLEMS IN CHAPTER 2

.. Prove Fanos inequality.


Solution: Let
1 if x = y,
E=
0 otherwise.

Then,

H(X |Y) = H(X, E |Y)


= H(E |Y) + H(X |E, Y)
H(E) + P{E = 0} H(X | E = 0, Y) + P{E = 1} H(X | E = 1, Y)
= H(Pe ) + Pe log | X |
1 + Pe log | X |.

.. Prove the Csiszr sum identity.


Solution: The identity can be proved by induction, or more simply using the chain
rule of mutual information. Notice that by chain rule the mutual information be-
tween Z n and W can be expanded in n! ways, depending on how we order the
element of Z n . For instance, these are both valid expansions

I(Z n ; W) = I(Zi ; W |Z i1 )
n

i=1

= I(Z j ; W |Z nj+1 ),
n

j=1

where the two orderings Z n = (Z1 , . . . , Zn ) and Z n = (Zn , . . . , Z1 ) have been used.
Therefore, we have

I(Xi+1 ; Yi |Y i1 , U ) = I(X j ; Yi |Y i1 , X nj+1 , U )


n n n
n (a)

i=1 i=1 j=i+1

= I(X j ; Yi |Y i1 , X nj+1 , U )
n j1
(b)

j=2 i=1

= I(X j ; Y j1 | X nj+1 )
n
(c)

j=2

= I(X j ; Y j1 | X nj+1 )
n
(d)

j=1

= I(Y i1 ; Xi | Xi+1 ),
n
n

i=1
2 Solutions to Selected Problems in Chapter 2

where (a) and (c) follow from chain rule of mutual information, (b) is obtained by
switching the order in the summations, and finally (d) follows from the fact that
Y0 = .
.. Prove the properties of jointly typical sequences with () constants explicitly spec-
ified.
Solution:
(a) If (x n , y n ) T(n) , then for all (x, y) X Y

|(x, y|x n , y n ) p(x, y)| p(x, y),

or equivalently

p(x, y) p(x, y) (x, y|x n , y n ) p(x, y) + p(x, y).

By summing up over x X , we get

(p(x, y) p(x, y)) (x, y|x n , y n ) (p(x, y) + p(x, y)),


xX xX xX

and therefore

p(y) p(y) (y| y n ) p(y) + p(y),

i.e., y n T(n) (Y). Similarly, by summing up over y Y , we get x n T(n) (X).


(b) By definition,

log p(y n |x n ) = log pY|X (yi |xi )


n

i=1

= n(x, y|x n , y n ) log p(y|x).


(x, y)X Y

But since (x n , y n ) T(n) (X, Y), for any (x, y) X Y

p(x, y) p(x, y) (x, y|x n , y n ) p(x, y) + p(x, y).

Therefore,

log p(y n |x n ) n(p(x, y) p(x, y)) log p(y|x)


(x, y)X Y

= n(H(Y | X) ()),

where () = H(Y|X). Similarly log p(y n |x n ) n(H(Y|X) + n()). Hence


p(y n |x n ) 2nH(Y|X) .
Solutions to Selected Problems in Chapter 2 3

(c) First, we establish the upper bound on |T(n) (Y|x n )|. Since

2n(H(Y|X)+()) p(y n |x n ) 1,
y :(x , y )T() y :(x , y )T()

|T(n) (Y|x n )| 2n(H(Y|X)+()) .


For the lower bound, let {k(x, y) : (x, y) X Y } be any set of nonnegative
integers such that if (x, y|x n , y n ) = n1 k(x, y) for all (x, y), then (x n , y n )
T(n) . Now consider

log | T(n) (Y |x n )| log k(x, y)! log k(x, y)!


1 1
n n xX yY yY

(1 n )
k(x, y) log k(x, y) k(x, y) log k(x, y)
n xX yY yY yY

yY k(x, y)
= (1 n ) log ,
k(x, y)
xX yY n k(x, y)

where the last inequality comes from Stirlings formula for some n such that
n 0 as n . Note that

1
k(x, y) p(x, y) p(x, y), and
n

p(x, y) p(x, y) p(x, y) + p(x, y).
k(x, y)
n

Substituting in the above inequality, we obtain

yY p(x, y) p(x, y)
log | T(n) (Y |x n )| (1 n ) p(x, y) p(x, y) log
1
n p(x, y)>0
p(x, y) + p(x, y)
yY p(x, y) p(x, y)
= (1 n ) p(x, y) p(x, y) log
p(x, y)>0
p(x, y) + p(x, y)
p(x) p(x)
= (1 n ) p(x, y) p(x, y) log
p(x, y)>0
p(x, y) + p(x, y)

= (1 n ) p(x, y) log
p(x) p(x)
p(x, y) log
p(x, y)>0
p(x, y) p(x, y)>0
p(x, y)
1
+ (1 ) log H(Y | X) as 0 and n .
1+
4 Solutions to Selected Problems in Chapter 2

(d) Consider

P{( X n , Y n ) T(n) } = p X (xi )pY ( yi )


n

(x , y )T() i=1

2nH(X) 2nH(Y)
(x , y )T()

| T(n) |2nH(X) 2nH(Y)


2nH(X,Y) 2nH(X) 2nH(Y)
2nI(X;Y) .

Solution:
(a) i.Since (x n , y n ) T(n) (X, Y), we have |(x, y|x n , y n )| p(x, y) (x, y)
(X, Y)
yY |(x, y|x n , y n ) p(x, y)| yY p(x, y) (x, y) (X, Y)
|(x|x n ) p(x)| p(x)
x n T(n) (X).
Similarly, we can prove y n T(n) (Y).
ii.Since x n T(n) (X), we have |(x|x n ) p(x)| p(x) x X.
By setting (x) = lo PX (x) in the Typical Average Lemma, we have

(1 )E(lo PX (x)) lo PX (xi ) (1 + )E(lo PX (x))


1 n
n i=1

(1 )H(X) lo PX (xi ) (1 + )H(X).


(n)
1
n i=1

Since p(x n ) = ni=1 PX (xi ), we have (1 )H(X) n1 P(x n ) (1 +


)H(X).
Thus, 2n(H(X)+()) p(x n ) 2n(H(X)()) where () = H(X).
Similarly, we have 2n(H(Y)+()) p(y n ) 2n(H(Y)()) where () =
H(Y)
iii.From next property, we have 2n(H(X,Y)+H(X,Y)) p(x n , y n ) 2n(H(X,Y)H(X,Y)) .
From the first property, we have 2n(H(Y)H(Y)) p(y1 ) 2n(H(Y)+H(Y)) .
From above, we have 2n(H(X|Y)+H(X|Y)) 2n(H(X|Y)H(X|Y)) .
p(x , y )

Thus, 2n(H(X|Y)+()) p(x n |y n ) 2


p(y )
where () = H(X|Y).
n(H(X|Y)())

iv.Since (x , y ) T (X, Y), we have (1 )p(x, y) (x, y|x n , y n )


n n (n)

(1 + )p(x, y) (x, y) X Y.

(1 )p(x, y)(x, y) (x, y|x n , y n )(x, y) (1 + )p(x, y)(x, y).


(x, y)XY (x, y)XY (x, y)XY

(1 )E((x, y)) (xi , yi ) (1 + )E((x, y)).


1 n
n i=1
Solutions to Selected Problems in Chapter 2 5

Let g(x,y) = -log p(x,y).


Then (1)E(lo p(x, y)) n1 ni=1 lo p(xi , yi ) (1+)E(lo p(x, y)).

2n(H(X,Y)+H(X,Y)) p(x n , y n ) 2n(H(X,Y)H(X,Y)) .

(b) From the lower bound of p(x n , y n )

1= p(x n , y n )
(x , y )X Y

p(x n , y n )
(x , y )T() (X,Y)

| T(n) (X, Y)|2n(H(X,Y)+H(X,Y)) .

Thus, |T(n) (X, Y)| 2nH(X,Y)+() where () = H(X, Y).


For the other direction,

1 p{(x n , y n ) T(n) (X, Y)}


|T(n) (X, Y)|2n(H(X,Y)+H(X,Y)) .

Thus, |T(n) (X, Y)| (1 )2n(H(X,Y)H(X,Y)) = 2n(H(X,Y)()) where () =


H(X, Y) n1 lo (1 ).
Therefore, from above |T(n) (X, Y)| 2n(H(X,Y)) .
(c) Since

1 = p(y n |x n )
y Y

p(y n |x n )
y T() (Y)

| T(n) (Y |x n )|2n(H(Y|X)H(Y|X))

, we have |T(n) (Y|X)| 2n(H(Y|X)+()) where () = H(Y|X).


(d) Since Y = (X), we have p(x, y) = p(x) if y = (x) and p(x, y) = 0 otherwise.
If yi = (x1 ) for i [1 : n], then |(x, y|x n , y n ) p(x, y)| = |(x|x n ) p(x)|
p(x) = p(x, y).
If y n T(n) and if i [1 : n] such that yi = (xi ). |(xi , yi )|x n , y n )
p(x, y)| p(xi , yi )
|(xi , yi )|x n , y n )| 0.

yi = (xi )i [1 : n].

Hence, from above, y n T(n) (Y|x n ) iff yi = (xi ) for i [1 : n].


6 Solutions to Selected Problems in Chapter 2

(X), we have |(x|x ) p(x)| < p(x)


(e) Since x n T(n) x X.
n

By Law of Large numbers and Y p(y |x ) ni=1 pY|X (yi |xi ), we have
n n n

(y|y n ) p(y|x) Further, if we fix the positions that all the xi in the positions
equal x, the distribution of yi on such positions will also converge to p(y|x)
for large enough n. It means that for n large enough, such that

(x , y )|x n , y n )
p(y|x) p(y|x).

(x|x )
i i
n

(xi , yi )|x n , y n )
(1 )(1 )p(x)p(y|x) (x |x n ) (1 + )(1 + )p(x)p(y|x).
(x|x n )
(1 )p(x, y) (xi , yi )|x n , y n ) (1 + )p(x, y) where = + > .
lim P(x n , y n ) T(n) (X, Y) = 1.
n

(f) If x n Tn and < , by Conditional Typicality Lemma,

1 < P{(x n , y n ) T(n) (X, Y)}


= P{y n T(n) (Y |x n )}
| T(n) (Y |x n )|2nH(Y|X)() .

|T(n) (Y|x n )| (1 )2n(H(Y|X)()) .


.. Inequalities. Label each of the following statements with =, , or . Justify each
answer.
(a) H(X|Z) vs. H(X|Y) + H(Y|Z).
(b) h(X + Y) vs. h(X), if X and Y are independent continuous random variables.
(c) h(X + aY) vs. h(X + Y), if Y N(0, 1) is independent of X and a 1.
(d) I(X 2 ; Y 2 ) vs. I(X1 ; Y1 ) + I(X2 ; Y2 ), if p(y 2 |x 2 ) = p(y1 |x1 )p(y2 |x2 ).
(e) I(X 2 ; Y 2 ) vs. I(X1 ; Y1 ) + I(X2 ; Y2 ), if p(x 2 ) = p(x1 )p(x2 ).
(f) I(aX+Y ; bX) vs. I(X+Y/a; X), if Y N(0, 1) is independent of X and a, b = 0.

Solution:
(a) Consider

H(X |Z) H(X, Y |Z) = H(Y |Z) + H(X |Y , Z) H(Y |Z) + H(X |Y).

(b) Consider
h(X + Y) h(X + Y |Y) = h(X |Y) = h(X).

(c) Let aY = Y1 + Y2 , where Y1 N(0, 1) and Y2 N(0, a2 1) are independent.


Then from part (b), h(X + aY) = h(X + Y1 + Y2 ) h(X + Y1 ) = h(X + Y).
Solutions to Selected Problems in Chapter 2 7

(d) Since Y1 X1 X2 Y2 ,
I(X1 , X2 ; Y1 , Y2 ) = H(Y1 , Y2 ) H(Y1 , Y2 | X1 , X2 )
= H(Y1 , Y2 ) H(Y1 | X1 , X2 ) H(Y2 | X1 , X2 , Y1 )
= H(Y1 , Y2 ) H(Y1 | X1 ) H(Y2 | X2 )
= I(X1 ; Y1 ) + I(X2 ; Y2 ) I(Y1 ; Y2 )
I(X1 ; Y1 ) + I(X2 ; Y2 ).
(e) Since X1 and X2 are independent,
I(X1 , X2 ; Y1 , Y2 ) = H(X1 , X2 ) H(X1 , X2 |Y1 , Y2 )
= H(X1 ) + H(X2 ) H(X1 |Y1 , Y2 ) H(X2 | X1 , Y1 , Y2 )
= I(X1 ; Y1 , Y2 ) + I(X2 ; X1 , Y1 , Y2 )
I(X1 ; Y1 ) + I(X2 ; Y2 ).
(f) Since a, b = 0,
I(aX + Y ; bX) = h(aX + Y) h(aX + Y |bX)
= h(aX + Y) h(Y)
1 1
= h(aX + Y) + log h(Y) + log
a a
= h X + h
Y Y
a a
= I X + ; X .
Y
a
.. Mrs. Gerbers Lemma. Let H 1 : [0, 1] [0, 1/2] be the inverse of the binary
entropy function.
(a) Show that H(H 1 (u) p) is convex in u for every p [0, 1].
(b) Use part (a) to prove the scalar MGL
H 1 (H(Y |U )) H 1 (H(X |U )) p.
(c) Use part (b) and induction to prove the vector MGL
H(Y n |U ) H(X n |U )
H 1 H 1 p.
n n
Solution:
(a) We have the following chain of inequalities:
H(Y |U ) = H(X + Z |U )
= EU [H(X + Z |U = u)] (.)
= EU [H(H (H(X |U = u)) p)]
1
(.)
H(H (EU (H(X |U = u))) p)
1
(.)
= H(H (H(X |U )) p),
1
8 Solutions to Selected Problems in Chapter 2

where (.) follows from the definition of conditional entropy, (.) follows
from the fact that Z Bern(p) is independent of (X, U ), and finally (.) is ob-
tained using the convexity of H(H 1 (u) p) in u. Since H 1 : [0, 1] [0, 1/2]
is an increasing function, by taking H 1 to both side of the above inequality,
we have
H 1 (H(Y |U )) H 1 (H(X |U )) p.

(b) We use induction to show the inequality in the part (b). The base case when
n = 1 follows from the part (a). Assume the inequality holds for n 1. Then
we have the following chain of inequalities:
H(Y n |U ) H(Y n1 |U ) H(Yn |Y1 , U )
= +
n1

n n n
H(Y n1 |U ) n 1 H(Yn |Y1 , U )
= +
n1

n1 n n
|U ) n 1 H(Yn |Y1 , U )
H H 1 p + ) (.)
n1 n1
H(X
n1 n n
H(X n1 |U ) n 1 H(Xn + Zn |Y1 , X1 , U )
H H 1 p + )
n1 n1

n1 n n
(.)
H(X n1 |U ) n 1 H(Xn + Zn |X1 , U )
= H H 1 p + )
n1

n1 n n
(.)
H(X n1 |U ) n 1 H(H (H(Xn |X1 , U )) p)
H H 1 p )
1
+
n1

n1 n n
(.)
H(X n1 |U ) n 1 H(Xn |X1 , U )
H H 1 + p (.)
n1

n1 n n
H(X n |U )
= H H 1 p .
n
Here (.) follows from the induction hypothesis; (.) follows from the fact
that conditioning reduces entropy; (.) follows from the fact that Xn + Zn
is independent of Y1n1 conditioned on X1n1 , U ; (.) is obtained by part (a);
finally, (.) follows from the fact that H(H 1 (u) p) is convex in u.
By taking H 1 to both sides of the above inequality, we have
H(Y n |U ) H(X n |U )
H 1 H 1 p.
n n

.. Differential entropy and MSE. Let (X, Y) = (X n , Y k ) f (x n , y k ) be a pair of


random sequences with covariance matrices KX and KY , respectively, and cross-
covariance matrix KXY = E[(X E(X))(Y E(Y))T ]. Let X be an estimate of X

given Y and K be the covariance matrix of the error (X X).
Solutions to Selected Problems in Chapter 2 9

(a) Using the fact that for any U f (un ),

h(U) log (2e)n |KU | ,


1
2
show that
h(X|Y) log (2e)n |KX|Y | .
1
2

(b) In particular, show that

h(X|Y) log (2e)n |KX KXY KY1 KYX | .


1
2

Solution:
is a function of Y,
(a) Since X

h(X|Y) = h(X X|Y)

h(X X)

log((2e)n |K |).
1
2
be the minimum mean square linear estimator of X given Y; i.e. X
(b) Let X =
1
KXY KY Y (assuming zero means for both X and Y). Then

h(X|Y) log((2e)n |K |)
1
2
= log (2e)n |KX KXY KY1 KYX | .
1
2

.. Maximum differential entropy. Let X f (x) be a zero-mean random variable and


X f (x ) be a zero-mean Gaussian random variable with the same variance as
X.
(a) Show that

f (x) log f X (x)dx = f X (x) log f X (x)dx = h(X ).

(b) Using part (a) and the nonnegativity of the relative entropy, conclude that

h(X) = D( f X | | f X ) f (x) log f X (x)dx h(X )

with equality iff X is Gaussian.


(c) Following similar steps, show that if X f (x) is a zero-mean random vec-
tor and X f (x ) is a zero-mean Gaussian random vector with the same
covariance matrix, then
h(X) h(X )

with equality iff X is Gaussian.


10 Solutions to Selected Problems in Chapter 2

Solution:
(a) Let X N(0, 2 ). Since f X (x) = exp( 2x 2 ), we have
2
1
2 2

f X (x) log f X (x)dx = f X (x) log(2 2 ) 2 dx


1 x2
2 2

= f X (x) log(2 2 ) 2 dx
(a) 1 x2
2 2
= f X (x) log f X (x)dx = h(X ),

where equality (a) follows since E(X )2 = EX 2 .


(b) By the nonnegativity of relative entropy,

h(X) = f X (x) log f X (x)dx


f X (x)
= f X (x) log + log f X (x) dx
f X (x)
= D( f X | | f X ) f X (x) log f X (x)dx

h(X ).

(c) Let X N(0, K). Since fX (x) = 1


1 exp( 12 xT K1 x), we have
(2) 2 |K| 2

fX (x) log fX (x)dx = fX (x) log(2) 2 |K| 2 xT K1 x


1

= fX (x) log(2) 2 |K| 2 xT K1 x


1

= fX (x) log fX (x)dx.

Hence, following similar steps as in (b), we have

h(X) = D( fX | | fX ) + h(X ) h(X ).

.. Maximum conditional differential entropy. Let (X, Y) = (X n , Y k ) f (x n , y k ) be


a pair of random sequences with covariance matrices KX = E[(X E(X))(X
E(X))T ] and KY = E[(Y E(Y))(Y E(Y))T ], respectively, and cross-covariance
matrix KXY = E[(X E(X))(Y E(Y))T ]. Let KX|Y = E[(X E(X|Y))(X E(X|Y))T ]
be the covariance matrix of the error vector of the minimum mean square error
(MMSE) estimate of X given Y.
(a) Show that
h(X|Y) log (2e)n |KX|Y |
1
2
with equality if (X, Y) are jointly Gaussian.
Solutions to Selected Problems in Chapter 2 11

(b) Show that


h(X|Y) log (2e)n |KX KXY KY1 KYX |
1
2

with equality if (X, Y) are jointly Gaussian.


Solution:
(a) Consider

h(X |Y) = h(X E(X |Y)|Y) h(X E(X |Y)) log(2e)n |K X|Y |
1
2
with equality if (X, Y) are jointly Gaussian, where the last inequality follows
from Problem .
(b) Let X = aY the linear MMSE estimator of X given Y. Then,

h(X |Y) = h(X aY |Y)


h(X aY)
log(2e)n |E[(X aY)(X aY)T ]|
1
2
log(2e)n |K X K XY KY 1 KY X |
1
2

with equality if (X, Y) are jointly Gaussian.


.. Hadamard inequality. Let Y n N(0, K). Use the fact that

h(Y n ) log (2e)n Kii


n
1
2 i=1

to prove Hadamards inequality

det(K) Kii .
n

i=1

Solution: Hadamards inequality follows immediately since

log(2e)n |K | = h(Y n ) log (2e)n Kii .


n
1 1
2 2 i=1

.. Conditional entropy power inequality. Let X f (x) and Z f (z) be independent


random variables and Y = X + Z. Then by the EPI,

22h(Y) 22h(X) + 22h(Z)

with equality iff both X and Z are Gaussian.


(a) Show that log(2x + 2 y ) is convex in (x, y).
12 Solutions to Selected Problems in Chapter 2

(b) Let X n and Z n be conditionally independent given an arbitrary random vari-


able U , with conditional densities f (x n |u) and f (z n |u), respectively. Use part
(a), the scalar EPI, and induction to prove the conditional EPI

22h(X + 22h(Z

|U )/n |U )/n |U )/n
22h(Y .

Solution:
(a) The Hessian matrix of (u, ) = log(2u + 2 ) is
2 2
(u, ) =
2 (u)2
2
u
2

u ()2
1 1
= (ln 2)2 .
+u
2
(2u + 2 )2 1 1

Since 2 (u, ) is positive semidefinite, (u, ) is convex.


(b) For n = 1, the result is immediate from the scalar EPI. We use induction to
prove the vector case. Assuming the vector EPI holds for n 1, we will show it
also holds for n using part (a) and the scalar EPI. Consider

n 1 2h(Y n1 ) 1
2h(Y n )/n = + 2h(Yn |Y n1 )
n n1 n
n1
log(2 (1) + 2 (1) ) + 2h(Yn |Y n1 ),
2( 1 ) 2( 1 ) 1
n n
where the inequality comes from the induction assumption. If we show that

2h(Yn |Y n1 ) log(2h(X |X + 2h(Z |Z ),


1 1
) )
(.)

then combining the two inequalities we have


n1
2h(Y n )/n log(2 (1) + 2 (1) ) + log(2h(X |X ) + 2h(Z |Z ) )
2( 1 ) 2( 1 ) 1 1 1

n n
log(2 2h(X )/n
+2 2h(Z )/n
), (.)

where the last inequality comes from the convexity of f (u, ). Now it remains
to show inequality (.).

2h(Yn |Y n1 ) 2h(Yn | X n1 , Z n1 ) (.)


= E(2h(Yn | X n1
=x n1
,Z n1
=z n1
))
E(log 2h(X |X + 2h(Z |X ) (.)
1 1 1 1 1
=x ,Z =z ) =x 1 ,Z 1 =z 1 )

= E(log 2h(X |X + 2h(Z |Z )


1 1 1 1
=x ) =z )

log 2h(X |X + 2h(Z |Z


1 1
) )
, (.)

where
Solutions to Selected Problems in Chapter 2 13

inequality (.) comes from data processing inequality,


inequality (.) comes from the scalar EPI, and
inequality (.) comes from the convexity of (u, ).

Now assume for n 1, equality happens in all the above inequalities iff X n1
and Z n1 are Gaussian with K X(n1) = aKZ(n1) . We show that the same is true
for n. First, we find the equality condition for each of the above inequalities.
Equality happens

in (.) iff h(X 1 )h(Z 1 )


n1
= h(Xn |X n1 ) h(Zn |Z n1 ),

in (.) iff if the distribution of Yn depends only on X n1 and Z n1 only


through the sum X n1 + Z n1 which implies

E(Yn | X n1 + Z n1 ) = E(Xn + Zn | X n1 + Z n1 )
= E(Xn | X n1 ) + E(Xn | X n1 )

has to be a function of (X n1 + Z n1 ), which happens only if E(Xn |X n1 )


and E(Zn |Z n1 ) are the same linear function up to some additive constant.
in (.) iff the distributions F(xn |x n1 ) and F(zn |z n1 ) are Gaussian,
in (.) iff Var(Xn |X n1 = x n1 ), Var(Zn |Z n1 = z n1 ) are independent of
x n1 and z n1 , respectively.

Note that if X n and Z n are Gaussian with K X = aKZ , then all the conditions
are satisfied.
To prove the necessity we can see that from the equality conditions in (.),
(.), (.), X n and Z n must be Gaussian. To show K Xn = aKZn , denote

K Xn =
T
A (n1)(n1) B(n1)1
B1(n1) C11

A (n1)(n1) B(n1)1
KZn = .
T

B1(n1) C11

for some a. Since


Now by the induction hypothesis, A = K Xn1 = aKZn1 = a A,
n n
X and Z are Gaussian we have

E(Xn | X n1 ) = BA1 X n1
E(Zn |Z n1 ) = B (A )1 Z n1 ,

and from the equality constraint in (.) and the fact that A = aA, we can
conclude B = aB. Lastly from the equality condition in (.) we conclude
that C = aC, and hence K Xn = aKZn .
14 Solutions to Selected Problems in Chapter 2

.. Entropy rate of a stationary source. Let X = {Xi } be a discrete stationary random


process.
(a) Show that
H(X n ) H(X n1 )
for n = 2, 3, . . . .
n n1

(b) Conclude that the entropy rate

H(X n )
H(X) = lim
n n
is well-defined.
(c) Show that for a continuous stationary ergodic process Y = {Yi },

h(Y n ) h(Y n1 )
for n = 2, 3, . . . .
n n1

Solution:
(a) We first show that H(Xn |X n1 ) is decreasing in n. By stationarity, we have

H(Xn+1 | X n ) H(Xn+1 | X2n )


= H(Xn | X n1 ).

Now we have

H(X n ) H(X n1 ) + H(Xn |X n1 )


=
n n
H(X ) 1 H(X n1 )
= H(Xn | X n1 )
n1

n1 n n1
H(X n1 ) 1
= H(Xi | X i1 ) H(Xn | X n1 )
n1
1
n1 n n 1 i=1
H(X n1 ) 1
H(Xn | X n1 ) H(Xn | X n1 )
n1
1
n1 n n 1 i=1
H(X n1 )
=
n1
.

(b) Since H(X )


n
is decreasing in n, the limit H(X ) = limn H(X )
n
H(X )
n
for all
n.
.. Worst noise for estimation. Let X N(0, P) and Z be independent of X with zero
mean and variance N. Show that the minimum MSE of estimating X given X + Z
is upper bounded as

E(X E(X | X + Z))2


PN
P+N
Solutions to Selected Problems in Chapter 2 15

with equality if Z is Gaussian. Thus, Gaussian noise is the worst noise if the input
to the channel is Gaussian.
Solution: Since the (nonlinear) MMSE is upper bounded by the linear MMSE,

E[(XE(X | X+Z))2 ] E[(X (X+Z))2 ] = E X+ Z =


2
P N P PN
P+N P+N P+N P+N
with equality if Z is Gaussian.
.. Worst noise for information. Let X and Z be independent, zero-mean random
variables with variances P and Q, respectively.
(a) Show that
h(X | X + Z) log
1 2ePQ
2 P +Q
with equality iff both X and Z are Gaussian. (Hint: Use the maximum differ-
ential entropy lemma or the EPI or Problem ..)
(b) Let X and Z be independent zero-mean Gaussian random variables with
variances P and Q, respectively. Use part (a) to show that

I(X ; X + Z ) I(X ; X + Z)

with equality iff Z is Gaussian.


Solution:
(a) By problem , we have

h(X | X + Z) lo (2eE[(X E(X | X + Z))2 ]) 2e


1 1 PQ
2 2 P +Q
with equality if both X and Z are Gaussian.
(b) We have

I(X ; X + Z) = h(X ) h(X | X + Z)

log(2eP) log(2e )
1 1 PQ
2 2 P +Q
= h(X ) h(X | X + Z )
= I(X ; X + Z ).

.. Variations on the joint typicality lemma. Let (X, Y , Z) p(x, y, z) and 0 < < .
Prove the following statements.
(a) Let (X n , Y n ) ni=1 p X,Y (xi , yi ) and Z n ni=1 pZ|X (zi |xi ), conditionally in-
dependent of Y n given X n . Then

P{(X n , Y n , Z n ) T(n) (X, Y , Z)} 2nI(Y ;Z|X) .


16 Solutions to Selected Problems in Chapter 2

(b) Let (x n , y n ) T(n) n Unif(T (n) (Z|x n )). Then


(X, Y) and Z

P{(x n , y n , Z n ) T(n) (X, Y , Z)} 2nI(Y ;Z|X) .

n be an arbitrary sequence, and Z n p(z n |x n ), where


(X), y
(c) Let x n T(n)

() if zn T(n) (Z|x n ),
=1 p| (z |x )
p(z |x ) = P{Z T (Z|x )}
0
n n

otherwise.

Then
P{(x n , yn , Z n ) T(n) (X, Y , Z)} 2n(I(Y ;Z|X)()) .

Solution:

(a)
P(X n , Y n , Z n ) T(n) (X, Y , Z)) = p(x n , y n )p(z n |x n )
(x , y ,z )T() (X,Y ,Z)

| T(n) (X, Y , Z)|2nH(X,Y) 2nH(Z|X)


2nH(X,Y ,Z) 2nH(X,Y) 2nH(Z|X)
= 2n(I(Y ;Z|X)) .

(b)
P(x n , y n , Z n ) T(n) (X, Y , Z)) = p(z n |x n )
z T() (Z|x , y )

=
1
z T() (Z|x , y )
|T(n) (Z|x n )|

2H(Z|X,Y) 2n(H(Z|X)
= 2n(I(Y ;Z|X)) .

(c) Since P{z n T(n) (Z|x n )} 1 for n sufficiently large,

P(x n , yn , Z n ) T(n) (X, Y , Z)) = p(z n |x n )


zT() (Z|x , y )


2n(H(Z|X)())
zT(Z|x , y )
1


1 nH(Z|X,Y) n(H(Z|X)()
1
2 2

2nH(Z|X,Y) 2n(H(Z|X)() .
Solutions to Selected Problems in Chapter 2 17

.. Jointly typical triples. Given (X, Y , Z) p(x, y, z), let

An = (x n , y n , z n ) : (x n , y n ) T(n) (X, Y),


(y n , z n ) T(n) (Y , Z), (x n , z n ) T(n) (X, Z).

(a) Show that |An | 2n(H(X,Y)+H(Y ,Z)+H(X,Z)+())/2 . (Hint: First show that |An |
2n(H(X,Y)+H(Z|Y)+()) .)
(b) Does a corresponding lower bound hold?
Solution:
(a) Let

Bn := {(x n , y n , z n ) : (x n , y n ) T(n) (X, Y), (y n , z n ) T(n) (Y , Z)}.

Since An Bn , |An | |Bn |.


Consider

| Bn | = |{(x n , y n , z n ) : y n T(n) (Y), x n T(n) (X | y n ), z n T(n) (Z | y n )}|



= {(x n , y n , z n ) : x n T(n) (X | y n ), z n T(n) (Z | y n )}
()
y T (Y)

| T(n) (Y)| | T(n) (X | y n )| | T(n) (Z | y n )|


2n(H(Y)+()) 2n(H(X|Y)+()) 2n(H(Z|Y)+())
= 2n(H(X,Y)+H(Z|Y)+3()) .

Similarly, by defining Cn := {(x n , y n , z n ) : (y n , z n ) T(n) (Y , Z), (x n , z n )


T(n) (X, Z)}, we have |An | |Cn | 2n(H(X,Z)+H(Y|Z)+3()) . Combining these
two upper bounds on |An | yields

| An | 2 2n(H(X,Y)+H(Z|Y)+3()) 2n(H(X,Z)+H(Y|Z)+3())
= 2n(H(X,Y)+H(X,Z)+H(Z|Y)+H(Y|Z)+6() ,

which implies that

| An | 2n(H(X,Y)+H(X,Z)+H(Z|Y)+H(Y|Z)+6())/2
2n(H(X,Y)+H(X,Z)+H(Y ,Z)+6())/2
2n(H(X,Y)+H(X,Z)+H(Y ,Z)+ ())/2 .

(b) This bound is not tight. For the random variables X, Y , Z always satisfying
X = Y = Z, we have |An | 2n(H(X)()) from the upper bound of the typical
set. However, the bound in the problem reduces to |An | 2n( 2 H(X)+()) .
3

.. and . Let (X, Y) be a pair of independent Bern(1/2) random variables. Let


k = (n/2)(1 + ) and x n be a binary sequence with k s followed by n k s.
18 Solutions to Selected Problems in Chapter 2

(a) Check that x n T(n) (X).


(b) Let Y n be an i.i.d. Bern(1/2) sequence, independent of x n . Show that

P(x n , Y n ) T(n) (X, Y) P Yi < (k + 1)/2 ,


k

i=1

which converges to 1/2 as n . Thus, the fact that x n T(n) (X) and Y n
ni=1 pY|X (yi |xi ) does not necessarily imply that P(x n , Y n ) T(n) (X, Y).
Remark: This problem illustrates that in general we need > in the conditional
typicality lemma.
Solution:
(a) Since (1|x n ) = = 2 n 2 n = 12 (1 + ) = p X (1)(1 + ). We can also
k (1+)
(1+)

know (1|x n ) = 2 (1 ). Thus, x n T(n) (X).


n
k 1
n
(b) There should be approximately half s and half s in the first k bits and half s
and half s in the last (n k) bits in y n so

P{(x n , y n ) T(n) (X, Y)} = P{ (1 ) Yi (1 + )}P{ (1 ) Yi (1 + )}


k n
n n n n
4 i=1 4 4 i=k+1
4

P{ Yi (1 + )}
k
n
i=1 4
k+1
P{ <
k
}.
i=1 2

The last inequality comes from k+1 > n/2(1+) = n4 (1 + ) as n .


Since ki=1 Yi < k+1
2 2
is approximately half s in the first k-bits, which has prob-
ability 2 . Thus, the fact that x n T(n) (X) and Y n ni=1 pY|X (yi |xi ) doesnt
2
1

necessarily imply that P(x n , Y n ) T(n) (X, Y) 1.

You might also like