Entropy, Relative Entropy and Mutual Information

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

12 Entropy, Relative Entropy and Mutual Information

since −t log t ≥ 0 for 0 ≤ t ≤ 1 , and is strictly positive for t not equal to 0 or 1.


Therefore the conditional entropy H(Y |X) is 0 if and only if Y is a function of X .

6. Conditional mutual information vs. unconditional mutual information. Give


examples of joint random variables X , Y and Z such that

(a) I(X; Y | Z) < I(X; Y ) ,


(b) I(X; Y | Z) > I(X; Y ) .

Solution: Conditional mutual information vs. unconditional mutual information.

(a) The last corollary to Theorem 2.8.1 in the text states that if X → Y → Z that
is, if p(x, y | z) = p(x | z)p(y | z) then, I(X; Y ) ≥ I(X; Y | Z) . Equality holds if
and only if I(X; Z) = 0 or X and Z are independent.
A simple example of random variables satisfying the inequality conditions above
is, X is a fair binary random variable and Y = X and Z = Y . In this case,

I(X; Y ) = H(X) − H(X | Y ) = H(X) = 1

and,
I(X; Y | Z) = H(X | Z) − H(X | Y, Z) = 0.
So that I(X; Y ) > I(X; Y | Z) .
(b) This example is also given in the text. Let X, Y be independent fair binary
random variables and let Z = X + Y . In this case we have that,

I(X; Y ) = 0
Entropy, Relative Entropy and Mutual Information 17
and,
(b) Since 0 ≤ H(X2 |X1 )I(X; Y |2 )Z)
≤ H(X ==H(X
H(X1 ) , |we
Z)have
= 1/2.
So I(X; Y ) < I(X; Y | Z) . Note that H(X2 |X1 ) X, Y, Z are not markov.
0 ≤in this case ≤ 1
H(X1 )
7. Coin weighing. Suppose one has n coins, among 0 ≤ ρ ≤which
1. there may or may not be one
counterfeit coin. If there is a counterfeit coin, it may be either heavier or lighter than
(c) coins.
the other ρ = 0 The
iff I(X 1 ; Xare
coins 2) =
to0 be
iff weighed
X1 and by
X2 aare independent.
balance.
(d) ρ = 1 iff H(X2 |X1 ) = 0 iff X2 is a function of X1 . By symmetry, X1 is a
(a) Find an upperofbound
function on the
X2 , i.e., X1 number
and X2 of coins
have n so thatrelationship.
a one-to-one k weighings will find the
counterfeit coin (if any) and correctly declare it to be heavier or lighter.
12. Example of joint entropy. Let p(x, y) be given by
(b) (Difficult) What is the coin weighing strategy for k = 3 weighings and 12 coins?

Solution: Coin weighing. ! Y


! 0 1
X !situations
(a) For n coins, there are 2n + 1 possible or “states”.
• One of the n coins is heavier. 0 1
3
1
3
• One of the n coins is lighter.
• They are all of equal weight. 1 0 1
3

Find

(a) H(X), H(Y ).


(b) H(X | Y ), H(Y | X).
(c) H(X, Y ).
(d) H(Y ) − H(Y | X).
(e) I(X; Y ) .
(f) Draw a Venn diagram for the quantities in (a) through (e).

Solution: Example of joint entropy

(a) H(X) = 2
3 log 32 + 1
3 log 3 = 0.918 bits = H(Y ) .
(b) H(X|Y ) = 3 H(X|Y = 0) + 3 H(X|Y
1 2
= 1) = 0.667 bits = H(Y |X) .
(c) H(X, Y ) = 3 × 3 log 3 = 1.585 bits.
1

(d) H(Y ) − H(Y |X) = 0.251 bits.


(e) I(X; Y ) = H(Y ) − H(Y |X) = 0.251 bits.
(f) See Figure 1.

13. Inequality. Show ln x ≥ 1 − 1


x for x > 0.
Solution: Inequality. Using the Remainder form of the Taylor expansion of ln(x)
about x = 1 , we have for some c between 1 and x
! " ! " 2
or
1
ln y ≥ 1 −
y
with equality iff y = 1 .

14. Entropy of a sum. Let X and Y be random variables that take on values x 1 , x2 , . . . , xr
and y1 , y2 , . . . , ys , respectively. Let Z = X + Y.

(a) Show that H(Z|X) = H(Y |X). Argue that if X, Y are independent, then H(Y ) ≤
H(Z) and H(X) ≤ H(Z). Thus the addition of independent random variables
adds uncertainty.
(b) Give an example of (necessarily dependent) random variables in which H(X) >
H(Z) and H(Y ) > H(Z).
(c) Under what conditions does H(Z) = H(X) + H(Y ) ?

Solution: Entropy of a sum.

(a) Z = X + Y . Hence p(Z = z|X = x) = p(Y = z − x|X = x) .


!
H(Z|X) = p(x)H(Z|X = x)
! !
= − p(x) p(Z = z|X = x) log p(Z = z|X = x)
x z
! !
= p(x) p(Y = z − x|X = x) log p(Y = z − x|X = x)
x y
!
= p(x)H(Y |X = x)
= H(Y |X).
Entropy, Relative Entropy and Mutual Information 19

If X and Y are independent, then H(Y |X) = H(Y ) . Since I(X; Z) ≥ 0 ,


we have H(Z) ≥ H(Z|X) = H(Y |X) = H(Y ) . Similarly we can show that
H(Z) ≥ H(X) .
(b) Consider the following joint distribution for X and Y Let
Entropy, Relative Entropy and Mutual
! Information 19
1 with probability 1/2
X = −Y =
0 withH(Y
If X and Y are independent, then probability
|X) = 1/2H(Y ) . Since I(X; Z) ≥ 0 ,
we have H(Z) ≥ H(Z|X) = H(Y |X) = H(Y ) . Similarly we can show that
Then H(X) = H(Y ) = 1 , but Z = 0 with prob. 1 and hence H(Z) = 0 .
H(Z) ≥ H(X) .
(c) We have
(b) Consider the following joint distribution for X and Y Let
H(Z) ≤ H(X, Y ) ≤ H(X) + H(Y )
!
because Z is a function of (X, Y ) and1 H(X, withY )probability
= H(X) + 1/2
H(Y |X) ≤ H(X) +
X = −Y =
0 with probability
H(Y ) . We have equality iff (X, Y ) is a function of Z and H(Y 1/2) = H(Y |X) , i.e.,
X and Y are independent.
Then H(X) = H(Y ) = 1 , but Z = 0 with prob. 1 and hence H(Z) = 0 .
15. Data processing. Let X1 → X2 → X3 → · · · → Xn form a Markov chain in this
(c) We
order; i.e., have
let
xn ) =≤
p(x1 , x2 , . . . ,H(Z) H(X,
p(x Y ) ≤ H(X) + H(Y )
1 )p(x2 |x1 ) · · · p(xn |xn−1 ).

Reducebecause 2 , .is
I(X1 ; XZ . . ,aXfunction
n ) to its of (X, Y form.
simplest ) and H(X, Y ) = H(X) + H(Y |X) ≤ H(X) +
H(Y ) . We have equality iff (X, Y ) is a function of Z and H(Y ) = H(Y |X) , i.e.,
Solution: Data Processing. By the chain rule for mutual information,
X and Y are independent.
I(X1 ; X2 , . . . , Xn ) = I(X1 ; X2 ) + I(X1 ; X3 |X2 ) + · · · + I(X1 ; Xn |X2 , . . . , Xn−2 ). (2.20)
15. Data processing. Let X1 → X2 → X3 → · · · → Xn form a Markov chain in this
order;
By i.e., let property, the past and the future are conditionally independent given
the Markov
the present and hencep(xall 1terms . , xn ) the
, x2 , . .except = p(x 1 )p(x
first |x1 ) ·Therefore
are 2zero. · · p(xn |xn−1 ).

Reduce I(X1 ; X2 , . . . , XnI(X


) to1 ; its
X2 ,simplest form.
. . . , Xn ) = I(X1 ; X2 ). (2.21)
Solution: Data Processing. By the chain rule for mutual information,
16. Bottleneck. Suppose a (non-stationary) Markov chain starts in one of n states, necks
down
I(Xto1; Xk 2< Xn ) =and
, . .n. ,states, 1 ; X2fans
I(Xthen ) + I(X
back1; X m 2>) +
to3 |X · · + I(XThus
k · states. 1 ; XnX
|X1 2→
, . .X
. ,2X→
n−2X).
3 , (2.20)
i.e., p(x1 , x2 , x3 ) = p(x1 )p(x2 |x1 )p(x3 |x2 ) , for all x1 ∈ {1, 2, . . . , n} , x2 ∈ {1, 2, . . . , k} ,
xBy {1, 2,Markov
3 ∈ the . . . , m} .property, the past and the future are conditionally independent given
the present and hence all terms except the first are zero. Therefore
(a) Show that the dependence of X1 and X3 is limited by the bottleneck by proving
that I(X1 ; X3 ) ≤ log k. I(X1 ; X2 , . . . , Xn ) = I(X1 ; X2 ). (2.21)
(b) Evaluate I(X1 ; X3 ) for k = 1 , and conclude that no dependence can survive such
a bottleneck.
16. Bottleneck. Suppose a (non-stationary) Markov chain starts in one of n states, necks
down to k < n states, and then fans back to m > k states. Thus X 1 → X2 → X3 ,
the entropy function. If we assume that n is large enough so that the probability
that n(p − ") ≤ k ≤ n(p + ") is greater than 1 − " , then we see that EK ≥
(1 − ")n(H(p) − 2δ) − 2 , which is very good since nH(p) is an upper bound on the
number of pure random bits that can be produced from the bent coin sequence.

18. World Series. The World Series is a seven-game series that terminates as soon as
either team wins four games. Let X be the random variable that represents the outcome
of a World Series between teams A and B; possible values of X are AAAA, BABABAB,
and BBBAAAA. Let Y be the number of games played, which ranges from 4 to 7.
Assuming that A and B are equally matched and that the games are independent,
calculate H(X) , H(Y ) , H(Y |X) , and H(X|Y ) .
Solution:
World Series. Two teams play until one of them has won 4 games.
There are 2 (AAAA, BBBB) World Series with 4 games. Each happens with probability
(1/2)4 .
#4$
There are 8 = 2 3 World Series with 5 games. Each happens with probability (1/2) 5 .
#5$
There are 20 = 2 3 World Series with 6 games. Each happens with probability (1/2) 6 .
#6$
There are 40 = 2 3 World Series with 7 games. Each happens with probability (1/2) 7 .

The probability of a 4 game series ( Y = 4 ) is 2(1/2) 4 = 1/8 .


The probability of a 5 game series ( Y = 5 ) is 8(1/2) 5 = 1/4 .
The probability of a 6 game series ( Y = 6 ) is 20(1/2) 6 = 5/16 .
The probability of a 7 game series ( Y = 7 ) is 40(1/2) 7 = 5/16 .

% 1
H(X) = p(x)log
p(x)
= 2(1/16) log 16 + 8(1/32) log 32 + 20(1/64) log 64 + 40(1/128) log 128
= 5.8125

% 1
H(Y ) = p(y)log
p(y)
= 1/8 log 8 + 1/4 log 4 + 5/16 log(16/5) + 5/16 log(16/5)
= 1.924

24 Entropy, Relative Entropy and Mutual Information

Y is a deterministic function of X, so if you know X there is no randomness in Y. Or,


H(Y |X) = 0 .
Since H(X) + H(Y |X) = H(X, Y ) = H(Y ) + H(X|Y ) , it is easy to determine
H(X|Y ) = H(X) + H(Y |X) − H(Y ) = 3.889

19. Infinite entropy. This problem shows that the entropy of a discrete random variable
!
can be infinite. Let A = ∞ n=2 (n log n)
2 −1 . (It is easy to show that A is finite by

bounding the infinite sum by the integral of (x log 2 x)−1 .) Show that the integer-
valued random variable X defined by Pr(X = n) = (An log 2 n)−1 for n = 2, 3, . . . ,
has H(X) = +∞ .
Solution: Infinite entropy. By definition, p n = Pr(X = n) = 1/An log 2 n for n ≥ 2 .
Therefore

"
H(X) = − p(n) log p(n)
n=2
"∞ # $ # $
= − 1/An log2 n log 1/An log 2 n
n=2

" log(An log2 n)
=
n=2 An log2 n

" log A + log n + 2 log log n
=
n=2 An log2 n
∞ ∞
" 1 " 2 log log n
= log A + + .
n=2
An log n n=2 An log n
2

The first term is finite. For base 2 logarithms, all the elements in the sum in the last
term are nonnegative. (For any other base, the terms of the last sum eventually all
become positive.) So all we have to do is bound the middle sum, which we do by
comparing with an integral.
∞ % &∞
" 1 ∞ 1 &
> dx = K ln ln x & = +∞ .
n=2
An log n 2 Ax log x 2
2
≥ 0.

Thus,
H(P2 ) ≥ H(P1 ).

29. Inequalities. Let X , Y and Z be joint random variables. Prove the following
inequalities and find conditions for equality.

(a) H(X, Y |Z) ≥ H(X|Z) .


(b) I(X, Y ; Z) ≥ I(X; Z) .
(c) H(X, Y, Z) − H(X, Y ) ≤ H(X, Z) − H(X) .
(d) I(X; Z|Y ) ≥ I(Z; Y |X) − I(Z; Y ) + I(X; Z) .

Solution: Inequalities.

(a) Using the chain rule for conditional entropy,

H(X, Y |Z) = H(X|Z) + H(Y |X, Z) ≥ H(X|Z),

with equality iff H(Y |X, Z) = 0 , that is, when Y is a function of X and Z .
(b) Using the chain rule for mutual information,

I(X, Y ; Z) = I(X; Z) + I(Y ; Z|X) ≥ I(X; Z),

with equality iff I(Y ; Z|X) = 0 , that is, when Y and Z are conditionally inde-
pendent given X .
(c) Using first the chain rule for entropy and then the definition of conditional mutual
information,

H(X, Y, Z) − H(X, Y ) = H(Z|X, Y ) = H(Z|X) − I(Y ; Z|X)


≤ H(Z|X) = H(X, Z) − H(X) ,

with equality iff I(Y ; Z|X) = 0 , that is, when Y and Z are conditionally inde-
pendent given X .
(d) Using the chain rule for mutual information,

I(X; Z|Y ) + I(Z; Y ) = I(X, Y ; Z) = I(Z; Y |X) + I(X; Z) ,

and therefore
I(X; Z|Y ) = I(Z; Y |X) − I(Z; Y ) + I(X; Z) .
We see that this inequality is actually an equality in all cases.

You might also like