Entropy, Relative Entropy and Mutual Information
Entropy, Relative Entropy and Mutual Information
Entropy, Relative Entropy and Mutual Information
(a) The last corollary to Theorem 2.8.1 in the text states that if X → Y → Z that
is, if p(x, y | z) = p(x | z)p(y | z) then, I(X; Y ) ≥ I(X; Y | Z) . Equality holds if
and only if I(X; Z) = 0 or X and Z are independent.
A simple example of random variables satisfying the inequality conditions above
is, X is a fair binary random variable and Y = X and Z = Y . In this case,
and,
I(X; Y | Z) = H(X | Z) − H(X | Y, Z) = 0.
So that I(X; Y ) > I(X; Y | Z) .
(b) This example is also given in the text. Let X, Y be independent fair binary
random variables and let Z = X + Y . In this case we have that,
I(X; Y ) = 0
Entropy, Relative Entropy and Mutual Information 17
and,
(b) Since 0 ≤ H(X2 |X1 )I(X; Y |2 )Z)
≤ H(X ==H(X
H(X1 ) , |we
Z)have
= 1/2.
So I(X; Y ) < I(X; Y | Z) . Note that H(X2 |X1 ) X, Y, Z are not markov.
0 ≤in this case ≤ 1
H(X1 )
7. Coin weighing. Suppose one has n coins, among 0 ≤ ρ ≤which
1. there may or may not be one
counterfeit coin. If there is a counterfeit coin, it may be either heavier or lighter than
(c) coins.
the other ρ = 0 The
iff I(X 1 ; Xare
coins 2) =
to0 be
iff weighed
X1 and by
X2 aare independent.
balance.
(d) ρ = 1 iff H(X2 |X1 ) = 0 iff X2 is a function of X1 . By symmetry, X1 is a
(a) Find an upperofbound
function on the
X2 , i.e., X1 number
and X2 of coins
have n so thatrelationship.
a one-to-one k weighings will find the
counterfeit coin (if any) and correctly declare it to be heavier or lighter.
12. Example of joint entropy. Let p(x, y) be given by
(b) (Difficult) What is the coin weighing strategy for k = 3 weighings and 12 coins?
Find
(a) H(X) = 2
3 log 32 + 1
3 log 3 = 0.918 bits = H(Y ) .
(b) H(X|Y ) = 3 H(X|Y = 0) + 3 H(X|Y
1 2
= 1) = 0.667 bits = H(Y |X) .
(c) H(X, Y ) = 3 × 3 log 3 = 1.585 bits.
1
14. Entropy of a sum. Let X and Y be random variables that take on values x 1 , x2 , . . . , xr
and y1 , y2 , . . . , ys , respectively. Let Z = X + Y.
(a) Show that H(Z|X) = H(Y |X). Argue that if X, Y are independent, then H(Y ) ≤
H(Z) and H(X) ≤ H(Z). Thus the addition of independent random variables
adds uncertainty.
(b) Give an example of (necessarily dependent) random variables in which H(X) >
H(Z) and H(Y ) > H(Z).
(c) Under what conditions does H(Z) = H(X) + H(Y ) ?
Reducebecause 2 , .is
I(X1 ; XZ . . ,aXfunction
n ) to its of (X, Y form.
simplest ) and H(X, Y ) = H(X) + H(Y |X) ≤ H(X) +
H(Y ) . We have equality iff (X, Y ) is a function of Z and H(Y ) = H(Y |X) , i.e.,
Solution: Data Processing. By the chain rule for mutual information,
X and Y are independent.
I(X1 ; X2 , . . . , Xn ) = I(X1 ; X2 ) + I(X1 ; X3 |X2 ) + · · · + I(X1 ; Xn |X2 , . . . , Xn−2 ). (2.20)
15. Data processing. Let X1 → X2 → X3 → · · · → Xn form a Markov chain in this
order;
By i.e., let property, the past and the future are conditionally independent given
the Markov
the present and hencep(xall 1terms . , xn ) the
, x2 , . .except = p(x 1 )p(x
first |x1 ) ·Therefore
are 2zero. · · p(xn |xn−1 ).
18. World Series. The World Series is a seven-game series that terminates as soon as
either team wins four games. Let X be the random variable that represents the outcome
of a World Series between teams A and B; possible values of X are AAAA, BABABAB,
and BBBAAAA. Let Y be the number of games played, which ranges from 4 to 7.
Assuming that A and B are equally matched and that the games are independent,
calculate H(X) , H(Y ) , H(Y |X) , and H(X|Y ) .
Solution:
World Series. Two teams play until one of them has won 4 games.
There are 2 (AAAA, BBBB) World Series with 4 games. Each happens with probability
(1/2)4 .
#4$
There are 8 = 2 3 World Series with 5 games. Each happens with probability (1/2) 5 .
#5$
There are 20 = 2 3 World Series with 6 games. Each happens with probability (1/2) 6 .
#6$
There are 40 = 2 3 World Series with 7 games. Each happens with probability (1/2) 7 .
% 1
H(X) = p(x)log
p(x)
= 2(1/16) log 16 + 8(1/32) log 32 + 20(1/64) log 64 + 40(1/128) log 128
= 5.8125
% 1
H(Y ) = p(y)log
p(y)
= 1/8 log 8 + 1/4 log 4 + 5/16 log(16/5) + 5/16 log(16/5)
= 1.924
19. Infinite entropy. This problem shows that the entropy of a discrete random variable
!
can be infinite. Let A = ∞ n=2 (n log n)
2 −1 . (It is easy to show that A is finite by
bounding the infinite sum by the integral of (x log 2 x)−1 .) Show that the integer-
valued random variable X defined by Pr(X = n) = (An log 2 n)−1 for n = 2, 3, . . . ,
has H(X) = +∞ .
Solution: Infinite entropy. By definition, p n = Pr(X = n) = 1/An log 2 n for n ≥ 2 .
Therefore
∞
"
H(X) = − p(n) log p(n)
n=2
"∞ # $ # $
= − 1/An log2 n log 1/An log 2 n
n=2
∞
" log(An log2 n)
=
n=2 An log2 n
∞
" log A + log n + 2 log log n
=
n=2 An log2 n
∞ ∞
" 1 " 2 log log n
= log A + + .
n=2
An log n n=2 An log n
2
The first term is finite. For base 2 logarithms, all the elements in the sum in the last
term are nonnegative. (For any other base, the terms of the last sum eventually all
become positive.) So all we have to do is bound the middle sum, which we do by
comparing with an integral.
∞ % &∞
" 1 ∞ 1 &
> dx = K ln ln x & = +∞ .
n=2
An log n 2 Ax log x 2
2
≥ 0.
Thus,
H(P2 ) ≥ H(P1 ).
29. Inequalities. Let X , Y and Z be joint random variables. Prove the following
inequalities and find conditions for equality.
Solution: Inequalities.
with equality iff H(Y |X, Z) = 0 , that is, when Y is a function of X and Z .
(b) Using the chain rule for mutual information,
with equality iff I(Y ; Z|X) = 0 , that is, when Y and Z are conditionally inde-
pendent given X .
(c) Using first the chain rule for entropy and then the definition of conditional mutual
information,
with equality iff I(Y ; Z|X) = 0 , that is, when Y and Z are conditionally inde-
pendent given X .
(d) Using the chain rule for mutual information,
and therefore
I(X; Z|Y ) = I(Z; Y |X) − I(Z; Y ) + I(X; Z) .
We see that this inequality is actually an equality in all cases.