0% found this document useful (0 votes)
47 views7 pages

Practice 2

pc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
47 views7 pages

Practice 2

pc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
ECE 542: Information Theory and Coding Homework 1 Solutions Problems 2.1, 2.2, 2.6, 2.8, 2.14, 2.21, 2.22, 2.30 . Coin flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips required. (a) Find the entropy H(X) in bits. The following expressions may be useful: a= (b) A random variable X is drawn according to this distribution. Find an “efficient” sequence of yes-no questions of the form, “Is X contained in the set 5?” Compare H(X) to the expected number of questions required to determine X Solution: (a) The number X of tosses till the first head appears has the geometric distribution with parameter p = 1/2, where P(X =n) = pg, n€ {1,2,...}. Hence the entropy of X is H(X) Laos) 0 =| Xora loge + Yo mp9" logs and =o logp _ paloga ? P =plog p - gloga ? H(p)/p bits. If p=1/2, then H(X) =2 bits. (b) Intuitively, it seems clear that the best questions are those that have equally likely chances of receiving a yes or a no answer. Consequently, one possible guess is that the most “efficient” series of questions is: Is X = 17 If not, is X = 2? If not, is X = 32... with a resulting expected number of questions equal to LE n(1/2") = 2 This should reinforce the intuition that H(X) is a mea- suze of the uncertainty of X. Indeed in this case, the entropy is exactly the same as the average number of questions needed to define X, and in general E(# of questions) > H(X). This problem has an interpretation as a source cod- ing problem. Let 0 =no, 1 =yes, X =Source, and Y =Encoded Source. Then the set of questions in the above procedure can be written as a collection of (X, Y.) pairs: (1,1), (2,01), (3,001), etc. . In fact, this intuitively derived code is the optimal (Huffman) code minimizing the expected number of questions. 2. Entropy of functions. Let X bea random vatiable taking on a finite number of values. What is the (general) inequality relationship of H(X) and H(Y) if @y (b) Solution: Let y = 9(z). Then wy)= LO plz). 29502) Consider any set of z's that map onto a single y. For this set X_ plz)loga(z)< YO plz)log p(y) = rly)log rly), ysds) rysds) since log is a monotone increasing function and p(2) < Ce.yaye) A(z) = ply). Ex tending this argument to the entire range of X (and Y), we obtain HX) = -Lr(2)og xe) = -L DD Aedbogx(e) ¥ 5 3=98) 2 -Lrlv)logr(v) = AY), with equality iff 9 is one-to-one with probability one. (a) ¥ = 2¥ is one-to-one and hence the entropy, which is just a function of the probabilities (and not the values of a random variable) does not change, ie., H(X) = H(Y). (b) ¥ = cos(X) is not necessarily one-to-one. Hence all that we can say is that H(X) > H(¥), with equality if cosine is one-to-one on the range of X 6. Zero conditional entropy. Show that if H(Y|X) = 0, then ¥ is a function of Xie, for all z with p(z) >0, there is only one possible value of y with p(z,y) > 0. Solution: Zero Conditional Entropy. Assume that there exists an z, say zo and two different values of y, say yi and yz such that p(zo,yi) > 0 and p(zo, 2) > 0. Then Pee) 2 P(z0,v1) + P(z0,¥2) > 0, and p(y:|z0) and p(ya|zo) axe not equal to 0 or 1. us AYIX) = -Dr(2)D plvlz)log (viz) (2.66) > r(z0)(~p(vilz0)log p(vslz0) p(valzo)log r(valz0)) (2.67) >> % (2.68) since ~tlogt 2 0 for 0 < t <1, and is strictly positive for t not equal to 0 or 1. Therefore the conditional entropy H(¥|X) is 0 if and only if ¥ is a function of X 8. World Series. The World Series is a seven-game series that terminates as soon as either team wins four games. Let X be the random variable that represents the outcome of, a World Series between teams A and B; possible values of X are AAAA, BABABAB, and BBBAAAA. Let Y be the number of games played, which ranges from 4 to 7, ‘Assuming that A and B are equally matched and that the games are independent, calculate H(X), H(Y), H(Y|X), and H(XIY). Solution: World Series. Two teams play until one of them has won 4 games ‘There are 2(AAAA, BBBB) World Series with 4 games. Each happens with probability apy. ‘There are § = 2($) World Series with 5 games. Each happens with probability (1/2)° ‘There are 20 = 2(5) World Series with 6 games. Each happens with probability (1/2)* ‘There are 40 = 2(8) World Seves with T games. Each happens with probability (1/2)? The probability of a 4 game series (Y = 4) is 2(1/2)* = 1/8 The probability of a'S game series (Y = 5) is 8(1/2)* = 1/4 The probability of a 6 game series (Y = 6) is 20(1/2)¢ The probability of a 7 game series (Y = 7) is 40(1/2)" HX) = Lrlerlorse5 = 2(1/16) log 16 + 8(1/32) log 32 + 20(1/64) log 64 + 40(1/128) log 128 = 5.8125 HY) = Dries 1/Blog 8 + 1/4log 4 + 5/16log(16/5) + 5/16log(16/5) 1.924 Y is a deterministic function of X, so if you know X there is no randomness in Y. Or, H(YIX) =0. Since H(X) + H(Y|X) = H(X,Y) = H(Y) + H(XIY), it is easy to determine H(X|Y) = H(X) + H(Y|X) - H(Y) = 3.889 14. Drawing with and without replacement. An urn contains r red, w white, and 6 black balls. Which has higher entropy, drawing & > 2 balls from the urn with replacement or without replacement? Set it up and show why. (There is both a hard way and a relatively simple way to do this.) Solution: Drawing with and without replacement. Intuitively, it is clear that if the balls are drawn with replacement, the number of possible choices for the i-th ball is larger, and therefore the conditional entropy is larger. But computing the conditional distributions is slightly involved. It is easier to compute the unconditional entropy. ‘* With replacement. In this case the conditional distribution of each draw is the same for every draw. Thus red with prob.zstoy white with prob. = (2.83) black with prob. Xj and therefore A(X Xaa,.- X1) = ACK) (2.84) + w ® = doa r +048) — oa log - Slog w ~ lod 85) ‘© Without replacement. The unconditional probability of the i-th ball being red is still r/(r-+w+8), etc. Thus the unconditional entropy H(X,) is still the same as with replacement. The conditional entropy H(X,|Xi-1,-.-,-X1) is less than the ‘unconditional entropy, and therefore the entropy of drawing without replacement is lower. 21. Data processing. Let Xy — X2— Xs —--» — Xq form a Markov chain in this order; ive, let C21, 22y-++y En) = Plz) (zale1) “PCE nlzn—2): Reduce 1(X;;X2,...,Xx) to its simplest form. Solution: Data Processing. By the chain rule for mutual information, U(X a Xay oop Xn) = Xa} Xa) +15 XalNa) +0 F(X; Ka|Xa5 oo Xana) (2.95) By the Markov property, the past and the future are conditionally independent given the present and hence all terms except the frst are zero. Therefore WX; Xa, Xa) = 1G X2)- (2.96) 22, Bottleneck. Suppose a (non-stationary) Markov chain starts in one of n states, necks down to & & states. Thus X; + Xz — Xs, Xi € {1,2...5m}, K€ {1,2)--.58), X3 € (1,2,..-5m} (a) Show that the dependence of X; and Xs is limited by the bottleneck by proving that 1(Xi;X3) $ log. (b) Evaluate 1(%1;Xs) for & a bottleneck. and conclude that no dependence can survive such Solution: Bottleneck. (a) From the data processing inequality, and the fact that entropy is maximum for a uniform distribution, we get 1(%3%3)_ $10 %) (Xa) = H(Xa |X) (4) loge. WIA WIA ‘Thus, the dependence between X, and X% is limited by the size of the bottleneck. That is 10%;Xs) $ loge. (b) For = 1, 1(XijXs) < log = 0 and since 1(%1,%3) 2 0, 1(%1,¥s) = 0. ‘Thus, for k= 1, X; and Xs are independent. 30) Recall that, = Drogas < - Yo ilog ai. Let gi = a(8)'. Then we have that, -Saken -Saven = ~ (tal En +toein}s in) x aa = loge Alog 8 Notice that the final right hand side expression is independent of {p,}, and that the inequality, = Lpilogp: < —loga- Alog 8 = holds for all a7 8 such that, = 1 Le 1-3 ‘The constraint on the expected value also requires that, nn ee Lieb =A=oc Ta = Combining the two constraints we have, which implies that, FS m+ Fa + Plugging these values into the expression for the maximum entropy, log a - Alog 8 = (A+ 1)log(A +1) - Alog A. ‘The general form of the distribution, B= apt can be obtained either by guessing or by Lagrange multipliers where, Fd) == Seaeen+ aa + nein fs the function whose gradient we set to 0. Many of you used Lagrange multipliers, but failed to argue that the result obtained is ‘global maximum. An argument similar to the above should have been used. On the other hand one could simply argue that since —H(p) is convex, it has only one local minima, no local maxima and therefore Lagrange multiplier actually gives the global maximum for H(p).

You might also like