Untitled
Untitled
Problem Q1.1 Consider a random symbol X with the symbol alphabet {1, 2, . . . , M} and a pmf {p1 , p2 , . . . , pM }. This problem concerns the relationship between the entropy H(X) and the probability p1 of the rst symbol. Let Y be a random symbol that is 1 if X = 1 and 0 otherwise. For parts (a) through (d), consider M and p1 to be xed. (a) Express H(Y ) in terms of the binary entropy function, Hb () = log() (1) log(1).
Soln: Y is 1 or 0 with probabilities p1 and 1p1 respectively, so H(Y ) = p1 log(p1 )
(1 p1 ) log(1 p1 ). Thus H(Y ) = Hb (p1 ) = Hb (1 p1 ).
(b) What is the conditional entropy H(X|Y =1)?
Soln: Given Y =1, X = 1 with probability 1, so H(X|Y = 1) = 0.
(c) Give a good upper bound to H(X|Y =0) and show how this bound can be met with
equality by appropriate choice of p2 , . . . , pM . Use this to upper bound H(X|Y ). Soln: Given Y =0, X=1 has probability 0, so there are M 1 elements with non zero probability. The maximum entropy for an alphabet of M1 terms is log(M1), so H(X|Y =0) log(M 1). Finally, Pr(X=j|X=1) = pj /(1 p1 ), so this upper bound on H(X|Y =0) is achieved when p2 = p3 = = pM . Combining this with part (b), H(X|Y ) = p1 H(X|Y =1) + (1p1 )H(Y |Y =0) (1p1 ) log(M 1). (d) Give a good upper bound for H(X) and show that how this bound can be met with equality by appropriate choice of p2 , . . . , pM . Soln: Note that H(XY ) = H(Y ) + H(X|Y ) Hb (p1 ) + (1p1 ) log(M1) and this is met with equality for p2 = , pM . There are now two equally good ap proaches. One is to note that H(XY ) = H(X) + H(Y |X). Since Y is uniquely specied by X, HH(Y |X) = 0, so H(X) = H(XY ) Hb (p1 ) + (1 p1 ) log(M 1) (1)
which is met with equality when p2 = p3 = = pM . The other approach is to observe that H(X) H(XY ), which leads again to the bound in (1), but a slightly more tedious demonstration that equality is met for p2 = = pM . This is the Fano bound of information theory; it is useful when p1 is very close to 1 and plays a key role in the noisy channel coding theorem. (e) For the same value of M as before, let p1 , . . . , pM be arbitrary and let pmax be max{p1 , . . . , pM }. Is your upper bound in (d) still valid if you replace p1 by pmax ? Explain. Soln: The same bound applies to each symbol, i.e., by replacing p1 by pj for any j, 1 j M. Thus it also applies to pmax .
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Problem Q1.2: Consider a DMS with i.i.d. X1 , X2 , . . . X = {a, b, c, d, e}, with probability {0.35, 0.25, 0.2, 0.1, 0.1} respectively. (a) Compute Lmin , the expected codeword length of an optimal variable-length prex free code for X . Soln: Applying Human algorithm, one gets the following respective codewords {000, 001, 01, 10, 11}, leading to an expected length of 2.2. (b) Let Lmin be the average codeword length, for an optimal code over X 2 , and Lmin as that for X 3 , and so on. True or False: for a general DMS, Lmin 1 Lmin , explain. 2 Soln: True: one can dene the encoding C2 , which maps any (x1 , x2 ) X 2 into the codeword C2 (x1 , x2 ) = C(x1 ) C(x2 ), where C is an optimal prex free code over X , with codewords length L(), and denotes the concatenation. Then C2 is clearly prex free, and ELC2 = (L(xi ) + L(xj ))P{xi , xj }
xi ,xj X (2) (2) (3)
xi X
L(xi )P{xi } +
xj X
L(xj )P{xj }.
Thus we get the following upper bound, (2) Lmin 2Lmin . (c) Show that Lmin Lmin + Lmin . Soln: In a similar way as in (b), decomposing X3 = X2 X, and concatenating optimal prex free codes for X 2 and X , one gets (3) (2) Lmin Lmin + Lmin .
(3) (2)
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
Problem Q1.3: In this problem, we try to construct a code which reduces the data rate at a cost of some amount of distortion in its reconstruction. Consider a binary source X1 , X2 , . . . i.i.d. Bernoulli (1/2) distributed. Obviously, a lossless source code would need 1 bit per source symbol to encode the source, allowing perfect reconstructions.
n A lossy source code is dened as follows. An encoder map takes a source string X1 , n encodes into nR bits, and a decoder reconstructs the source as X1 . The goal is to guarantee that for any > 0,
Pr
1 n n| > d + 0 |X X1 n 1
as n ,
(2)
The parameter d, which indicates the fraction of symbols that are allowed to be wrong,
is often called a delity constraint. The lossless code we learned in class corresponds to
the case that d = 0.
(a) Find the minimum rate of the lossy source code for the binary source above at d = 1/2, i.e., the reconstruction can have half of its symbols wrong in the sense of (2). Soln: By encoding all possible sequences into the all zeros sequence (only one codeword for any n), one satises condition (2) with d = 1/2 (by the Law of Large Number). Thus the rate is zero. Note that one can do slightly better by encoding any sequences that have a majority of zeros into the all zeros sequence, and any sequences that have a majority of ones into the all ones sequence. That way the rate is still zero, and the error probability is exactly zero for any n. (b)To achieve d = 1/4, compare the following 2 approaches, both satisfying the delity constraint. Compute the average rate of the two codes. (b) 1) For a length 2n string, take the rst n symbols and send uncoded, and ignore the 2n rest. The decoder reconstruct the rst n symbols, and simply lets Xn+1 = 0. Soln: For a length 2n string, all possible sequences occurring in the rst n elements have to be perfectly encoded (meaning with d=0), and since the symbols are i.i.d. Bernoulli (1/2), we get for the average rate R = nH(1/2)/(2n) = 1/2. (b) 2) For a length 2n string, divide it into 2 letter segments, which takes value 00, 01,
n 10, or 11. Construct a new binary string of length n, Z1 . Set Zi = 1 if the ith segment
2 X2ii1 = 11; and Zi = 0 otherwise. Now the encoder applies a lossless code on Z, and
transmits it. The decoder reconstructs Z, and for each Zi , it reconstructs the ith segment
2 2 of X . If Zi = 1, the reconstruction X2ii1 = 11, otherwise X2ii1 = 00.
Soln: We still have n over 2n i.i.d. symbols that have to be perfectly encoded, but
now with a Bernoulli (1/4) distribution (where 1/4 is the probability of having a one).
So the average rate becomes R = H(1/4)/2 = 0.406.
(c) (bonus) Do you think the better one of part (b) is optimal? If not, briey explain
your idea to improve over that.
Soln: It is possible to improve the idea suggested in (b) 2), by dividing, for example, the
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
strings into 3 letter segments. We then map any 3-sequences with a majority of 0s to 0, and any 3-sequences with a majority of 1s to 1. The 1/4 delity constraint is satised (in the average, one symbol over 4 is wrong), and for a string of length 3n, we have to encode a sequence of length n which has i.i.d. Bernoulli (1/2) distributed symbols, leading to an average rate R = nH(1/2)/(3n) = 1/3. However, one can do better. Consider Tn (B(1/2)), the type class of the Bernoulli (1/2) distribution. This set is of asymptotic size 2n (more precisely: log(|Tn (B(1/2))|)/n 1). For any > 0, we now pick up K = 2n(1H(1/4)+) sequences, Y1 , . . . , YK , uniformly at random among the 2n possible sequences. Then, for a given sequence y, we only transmit the index of the Yi which has minimal Hamming distance, leading to a rate R = 1 H(1/4) + . The closest Yi is then declared and we claim that this satises a delity constraint of 1/4. In fact, note that the volume of a Hamming ball of radius 1/4 is asymptotically 2nH(1/4) , therefore we have for any i P{d(y, Yi) 1/4} = so that P{i s.t. d(y, Yi) 1/4} = 1 P{i s.t. d(y, Yi) > 1/4} 2nR 2nH(1/4) = 1 1 2n 1 e2
n(H(1/4)1+R)
2nH(1/4) , 2n
= 1 en ,
where last inequality uses (1 x)n exn . This shows that any rates less than 1 H(1/4) can be achieved, and it turns out that this bound is actually the best possible one (cf. the Rate Distortion Theorem).
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].