0% found this document useful (0 votes)
17 views5 pages

Midit 10

Uploaded by

raghad.yousif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Midit 10

Uploaded by

raghad.yousif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ECE 776 - Information theory (Spring 2010)

Midterm

Please give well-motivated answers.

Q1 (1 point). Consider random variables X, Y, Z such that Z = X + Y. Show that H(Z) ≤


H(X) + H(Y ). When does equality hold?

Sol.: We have
H(Z) = H(X + Y ) ≤ H(X, Y ) ≤ H(X) + H(Y ).
The first inequality is an equality if from X + Y one is always able to recover X and Y
(this depends on the alphabets in which X and Y are defined. For instance, if X = {0, 1}
and Y = {3, 5}, this is the case). The second inequality is an equality if X and Y are
independent.

Q2 (1 point) Consider the joint distribution p(x, y) given by the following table

x\y 0 1
0 1/4 1/4 .
1 1/2 0

Find the best estimator of X given Y (in the sense of minimizing the error probability) and
calculate its probability of error. Then, calculate H(X|Y ) and verify the Fano inequality for
this estimator.

Sol.: The estimator X̂(Y ) that minimizes the probability of error is given by

X̂(0) = 1
X̂(1) = 0.

The probability of error is Pe = p(0, 0) + p(1, 0) = 1/4.


We have

H(X|Y ) = pY (1)H(X|Y = 1) + pY (0)H(X|Y = 0)


µ ¶ µ ¶
1 3 1 4 3 1
= 0+ H · = H = 0.69.
4 4 4 3 4 3

By Fano inequality

H(X|Y ) ≤ Pe log(2 − 1) + H(Pe )


→ 0.69 ≤ 0.81.

Q3. (1 point) We want to find out if a treasure lies behind door X =1, 2 or 3. Our prior
knowledge is summarized in the distribution pX (1) = 1/2, pX (2) = 1/4 and pX (3) = 1/4.
We are allowed to ask the question: Is X ∈ S (where S is a subset of {1, 2, 3})? The answer
to the question is Y = 1 if X ∈ S and 0 otherwise. Which subset S would you choose

1
to maximize the mutual information between X and Y (and thus increasing our chances of
finding the treasure)? [Hint: Calculate I(X; Y ) for different sets].

Sol.: Let us evaluate

I(X; Y ) = H(Y ) − H(Y |X)


= H(Pr[X ∈ S]) − H(Y |X)
= H(Pr[X ∈ S]) ≤ 1.

Thus, the mutual information is maximized if we choose S = {1} or S = {2, 3}, since in
those cases, we obtain I(X; Y ) = 1. In other words, from the question at hand, we can gather
at most 1 bit about the location of the treasure, and these choices of S allow us to obtain
this much information.

Q4 (1 point) We are given the joint pmf p(x, y) defined as below

x\y 0 1
0 1/8 1/8 .
1 1/8 5/8

Are the sequences x4 = 1110 and y 4 = 0111 individually −typical with respect to the
marginal distributions p(x) and p(y), respectively, for = 0.1? Are they jointly −typical
with respect to p(x, y) with = 0.1?

Sol: We have the marginals p(x) = p(y) = (1/4, 3/4) so that H(X) = H(Y ) = −1/4 log2 1/4−
3/4 log2 3/4 = 0.81 bits. Moreover, the joint entropy is H(X, Y ) = −3/8 log2 1/8−5/8 log2 5/8 =
1.55 bits.
It is easy to see that x4 , y 4 are individually typical with respect to p(x) and p(y) for any
since
1 1
− log2 p(x4 ) = − log2 p(y 4 ) = H(X) = H(Y ).
4 4
To check whether they are jointly typical, we must calculate
µ 2 2¶
1 4 4 1 5 1
− log2 p(x , y ) = − log2 ·
4 4 8 8
= 1.84 > H(X, Y ) + = 1.55 + 0.1.

Therefore, the sequences are not jointly typical with respect to the given joint distribution.

P1 (2 points) An i.i.d. source X n is characterized by the pmf



⎨ 0.5 x = 1
p(x) = 0.3 x = 2 .

0.2 x = 3

(a) What is the minimal rate required for lossless compression?


(b) Find the codeword lengths for the binary Huffman code and for the Shannon code. Then,
calculate the average codeword lengths in the two cases, and compare them with the previous
point.

2
(c) Propose a compression scheme that is able to attain the performance limits of point (a).
(d) If X n is processed according to a function g(·), producing Y n = g(X n ), does the rate
necessary to compress Y n increase or decrease? Why?

Sol.: (a) It is
R = H(X) = 1.485 bits/ source symbol.
(b) For Shannon, we have

(1) = d− log2 (0.5)e = 1


(2) = d− log2 (0.3)e = 2
(3) = d− log2 (0.2)e = 3,

so that the average length is L = 0.5 + 0.6 + 0.6 = 1.7 > H(X).
For Huffman, using the usual procedure, we get

(1) = 1
(2) = 2
(3) = 2,

so that the average length is L = 0.5 + 0.6 + 0.4 = 1.5 > H(X) (but quite close to H(X)!).
(c) We need to code over blocks of n symbols with n large. As seen in class, we can either
use a typicality-based compression scheme, or use Shannon code over blocks of n symbols
(see lectures or textbook)
(d) Since H(Y ) ≤ H(X), the compression rate decreases.

P2 (2 point) The evolution of a random process Xi ∈ {1, 2, 3, 4} depends on another random


process Yi ∈ {0, 1}, i = 1, 2, ..., n, in the following way. Process Y1 , Y2 , ..., Yn is a Markov
chain with transition probability p(yk |yk−1 )
⎡ ⎤
yk−1 \yk 0 1
⎣ 0 0.2 0.8 ⎦ ,
1 0.4 0.6
Q
while X n is such that p(xn |y n ) = ni=1 p(xi |yi ), where
½
1/2 for xi = 1 and xi = 2
p(xi |0) =
0 otherwise

and ½
1/2 for xi = 3 and xi = 4
p(xi |1) = .
0 otherwise
(a) Is process Y n stationary for any initial distribution Pr[Y1 = 1]? If not, find an initial
distribution that makes Y n stationary. Assume this distribution for the following questions.
(b) Calculate the entropy rate of Y n .
(b) Is process X n stationary? Is it a Markov chain?

3
(b) Does the entropy rate of X n exist? Calculate the entropy (not the entropy rate) of
process X n . Do you expect the entropy rate to be larger or smaller?

Sol.: (a) No, from the theory on Markov chains, Y n is stationary only if the initial distribution
is the stationary distribution:
0.8 2
Pr[Y1 = 1] = = .
0.8 + 0.4 3
(b) We have
1 2
H(Y) = H(Y2 |Y1 ) = H(Y2 |Y1 = 0) + H(Y2 |Y1 = 0)
3 3
1 2
= H(0.2) + H(0.4)
3 3
= 0.89.

(c) The process X n is stationary, but is not a Markov chain. To see this, let us calculate
X
p(xn ) = p(y n )p(xn |y n )
yn ∈Y n
X Q
n Q
n
= p(y1 ) p(yk |yk−1 ) p(xi |yi ),
yn ∈Y n k=2 i=1

which is the same for all possible time shifts, but does not factorize as for a Markov chain
(it is in fact a Hidden Markov Model).
(d) It exists since the process is stationary. To calculate the entropy, we evaluate
X 1 1 1
pX (1) = p(y)p(x|y) = · =
y
3 2 6
1 1 1
pX (2) = · =
3 2 6
2 1 1
pX (3) = · =
3 2 3
2 1 1
pX (4) = · = ,
3 2 3
so that
H(X) = 1.91.
We expect the entropy rate to be smaller than the entropy since the process is not i.i.d.

P3. Consider the channel described by the following p(y|x) with alphabets X = {0, 1} and
Y = {0, 1, e}
x/y 0 e 1
0 1−α− α
1 α 1−α−
with 0 ≤ α, ≤ 1 and α + ≤ 1.

4
(a) Calculate I(X; Y ) for a given p(x) (express the mutual information in terms of pX (1) = p)
(Hint: To simplify the derivation, define a variable E = 1 if Y = e and E = 0 otherwise,
and use the fact that E is a function of Y ...).
(b) Calculate the capacity of the channel at hand.
(c) Argue and verify that if we set α = 0 and = 0 we recover known results.

Sol.:
(a) We have

I(X; Y ) = H(Y ) − H(Y |X)


= H(Y E) − H(Y E|X)
= H(E) + H(Y |E) − H(E|X) − H(Y |EX)
µ ¶
= H(α) + (1 − α)H(p) − H (α) − (1 − α)H
1−α

(b) The mutual information is maximized for p = 1/2 so that we have


µ µ ¶¶
C = (1 − α) 1 − H .
1−α

(c) For α = 0, we recover the capacity of a binary symmetric channel, whereas for = 0 we
obtain the capacity of a binary erasure channel.

You might also like