Midit 10
Midit 10
Midterm
Sol.: We have
H(Z) = H(X + Y ) ≤ H(X, Y ) ≤ H(X) + H(Y ).
The first inequality is an equality if from X + Y one is always able to recover X and Y
(this depends on the alphabets in which X and Y are defined. For instance, if X = {0, 1}
and Y = {3, 5}, this is the case). The second inequality is an equality if X and Y are
independent.
Q2 (1 point) Consider the joint distribution p(x, y) given by the following table
x\y 0 1
0 1/4 1/4 .
1 1/2 0
Find the best estimator of X given Y (in the sense of minimizing the error probability) and
calculate its probability of error. Then, calculate H(X|Y ) and verify the Fano inequality for
this estimator.
Sol.: The estimator X̂(Y ) that minimizes the probability of error is given by
X̂(0) = 1
X̂(1) = 0.
By Fano inequality
Q3. (1 point) We want to find out if a treasure lies behind door X =1, 2 or 3. Our prior
knowledge is summarized in the distribution pX (1) = 1/2, pX (2) = 1/4 and pX (3) = 1/4.
We are allowed to ask the question: Is X ∈ S (where S is a subset of {1, 2, 3})? The answer
to the question is Y = 1 if X ∈ S and 0 otherwise. Which subset S would you choose
1
to maximize the mutual information between X and Y (and thus increasing our chances of
finding the treasure)? [Hint: Calculate I(X; Y ) for different sets].
Thus, the mutual information is maximized if we choose S = {1} or S = {2, 3}, since in
those cases, we obtain I(X; Y ) = 1. In other words, from the question at hand, we can gather
at most 1 bit about the location of the treasure, and these choices of S allow us to obtain
this much information.
x\y 0 1
0 1/8 1/8 .
1 1/8 5/8
Are the sequences x4 = 1110 and y 4 = 0111 individually −typical with respect to the
marginal distributions p(x) and p(y), respectively, for = 0.1? Are they jointly −typical
with respect to p(x, y) with = 0.1?
Sol: We have the marginals p(x) = p(y) = (1/4, 3/4) so that H(X) = H(Y ) = −1/4 log2 1/4−
3/4 log2 3/4 = 0.81 bits. Moreover, the joint entropy is H(X, Y ) = −3/8 log2 1/8−5/8 log2 5/8 =
1.55 bits.
It is easy to see that x4 , y 4 are individually typical with respect to p(x) and p(y) for any
since
1 1
− log2 p(x4 ) = − log2 p(y 4 ) = H(X) = H(Y ).
4 4
To check whether they are jointly typical, we must calculate
µ 2 2¶
1 4 4 1 5 1
− log2 p(x , y ) = − log2 ·
4 4 8 8
= 1.84 > H(X, Y ) + = 1.55 + 0.1.
Therefore, the sequences are not jointly typical with respect to the given joint distribution.
2
(c) Propose a compression scheme that is able to attain the performance limits of point (a).
(d) If X n is processed according to a function g(·), producing Y n = g(X n ), does the rate
necessary to compress Y n increase or decrease? Why?
Sol.: (a) It is
R = H(X) = 1.485 bits/ source symbol.
(b) For Shannon, we have
so that the average length is L = 0.5 + 0.6 + 0.6 = 1.7 > H(X).
For Huffman, using the usual procedure, we get
(1) = 1
(2) = 2
(3) = 2,
so that the average length is L = 0.5 + 0.6 + 0.4 = 1.5 > H(X) (but quite close to H(X)!).
(c) We need to code over blocks of n symbols with n large. As seen in class, we can either
use a typicality-based compression scheme, or use Shannon code over blocks of n symbols
(see lectures or textbook)
(d) Since H(Y ) ≤ H(X), the compression rate decreases.
and ½
1/2 for xi = 3 and xi = 4
p(xi |1) = .
0 otherwise
(a) Is process Y n stationary for any initial distribution Pr[Y1 = 1]? If not, find an initial
distribution that makes Y n stationary. Assume this distribution for the following questions.
(b) Calculate the entropy rate of Y n .
(b) Is process X n stationary? Is it a Markov chain?
3
(b) Does the entropy rate of X n exist? Calculate the entropy (not the entropy rate) of
process X n . Do you expect the entropy rate to be larger or smaller?
Sol.: (a) No, from the theory on Markov chains, Y n is stationary only if the initial distribution
is the stationary distribution:
0.8 2
Pr[Y1 = 1] = = .
0.8 + 0.4 3
(b) We have
1 2
H(Y) = H(Y2 |Y1 ) = H(Y2 |Y1 = 0) + H(Y2 |Y1 = 0)
3 3
1 2
= H(0.2) + H(0.4)
3 3
= 0.89.
(c) The process X n is stationary, but is not a Markov chain. To see this, let us calculate
X
p(xn ) = p(y n )p(xn |y n )
yn ∈Y n
X Q
n Q
n
= p(y1 ) p(yk |yk−1 ) p(xi |yi ),
yn ∈Y n k=2 i=1
which is the same for all possible time shifts, but does not factorize as for a Markov chain
(it is in fact a Hidden Markov Model).
(d) It exists since the process is stationary. To calculate the entropy, we evaluate
X 1 1 1
pX (1) = p(y)p(x|y) = · =
y
3 2 6
1 1 1
pX (2) = · =
3 2 6
2 1 1
pX (3) = · =
3 2 3
2 1 1
pX (4) = · = ,
3 2 3
so that
H(X) = 1.91.
We expect the entropy rate to be smaller than the entropy since the process is not i.i.d.
P3. Consider the channel described by the following p(y|x) with alphabets X = {0, 1} and
Y = {0, 1, e}
x/y 0 e 1
0 1−α− α
1 α 1−α−
with 0 ≤ α, ≤ 1 and α + ≤ 1.
4
(a) Calculate I(X; Y ) for a given p(x) (express the mutual information in terms of pX (1) = p)
(Hint: To simplify the derivation, define a variable E = 1 if Y = e and E = 0 otherwise,
and use the fact that E is a function of Y ...).
(b) Calculate the capacity of the channel at hand.
(c) Argue and verify that if we set α = 0 and = 0 we recover known results.
Sol.:
(a) We have
(c) For α = 0, we recover the capacity of a binary symmetric channel, whereas for = 0 we
obtain the capacity of a binary erasure channel.