Probc 1
Probc 1
Problem Sheet 1
(Most questions are from Cover & Thomas, the corresponding question numbers (as in 1st ed.) are given in brackets at the start of the question)
11. [~2.16] x and y are correlated binary random variables with p(x=y=0)=0 and all other
joint probabilities equal to 1/3. Calculate H(x), H(y), H(x|y), H(y|x), H(x,y), I(x;y).
single maximum. Since D (½) = D (1) = 0 we must have D ( p ) > 0 for ½ < p < 1 .
1/ 2
= ∑ n =1 n 2− n =
∞
=2
(1 − 1/ 2) 2
5. (a) chain rule, (b) g(x)|x has only one possible value and hence zero entropy, (c) chain
(b) Ask if x = 1, 2, 3, … in turn, i.e., ask the following questions: rule, (d) entropy is positive. We have equality at (d) iff g(x) is a one-to-one function for
every x with p(x)>0.
Is x = 1?
If not, is x = 1?
If not, is x = 2? 6. H (y | x ) = ∑ p ( x ) H (y | x = x )
… x
∑
∞ All terms are non-negative so the sum is zero only if all terms are zero. For any given
Expected number of questions is n =1
n 2− n = 2 . term this is true either if p(x)=0 or if H(y|x=x) is zero. The second case arises only if
H(y|x=x) has only one value, i.e. y is a function of x. The first case is why we needed
the qualification about p(x)>0 in answers 2 and 4 above.
2. H(x,y)=H(x)+H(y|x)=H(Y)+H(x|y), but H(y|x)=0 since Y is a function of x so H(y)=
H(x) – H(x|y) ≤H(x) with equality iff H(x|y)=0 which is true only if x is a function of y,
i.e. if y is a one-to-one function of x for every value of x with p(x)>0. Hence 7. (a) The probability of any given value of x1:4 depends on the number of 1’s and 0’s.
We create four subsets with equal probabilities to generate a pair of bits and two
(a) H(y)≤H(x) because, for example 1 = ( −1) 2 2
other subsets to generate one bit only. The expected number of bits generated is
(b) H(y)=H(x) E K = 8 p (1 − p )3 + 10 p 2 (1 − p ) 2 + 8 p 3 (1 − p )
(b) (a) i.i.d entropies add, (b) functions reduce entropy, (c) chain rule, (d) zi are i.i.d.
3. Maximum is log n iff all elements of p are equal. Minimum is 0 iff only one element of with entropy of 1 bit, (e) entropy is positive.
p is non-zero; there are n possible elements that this could be.
8. (a) This is true for any Markov chain x→y→z. One possibility is x=y=z all fair
4. (a) and (b) are straightforward calculus: easiest to convert logs to base e first. For the Bernoulli variables.
others, assume ½ < p < 1 for covenience (other half follows by symmetry). Since
H ′′( p ) < 0 , H ( p ) is concave and so lies above the straight line 2 − 2 p defined in (b) An example of this was given in lectures. A slightly different example is if x and
y are fair binary variables and z=xy. Knowing z, entangles x and y.
(c).
At p = ½ the bound in (e) has the same value and first two derivatives as H ( p ) . For
9. (a) I(x;y;z)={H(x)-H(x|y)}-{H(x|z)-H(x|y,z)}=H(x)-{H(x,y)-H(y)}-{H(x,z)-
½ < p < 1 its second derivative is greater than H ′′( p ) and so the bound follows.
H(z)}+{H(x,y,z)-H(y,z)}
(b) Use the example from 8(b) above.
12. The data processing inequality says that I(x;z)≤I(x;y)=H(y)–H(y|x) ≤H(y) ≤ log k
where the last inequality is the uniform bound on entropy. If k=1 then log k = 0 and so
x and z must be independent.