6.441 - Information Theory: Homework 3: Problem 1
6.441 - Information Theory: Homework 3: Problem 1
i=1
P
YiXi
Then for all i = 1, 2, . . . , n
P
YiXi
=
P
YiXi
P
Xi
=
P
X
n
Y
n
P
X
n
P
Xi
P
XiYi
=
P
Y
n
X
n
P
YiXi
=
n
k=1
P
Y
k
X
k
P
YiXi
=
n
k=1,ki
P
Y
k
X
k
Therefore,
P
Y
n
X
n =
P
X
n
Y
n
P
X
n
=
P
X
n
Y
n
P
Xi
P
XiXi
=
P
XiY
n
Xi
P
XiXi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi
P
XiYiXi
P
XiXi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi
P
XiYi
P
Xi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi
P
YiXi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi
n
k=1,ki
P
Y
k
X
k
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
n
k=1
P
Y
k
X
k
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
Y
n
X
n
So for all i = 1, . . . , n
P
XiY
n
Xi
= P
YiXi
P
XiYiXi
So Y
i
X
i
(X
i
, Y
i
), namely Y
i
X
i
(X
i
, Y
i
).
Necessity: If Y
i
X
i
(X
i
, Y
i
) for all i = 1, 2, . . . , n, then Y
i
X
i
(X
i
, Y
i
) and
P
XiY
n
Xi
= P
YiXi
P
XiYiXi
Weihao Gao Homework 3 Problem 1 (continued)
So
P
Y
n
X
n =
P
X
n
Y
n
P
X
n
=
P
X
n
Y
n
P
X1
P
X1Xi
=
P
X1Y
n
X1
P
X1X1
=
P
Y1X1
P
X1Y1X1
P
X1X1
= P
Y1X1
P
X1Y1
P
X1
= P
Y1X1
P
Y1X1
Similarly, we have
P
Y1X1
= P
Y2X2
P
Y
{1,2}
X
{1,2}
By parity of reasoning, we got P
Y
n
X
n =
n
i=1
P(Y
i
X
i
).
Problem 2
Answer:
Proof. For Z
1
, . . . , Z
n
independent Poisson distribution random variables with mean , we have
Pr[Z
i
= k] =
k
e
k!
for all non-negative integer k. Therefore, for T =
n
i=1
Z
i
, we obtain
P
()
TZ
n
(t, z
1
, . . . , z
n
) =
n
i=1
zi
e
z
i
!
(t
n
i=1
z
i
)
= (
n
i=1
zi
e
)
(t
n
i=1
z
i
)
n
i=1
z
i
!
= (
t
e
n
)
(t
n
i=1
z
i
)
n
i=1
z
i
!
where
t
e
n
is a function on t and ;
(t
n
i=1
zi)
n
i=1
zi!
is a function on t and z
i
s. By Fisher Theorem, T =
n
i=1
is a sucient statistic of (Z
1
, . . . , Z
n
).
Problem 3
Answer:
Let p
n
be the probability that Y
n
= Y
0
. Then we have p
0
= 1 and the following recursion formula:
p
n
= (1 )p
n1
+ (1 p
n1
)
p
n
1
2
= (1 2)(p
n1
1
2
)
p
n
=
1
2
+
1
2
(1 2)
n
So H(Y
0
Y
n
) = h(p
n
). I(Y
0
; Y
n
) = H(Y
0
) H(Y
0
Y
n
) = log 2 h(p
n
).
Problem 3 continued on next page. . . Page 2 of 6
Weihao Gao Homework 3 Problem 3 (continued)
In order to study the property of I(Y
0
; Y
n
) as n goes, we should study the property of h(p) where p
1
2
.
Since
h
(p)
p=
1
2
= log
1 p
p
p=
1
2
= 0
h
(p)
p=
1
2
=
log e
(1 p)p
p=
1
2
= 4log e
So for n goes large, h(p
n
) h(
1
2
) 4log e(
1
2
p
n
)
2
= log 2 4log e
1
2
(1 2)
2n
= log 2 log e(1 2)
2n
.
Therefore
I(Y
0
; Y
n
) log e(1 2)
2n
Problem 4
Answer:
Without loss of generosity, assume that E[X
G
] = E[Y
G
] = E[Y ] = 0 (otherwise, we can add an oset on
them, and the constant oset does not change the mutual information.)
Dene =
E[X
G
Y
G
]
E[X
2
G
]
. If = 0, I(X
G
; Y
G
) = 0 I(X
G
; Y ).
If 0, we dene N
G
=
1
Y
G
X
G
, N =
1
Y X
G
. Then we have:
E[X
G
N] = E[X
G
N
G
] =
1
E[X
G
Y
G
] E[X
2
G
] = 0
N
G
is a jointly Gaussian with X
g
, so X
G
N
G
, then
E[N
2
G
] = D[N
G
] =
1
2
D[Y
G
] + D[X
G
] =
1
2
D[Y ] + D[X
G
] D[N] = E[N
2
]
So N is a noise with E[N
2
]
2
N
. So we have:
I(X
G
; X
G
+ N
G
) I(X
G
; X
G
+ N)
I(X
G
; Y
G
) I(X
G
; Y )
I(X
G
; Y
G
) I(X
G
; Y )
If Y
G
is not a jointly Gaussian with X
G
, this may not holds true.
Counter example: Y =
X X 1
X X > 1
. Then Y
G
is a Gaussian distribution and I(X
G
, Y
G
) can be innity.
(Acknowledgement: Tianren Liu)
Problem 5
Answer:
Assume w.l.o.g. that E[A] = E[B] = E[C] = 0. Then
A B C I(A; CB) = 0
To compute I(A; CB), assume C = (B + N) where N B; A = (N + Z) where Z N. We shall compute
Problem 5 continued on next page. . . Page 3 of 6
Weihao Gao Homework 3 Problem 5 (continued)
relations among , ,
N
and
Z
.
=
E[BC]
E[B
2
]
=
r
BC
2
B
=
r
BC
2
N
=
1
2
C
+
2
B
=
2
B
2
C
r
2
BC
2
C
+
2
B
=
2
B
(1 +
1
r
2
BC
)
=
E[AN]
E[N
2
]
=
E[AC] E[AB]
2
N
=
r
AC
C
r
BC
B
r
AB
2
N
=
(r
AC
r
AB
r
BC
)
A
2
N
2
Z
=
2
2
N
+
2
A
Then
I(A; CB) = 0 I(A; N) = I(N; N + Z) =
1
2
log (1 +
2
N
2
Z
) = 0
0 =
2
N
2
Z
=
2
2
N
2
N
+
2
A
0 =
2
=
(r
AC
r
AB
r
BC
)
A
2
N
r
AC
= r
AB
r
BC
For the discrete version, A B C if and only if
P
XY Z
= P
X
P
Y X
P
ZY
= P
XY
P
ZY
P
XY Z
P
Y
= P
XY
P
Y Z
In terms of x
abc
s, the condition is for any a A, B B, C C, it holds that:
A,c
C
x
abc
x
a
bc
=
a
A,c
C
x
abc
x
a
bc
Problem 6
Answer:
Since A B C, for any a, b, c with P
A
(a) > 0, it holds:
P
ABC
(a, b, c) = P
A
(a)P
BA
(ba)P
CB
(cb)
Since A C B, it holds:
P
ABC
(a, b, c) = P
A
(a)P
CA
(ca)P
BC
(bc)
P
BA
(ba)P
CB
(cb) = P
CA
(ca)P
BC
(bc)
P
BA
(ba) = P
CA
(ca)
P
BC
(bc)
P
CB
(cb)
= P
CA
(ca)
P
B
(b)
P
C
(c)
Take the sum of all possible b, we have
1 =
b
P
BA
(ba) = P
CA
(ca)
b
P
B
(b)
P
C
(c)
=
P
CA
(ca)
P
C
(c)
Problem 6 continued on next page. . . Page 4 of 6
Weihao Gao Homework 3 Problem 6 (continued)
So P
CA(ca)
= P
C
(c), then P
BA(ba)
= P
B
(b). Therefore, A (B, C).
That implicates that: if T is a sucient statistic of Y for and P
TY
(ty) > 0, then (Y, T).
Bonus: All the counter-examples satisfy either B = C or B =
C. but P(BA = 0) P(BA = 1).
Problem 7
Answer:
First, we will show that for jointly Gaussian (X, Y ), I(X; Y ) = 0 X Y .
Assume w.l.o.g that E[X] = E[Y ] = 0. Dene =
E[XY ]
E[Y
2
]
. If X Y ,
1
= 0, I(X; Y ) = 0.
If 0, dene N =
1
2
N
=
1
2
A
+
2
C
therefore,
I(X; Y ) = I((Y + N); Y ) = I(Y ; Y + N) =
1
2
log (1 +
2
y
1
2
x
+
2
y
) =
1
2
log (1 +
2
2
y
2
x
+
2
2
y
) <
1
2
log 1 < 0
So I(X; Y ) = 0 X Y .
Back to the original problem. If I(A; C) = I(B; C) = 0, we have A C, B C, so (A, B) C. So
I(A, B; C) = 0.
For general A, B, C, consider the following distribution:
P
ABC=0
(00) = P
ABC=0
(11) =
1
3
, P
ABC=0
(01) = P
ABC=0
(10) =
1
6
P
ABC=1
(00) = P
ABC=1
(11) = P
ABC=1
(01) = P
ABC=1
(10) =
1
4
Here I(A; C) = I(B; C) = 0, but I(A, B; C) > 0.
The statement is false. Due to the counter-example above.
Problem 8
Answer:
Since {X
n
} is a Markov chain, so the entropy rate is:
R = lim
n
H(X
n
X
n1
)
In this example, the stable state is lim
n
X
n
= (13, 13, 13). So
R =
1
3
(
1
2
log 2 +
1
4
log 4 +
1
4
log 4) +
1
3
(
1
2
log 2 +
1
2
log 2) +
1
3
log 1 =
1
3
(
3
2
log 2 +
1
2
log 4) =
5
6
log 2
Page 5 of 6
Weihao Gao Homework 3 Problem 9
Problem 9
Answer:
We claim that for any graph, the entropy rate of random walk on any simple graph G is:
vV
d
v
log d
v
(2
vV
d
v
)
Since random walk is a Markov chain, so the entropy rate is R = lim
n
H(X
n
X
n1
). For n goes to
innity, the stable state is the (only) all-nonnegative eigenvector of the random walk matrix. It is
lim
n
X
n
=
1
2
vV
d
v
(d
v1
, . . . , d
vn
)
So the entropy rate is:
R = lim
n
viV
P[X
n
= i]H(X
n+1
X
n
) =
1
2
vV
d
v
viV
d
vi
log d
vi
In this particular problem, we x
vV
d
v
= 4. So we have to maximize/minimize the sum
vV
d
v
log d
v
.
Here is all the possible distributions of degrees:
(2, 2, 2, 2); (3, 2, 2, 1); (2, 2, 2, 1, 1); (3, 2, 1, 1, 1); (4, 1, 1, 1, 1)
Among these, (2, 2, 2, 2) maximize the value to 4log 2. The corresponding graph is a square. (2, 2, 2, 1, 1)
minimize the value to 3log 2. The corresponding graph is a line of ve nodes.
Page 6 of 6