0% found this document useful (0 votes)
168 views

6.441 - Information Theory: Homework 3: Problem 1

This document contains a student's homework assignment for an information theory course. It includes 6 problems, with the student providing proofs and solutions. The problems cover topics like independence of random variables, sufficient statistics, Markov chains, and conditional independence. The student provides detailed mathematical proofs and reasoning for each problem.

Uploaded by

Kushagra Singhal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views

6.441 - Information Theory: Homework 3: Problem 1

This document contains a student's homework assignment for an information theory course. It includes 6 problems, with the student providing proofs and solutions. The problems cover topics like independence of random variables, sufficient statistics, Markov chains, and conditional independence. The student provides detailed mathematical proofs and reasoning for each problem.

Uploaded by

Kushagra Singhal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

6.

441 - Information Theory: Homework 3


Instructed by Y. Polyanskiy
Due on Feb. 28, 2012
Weihao Gao [email protected]
Problem 1
Answer:
Proof. (Notate X
n
= X
1
. . . X
n
, Y
n
= Y
1
. . . Y
n
)
Suciency: If
P
Y1...YnX1...Xn
=
n

i=1
P
YiXi
Then for all i = 1, 2, . . . , n
P
YiXi
=
P
YiXi
P
Xi
=
P
X
n
Y
n
P
X
n

P
Xi
P
XiYi
=
P
Y
n
X
n
P
YiXi
=

n
k=1
P
Y
k
X
k
P
YiXi
=
n

k=1,ki
P
Y
k
X
k
Therefore,
P
Y
n
X
n =
P
X
n
Y
n
P
X
n
=
P
X
n
Y
n
P
Xi
P
XiXi
=
P
XiY
n
Xi
P
XiXi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi

P
YiXi
P
XiYiXi
P
XiXi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi

P
XiYi
P
Xi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi
P
YiXi
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
YiXi

n

k=1,ki
P
Y
k
X
k
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
n

k=1
P
Y
k
X
k
=
P
XiY
n
Xi
P
YiXi
P
XiYiXi
P
Y
n
X
n
So for all i = 1, . . . , n
P
XiY
n
Xi
= P
YiXi
P
XiYiXi
So Y
i
X
i
(X
i
, Y
i
), namely Y
i
X
i
(X
i
, Y
i
).
Necessity: If Y
i
X
i
(X
i
, Y
i
) for all i = 1, 2, . . . , n, then Y
i
X
i
(X
i
, Y
i
) and
P
XiY
n
Xi
= P
YiXi
P
XiYiXi
Weihao Gao Homework 3 Problem 1 (continued)
So
P
Y
n
X
n =
P
X
n
Y
n
P
X
n
=
P
X
n
Y
n
P
X1
P
X1Xi
=
P
X1Y
n
X1
P
X1X1
=
P
Y1X1
P
X1Y1X1
P
X1X1
= P
Y1X1

P
X1Y1
P
X1
= P
Y1X1
P
Y1X1
Similarly, we have
P
Y1X1
= P
Y2X2
P
Y
{1,2}
X
{1,2}
By parity of reasoning, we got P
Y
n
X
n =
n
i=1
P(Y
i
X
i
).
Problem 2
Answer:
Proof. For Z
1
, . . . , Z
n
independent Poisson distribution random variables with mean , we have
Pr[Z
i
= k] =

k
e

k!
for all non-negative integer k. Therefore, for T =
n
i=1
Z
i
, we obtain
P
()
TZ
n
(t, z
1
, . . . , z
n
) =
n

i=1

zi
e

z
i
!
(t
n

i=1
z
i
)
= (
n

i=1

zi
e

)
(t
n
i=1
z
i
)

n
i=1
z
i
!
= (
t
e
n
)
(t
n
i=1
z
i
)

n
i=1
z
i
!
where
t
e
n
is a function on t and ;
(t
n
i=1
zi)

n
i=1
zi!
is a function on t and z
i
s. By Fisher Theorem, T =
n
i=1
is a sucient statistic of (Z
1
, . . . , Z
n
).
Problem 3
Answer:
Let p
n
be the probability that Y
n
= Y
0
. Then we have p
0
= 1 and the following recursion formula:
p
n
= (1 )p
n1
+ (1 p
n1
)
p
n

1
2
= (1 2)(p
n1

1
2
)
p
n
=
1
2
+
1
2
(1 2)
n
So H(Y
0
Y
n
) = h(p
n
). I(Y
0
; Y
n
) = H(Y
0
) H(Y
0
Y
n
) = log 2 h(p
n
).
Problem 3 continued on next page. . . Page 2 of 6
Weihao Gao Homework 3 Problem 3 (continued)
In order to study the property of I(Y
0
; Y
n
) as n goes, we should study the property of h(p) where p
1
2
.
Since
h

(p)
p=
1
2
= log
1 p
p

p=
1
2
= 0
h

(p)
p=
1
2
=
log e
(1 p)p

p=
1
2
= 4log e
So for n goes large, h(p
n
) h(
1
2
) 4log e(
1
2
p
n
)
2
= log 2 4log e
1
2
(1 2)
2n
= log 2 log e(1 2)
2n
.
Therefore
I(Y
0
; Y
n
) log e(1 2)
2n
Problem 4
Answer:
Without loss of generosity, assume that E[X
G
] = E[Y
G
] = E[Y ] = 0 (otherwise, we can add an oset on
them, and the constant oset does not change the mutual information.)
Dene =
E[X
G
Y
G
]
E[X
2
G
]
. If = 0, I(X
G
; Y
G
) = 0 I(X
G
; Y ).
If 0, we dene N
G
=
1

Y
G
X
G
, N =
1

Y X
G
. Then we have:
E[X
G
N] = E[X
G
N
G
] =
1

E[X
G
Y
G
] E[X
2
G
] = 0
N
G
is a jointly Gaussian with X
g
, so X
G
N
G
, then
E[N
2
G
] = D[N
G
] =
1

2
D[Y
G
] + D[X
G
] =
1

2
D[Y ] + D[X
G
] D[N] = E[N
2
]
So N is a noise with E[N
2
]
2
N
. So we have:
I(X
G
; X
G
+ N
G
) I(X
G
; X
G
+ N)
I(X
G
; Y
G
) I(X
G
; Y )
I(X
G
; Y
G
) I(X
G
; Y )
If Y
G
is not a jointly Gaussian with X
G
, this may not holds true.
Counter example: Y =

X X 1
X X > 1
. Then Y
G
is a Gaussian distribution and I(X
G
, Y
G
) can be innity.
(Acknowledgement: Tianren Liu)
Problem 5
Answer:
Assume w.l.o.g. that E[A] = E[B] = E[C] = 0. Then
A B C I(A; CB) = 0
To compute I(A; CB), assume C = (B + N) where N B; A = (N + Z) where Z N. We shall compute
Problem 5 continued on next page. . . Page 3 of 6
Weihao Gao Homework 3 Problem 5 (continued)
relations among , ,
N
and
Z
.
=
E[BC]
E[B
2
]
=
r
BC

2
B
=
r
BC

2
N
=
1

2
C
+
2
B
=

2
B

2
C
r
2
BC

2
C
+
2
B
=
2
B
(1 +
1
r
2
BC
)
=
E[AN]
E[N
2
]
=
E[AC] E[AB]

2
N
=
r
AC

C

r
BC

B
r
AB

2
N
=
(r
AC
r
AB
r
BC
)
A

2
N

2
Z
=
2

2
N
+
2
A
Then
I(A; CB) = 0 I(A; N) = I(N; N + Z) =
1
2
log (1 +

2
N

2
Z
) = 0
0 =

2
N

2
Z
=

2

2
N

2
N
+
2
A
0 =
2
=
(r
AC
r
AB
r
BC
)
A

2
N
r
AC
= r
AB
r
BC
For the discrete version, A B C if and only if
P
XY Z
= P
X
P
Y X
P
ZY
= P
XY
P
ZY
P
XY Z
P
Y
= P
XY
P
Y Z
In terms of x
abc
s, the condition is for any a A, B B, C C, it holds that:

A,c

C
x
abc
x
a

bc
=
a

A,c

C
x
abc
x
a

bc
Problem 6
Answer:
Since A B C, for any a, b, c with P
A
(a) > 0, it holds:
P
ABC
(a, b, c) = P
A
(a)P
BA
(ba)P
CB
(cb)
Since A C B, it holds:
P
ABC
(a, b, c) = P
A
(a)P
CA
(ca)P
BC
(bc)
P
BA
(ba)P
CB
(cb) = P
CA
(ca)P
BC
(bc)
P
BA
(ba) = P
CA
(ca)
P
BC
(bc)
P
CB
(cb)
= P
CA
(ca)
P
B
(b)
P
C
(c)
Take the sum of all possible b, we have
1 =
b
P
BA
(ba) = P
CA
(ca)

b
P
B
(b)
P
C
(c)
=
P
CA
(ca)
P
C
(c)
Problem 6 continued on next page. . . Page 4 of 6
Weihao Gao Homework 3 Problem 6 (continued)
So P
CA(ca)
= P
C
(c), then P
BA(ba)
= P
B
(b). Therefore, A (B, C).
That implicates that: if T is a sucient statistic of Y for and P
TY
(ty) > 0, then (Y, T).
Bonus: All the counter-examples satisfy either B = C or B =

C. but P(BA = 0) P(BA = 1).
Problem 7
Answer:
First, we will show that for jointly Gaussian (X, Y ), I(X; Y ) = 0 X Y .
Assume w.l.o.g that E[X] = E[Y ] = 0. Dene =
E[XY ]
E[Y
2
]
. If X Y ,
1
= 0, I(X; Y ) = 0.
If 0, dene N =
1

X Y . Then N is a Gaussian distribution independent with Y . And

2
N
=
1

2
A
+
2
C
therefore,
I(X; Y ) = I((Y + N); Y ) = I(Y ; Y + N) =
1
2
log (1 +

2
y
1

2
x
+
2
y
) =
1
2
log (1 +

2

2
y

2
x
+
2

2
y
) <
1
2
log 1 < 0
So I(X; Y ) = 0 X Y .
Back to the original problem. If I(A; C) = I(B; C) = 0, we have A C, B C, so (A, B) C. So
I(A, B; C) = 0.
For general A, B, C, consider the following distribution:

P
ABC=0
(00) = P
ABC=0
(11) =
1
3
, P
ABC=0
(01) = P
ABC=0
(10) =
1
6
P
ABC=1
(00) = P
ABC=1
(11) = P
ABC=1
(01) = P
ABC=1
(10) =
1
4
Here I(A; C) = I(B; C) = 0, but I(A, B; C) > 0.
The statement is false. Due to the counter-example above.
Problem 8
Answer:
Since {X
n
} is a Markov chain, so the entropy rate is:
R = lim
n
H(X
n
X
n1
)
In this example, the stable state is lim
n
X
n
= (13, 13, 13). So
R =
1
3
(
1
2
log 2 +
1
4
log 4 +
1
4
log 4) +
1
3
(
1
2
log 2 +
1
2
log 2) +
1
3
log 1 =
1
3
(
3
2
log 2 +
1
2
log 4) =
5
6
log 2
Page 5 of 6
Weihao Gao Homework 3 Problem 9
Problem 9
Answer:
We claim that for any graph, the entropy rate of random walk on any simple graph G is:

vV
d
v
log d
v
(2
vV
d
v
)
Since random walk is a Markov chain, so the entropy rate is R = lim
n
H(X
n
X
n1
). For n goes to
innity, the stable state is the (only) all-nonnegative eigenvector of the random walk matrix. It is
lim
n
X
n
=
1
2
vV
d
v
(d
v1
, . . . , d
vn
)
So the entropy rate is:
R = lim
n

viV
P[X
n
= i]H(X
n+1
X
n
) =
1
2
vV
d
v

viV
d
vi
log d
vi
In this particular problem, we x
vV
d
v
= 4. So we have to maximize/minimize the sum
vV
d
v
log d
v
.
Here is all the possible distributions of degrees:
(2, 2, 2, 2); (3, 2, 2, 1); (2, 2, 2, 1, 1); (3, 2, 1, 1, 1); (4, 1, 1, 1, 1)
Among these, (2, 2, 2, 2) maximize the value to 4log 2. The corresponding graph is a square. (2, 2, 2, 1, 1)
minimize the value to 3log 2. The corresponding graph is a line of ve nodes.
Page 6 of 6

You might also like