EE6340 - Information Theory Problem Set 5 Solution: Max Max I
EE6340 - Information Theory Problem Set 5 Solution: Max Max I
q
i=1
D
n
max
n
i
= D
n
max
q
i=1
D
n
i
< D
n
max
. Hence there are sequences that do not start with
any codeword. These and all longer sequences with these length n
max
codewords as prexes cannot
be decoded.This situation can be best visualised using a tree.
Alternatively, we can map codewords onto dyadic intervals on the real line corresponding to the
real umbers whose decimal expansions start with the codewords. Since the length of the interval
for a codeword of length n
i
is D
n
i
and
D
n
i
< 1,there exist some intervals not used by any
codeword. The sequences in these intervals do not begin with any codeword and hence cannot be
decoded.
3. A possible solution for optimal codes for each state can be
Next state S
1
S
2
S
3
Code C
1
0 10 11
Code C
2
10 0 11
Code C
3
- 0 1
Average message length of the next symbol conditioned on the previous state using the given coding
scheme is
E(L|C
1
) =
1
2
(1) +
1
4
(2) +
1
4
(2) = 1.5
E(L|C
2
) =
1
4
(2) +
1
2
(1) +
1
4
(2) = 1.5
E(L|C
3
) = 0(1) +
1
2
(1) +
1
2
(1) = 1
Note that this code assignment achieves the conditional entropy lower bound.
To nd the unconditional average, we have to nd the stationary distribution on the states. Let
be the stationary distribution. Then solving = P
= =
= =
i=1
i
E(L|C
i
) = 4/3
1
Entropy rate of the Markov chain H = H(X
2
|X
1
) =
3
i=1
i
H(X
2
|X
1
= S
i
) = 4/3.
We observe that the unconditional average no. of bits per source symbol = Entropy rate of the
Markov chain because the expected length of each code C
i
= Entropy of state after S
i
(H(X
2
|X
1
=
S
i
)), so compression is maximal.
4. Binary Human code
Code
00 X1 6/21 6/21 6/21 9/21 12/21 1
10 X2 5/21 5/21 6/21 6/21 9/21
11 X3 4/21 4/21 5/21 6/21
010 X4 3/21 3/21 4/21
0110 X5 2/21 3/21
0111 X6 1/21
E(length)=51/21=2.43 bits.
Ternary Human code
Code
1 X1 6/21
6/21 10/21
1
2 X2 5/21
5/21
6/21
00 X3 4/21 4/21 5/21
01 X4 3/21 3/21
020 X5
2/21
3/21
021 X6 1/21
022 X7 0
E(length)=34/21=1.62 bits.
5. a) With 6 questions, the player cannot cover more than 63 values. This can be seen by induction.
With 1 question, he can cover 1 value, with 2, he can cover 1 value with rst question and
depending on this answer, there are 2 possible values of X that can be asked in the next
question. By extending this argument, we can see that he can ask 63 dierent questions of
the form "Is X=i" with 6 questions.
Thus if a player wants to maximise his expected return, he should choose the 63 outcomes
which have the highest values of p(x)v(x) and play to isolate these values.
First question should be "Is X=i" where i is the median of these 63 values. After isolating
one half using the rst question, the second question must be "Is X=j" where j is the median
of the half remaining after the rst question. The maximum expected winnings will be sum
of the 63 chosen p(x)v(x).
b) If arbitrary questions are allowed, the game reduces to 20 questions to determine the object.
Returns =
x
p(x)(v(x) l(x)), where l(x)=no.of questions required to determine the object.
Maximising the expected return is equivalent to minimising the expected no.of questions,
and thus the optimal strategy is to construct a Human code for the source and use that to
construct a question strategy.
x
p(x)v(x) H 1 Expected return
x
p(x)v(x) H
c) A computer wishing to minimise the return to the player will want to minimise
x
p(x)v(x)
H(x) over choices of p(x). Note that this is only a lower bound to the expected winning of
2
the player. Although the expected winnings of the player may be larger, we will assume that
the computer wants to minimise the lower bound.
Let J(p) =
p
i
v
i
+
p
i
log p
i
+
p
i
Dierentiating and setting to 0, v
i
+ log p
i
+ 1 + = 0
After normalising to ensure p is a pmf, p
i
=
2
v
i
j
2
v
j
Now let r
i
=
2
v
i
j
2
v
j
i
p
i
v
i
+
i
p
i
log p
i
=
i
p
i
log p
i
+
i
p
i
log 2
v
i
=
i
p
i
log p
i
i
p
i
log r
i
log(
i
2
v
j
) = D(p||r) log(
i
2
v
j
)
Thus return is minimised by choosing p
i
= r
i
. This is the distribution that the computer
must choose.
6. a) We need to minimise C =
i
p
i
c
i
l
i
such that 2
l
i
1. We will assume equality in the
constraint and let r
i
= 2
l
i
and let Q =
i
p
i
c
i
. Let q
i
= (p
i
c
i
)/Q. Then q also forms a
probability distribution and we can write C as
C =
i
p
i
c
i
l
i
= Q
i
q
i
log
1
r
i
= Q(
i
q
i
log
q
i
r
i
i
q
i
log q
i
)
= Q(D(q||r) +H(q))
We can minimise C by choosing r = q or l
i
= log
p
i
c
i
j
p
j
c
j
. Here we ignore any integer
constraints on l
i
C
huffman
C
+Q
7. a) Since l
i
= log
1
p
i
, we have
log
1
p
i
l
i
log
1
p
i
+ 1
= H(X) L =
i
p
i
l
i
< H(X) + 1
(1)
The dicult part is to prove that the code is a prex code. By the choice of l
i
, we have
2
l
i
p
i
2
(l
i
1)
Thus, F
j
, j > i diers from F
i
by atleast 2
l
i
and will therefore dier
from F
i
in atleast one place in the rst l
i
places. Thus no codeword is a prex for any other
codeword.
Symbol Probability Fi in decimal Fi in binary li Code
1 0.5 0.0 0.0 1 0
2 0.25 0.5 0.10 2 10
3 0.125 0.75 0.110 3 110
4 0.125 0.875 0.111 3 111
b) The Shannon code in this case achieves the entropy bound(1.75 bits) and is optimal.
3