Digital Communication Quiz
Digital Communication Quiz
In-class quiz
(b) Find an example of a Huffman code with 7 code words in which one code word has
length 6 but �log(1/pi )� = 4.
[8 points] Since Huffman codes are full, the only possibility is to have two code words of
length 6, and one code word of each shorter length down to 1 (see Figure 1).
p + ε ��� ε
��
2p + ε �� �
� ��
�
3p + 2ε ������ �
� � ��
�
5p + 3ε ���� � � �
� p
� � � �
8p + 5ε ����� �� � p
�� �� ��
13p + 8ε������ �� p + ε
��� � � 2p + ε
� �
�� ��
�� 3p + 2ε
� 5p + 3ε
Our objective here is to maximize the probability of one of the code words of length 6
while maintaining the constraints of a Huffman code. Let p be the probability of one code
word of length 6, and let ε (arbitrarily small) be the probability of the other. We choose
the probability of each of the other nodes to be as small as possible, while still allowing
the Huffman algorithm to choose the above code.
The figure shows the minimum probability that can be assigned to each leaf node, and
results in 1 = 13p + 8ε. Since ε can be chosen arbitrarily small, we see that we can choose
p to have any desired value greater than 1/13. Thus we can choose p > 2−4 , which leads
to �log(1/pi )� = 4.
An alternative approach is to set p = 1/16 at the outset and then either choose ε as
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
1
�
above, or choose any ε small enough and then choose p1 to make i pi = 1.
(c) Explain how this can be generalized to Huffman codes in which li − �log(1/pi )� is
arbitrarily large for an least one code word.
[6 points] The coefficients of p, moving from right to left on the upper nodes, are
1, 1, 2, 3, 5, 8, 13.
√ These are the terms of the Fibonnaci series, which increases geomet
rically as (1 + 5)/2. More precisely, the nth term of the series is
�� √ �n � √ �n �
1 1+ 5 1− 5
√ −
5 2 2
If we extend the argument in part (b) to a tree of length n − 1 with n code words, we
√
(1+ 5)
√
get − log p ≈ n log 2 − log 5. This is increasing linearly with n but at a smaller
slope than 1. Since the length of the code word is increasing with n with slope 1, the
difference between the length and the log pmf is growing without bound1 .
[6 points] There are very few of the large probability words, so even though they have
large probability individually, their aggregate probability is very small.
(c) Assume that there are both intermediate nodes and leaf nodes at some given length l.
Prove that each code word of length l has a probability p ≥ ql /2 where ql is the maximum
of the probabilities of the intermediate nodes of length l.
[6 points] For each intermediate node (and in particular the most probable one), both of
the immediate descendants of that node have probabilities, say q � and q �� satisfying q � ≤ p
and q �� ≤ p, since if q � > p (or q �� > p), the node of probability p could be interchanged
with the subtree stemming from q � (or q �� ) with a reduction in the average code word
1
You were not expected to go through this entire analytic argument.
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
2
length. Thus ql = q � + q �� ≤ 2p. Thus, p ≥ ql /2.
(d) Let m be the shortest length for which leaf nodes exist (you may assume that all such
leaf nodes correspond to atypical n-tuples). Let Mm be the number of leaf nodes of length
m. Let δm ≤ δ be the sum of the probabilities of these atypical leaf nodes. Find a lower
bound to δm in terms of qm (the maximum of the probabilities of the intermediate nodes
of length m) and Mm . Hint: Use part (c).
[6
�M points] Let pi ; 1 ≤ i ≤ Mm be the probabilities of the leaf nodes of length m. Then
m
i=1 pi = δm . From part (c), pi ≥ qm /2 for each i; 1 ≤ i ≤ Mm . Thus
Mm
� Mm
� qm Mm qm
δm = pi ≥ =
i=1 i=1
2 2
(e) Find a lower bound to qm in terms of δm and Mm . Hint: The sum of the probabilities
of the intermediate nodes plus leaf nodes at length m must be one.
[6 points] Let qm (i) be the probability of the ith intermediate node at length m. Thus
qm = maxi {qm (i)}. The number of intermediate nodes of length m is 2m − Mm , so
2m�
−Mm
m
qm (2 − Mm ) ≥ qM (i) = 1 − δm
i=1
1 − δm
qm ≥
(2m − Mm )
(f) Let βm = Mm /2m be the fraction of nodes at length m that are leaf nodes. Show that
βm 2δm
≤
1 − βm 1 − δm
Mm qm Mm (1 − δm )
δm ≥ ≥
2 2(2m − Mm )
βm qm βm (1 − δm )
δm ≥ ≥
2 2(1 − βm )
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
3
Problem Q-3 (30 points)
(a) Let U be a source output. Find the probability that the distance from U to V1 exceeds
some given number r. Ignore edge effects throughout, i.e., assume that the sample value
of U is more than r away from the boundary of the region A.
[5 points] For any given sample value u of U , the distance from u to the sample value v1
of V1 is less than or equal to r if v1 lies within a circle of radius r (i.e., of area πr2 ) around
u. Since V1 is uniformly selected over the area A, the probability that V1 lies within this
area is πr2 /A (we are using the symbol A both for the area of the region and the region
itself). Thus, for each u, Pr(�V1 − u� > r) = 1 − πr2 /A.
Many of you tried to approach this problem componentwise (i.e. along each dimension).
Since A is unspecified and we are dealing with circular regions around u, that approach
doesn’t quite work.
Note that it is important to distinguish between,
Most of you missed this distinction and derived the former and then equated it to the
latter.
(b) Find the probability that the distance from U to each of the {Vj } (i.e., to the closest
of the {Vj }) exceeds r.
[5 points] Since V1 , . . . , VM are independent, the events �Vj − u� > r are independent
events for any given u. Thus, for any given u,
(c) Assume that r2 /A is extremely small and approximate the probability in (b) as e−M g(r)
for the appropriate function g(r).
[5 points] For any ε > 0, (1 − ε)M = eM ln(1−ε) . For ε very small, ln(1 − ε) ≈ ε, so
(1 − ε)M ≈ e−M ε . Using πr2 /A for ε, this becomes
2 /A
(1 − πr2 /A)M ≈ e−M πr
(d) Let R be the error when the source output is represented as the closest quantization
point. Express the distribution function of the random variable R in terms of your answer
to c.
[7 points] The distribution function of R is FR (r) = Pr(R ≤ r) = 1−Pr(R > r). However,
R > r means that �Vj − U � > r for all j; 1 ≤ j ≤ M , which is the quantity found in part
(c). Thus
2
FR (r) = 1 − e−M πr /A
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
4
(e) Find the mean square error. The mean square error here is averaged over both the
source output and the random choice of quantizer points. Compare your result with that
of a quantizer using a square of quantization regions.
[6 points] From part (d), the probability density for R is
� �
2πM r −πM r2
fR (r) = exp
A A
�
The mean square error per dimension is then (1/2) r2 fR (r)dr. If we substitute y for r2 ,
this simplifies to
� � �
1 ∞ πM y −πM y A
MSE = exp dy =
2 y=0 A A 2πM
[2 points] For the 2D quantizer using square regions, each of area A/M , the MSE per
dimension is (1/12)(A/M ). Thus, the random choice of quantization points in 2D is not
as good as the much more straight forward uniform scalar quantizer.
Some students attempted to find the distribution of R2 and then find its mean. While
this is not necessary here, it is useful to know. See the solutions to HW 5.2(e) to see how
to do this.
Several students were confused regarding the limits of integration in this part (and the
range of r in defining the distribution in part (d)). If we derived a precise distribution
for r, then r clearly cannot exceed the diameter of region A (the diameter is simply the
largest distance between any two points in A). However, in part (d), we approximated the
2
distribution function by 1 − e−M πr /A . For this to be a valid distribution, r must range
from 0 to infinity. This is why the limits of integration are 0 to infinity.
The other way of looking at this is that under the assumption r2 << A, the diameter
of A is much larger than r. This, combined with the fact that the distribution is falling
exponentially in r is why integrating from 0 to infinity is justified.
Some of you forgot to normalize both the MSEs per dimension leading to an incorrect
comparison of the random and square case.
Problem Q-4 (15 points)
(a) Express the coefficients {uk } as inner products involving u(t), {θ k }, and {Ak }.
[7 points] As done in the notes several times, we have
� ∞ � ∞�
∗
u(t)θj (t) dt = uk θk (t)θj∗ (t) dt
−∞ −∞ k
� � ∞
= uk θk (t)θj∗ (t) dt = uj Aj
k −∞
Thus,
1
uk = �u, θ k �
Ak
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].
5
�∞
(b) Find the energy �u�2 = −∞
|u(t)|2 dt in the simplest form you can in terms of {uk },
{θk } and {Ak }.
[8 points] Again, as before,
� ∞ � ∞ �
∗
u(t)u (t) dt = u(t) u∗k θk∗ (t) dt
−∞ −∞ k
� � ∞ �
= u∗k u(t)θk∗ (t) dt = |uk |2 Ak
k −∞ k
Common errors in this part had to do with scaling. People either forgot to scale or
assumed that all of the Aj s were identical.
Cite as: Robert Gallager, course materials for 6.450 Principles of Digital Communications I, Fall 2006. MIT OpenCourseWare
(http://ocw.mit.edu/), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].