Lecture 14
Lecture 14
Lecture 14 — February 28
Lecturer: David Tse Scribe: Sagnik M, Vivek B
14.1 Outline
• Gaussian channel and capacity
• Information measures for continuous random variables
14.2 Recap
So far, we have focused only on communication channels with a discrete alphabet. For
instance, the binary erasure channel (BEC) is a good model for links and routes in a network
where packets of data are transferred correctly or lost entirely. The binary symmetric channel
(BSC) on the other hand is quite an oversimplification of errors in a physical channel and
therefore, not a very realistic model.
14-1
EE/Stats 376A Lecture 14 — February 28 Winter 2017
14.3.2 Capacity
Theorem 1. In a Gaussian channel with average power constrained by P and noise dis-
tributed as N (0, σ 2 ), the channel capacity C is given by 21 log 1 + σP2 .
Theorem 1 is a cornerstone result for information theory, and we will devote this section
to analyze the sphere packing structure of a Gaussian channel and provide an intuitive
explanation of the theorem above. In the next lecture we will prove the theorem rigorously.
The sphere packing argument of the Gaussian channel is characterized by the noise and
the output spheres. In particular, the ratio of the volume of the output sphere to the volume
of a ‘average’ noise sphere gives us an upper bound on the codebook size, 2nR .
Noise Spheres: Consider any input codeword X of length n and the received vector Y ,
where Yi = Xi + Zi , Zi ∼ N (0, σ 2 ). Then, the radius of the noise sphere ispthe
Pndistance
n
pP
2 2
between input vector X and received vector Y , equal to i=1 (Xi − Yi ) = i=1 Zi .
2 2 2
Pni ] =2Var(Z2i ) + E[Zi ] = σ and since Zi ’s are i.i.d by the Weak Law of √
Now, E[Z Large
Numbers, i=1 Zi ≈ nσ . Thus the ‘average’ radius of a noise sphere is approximately nσ 2 .
Output Sphere: Because of the power constraint P , the input sphere has radius at most
√
nP . The noise expands the input sphere into the output sphere by nσ 2 . Since p the out-
put vectors have energy no greater than nP +nσ 2 , they lie in a sphere of radius n(P + σ 2 ).
Therefore, the number of non-intersecting noise spheres in the output sphere is at most
n n
Cn (n(P + σ 2 )) 2
P 2
n = 1+ 2 .
Cn (nσ 2 ) 2 σ
For decoding with a low probability of error,
n
nR P 2 1 P
2 ≤ 1+ 2 =⇒ R ≤ log 1 + 2 .
σ 2 σ
Similar to the case of discrete alphabet, we expect the expression 21 log(1 + σP2 ) to be the
solution of a mutual information maximization problem with power constraint (14.1):
max I(X; Y )
fX
(14.2)
s.t E[X 2 ] ≤ P,
where I(X; Y ) is some notion of mutual information between continuous random variables
X and Y . A rigorous proof of the result necessitates the introduction of the corresponding
14-2
EE/Stats 376A Lecture 14 — February 28 Winter 2017
X Y
Digital To Analog Physical Channel Analog To Digital
Now, we will show that Definition 2 is sensible by proving that it’s an approximation to
the discretized form of X, Y . Since X and Y are the limit of arbitrarily small discretizations,
we will have show that the definition is consistent with our previous definitions.
For ∆ > 0, define
X ∆ = i∆ if i∆ ≤ X < (i + 1)∆,
Y ∆ = i∆ if i∆ ≤ Y < (i + 1)∆.
Now, for small ∆, we have p(X ) ≈ ∆f (X), p(Y ∆ ) ≈ ∆f (Y ) and p(X ∆ , Y ∆ ) ≈ ∆2 f (X, Y ),
∆
From the above equations we see that ∆ can be arbitrarily small, I(X ∆ , Y ∆ ) approximates
I(X; Y ) to an arbitrary precision. Therefore,the definition for mutual information in con-
tinuous random variables is consistent with that for discrete ones.
14-3
EE/Stats 376A Lecture 14 — February 28 Winter 2017
Now, of course there are similarities and dissimilarities between entropy and differential
entropy. We explore some of these in detail:
1. H(X) ≥ 0 for any discrete random variable X. However, h(X) need not be non-
negative for every continuous random variable. This is because a probability mass
function is always at most 1 but a density function can be arbitrarily large.
2. H(X) is label-invariant. However, h(X) need not be label invariant. The simplest
change of labels, say Y = aX, for a scalar a proves this. Indeed the density function
1
fY is given by fY (y) = |a| fX ay and so
1 |a| 1
h(Y ) = E log = E log = log |a|+E log = h(X)+log |a|.
fY (Y ) fX (Y /a) fX (X)
14-4
EE/Stats 376A Lecture 14 — February 28 Winter 2017
2 /2
2. Normal Distribution. Suppose X ∼ N (0, 1). Then, f (x) = √12π e−x for every x, and
1
√ 2
log2 f (x) = log2 2π + x2 log2 e. The differential entropy of X is
√ X2
1
h(X) = E log2 = E log2 2π + log2 e
f (X) 2
1 log2 e 1
=⇒ h(X) = log2 2π + (Var(X) + E[X]2 ) = log2 2πe.
2 2 2
14-5