0% found this document useful (0 votes)

125 views12 pages

Chap 3 Capacity of AWGN Channels

The document summarizes the proof that the capacity of an additive white Gaussian noise (AWGN) channel is equal to W log2(1+SNR) bits per second, where W is the bandwidth and SNR is the signal-to-noise ratio. The proof uses Shannon's random coding technique and typical set decoding. It shows that the probability of error goes to zero exponentially with code length N as long as the code rate is less than the channel capacity. This establishes that error-free transmission is possible at rates below capacity.

Uploaded by

Wesley Liverpool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views12 pages

Chap 3 Capacity of AWGN Channels

Uploaded by

Wesley Liverpool

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapter 3

Capacity of AWGN channels

In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log2 (1+SNR) bits per second (b/s). The proof that reliable transmission is possible at any rate less than capacity is based on Shannons random code ensemble, typical-set decoding, the Cherno-bound law of large numbers, and a fundamental result of large-deviation theory. We also sketch a geometric proof of the converse. Readers who are prepared to accept the channel capacity formula without proof may skip this chapter.

3.1

Outline of proof of the capacity theorem

The rst step in proving the channel capacity theorem or its converse is to use the results of Chapter 2 to replace a continuous-time AWGN channel model Y (t) = X (t) + N (t) with bandwidth W and signal-to-noise ratio SNR by an equivalent discrete-time channel model Y = X + N with a symbol rate of 2W real symbol/s and the same SNR, without loss of generality or optimality. We then wish to prove that arbitrarily reliable transmission can be achieved on the discretetime channel at any rate (nominal spectral eciency) < C[b/2D] = log2 (1 + SNR) b/2D. This will prove that reliable transmission can be achieved on the continuous-time channel at any data rate R < C[b/s] = W C[b/2D] = W log2 (1 + SNR) b/s. We will prove this result by use of Shannons random code ensemble and a suboptimal decoding technique called typical-set decoding. Shannons random code ensemble may be dened as follows. Let Sx = P/2W be the allowable average signal energy per symbol (dimension), let be the data rate in b/2D, and let N be the code block length in symbols. A block code C of length N , rate , and average energy Sx per dimension is then a set of M = 2N/2 real sequences (codewords) c of length N such that the expected value of ||c||2 under an equiprobable distribution over C is N Sx . For example, the three 16-QAM signal sets shown in Figure 3 of Chapter 1 may be regarded as three block codes of length 2 and rate 4 b/2D with average energies per dimension of Sx = 5, 6.75 and 4.375, respectively. 23

CHAPTER 3. CAPACITY OF AWGN CHANNELS

In Shannons random code ensemble, every symbol ck of every codeword c C is chosen independently at random from a Gaussian ensemble with mean 0 and variance Sx . Thus the average energy per dimension over the ensemble of codes is Sx , and by the law of large numbers the average energy per dimension of any particular code in the ensemble is highly likely to be close to Sx . We consider the probability of error under the following scenario. A code C is selected randomly from the ensemble as above, and then a particular codeword c0 is selected for transmission. The channel adds a noise sequence n from a Gaussian ensemble with mean 0 and variance Sn = N0 /2 per symbol. At the receiver, given y = c0 + n and the code C , a typical-set decoder implements the following decision rule (where is some small positive number): If there is one and only one codeword c C within squared distance N (Sn ) of the received sequence y, then decide on c; Otherwise, give up. A decision error can occur only if one of the following two events occurs: The squared distance ||y c0 ||2 between y and the transmitted codeword c0 is not in the range N (Sn ); The squared distance ||y ci ||2 between y and some other codeword ci = c0 is in the range N (Sn ). Since y c0 = n, the probability of the rst of these events is the probability that ||n||2 is not in the range N (Sn ) ||n||2 N (Sn + ). Since N Nk } is an iid zero-mean Gaussian = { 2 2 sequence with variance Sn per symbol and ||N|| = k Nk , this probability goes to zero as N for any > 0 by the weak law of large numbers. In fact, by the Cherno bound of the next section, this probability goes to zero exponentially with N . For any particular other codeword ci C , the probability of the second event is the probability that a code sequence drawn according to an iid Gaussian pdf pX (x) with symbol variance Sx and a received sequence drawn independently according to an iid Gaussian pdf pY (y) with symbol variance Sy = Sx + Sn are typical of the joint pdf pXY (x, y) = pX (x)pN (y x), where here we dene typical by the distance ||x y||2 being in the range N (Sn ). According to a fundamental result of large-deviation theory, this probability goes to zero as eN E , where, up to terms of the order of , the exponent E is given by the relative entropy (Kullback-Leibler divergence) pXY (x, y ) . D(pXY ||pX pY ) = dx dy pXY (x, y ) log pX (x)pY (y ) If the logarithm is binary, then this is the mutual information I (X ; Y ) between the random variables X and Y in bits per dimension (b/D). In the Gaussian case considered here, the mutual information is easily evaluated as Sy (y x)2 log2 e 1 y 2 log2 e 1 1 + log2 2Sy + = log2 I (X ; Y ) = EXY log2 2Sn 2 2Sn 2 2Sy 2 Sn

b/D.

Since Sy = Sx + Sn and SNR = Sx /Sn , this expression is equal to the claimed capacity in b/D.

3.2. LAWS OF LARGE NUMBERS

Thus we can say that the probability that any incorrect codeword ci C is typical with respect to y goes to zero as 2N (I (X ;Y )()) , where () goes to zero as 0. By the union bound, the probability that any of the M 1 < 2N/2 incorrect codewords is typical with respect to y is upperbounded by Pr{any incorrect codeword typical} < 2N/2 2N (I (X ;Y )()) , which goes to zero exponentially with N provided that < 2I (X ; Y ) b/2D and is small enough. In summary, the probabilities of both types of error go to zero exponentially with N provided that < 2I (X ; Y ) = log2 (1 + SNR) = C[b/2D] b/2D and is small enough. This proves that an arbitrarily small probability of error can be achieved using Shannons random code ensemble and typical-set decoding. To show that there is a particular code of rate < C[b/2D] that achieves an arbitrarily small error probability, we need merely observe that the probability of error over the random code ensemble is the average probability of error over all codes in the ensemble, so there must be at least one code in the ensemble that achieves this performance. More pointedly, if the average error probability is Pr(E ), then no more than a fraction of 1/K of the codes can achieve error probability worse than K Pr(E ) for any constant K > 0; e.g., at least 99% of the codes achieve performance no worse than 100 Pr(E ). So we can conclude that almost all codes in the random code ensemble achieve very small error probabilities. Briey, almost all codes are good (when decoded by typical-set or maximum-likelihood decoding).

3.2

Laws of large numbers

The channel capacity theorem is essentially an application of various laws of large numbers.

3.2.1

The Cherno bound

The weak law of large numbers states that the probability that the sample average of a sequence of N iid random variables diers from the mean by more than > 0 goes to zero as N , no matter how small is. The Cherno bound shows that this probability goes to zero exponentially with N , for arbitrarily small . Theorem 3.1 (Cherno bound) Let SN be the sum of N iid real random variables Xk , each with the same probability distribution pX (x) and mean X = EX [X ]. For > X , the probability that SN N is upperbounded by Pr{SN N } eN Ec ( ) , where the Cherno exponent Ec ( ) is given by Ec ( ) = max s (s),
s0

where (s) denotes the semi-invariant moment-generating function, (s) = log EX [esX ].

CHAPTER 3. CAPACITY OF AWGN CHANNELS

Proof. The indicator function (SN N ) of the event {SN N } is bounded by (SN N ) es(SN N ) for any s 0. Therefore Pr{SN N } = (SN N ) es(SN N ) , s 0, = k Xk and that the Xk are

where the overbar denotes expectation. Using the facts that SN independent, we have es(SN N ) = es(Xk ) = eN (s (s)) ,
k

where (s) = log esX . Optimizing the exponent over s 0, we obtain the Cherno exponent Ec ( ) = max s (s).
s0

We next show that the Cherno exponent is positive: Theorem 3.2 (Positivity of Cherno exponent) The Cherno exponent Ec ( ) is positive when > X , provided that the random variable X is nondeterministic. Proof. Dene X (s) as a random variable with the same alphabet as X , but with the tilted probability density function q (x, s) = p(x)esx(s) . This is a valid pdf because q (x, s) 0 and q (x, s) dx = e(s) esx p(x) dx = e(s) e(s) = 1. Evidently (0) = log EX [1] = 0, so q (x, 0) = p(x) and X (0) = X . Dene the moment-generating (partition) function Z (s) = e Now it is easy to see that Z (s) = Similarly, Z (s) = xesx p(x) dx = e(s) xesx q (x, s) dx = Z (s)X (s).
(s)

= EX [e

esx p(x) dx.

x2 esx p(x) dx = Z (s)X 2 (s).

Consequently, from (s) = log Z (s), we have (s) = (s) = Z (s) = X (s); Z (s) Z (s) 2 Z (s) 2 = X 2 (s) X (s) . Z (s) Z (s)

Thus the second derivative (s) is the variance of X (s), which must be strictly positive unless X (s) and thus X is deterministic.

3.2. LAWS OF LARGE NUMBERS

We conclude that if X is a nondeterministic random variable with mean X , then (s) is a strictly convex function of s that equals 0 at s = 0 and whose derivative at s = 0 is X . It follows that the function s (s) is a strictly concave function of s that equals 0 at s = 0 and whose derivative at s = 0 is X . Thus if > X , then the function s (s) has a unique maximum which is strictly positive. Exercise 1. Show that if X is a deterministic random variable i.e., the probability that X equals its mean X is 1 and > X , then Pr{SN N } = 0. The proof of this theorem shows that the general form of the function f (s) = s (s) when X is nondeterministic is as shown in Figure 1. The second derivative f (s) is negative everywhere, so the function f (s) is strictly concave and has a unique maximum Ec ( ). The slope f (s) = X (s) therefore decreases continually from its value f (0) = X > 0 at s = 0. The slope becomes equal to 0 at the value of s for which = X (s); in other words, to nd the maximum of f (s), keep increasing the tilt until the tilted mean X (s) is equal to . If we denote this value of s by s ( ), then we obtain the following parametric equations for the Cherno exponent: Ec ( ) = s ( ) (s ( )); = X (s ( )).
6

f (s) Ec ( ) 0

slope X slope 0

s ( )

Figure 1. General form of function f (s) = s (s) when > X . We will show below that the Cherno exponent Ec ( ) is the correct exponent, in the sense that log Pr{SN N } = Ec ( ). lim N N The proof will be based on a fundamental theorem of large-deviation theory We see that nding the Cherno exponent is an exercise in convex optimization. In convex optimization theory, Ec ( ) and (s) are called conjugate functions. It is easy to show from the properties of (s) that Ec ( ) is a continuous, strictly convex function of that equals 0 at = X and whose derivative at = X is 0.

3.2.2

Cherno bounds for functions of rvs

If g : X R is any real-valued function dened on the alphabet X of a random variable X , then g (X ) is a real random variable. If {Xk } is a sequence of iid random variables Xk with the same distribution as X , then {g (Xk )} is a sequence of iid random variables g (Xk ) with the same distribution as g (X ). The Cherno bound thus applies to the sequence {g (Xk )}, and shows that 1 the probability that the sample mean N k g (Xk ) exceeds goes to zero exponentially with N as N whenever > g (X ).

CHAPTER 3. CAPACITY OF AWGN CHANNELS

Let us consider any nite set {gj } of such functions gj : X R. Because the Cherno bound decreases exponentially with N , we can conclude that the probability that any of the sample 1 means N k gj (Xk ) exceeds its corresponding expectation gj (X ) by a given xed > 0 goes to zero exponentially with N as N . We may dene a sequence {Xk } to be -typical with respect to a function gj : X R if k gj (Xk ) < gj (X ) + . We can thus conclude that the probability that {Xk } is not -typical with respect to any nite set {gj } of functions gj goes to zero exponentially with N as N . 1 A simple application of this result is that the probability that the sample mean N k gj (Xk ) is not in the range gj (X ) goes to zero exponentially with N as N for any > 0, because this probability is the sum of the two probabilities Pr{ k gj (Xk ) N (gj (X ) + )} and Pr{ k gj (Xk ) N (gj (X ) + )}.
1 N

More generally, if the alphabet X is nite, then by considering the indicator functions of each possible value of X we can conclude that the probability that all observed relative frequencies in a sequence are not within of the corresponding probabilities goes to zero exponentially with N as N . Similarly, for any alphabet X , we can conclude that the probability of any nite 1 m number of sample moments N k Xk are not within of the corresponding expected moments X m goes to zero exponentially with N as N . In summary, the Cherno bound law of large numbers allows us to say that as N we will almost surely observe a sample sequence x which is typical in every (nite) way that we might specify.

3.2.3

Asymptotic equipartition principle

One consequence of any law of large numbers is the asymptotic equipartition principle (AEP): as N , the observed sample sequence x of an iid sequence whose elements are chosen according to a random variable X will almost surely be such that pX (x) 2N H(X ) , where H(X ) = EX [ log2 p(x)]. If X is discrete, then pX (x) is its probability mass function (pmf ) and H(X ) is its entropy; if X is continuous, then pX (x) is its probability density function (pdf) and H(X ) is its dierential entropy. The AEP is proved by observing that log2 pX (x) is a sum of iid random variables log2 pX (xk ), so the probability that log2 pX (x) diers from its mean N H(X ) by more than > 0 goes to zero as N . The Cherno bound shows that this probability in fact goes to zero exponentially with N . A consequence of the AEP is that the set T of all sequences x that are -typical with respect to the function log2 pX (x) has a total probability that approaches 1 as N . Since for all sequences x T we have pX (x) 2N H(X ) i.e., the probability distribution pX (x) is approximately uniform over T this implies that the size |T | of T is approximately 2N H(X ) . In the discrete case, the size |T | is the number of sequences in T , whereas in the continuous case |T | is the volume of T . In summary, the AEP implies that as N the observed sample sequence x will almost surely lie in an -typical set T of size 2N H(X ) , and within that set the probability distribution pX (x) will be approximately uniform.

3.2. LAWS OF LARGE NUMBERS

3.2.4

Fundamental theorem of large-deviation theory

As another application of the law of large numbers, we prove a fundamental theorem of largedeviation theory. A rough statement of this result is as follows: if an iid sequence X is chosen according to a probability distribution q (x), then the probability that the sequence will be typical of a second probability distribution p(x) is approximately Pr{x typical for p | q } eN D(p||q) , where the exponent D(p||q ) denotes the relative entropy (Kullback-Leibler divergence) p(x) p(x) dx p(x) log D(p||q ) = Ep log = . q (x) q (x) X Again, p(x) and q (x) denote pmfs in the discrete case and pdfs in the continuous case; we use notation that is appropriate for the continuous case. Exercise 2 (Gibbs inequality). (a) Prove that for x > 0, log x x 1, with equality if and only if x = 1. (b) Prove that for any pdfs p(x) and q (x) over X , D(p||q ) 0, with equality if and only if p(x) = q (x). Given p(x) and q (x), we will now dene a sequence x to be -typical with regard to log p(x)/q (x) if the log likelihood ratio (x) = log p(x)/q (x) is in the range N (D(p||q ) ), where D(p||q ) = Ep [(x)] is the mean of (x) = log p(x)/q (x) under p(x). Thus an iid sequence X chosen according to p(x) will almost surely be -typical by this denition. The desired result can then be stated as follows: Theorem 3.3 (Fundamental theorem of large-deviation theory) Given two probability distributions p(x) and q (x) on a common alphabet X , for any > 0, the probability that an iid random sequence X drawn according to q (x) is -typical for p(x), in the sense that log p(x)/q (x) is in the range N (D(p||q ) ), is bounded by (1 (N ))eN (D(p||q)+) Pr{x typical for p | q } eN (D(p||q)) , where (N ) 0 as N . Proof. Dene the -typical region T = {x | N (D(p||q ) ) log p(x) N (D(p||q ) + )}. q (x)

By any law of large numbers, the probability that X will fall in T goes to 1 as N ; i.e., 1 (N ) dx p(x) 1,
T

where (N ) 0 as N . It follows that dx q (x) dx p(x)eN (D(p||q)) eN (D(p||q)) ; T T dx q (x) dx p(x)eN (D(p||q)+) (1 (N ))eN (D(p||q)+) .
T T

CHAPTER 3. CAPACITY OF AWGN CHANNELS

Since we can choose an arbitrarily small > 0 and (N ) > 0, it follows the exponent D(p||q ) is the correct exponent for this probability, in the sense that log Pr{x typical for p | q } = D(p||q ). N N lim Exercise 3 (Generalization of Theorem 3.3). (a) Generalize Theorem 3.3 to the case in which q (x) is a general function over X . State any necessary restrictions on q (x). (b) Using q (x) = 1 in (a), state and prove a form of the Asymptotic Equipartition Principle. As an application of Theorem 3.3, we can now prove: Theorem 3.4 (Correctness of Cherno exponent) The Cherno exponent Ec ( ) is the correct exponent for Pr{SN N }, in the sense that
N

lim

log Pr{SN N } = Ec ( ), N

where SN = k xk is the sum of N iid nondeterministic random variables drawn according to some distribution p(x) with mean X < , and Ec ( ) = maxs0 s (s) where (s) = log esX . Proof. Let s be the s that maximizes s (s) over s 0. As we have seen above, for s = s the tilted random variable X (s ) with tilted distribution q (x, s ) = p(x)es x(s ) has mean X (s ) = , whereas for s = 0 the untilted random variable X (0) with untilted distribution q (x, 0) = p(x) has mean X (0) = X . Let q (0) denote the untilted distribution q (x, 0) = p(x) with mean X (0) = X , and let q (s ) denote the optimally tilted distribution q (x, s ) = p(x)es x(s ) with mean X (s ) = . Then log q (x, s )/q (x, 0) = s x (s ), so D(q (s )||q (0)) = s (s ) = Ec ( ). Moreover, the event that X is -typical with respect to the variable log q (x, s )/q (x, 0) = s x (s ) under q (x, 0) = p(x) is the event that s SN N (s ) is in the range N (s (s ) ), since is the mean of X under q (x, s ). This event is equivalent to SN being in the range N ( /s ). Since may be arbitrarily small, it is clear that the correct exponent of the event Pr{SN N } is Ec ( ). This event evidently dominates the probability Pr{SN N }, which we have already shown to be upperbounded by eN Ec ( ) . Exercise 4 (Cherno bound divergence upper bound.) Using the Cherno bound, prove that for any two distributions p(x) and q (x) over X , Pr{log p(x) N D(p||q ) | q } eN (D(p||q)) . q (x)

[Hint: show that the s that maximizes s (s) is s = 1.]

3.2. LAWS OF LARGE NUMBERS

3.2.5

Proof of the forward part of the capacity theorem

We now prove that with Shannons random Gaussian code ensemble and with a slightly different denition of typical-set decoding, we can achieve reliable communication at any rate < C[b/2D] = log2 (1 + SNR) b/2D. We recall that under this scenario the joint pdf of the channel input X and output Y is pXY (x, y ) = pX (x)pN (y x) = 1 1 2 2 ex /2Sx e(yx) /2Sn . 2Sx 2Sn

Since Y = X + N , the marginal probability of Y is 1 2 ey /2Sy , pY (y ) = 2Sy where Sy = Sx + Sn . On the other hand, since incorrect codewords are independent of the correct codeword and of the output, the joint pdf of an incorrect codeword symbol X and of Y is 1 1 2 2 ey /2Sy . qXY (x , y ) = pX (x )pY (y ) = e(x ) /2Sx 2Sx 2Sy We now redene typical-set decoding as follows. An output sequence y will be said to be -typical for a code sequence x if (x, y) = log pXY (x, y) N (D(pXY ||pX pY ) ). pX (x)pY (y)
1 2

Substituting for the pdfs and recalling that D(pXY ||pX pY ) = equivalent to ||y||2 ||y x||2 + 2N . Sn Sy

log Sy /Sn , we nd that this is

Since ||y||2 /N is almost surely very close to its mean Sy , this amounts to asking that ||y x||2 /N be very close to its mean Sn under the hypothesis that x and y are drawn according to the joint pdf pXY (x, y ). The correct codeword will therefore almost surely meet this test. According to Exercise 4, the probability that any particular incorrect codeword meets the test (x, y) = log pXY (x, y) N D(pXY ||pX pY ) pX (x)pY (y)

is upperbounded by eN D(pXY ||pX pY ) = 2N I (X ;Y ) . If we relax this test by an arbitrarily small number > 0, then by the continuity of the Cherno exponent, the exponent will decrease by an amount () which can be made arbitrarily small. Therefore we can assert that the probability that a random output sequence Y will be -typical for a random incorrect sequence X is upperbounded by Pr{Y -typical for X} 2N (I (X ;Y )()) , where () 0 as 0.

CHAPTER 3. CAPACITY OF AWGN CHANNELS

Now if the random codes have rate < 2I (X ; Y ) b/2D, then there are M = 2N/2 codewords, so by the union bound the total probability of any incorrect codeword being -typical is upperbounded by Pr{Y -typical for any incorrect X} (M 1)2N (I (X ;Y )()) < 2N (I (X ;Y )/2()) . If < 2I (X ; Y ) and is small enough, then the exponent will be positive and this probability will go to zero as N . Thus we have proved the forward part of the capacity theorem: the probability of any kind of error with Shannons random code ensemble and this variant of typical-set decoding goes to zero as N , in fact exponentially with N .

3.3

Geometric interpretation and converse

For AWGN channels, the channel capacity theorem has a nice geometric interpretation in terms of the geometry of spheres in real Euclidean N -space RN . By any law of large numbers, the probability that the squared Euclidean norm ||X||2 of a random sequence X of iid Gaussian variables of mean zero and variance Sx per symbol falls in the range N (Sx ) goes to 1 as N , for any > 0. Geometrically, the typical region T = {x RN | N (Sx ) ||x||2 N (Sx + )} is a spherical shell with outer squared radius N (Sx + ) and inner squared radius N (Sx ). Thus the random N -vector X will almost surely lie in the spherical shell T as N . This phenomenon is known as sphere hardening. Moreover, the pdf pX (x) within the spherical shell T is approximately uniform, as we expect from the asymptotic equipartition principle (AEP). Since pX (x) = (2Sx )N/2 exp ||x||2 /2Sx , within T we have (2eSx )N/2 e(N/2)(/Sx ) pX (x) (2eSx )N/2 e(N/2)(/Sx ) . Moreover, the fact that pX (x) (2eSx )N/2 implies that the volume of T is approximately |T | (2eSx )N/2 . More precisely, we have 1 (N ) pX (x) dx 1, where (N ) 0 as N . Since |T | =
T T

dx, we have

1 (2eSx )N/2 e(N/2)(/Sx ) |T | |T | (2eSx )N/2 e(N/2)(/Sx ) ; 1 (N ) (2eSx )N/2 e(N/2)(/Sx ) |T | |T | (1 (N ))(2eSx )N/2 e(N/2)(/Sx ) . Since these bounds hold for any > 0, this implies that
N

lim

1 log |T | = log 2eSx = H(X ), N 2

where H(X ) = 1 2 log 2eSx denotes the dierential entropy of a Gaussian random variable with mean zero and variance Sx .

3.3. GEOMETRIC INTERPRETATION AND CONVERSE

We should note at this point that practically all of the volume of an N -sphere of squared radius N (Sx + ) lies within the spherical shell |T | as N , for any > 0. By dimensional analysis, the volume of an N -sphere of radius r must be given by AN rN for some constant AN that does not depend on r. Thus the ratio of the volume of an N -sphere of squared radius N (Sx ) to that of an N -sphere of squared radius N (Sx + ) must satisfy Sx N/2 AN (N (Sx ))N/2 0 as N , for any > 0. = Sx + AN (N (Sx + ))N/2 It follows that the volume of an N -sphere of squared radius N Sx is also approximated by eN H(X ) = (2eSx )N/2 as N . Exercise 5. In Exercise 4 of Chapter 1, the volume of an N -sphere of radius r was given as V (N, r) = (r2 )N/2 , (N/2)!

for N even. In other words, AN = N/2 /((N/2)!). Using Stirlings approximation, m! (m/e)m as m , show that this exact expression leads to the same asymptotic approximation for V (N, r) as was obtained above by use of the asymptotic equipartition principle. The sphere-hardening phenomenon may seem somewhat bizarre, but even more unexpected phenomena occur when we code for the AWGN channel using Shannons random code ensemble. In this case, each randomly chosen transmitted N -vector X will almost surely lie in a spherical shell TX of squared radius N Sx , and the random received N -vector Y will almost surely lie in a spherical shell TY of squared radius N Sy , where Sy = Sx + Sn . Moreover, given the correct transmitted codeword c0 , the random received vector Y will almost surely lie in a spherical shell T (c0 ) of squared radius N Sn centered on c0 . A further consequence of the AEP is that almost all of the volume of this nonzero-mean shell, whose center c0 has squared Euclidean norm ||c0 ||2 N Sx , lies in the zero-mean shell TY whose squared radius is N Sy , since the expected squared Euclidean norm of Y = c0 + N is EN [||Y||2 ] = ||c0 ||2 + N Sn N Sy . Curiouser and curiouser, said Alice. We thus obtain the following geometrical picture. We choose M = 2N/2 code vectors at random according to a zero-mean Gaussian distribution with variance Sx , which almost surely puts them within the shell TX of squared radius N Sx . Considering the probable eects of a random noise sequence N distributed according to a zero-mean Gaussian distribution with variance Sn , we can dene for each code vector ci a typical region T (ci ) of volume |T (ci )| (2eSn )N/2 , which falls almost entirely within the shell TY of volume |TY | (2eSy )N/2 . Now if a particular code vector c0 is sent, then the probability that the received vector y will fall in the typical region T (c0 ) is nearly 1. On the other hand, the probability that y will fall in the typical region T (ci ) of some other independently-chosen code vector ci is approximately equal to the ratio |T (ci )|/|TY | of the volume of T (ci ) to that of the entire shell, since if y is generated according to py (y) independently of ci , then it will be approximately uniformly distributed over TY . Thus this probability is approximately N/2 Sn (2eSn )N/2 |T (ci )| = . Pr{Y typical for ci } N/ 2 |TY | Sy (2eSy ) As we have seen in earlier sections, this argument may be made precise.

CHAPTER 3. CAPACITY OF AWGN CHANNELS

It follows then that if < log2 (1 + Sx /Sn ) b/2D, or equivalently M = 2N/2 < (Sy /Sn )N/2 , then the probability that Y is typical with respect to any of the M 1 incorrect codewords is very small, which proves the forward part of the channel capacity theorem. On the other hand, it is clear from this geometric argument that if > log2 (1 + Sx /Sn ) b/2D, or equivalently M = 2N/2 > (Sy /Sn )N/2 , then the probability of decoding error must be large. For the error probability to be small, the decision region for each code vector ci must include almost all of its typical region T (ci ). If the volume of the M = 2N/2 typical regions exceeds the volume of TY , then this is impossible. Thus in order to have small error probability we must have Sy Sx = log2 (1 + ) b/2D. 2N/2 (2eSn )N/2 (2eSy )N/2 log2 Sn Sn This argument may also be made precise, and is the converse to the channel capacity theorem. In conclusion, we obtain the following picture of a capacity-achieving code. Let TY be the N -shell of squared radius N Sy , which is almost the same thing as the N -sphere of squared radius N Sy . A capacity-achieving code consists of the centers ci of M typical regions T (ci ), where ||ci ||2 N Sx and each region T (ci ) consists of an N -shell of squared radius N Sn centered on ci , which is almost the same thing as an N -sphere of squared radius N Sx . As Sx ) b/2D, these regions T (ci ) form an almost disjoint partition of TY . C[b/2D] = log2 (1 + S n This picture is illustrated in Figure 2.
'$ nn nn &%

Figure 2. Packing (Sy /Sn )N/2 typical regions T (ci ) of squared radius N Sn into a large typical region TY of squared radius N Sy .

3.3.1

Discussion

It is natural in view of the above picture to frame the problem of coding for the AWGN channel as a sphere-packing problem. In other words, we might expect that a capacity-achieving code basically induces a disjoint partition of an N -sphere of squared radius N Sy into about (Sy /Sn )N/2 disjoint decision regions, such that each decision region includes the sphere of squared radius N Sn about its center. However, it can be shown by geometric arguments that such a disjoint partition is impossible as the code rate approaches capacity. What then is wrong with the sphere-packing approach? The subtle distinction that makes all the dierence is that Shannons probabilistic approach does not require decision regions to be disjoint, but merely probabilistically almost disjoint. So the solution to Shannons coding problem involves what might be called soft sphere-packing. We will see that hard sphere-packing i.e., maximizing the minimum distance between code vectors subject to a constraint on average energy is a reasonable approach for moderate-size codes at rates not too near to capacity. However, to obtain reliable transmission at rates near capacity, we will need to consider probabilistic codes and decoding algorithms that follow more closely the spirit of Shannons original work.

DC Full Handwritten Notes @vtudeveloper - in
No ratings yet
DC Full Handwritten Notes @vtudeveloper - in
305 pages
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
No ratings yet
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
3 pages
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
100% (1)
Notes For EE 229A: Information and Coding Theory UC Berkeley Fall 2020
70 pages
Coding 515
No ratings yet
Coding 515
92 pages
Shannon Capacity Theorem
No ratings yet
Shannon Capacity Theorem
15 pages
uRLLC Rate
No ratings yet
uRLLC Rate
53 pages
Week 4 - Channel Capacity (Chapter 7) and Differential Entropy (Chapter 8)
No ratings yet
Week 4 - Channel Capacity (Chapter 7) and Differential Entropy (Chapter 8)
16 pages
Electrical Engineering 229A Lecture Notes Information Theory and Coding
No ratings yet
Electrical Engineering 229A Lecture Notes Information Theory and Coding
117 pages
Third Order Analysis of Channel Coding in The Small To Moderate Deviations Regime Accepted
No ratings yet
Third Order Analysis of Channel Coding in The Small To Moderate Deviations Regime Accepted
31 pages
Information Theory For Electrical Engineers
No ratings yet
Information Theory For Electrical Engineers
277 pages
Week 3 - AEP (Chapter 3) and Channel Coding (Chapter 7)
No ratings yet
Week 3 - AEP (Chapter 3) and Channel Coding (Chapter 7)
10 pages
Information Theory For Single-User Systems With Arbitrary Statistical Memory
No ratings yet
Information Theory For Single-User Systems With Arbitrary Statistical Memory
111 pages
Dabel Info Theory
No ratings yet
Dabel Info Theory
25 pages
Elements of Information Theory-Chapter1-2
No ratings yet
Elements of Information Theory-Chapter1-2
63 pages
Info Theory Solutions
No ratings yet
Info Theory Solutions
270 pages
Shannon's Theorems: Math and Science Summer Program 2020
No ratings yet
Shannon's Theorems: Math and Science Summer Program 2020
28 pages
CourseNotesEE501 PDF
No ratings yet
CourseNotesEE501 PDF
231 pages
Lecture Notes in Information Theory Volume II
No ratings yet
Lecture Notes in Information Theory Volume II
293 pages
Entropy
No ratings yet
Entropy
9 pages
Formulaire PDC Final
No ratings yet
Formulaire PDC Final
4 pages
Lecture 11
No ratings yet
Lecture 11
5 pages
Grover 221210109
No ratings yet
Grover 221210109
5 pages
1 - Practical Guide For Kaggle Competitions
No ratings yet
1 - Practical Guide For Kaggle Competitions
39 pages
Information Theory 2
No ratings yet
Information Theory 2
41 pages
Asymptotic Equipartition Property of Output When Rate Is Above Capacity
No ratings yet
Asymptotic Equipartition Property of Output When Rate Is Above Capacity
23 pages
Project Report: On Data Encryption & Security Using Image Processing
50% (2)
Project Report: On Data Encryption & Security Using Image Processing
34 pages
ETN642 Lec8 Ch8 Handouts
No ratings yet
ETN642 Lec8 Ch8 Handouts
12 pages
07-Channel Coding-The Road To Channel Capacity
No ratings yet
07-Channel Coding-The Road To Channel Capacity
28 pages
Channel Capacity
No ratings yet
Channel Capacity
51 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
Proving Shannon's Second Theorem
No ratings yet
Proving Shannon's Second Theorem
18 pages
6441 Lecture 17
No ratings yet
6441 Lecture 17
17 pages
An Introduction To Information Theory: Adrish Banerjee
No ratings yet
An Introduction To Information Theory: Adrish Banerjee
6 pages
15ec54 PDF
No ratings yet
15ec54 PDF
56 pages
ETN3046 Chapter 6
No ratings yet
ETN3046 Chapter 6
31 pages
Digital Communication Chapter 3
No ratings yet
Digital Communication Chapter 3
37 pages
Channel Cap
No ratings yet
Channel Cap
9 pages
(T) $ B (T) - ML (T) 1 DT + Z Akl
No ratings yet
(T) $ B (T) - ML (T) 1 DT + Z Akl
10 pages
(Ebook PDF) Data Analysis and Decision Making 4Th Edition
No ratings yet
(Ebook PDF) Data Analysis and Decision Making 4Th Edition
42 pages
Noise, Information Theory, and Entropy
No ratings yet
Noise, Information Theory, and Entropy
34 pages
Channel Capacity: 1 Preliminaries and Definitions
No ratings yet
Channel Capacity: 1 Preliminaries and Definitions
5 pages
Finite Blocklength Coding For Channels With Side Information at The Receiver
No ratings yet
Finite Blocklength Coding For Channels With Side Information at The Receiver
5 pages
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
No ratings yet
The Information Theory: C.E. Shannon, A Mathematical Theory of Communication'
43 pages
NLP and Generative AI Syllabus - 2025
No ratings yet
NLP and Generative AI Syllabus - 2025
5 pages
Digital Communication Chapter 3
No ratings yet
Digital Communication Chapter 3
37 pages
Cdma Capacity
No ratings yet
Cdma Capacity
3 pages
Notes It
No ratings yet
Notes It
46 pages
BCS 2015-16
No ratings yet
BCS 2015-16
29 pages
Finite Elements in Analysis & Design: Abhishek Arora, Benjamin M. Ward, Caglar Oskay
No ratings yet
Finite Elements in Analysis & Design: Abhishek Arora, Benjamin M. Ward, Caglar Oskay
25 pages
Unit 2 - 1
No ratings yet
Unit 2 - 1
19 pages
Rohini 15720602071
No ratings yet
Rohini 15720602071
2 pages
Applications of Error-Control Coding
No ratings yet
Applications of Error-Control Coding
30 pages
Information Theory
No ratings yet
Information Theory
97 pages
Information and Digital Transmission: Haykin Chapter 9 Carlson Chapter 16
No ratings yet
Information and Digital Transmission: Haykin Chapter 9 Carlson Chapter 16
27 pages
MATHESH Matlab Final Output
No ratings yet
MATHESH Matlab Final Output
19 pages
ECE 771 Lecture 10 - The Gaussian Channel
No ratings yet
ECE 771 Lecture 10 - The Gaussian Channel
9 pages
T4 NoiseAndMutualInformation
No ratings yet
T4 NoiseAndMutualInformation
8 pages
Dirlik Examples
No ratings yet
Dirlik Examples
13 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Channel Capacity and The Channel Coding Theorem, Part I: Information Theory 2013
No ratings yet
Channel Capacity and The Channel Coding Theorem, Part I: Information Theory 2013
17 pages
Introduction To Information Theory Channel Capacity and Models
No ratings yet
Introduction To Information Theory Channel Capacity and Models
36 pages
Informationtheory Ii: Model Answers To Exercise 4 of March 24, 2010
No ratings yet
Informationtheory Ii: Model Answers To Exercise 4 of March 24, 2010
4 pages
Control Syste1
No ratings yet
Control Syste1
26 pages
Additive White Gaussian Noise
No ratings yet
Additive White Gaussian Noise
6 pages
Channel Capacity PDF
No ratings yet
Channel Capacity PDF
4 pages
A Mathematical Theory of Communication: by C.E.Shannon Presented by Ling Shi
No ratings yet
A Mathematical Theory of Communication: by C.E.Shannon Presented by Ling Shi
10 pages
Quantum Computing
No ratings yet
Quantum Computing
122 pages
MMW
No ratings yet
MMW
3 pages
Lecture1 Slides
No ratings yet
Lecture1 Slides
10 pages
A Survey On Post-Quantum Cryptography For 5G6G Communications - v1.2 (Cleared)
No ratings yet
A Survey On Post-Quantum Cryptography For 5G6G Communications - v1.2 (Cleared)
6 pages
MCMidterm SIM 2021 MCQ Questions Answers
No ratings yet
MCMidterm SIM 2021 MCQ Questions Answers
5 pages
Econometrics Work-Sheet, Fikadu
No ratings yet
Econometrics Work-Sheet, Fikadu
3 pages
RiskManagement B00246928
No ratings yet
RiskManagement B00246928
8 pages
A Little Magic Means A Lot
No ratings yet
A Little Magic Means A Lot
29 pages
Shannon Source Coding Theorem
No ratings yet
Shannon Source Coding Theorem
3 pages
Spectral Density
No ratings yet
Spectral Density
27 pages
Pdsa Ga4
No ratings yet
Pdsa Ga4
3 pages
Understanding Absence Quota
No ratings yet
Understanding Absence Quota
14 pages
E-Commerce With Digital Signature
No ratings yet
E-Commerce With Digital Signature
18 pages
Absolute/Global Extrema: Maxima and Minima of A Function of One Variable
No ratings yet
Absolute/Global Extrema: Maxima and Minima of A Function of One Variable
3 pages
The Multinomial Logit Model For Nominal Response Data: James J. Dignam
No ratings yet
The Multinomial Logit Model For Nominal Response Data: James J. Dignam
19 pages
Regula Falsi PDF
No ratings yet
Regula Falsi PDF
8 pages
Assignment 1 - PS9
No ratings yet
Assignment 1 - PS9
3 pages
Additive White Gaussian Noise
No ratings yet
Additive White Gaussian Noise
7 pages
Low Variance Sampling Techniques For Particle Filter
No ratings yet
Low Variance Sampling Techniques For Particle Filter
7 pages
04 - Absolute Extrema
No ratings yet
04 - Absolute Extrema
4 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
2 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)