0% found this document useful (0 votes)
47 views11 pages

Different Integration Formulas

This document discusses examples of channel capacity for different types of channels: 1) A noiseless binary channel with a capacity of 1 bit. 2) A noisy channel with nonoverlapping outputs also has a capacity of 1 bit. 3) A noisy typewriter channel has a capacity of log 13 bits by using every alternate input symbol. 4) A binary symmetric channel has a capacity of 1 - H(p) bits, where p is the probability of input complementation. 5) A binary erasure channel has a capacity of 1 - α bits, where α is the fraction of bits erased.

Uploaded by

noman2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views11 pages

Different Integration Formulas

This document discusses examples of channel capacity for different types of channels: 1) A noiseless binary channel with a capacity of 1 bit. 2) A noisy channel with nonoverlapping outputs also has a capacity of 1 bit. 3) A noisy typewriter channel has a capacity of log 13 bits by using every alternate input symbol. 4) A binary symmetric channel has a capacity of 1 - H(p) bits, where p is the probability of input complementation. 5) A binary erasure channel has a capacity of 1 - α bits, where α is the fraction of bits erased.

Uploaded by

noman2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Tutorial 3

1 Examples of Channel Capacity

1. Noiseless Binary Channel


Suppose that we have a channel whose the binary input is reproduced exactly at the output as shown in
Fig. 1.

Figure 1: Noiseless Binary Channel

In this case, any transmitted bit is received without error. Hence, one error-free bit can be transmitted per
use of the channel, and the capacity is 1 bit. We can also calculate the information capacity as follows.

C = max I(X; Y ) = max H(X) − H(X|Y ) = 1 bit.


p(x) p(x)

which is achieved by using p(x) = ( 21 , 21 ).

2. Noisy Channel with Nonoverlapping Outputs


This channel has two possible outputs corresponding to each of the two inputs as shown in Fig. 2.

Figure 2: Noisy Channel with Nonoverlapping Outputs

The channel appears to be noisy, but really is not. Even though the output of the channel is a random
consequence of the input, the input can be determined from the output, and hence every transmitted bit

1
can be recovered without error. The capacity of this channel is also 1 bit per transmission. We can also
calculate the information capacity as follows.

C = max I(X; Y ) = max H(X) − H(X|Y ) = 1 bit.


p(x) p(x)

3. Noisy Typewriter
1
In this case the channel input is either received unchanged at the output with probability 2 or is
transformed into the next letter with probability 12 as shown in Fig 3.

Figure 3: Noisy Typewriter

If the input has 26 symbols and we use every alternate input symbol, we can transmit one of 13 symbols
without error with each transmission. Hence, the capacity of this channel is log 13 bits per transmission.
We can also calculate the information capacity as follows.

C = max I(X; Y ) = max H(Y ) − H(Y |X) = log 26 − 1 = log 13


p(x) p(x)

2
which is achieved by using p(x) distributed uniformly over all the inputs.

4. Binary Symmetric Channel


Consider the binary symmetric channel (BSC), which is shown in Fig. 4. This is a binary channel in which
the input symbols are complemented with probability p.

Figure 4: Binary Symmetric Channel

We bound the mutual information by

I(X; Y ) = H(Y ) − H(Y |X)



= H(Y ) − p(x)H(Y |X = x)

= H(Y ) − p(x)H(p)
= H(Y ) − H(p)
≤ 1 − H(p)

where the last inequality follows because Y is a binary random variable. Equality is achieved when
the input distribution is uniform. Hence, the information capacity of a binary symmetric channel with
parameter p is

C = 1 − H(p) bits.

5. Binary Erasure Channel


The analog of the binary symmetric channel in which some bits are lost (rather than corrupted) is the
binary erasure channel. In this channel, a fraction α of the bits are erased. The receiver knows which bits
have been erased. The binary erasure channel has two inputs and three outputs as shown in Fig. 5.
We calculate the capacity of the binary erasure channel as follows:

C = max I(X; Y )
p(x)

= max H(Y ) − H(Y |X)


p(x)

= max H(Y ) − H(α)


p(x)

The first guess for the maximum of H(Y ) would be log 3, but we cannot achieve this by any choice of
input distribution p(x). Letting E be the event Y = e, using the expansion

H(Y ) = H(Y, E) = H(E) + H(Y |E)

3
Figure 5: Binary Erasure Channel

and letting P (X = 1) = π, we have

H(Y ) = H((1 − π)(1 − α), α, π(1 − α))


= −(1 − π)(1 − α) log((1 − π)(1 − α)) − α log α − π(1 − α) log(π(1 − α))
= (1 − α)(−(1 − π) log(1 − π) − π log π) + (−α log α − (1 − α) log(1 − α))
= (1 − α)H(π) + H(α)

Hence

C = max H(Y ) − H(α)


p(x)

= max(1 − α)H(π) + H(α) − H(α)


p(x)

= max(1 − α)H(π)
p(x)

=1−α

where capacity is achieved by π = 12 .

2 Exercises

1. An additive noise channel.


Find the channel capacity of the following discrete memoryless channel:

4
where P r{Z = 0} = P r{Z = a} = 21 . The alphabet for x is X = {0, 1}. Assume the at Z is
independent of X. Observe that the channel capacity depends on the value of a.
Solution:
The channel can be modeled as follows.

Y =X +Z X ∈ {0, 1}, Z ∈ {0, a}

We have to distinguish various cases depending on the values of a.


a = 0. In this case, Y = X, and max I(X; Y ) = max H(X) = 1. Hence the capacity is 1 bit per
transmission.
a ̸= 1, ±1. In this case, Y has four possible values 0, 1, a and 1 + a. Knowing Y , we know the X which
was sent, and hence H(X|Y ) = 0. Hence max I(X; Y ) = max H(X) = 1, achieved for an uniform
distribution on the input X.
a = 1. In this case Y has three possible output values, 0, 1 and 2, and the channel is identical to
the binary erasure channel with a = 1/2. Hence, the capacity of this channel is 1 − a = 1/2 bit per
transmission.
a = −1. This is similar to the case when a = 1 and the capacity is also 1/2 bit per transmission. 

2. Channels with memory have higher capacity.


Consider a binary symmetric channel with Yi = Xi ⊕Zi , where ⊕ is mod 2 addition, and Xi , Yi ∈ {0, 1}.
Suppose that {Zi } has constant marginal probabilities P r{Zi = 1} = p = 1 − P r{Zi = 0}, but that
Z1 , Z2 , · · · , Zn are not necessarily independent. Assume that Z n is independent of the input X n . Let
C = 1 − H(p, 1 − p). Show that max I(X1 , X2 , · · · , Xn ; Y1 , Y2 , · · · , Yn ) ≥ nC.
p(x1 ,x2 ,··· ,xn )
Solution:
Since

Yi = Xi ⊕ Zi ,

where
{
1 with probability p
Zi =
0 with probability 1 − p
and Zi are not independent,

I(X1 , X2 , · · · , Xn ; Y1 , Y2 , · · · , Yn ) = H(X1 , X2 , · · · , Xn ) − H(X1 , X2 , · · · , Xn |Y1 , Y2 , · · · , Yn )


= H(X1 , X2 , · · · , Xn ) − H(Z1 , Z2 , · · · , Zn |Y1 , Y2 , · · · , Yn )
≥ H(X1 , X2 , · · · , Xn ) − H(Z1 , Z2 , · · · , Zn )

n
= H(X1 , X2 , · · · , Xn ) − H(Zi |Z i−1 )
i=1
∑n
≥ H(X1 , X2 , · · · , Xn ) − H(Zi )
i=1

= H(X1 , X2 , · · · , Xn ) − nH(p)
= n − nH(p)

5
(1)
if X1 , X2 , · · · , Xn are chosen i.i.d. ∼ Bern 2 . The capacity of the channel with memory over n uses
of the channel is

nC (n) = max I(X1 , X2 , · · · , Xn ; Y1 , Y2 , · · · , Yn )


p(x1 ,x2 ,··· ,xn )

≥ I(X1 , X2 , · · · , Xn ; Y1 , Y2 , · · · , Yn )|p(x1 ,x2 ,··· ,xn )=Bern( 1 )


2

≥ n(1 − H(p))
= nC

Hence channels with memory have higher capacity. 


Remark: The intuitive explanation for this result is that the correlation between the noise decreases the
effective noise; one could use the information from the past samples of the noise to combat the present
noise.

3. Consider the discrete memoryless channel Y = X + Z(mod11), where


( )
1, 2, 3
Z= 1 1 1
3, 3, 3

and X ∈ {0, 1, · · · , 10}. Assume that Z is independent of X.


(a) Find the capacity.
(b) What is the maximizing p∗ (x)?
Solution:

Y = X + Z(mod11)

where


 1 with probability
1
3
Z= 1
2 with probability


3
1
3 with probability 3

In this case,

H(Y |X) = H(Z|X) = H(Z) = log 3,

since Z is independent X, and hence the capacity of the channel is

C = max I(X; Y )
p(x)

= max H(Y ) − H(Y |X)


p(x)

= max H(Y ) − log 3


p(x)

= log 11 − log 3

which is attained when Y has a uniform distribution, which occurs (by symmetry) when X has a uniform
distribution.

6
(a) The capacity of the channel is log 11
3 bit/transmission.

(b) The capacity is achieved by an uniform distribution on the inputs. p(X = i) = 1


11 for i = 0, 1, · · · , 10.


4. Using two channel at once.


Consider two discrete memoryless channels (X1 , p(y1 |x1 ), Y1 ) and (X2 , p(y2 |x2 ), Y2 ) with capacities C1
and C2 respectively. A new channel (X1 × X2 , p(y1 |x1 ) × p(y2 |x2 ), Y1 × Y2 ) is formed in which x1 ∈ X1
and x2 ∈ X2 , are simultaneously sent, resulting in y1 , y2 . Find the capacity of this channel.
Solution:
Suppose we are given two channels, (X1 , p(y1 |x1 ), Y1 ) and (X2 , p(y2 |x2 ), Y2 ), which we can use at
the same time. We can define the product channel as the channel, (X1 × X2 , p(y1 , y2 |x1 , x2 ) =
p(y1 |x1 )p(y2 |x2 ), Y1 × Y2 ). To find the capacity of the product channel, we must find the distribution
p(x1 , x2 ) on the input alphabet X1 × X2 that maximizes I(X1 , X2 ; Y1 , Y2 ). Since the joint distribution

p(x1 , x2 , y1 , y2 ) = p(x1 , x2 )p(y1 |x1 )p(y2 |x2 )

Y1 → X1 → X2 → Y2 forms a Markov chain and therefore

I(X1 , X2 ; Y1 , Y2 ) = H(Y1 , Y2 ) − H(Y1 , Y2 |X1 , X2 )


= H(Y1 , Y2 ) − H(Y1 |X1 , X2 ) − H(Y2 |X1 , X2 , Y1 )
= H(Y1 , Y2 ) − H(Y1 |X1 ) − H(Y2 |X2 )
= H(Y1 ) + H(Y2 |Y1 ) − H(Y1 |X1 ) − H(Y2 |X2 )
≤ H(Y1 ) + H(Y2 ) − H(Y1 |X1 ) − H(Y2 |X2 )
= I(X1 ; Y1 ) + I(X2 ; Y2 ),

Equality holds if Y1 and Y2 are independent, namely, X1 and X2 are independent. Hence,

C = max I(X1 , X2 ; Y1 , Y2 )
p(x1 ,x2 )

≤ max I(X1 ; Y1 ) + I(X2 ; Y2 )


p(x1 ,x2 )

= max I(X1 ; Y1 ) + max I(X2 ; Y2 )


p(x1 ) p(x2 )

= C1 + C2

with equality iff p(x1 , x2 ) = p∗ (x1 )p∗ (x2 ) and p∗ (x1 ) and p∗ (x2 ) are the distributions that maximize
C1 and C2 respectively. 

5. The Z channel.
The Z-channel has binary input and output alphabets and transition probabilities p(y|x) given by the
following matrix:
[ ]
1 0
Q= x, y ∈ {0, 1}
1/2 1/2

7
Find the capacity of the Z-channel and the maximizing input probability distribution.
Solution:
First we express I(X; Y ), the mutual information between the input and output of the Z-channel, as a
funtion of x = P r(X = 1):

H(Y |X) = P r(X = 0)H(Y |X = 0) + P r(X = 1)H(Y |X = 1)


= P r(X = 0) · 0 + P r(X = 1) · 1
=x
H(Y ) = H(P r(Y = 1)) = H(x/2)
I(X; Y ) = H(Y ) − H(Y |X) = H(x/2) − x

Since I(X; Y ) = 0 when x = 0 and x = 1, the maximum mutual information is obtained for some value
of x such that 0 < x < 1.
Using elementary calculus, we determine that
d 1 1 − x/2
I(X; Y ) = log − 1,
dx 2 x/2
which is equal to zero for x = 2/5. (It is reasonable that P r(X = 1) < 1/2 because X = 1 is the noisy
input to the channel.) So the capacity of the Z-channel in bits is H(1/5) − 2/5 = 0.722 − 0.4 = 0.322.


6. Time-varying channels.
Consider a time-varying discrete memoryless channel. Let Y1 , Y2 , · · · , Yn be conditionally independent

given X1 , X2 , · · · , Xn , with conditional distribution given by p(y|x) = ni=1 pi (yi |xi ). Let X =

(X1 , X2 , · · · , Xn ), Y = (Y1 , Y2 , · · · , Yn ). Find maxp(x) I(X; Y ).


Solution:
We can use the same chain of the inequalities as in the proof of the converse of the channel coding
theorem. Hence

I(X n ; Y n ) = H(Y n ) − H(Y n |X n )



n
= H(Y n ) − H(Yi |Y1 , · · · , Yi−1 , X n )
i=1
∑n
= H(Y n ) − H(Yi |Xi ),
i=1

8
since by the definition of the channel, Yi depends only on Xi and is conditionally independent of
everything else. Continuing the series of inequalities, we have

n
I(X ; Y ) = H(Y ) −
n n n
H(Yi |Xi )
i=1

n ∑
n
= H(Yi |Y i−1
)− H(Yi |Xi )
i=1 i=1
∑n ∑
n
≤ H(Yi ) − H(Yi |Xi )
i=1 i=1
∑n
≤ (1 − H(pi ))
i=1

with equality if X1 , X2 , · · · , Xn is chosen i.i.d. ∼ Bern(1/2). Hence



n
max I(X1 , X2 , · · · , Xn ; Y1 , Y2 , · · · , Yn ) = (1 − H(pi )).
p(x)
i=1

7. Unused symbols.
Show that the capacity of the channel with probability transition matrix
 
2/3 1/3 0
 
Py|x =  1/3 1/3 1/3 
0 1/3 2/3

is achieved by a distribution that places zero probability on one of input symbols. What is the capacity of
this channel?
Solution:
Let the probabilities of the three input symbols be p1 , p2 and p3 . Then the probabilities of the three
output symbols can be easily calculated to be ( 32 p1 + 13 p2 , 13 , 31 p2 + 32 p3 ), and therefore

I(X; Y ) = H(Y ) − H(Y |X)


2 1 1 1 2 2 1
= H( p1 + p2 , , p2 + p3 ) − (p1 + p3 )H( , ) − p2 log 3
3 3 3 3 3 3 3
1 1 1 1 2 1
= H( + (p1 − p3 ), , (p1 − p3 )) − (p1 + p3 )H( , ) − (1 − p1 − p3 ) log 3
3 3 3 3 3 3
where we have substituted p2 = 1 − p1 − p3 . Now if we fix p1 + p3 , then the second and third term are
fixed, and the first term is maximized if p1 − p3 = 0, i.e., if p1 = p3 .
Now setting p1 = p3 , we have
1 1 1 1 2
I(X; Y ) = H( , , ) − (p1 + p3 )H( , ) − (1 − p1 − p3 ) log 3
3 3 3 3 3
1 2
= log 3 − (p1 + p3 )H( , ) − (1 − p1 − p3 ) log 3
3 3
1 2
= (p1 + p3 )(log 3 − H( , ))
3 3

9
which is maximized if p1 + p3 is as large as possible (since log 3 > H( 13 , 32 )). Therefore the maximizing
distribution corresponds to p1 + p3 = 1, p1 = p3 , and therefore (p1 , p2 , p3 ) = ( 21 , 0, 21 ). We have the
capacity of this channel as
1 2 2 2
C = log 3 − H( , ) = log 3 − (log 3 − ) = bits.
3 3 3 3

Remark: The intuitive reason why p2 = 0 for the maximizing distribution is that conditional on the
input being 2, the output is uniformly distributed. The same uniform output distribution can be achieved
without using the symbol 2 (by setting p1 = p3 ), and therefore the use of symbol 2 does not add any
information (it does not change the entropy of the output and the conditional entropy H(Y |X = 2) is
the maximum possible, i.e., log 3 , so any positive probability for symbol 2 will only reduce the mutual
information.
Note that not using a symbol is optimal only if the uniform output distribution can be achieved without
use of that symbol. For example, in the Z channel example above, both symbols are used, even though
one of them gives a conditionally uniform distribution on the output.

8. Erasures and errors in a binary channel.


Consider a channel with binary inputs that has both erasures and errors. Let the probability of error be
ϵ and the probability of erasure be α, so the the channel is as illustrated below: (a) Find the capacity of

this channel.
(b) Specialize to the case of the binary symmetric channel (α = 0).
(c) Specialize to the case of the binary erasure channel (ϵ = 0).
Solution:
(a) As with the examples in the text, we set the input distribution for the two inputs to be π and 1 − π.

10
Then

C = max I(X; Y )
p(x)

= max H(Y ) − H(Y |X)


p(x)

= max H(Y ) − H(1 − ϵ − α, α, ϵ)


p(x)

As in the case of the erasure channel, the maximum value for H(Y ) cannot be log 3, since the probability
of the erasure symbol is α independent of the input distribution. Thus,

H(Y ) = H(π(1 − ϵ − α) + (1 − π)ϵ, α, (1 − π)(1 − ϵ − α) + πϵ)


( )
π + ϵ − πα − 2πϵ 1 − π − ϵ + 2ϵπ − α + απ
= H(α) + (1 − α)H ,
1−α 1−α
≤ H(α) + (1 − α)

π+ϵ−πα−2πϵ
with equality iff 1−α = 12 , which can be achieved by setting π = 21 .
Therefore the capacity of this channel is

C = H(α) + 1 − α − H(1 − α − ϵ, α, ϵ)
1−α−ϵ ϵ
= H(α) + 1 − α − (1 − α)H( , )
( 1 − α ) 1 − α
1−α−ϵ ϵ
= (1 − α) 1 − H( , )
1−α 1−α

(b) Setting α = 0, we get

C = 1 − H(ϵ),

which is the capacity of the binary symmetric channel.


(c) Setting ϵ = 0, we get

C = 1 − α,

which is the capacity of the binary erasure channel. 

11

You might also like