Noisy Channel Theorem
Noisy Channel Theorem
Noisy Channel Theorem
1
Shannon figured this all out. His theorem asserts that if source entropy
is less than the channel capacity, then -reliable communication is possible for
every > 0. But if the source entropy is greater than the channel capacity, then
there is a positive lower limit to the reliability that is possible.
Joint Probabilities
sent \ received y1 y2 Marginal X prob. (Source prob)
x1 0.1 0.3 0.4
x2 0.4 0.2 0.6
Marginal Y 0.5 0.5
P (xi & yj )
P (yj |xi ) = .
P (xi )
Transition Probabilities
sent \ received y1 y2 Row sum
x1 0.25 0.75 1
x2 0.33 0.67 1
Summarizing: From a joint probability distribution on X Y we found
both source probabilities for X and transition probabilities from X to Y . The
process is reversible, which is very important for the sequel.. If you begin with
just source and transition probabilities, then you can easily construct a full
table of joint probabilities. Just multiply each row of the transition table by its
corresponding source probability. The joint probability table is constructed from
information about both the source (the source probabilities) and the channel
(the transition probabilities). From the joint probabilities you can find marginal
2
probabilities for the received variables. For the same channel, different sources
lead to different joint probabilities.
Given either a joint probability model on X Y OR (equivalently) a source
S and a channel Ch, we define the mutual information
P (yj |xi )
I(xi , yj ) = log2 .
P (yj )
The mutual information is the ratio of a conditional probability and a marginal
probability for the same quantity yj . In terms of sources and channels, its the
ratio of a transition probability to yj (from xi ) and the probability that yj will
be received when a random word is transmitted from the source.
In our example we have
Mutual Information I(xi , yj ).
sent \ received y1 y2
When yj is more likely to occur after xi than it is overall, the mutual informa-
tion is positive. The occurrence of xi indicates an increased probability of yj .
When yj is less likely to occur after xi than it is overall, the mutual information
is negative. The occurrence of xi indicates a decreased probability of yj . If the
mutual information is zero, I(xi , yj ) = 0, then P (yj |xi ) = P (yj ), so informa-
tion about the occurrence of xi gives no information enabling us to revise our
estimate of the probability of yj .
Mutual information is symmetric in the two variables: I(xi , yj ) = I(yj , xi ).
Indeed, since
P (yj |xi )P (xi ) = P (xi & yj ) = P (xi |yj )P (yj )
we have
P (yj |xi ) P (xi |yj )
= .
P (yj ) P (xi )
Finally, we can define the Average Mutual Information of a joint distri-
bution on X Y (equivalently of a channel specified by transition probabilities
P (yj |xi ) together with a source specified by its word probabilities P (xi )) to be
the expected value of the mutual information.
M X
X N
I(X, Y ) = P (xi & yj )I(xi , yj ).
i=1 j=1
3
1.3 Capacity of a Binary Symmetric Channel
Main Example.
Our goal is to compute the channel capacity of a binary symmetric channel,
Ch(), with cross-over probability .
We work with a source, S(p), with words {0, 1} and probabilities P (1) = p
and P (0) = q = 1 p.
Joint Probabilities
Binary symmetric channel ()
Source (p, q)
0 p(1 ) p p
1 q q(1 ) q
Mutual Information
Binary symmetric channel ()
Source (p, q)
sent\received 0 1
1
0 log2 p(1)+q log2 p+q(1)
.
1
1 log2 p(1)+q log2 p+q(1)
The average mutual information for the source and binary symmetric chan-
nel, obtained from the joint probabilities and the element-level mutual informa-
tion, is
1
I(, p) = p(1 ) log2 + p log2
p(1 ) + q p + q(1 )
1
+ q log2 + q(1 ) log2
p(1 ) + q p + q(1 )
= (1 ) log2 + log2 (p(1 ) + q) log2 (p(1 ) + q)
(p + q(1 )) log2 (p + q(1 )).
4
among all sources we choose one for which the channel has the greatest average
mutual information, that maximal value is a property of the channel alone.
Definition The Capacity of a discrete memoryless channel is the average
mutual information when computed with the source that gives greatest such
average.
To find the capacity of the binary symmetric channel, we have to maximize
I(, p) with respect to p, on the domain 0 p 1. Its a calculus problem.
Differentiating with respect to p (remembering that q = 1 p) and equating
the result to zero shows that maximum average mutual information occurs with
p = q = 1/2. Hence
1 1 1 1
Channel capacity = (1 ) log2 + log2 log2 log2
2 2 2 2
= 1 + log2 + (1 ) log2 (1 ).
The capacity of the channel depends on the cross-over probability . If
the cross-over error rate is 50%, then the channel has no capacity to transmit
information reliably. The lower the cross-over error rate, the more efficiently
information can be sent. If there = 0, so there is no error at all, then it takes
just 1 bit through the channel to transmit 1 bit of information.
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
-
5
The hope is to take a word stream from the source, encode it is some fancy
way so that after being sent through the noisy channel we can detect and correct
as many transmission errors as possible and then decode, in the end producing
a message that is virtually error free, nearly identical with the word stream
produced by the source. Can we do this?
Theorem (Shannons noisy coding theorem, aka channel coding theorem)
Given a memoryless source and a discrete memoryless channel:
1. If the source entropy is less than the channel capacity, then the error
probability can be reduced to any desired level by using a sufficiently
complex encoder and decoder. There exist codes that can do the job.
2. If the source entropy is greater than the channel capacity, arbitrarily small
error probability cannot be achieved. There is a limit to how effective a
code can be.
In the cases where the noisy coding theorem asserts existence of good codes,
the proof of the theorem gives no indication of how to create these codes. The
theorem is a pure existence theorem. Moreover, the theorem does not claim
that the codes with these minimal error rates are easy to implement. There
has to be some good underlying structure to the code in order for encoding and
decoding to be efficient.