100% found this document useful (1 vote)
119 views12 pages

The Zero Error Capacity of A Noisy Channel

1) The zero error capacity (Co) of a noisy channel is defined as the highest rate of information transmission that can be achieved with zero probability of error. 2) Co is more difficult to calculate than the ordinary capacity (C) of a channel, and leads to unsolved problems. 3) Co is studied for both discrete memoryless channels and channels with feedback. While C is unchanged by feedback, Co may be greater with feedback. 4) Methods are presented for obtaining upper and lower bounds on Co, and for evaluating Co through the use of block codes with no confusion between codewords.

Uploaded by

Seckin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
119 views12 pages

The Zero Error Capacity of A Noisy Channel

1) The zero error capacity (Co) of a noisy channel is defined as the highest rate of information transmission that can be achieved with zero probability of error. 2) Co is more difficult to calculate than the ordinary capacity (C) of a channel, and leads to unsolved problems. 3) Co is studied for both discrete memoryless channels and channels with feedback. While C is unchanged by feedback, Co may be greater with feedback. 4) Methods are presented for obtaining upper and lower bounds on Co, and for evaluating Co through the use of block codes with no confusion between codewords.

Uploaded by

Seckin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

TRE ZERO ERROR CATACITY OF A NOISY CRANNEL

Claude E. Shannon
Bell Telephone Laboratories, Murray Hill, New Jersey
Massachusetts Institute of Technolo,qv, Cambridge, Mass.

Abstract

The zero error capacity Co of a noisy


channel is defined as the least upper bound of
rates at which it is possible to transmit infor-
mation with zero probability of error. var 1 oue
properties of Co are ,studied: upper end lower
bounds and methods of evaluation of Co are given.
Inequalities are obtained for the Co relating to
the @*sumnand "product" of two given channels.
The analogous problem of zero error capacity C,F
for a channel with a feedback link is considered.
It is shown that while the ordinary capacity of
a memoryless channel with feedback is equal to
that of the same channel without feedback, the
zero error capacity may be greater. A solution
is given to the problem of eialuating CoF.

Introduction

The ordinary capacity C of a noisy channel


may be thought of as follows. There exists a
sequence of codes for the channel of. increasing
block length such that the input rate of trans-
mission approaches C and the probability of error
in decoding at the receiving point approaches
zero. Furthermore, this is not true for any
value higher than C. In some situations it may
be of interest to consider. rather than codes
with probability of error auproaching zero, codes Fig. 1
for which the probability -is zero and to
investigate the highest possible rate of trans- may take to be the integers l,Z,...,M) into a
mission (or the least upper bound of these rates) subset of input words of length n will be called
for such codes. This rate, Co, is the main a block code of length n. R = & log M will be
object of investigation of the present Paper. called the inuut rate for this 'model Unless
It is interesting that while Co would appear to otherwise specifi=a code will mean such a
be a simpler property of a channel then C, it is block code. We will, throughout, use natural
in fact more difficult to calculate end leads to logarithms and natural (rather then binary) unite
a number of as yet unsolved problems. of information, since this simplifies the
analytical processes that will be employed.
We shall consider only finite discrete
memoryless channels. Such a channel is specified A decoding system for a block code of
by a finite transition matrix I/pi(j)11 where length n is a method of associating a unique
pi(j) is the probability of input letter i being input message (integer from 1 to M) with each
received as output letter j (i = l,Z,...,a; possible output word of length n, that is, a
j = 1,2,..., b) and $ pi(j) = 1. Equivalently, function from outnut words of length n to the
such a channel maybe represented by a line integers 1 to #. *The Probability-of error for a
diagram such as Fig. 1. code is the,probability when the M input messages
are used each with Probability l/M that the noise
The channel being memorylees means that and the decoding system will lead to an input
successive operations are independent. If the messege different from the one that actually
input letters i and j are used, the probability occurred.
of output letters k and 1 will be pi(k)Pj(l).
A sequence,of input letters will be called an If we have two given channels, it is
input vord, a sequence of output letters an possible to form a single channel from them in
outwt word. A mapping of M messages (which we two natural ways which we call the sum and
product of the two channels. The m of two

‘8

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
channels is the channel formed by using inputs note that any two words have a possible output
from either of the two given channels with the word in common, namely the word consisting of the
same transition probabilities to the set of out- sequence of common output letters when the two in-
put letters consisting of the logical sum of the put vorde are compared letter by letter. Each of
two output alphabets. Thus the sum channel is the two input words has a probability at least
defined by a transition matrix formed by placing l&n of producing this common output word. In
the matrix of one channel below and to the right using the code, the two particular input words
of that for the other channel and filling the will each occur 1 of the time and will cause the
remaining two rectangles with zeros. If p,(j) common output b Enpmin of the time. This output
and llP;(j)/I are the individual matrices, the
can be decoded in only one way. Hence at least
sum has the following matrix: one of these situations leads to an error. This
error, Kln pmin, is assigned to this code word, end
~~(1) . . . p,(r) 0 . . . 0
from the remaining M - 1 code words another pair
.. . . .. is chosen. A source of error to the amount *
.
a piin is assigned in similar fashion to one of
p,(l) . . . it(r) b . . . b these, and this is a disjoint event. Continuing
0 . ..o in this manner, we obtain a total of at least
pi(l) . . . Pi(r') u pn
. . . as probability of error.
. . . Y min
b . . . b pi*(l). . . pL'(r') If it is not true that the input letters
are all adjacent to each other, it is possible to
transmit at a positive rate with zero probability
The product of two channels is the channel of error. The least upper'bound of all rates
whose input alphabet consists of all ordered which can be achieved with zero probability of
pairs (1.i') where i is a letter from the first error will be called the zero error cauac1t.X of
channel alphabet and 1' from the second, whose the channel and denoted by Co. If we let MO(n) be
output alphabet is the similar set of ordered the largest number of words in a code of length n,
pairs of letters from the two individual output no two of which are adjacent, then Co is the least
alphabets and whose transitton probability from upper bound of the numbers 4 log M,(n) when n
(i,il) to Ll,j9 is p,(j) f,'Ll 1. varies through all positive integers.
The sum of channelscorresponds physically
to a situation where either of two channels mey be One might expect that Co would be equal to
used (but not both), a new choice being made for log M,(l), that is, that if we choose the largest
each transmitted letter. The product channel possible set of non-adjacent letters and form all
corresponds to a situation where both channels sequences of these of length n, then this would be
are used each unit of time. It is interesting to the best error free code of length n. This is not,
note that multiplication and addition of channels in general, true, although it holds in many cases,
are both associative and commutative, and that particularly when the number of input letters is
the product distributes over a sum. Thus one can small. The first failure occurs with five input
develop a kind of algebra for channels in which letters vith the channel in Fig. 2. In this
i&isX~ossible to write, for example, a polynomial channel, it is possible to choose at most two non-
adjacent letters, for example 0 and 2. Using
an 9 where the an are non-negative integers a
sequences of these, 00, 02. 20, and 22 we obtain
and X is a channel. We shall not, however, four vords in a code of length two. However, it
investigate here the algebraic properties of this is possible to construct a code of length two with
system. five members no two of 'which are adjacent as
follows: 00, 12, 24, 31. 43. It is readily
The Zero Error Canacity verified that no two of these are adjacent. Thus,
Co for this channel is at least .$ log 5.
In a discrete channel we will say that two
input letters are adjacent if there is an output
letter which can be caused by either of these two.
Thus, i and-j are adjacent if there exists a t
such that both pi(t) and pj(t) do not vanish. In
Fig.'l, a end c are adjacent, while a and d are
not.
If all input letters are adjacent to each
other, any code with more than one vord has a
probability of error at the receiving point
greater than zero. In fact, the probability of
error in decoding words satisfies

PeG+ Piin
where pm n is the smallest (non-vanishing) among
the Pi(J 3 . n is the length of the code and M is
the number of words in the code. To prove this, Fig. 2

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
No method has been found for determining Theorem 1: The zero error capacity Co of
Co for the general discrete channel, and this we a discrete memorylees channel is bounded by the
propose as an interesting unsolved problem in inequalities
coding theory. We shall develop a number of
results which enable one to determine Co in many - log min > Aij PiPj (Co\< min C
special cases, for example, in all channelswith pi ij pi(S)
five or lees input letters vith the single excep-
tion of the channel of Fig. 2 (or channels EP, = 1, Pi20
equivalent in adjacency structure to it). We i
will also develop some general inequalities
enabling one to estimate Co quite closely-in most $Z p,(j) = 1, p,(j) 2 0
cases.

It WV be seen, in the first place, that where C is the capacity of any channel with
the value of Co depends only on which input transition probabilities p,(j) and having the
letters are adjacent to each other. Let us adjacency matrix Aij.
define the adjacency matrix for a channel, Aij,
as follows. The upper bound is fairly obvious. The
zero error capacity is certainly lees than or
equal to the ordinary capacity for any channel
with the same adjacency matrix since the former
1 if input letter 1 is adjacent to j or requires codes with zero probability of error
Aij = ifi=j vhile the latter requires only codes approaching
0 otherwise zero probability of erroi. By minimizing the
i capacity through variation of the pi(j) we find
the lowest upper bound available through this
Suppose two channelshave the same adjacency argument. Since the capacity is a continuous
matrix (possibly after renumbering the input function of the pi(j) in the closed region
letters of one of them). Then it is obvious that defined by pi(j) 2, 0, f p,(j) = 1, we may
a zero error code for one will be a zero error write min instead of greatest lower bound.
code for the other and..hence, that the zero
error capacity Co for one will also apply to the It is worth noting that it is only neces-
other. sary to consider a particular channel in perform-
ing this minimization, although there are an
The adjacency structure contained in the
infinite number with the s&me adjacency matrix.
adjacency matrix can also be represented as a
This one particular channel is obtained as
linear graph. Construct a graph with as many follows from the adjacency matrix. If Aik = 1
vertices as there are input letters, and connect
for a pair 1 k, define an output letter j with
two distinct vertices with a line or branch of
p (j) and p (j) both differing from zero. Now
the graph if the corresponding input letters are
ih there arf any three input letters, say 1 k 1,
adjacent. Two examples are shown in Fig. 3,
all adjacent to each other, define an output
corresponding to the channels of Figs. 1 and 2.
letter, say m, with pi(m) pk(m) pi(m) all dif-
ferent from zero. In the adjacency graph this
corresponds to a complete sub-graph with three
vertices. Next, subsets of four letters or
complete subgraphs of four vertices, say 1 k 1 m,
are given an output letter, each being connected
to it, end so on. It is evident that any channel
with the same adjacency matrix differs from that
just described only by variation in the number of
0 output symbols for some of the pairs. triplets,
e etc., of adjacent input letters. If a channel
d has more than one output symbol for an adjacent
subset of input letters, then its capacity is
reduced by identifying these. If a channel
I contains no element, say for a triplet i k 1 of
adjacent input letters, this will occur as a
special case of our canonical channel which has
0 output letter m for this triplet when pi(m),
2
%(m) and p,(m) all vanish.

The lover bound of the theorem will now be


0 proved. We use the procedure of random codes
4 3 based on probabilities for the letters Pi, these
being chosen to minimize the quadratic form
Fig. 3 A .P P.. Construct en ensemble of codes
> iJ i J
if

10

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
each containirg M words. each word n letters long. is stationary under variation of all non-vanieh-
The words in a code are chosen by the following ing Pi subject to x Pi = 1 end under varia-
stochastic method. Each letter of each word’ is I
chosen independently of all others and is the
letter 1 with probability Pi. We now compute the tion of p,(j) for those pi(j)suchthat Pip,(j)>0
probability in the ensemble that any particular and subject to f Pi(j) = 1.
word is not adjacent to any other word in its
code. The probability that the first letter of
one word is adjacent to the first letter of a 2) The mutual information between input-
second word is since this sums output pairs Iij = lw (pi(j)/ f p$&)) ia
5 AijPiPj *
the cases of adjacency with coefficient 1 end constant, Iij = I, for all ij pairs of non-vanish-
those of non-adjacency with coefficient 0. The
probability that two vorde are adjacent in all ing probability (i.e. pairs for vhich Pipi(j)‘>O).
letters, and therefore adjacent as words, is
( z &,P,P,)n. The probability of non-adja-
AJ AJ A J 3) We have p,(j) = rj a function of j
cency is, therefore 1 - ( 5 AijPiPj)n. The
only whenever P,p,(j)>O; and also > Pi = h,
probability that all N - 1 other-words in a code i&j
are not adjacent to a given word is, since they .a constant independent of j where S is the Set
are chosen independently, j
of input letters that can produce output letter j
M -1 with probability greater than zero. We also have

which is, by a well


[
1 1 ( )
ij
knovn inequality,
AijPiPjP

greater
1 I = log h-l.

The pi(j) and Pi corresponding


maximum and minimum capacity when the pi(j)
to the
are
than 1 - ( H - l)( AijPiPj )n, which in turn varied (keeping, however, any pi(j) that are zero
F; fixed at sero) satisfy l), 2) and 3).
is greater than 1 - M ( x AijPiPj)n. If we
ij Proof: We vi11 show first that 1) and 2)
set H = (1 - P P -n, we then have,
.u 13 are equivalent and then that 2) end 3) are
by taking 5 small, a-rate as close as desired to equivalent.
- 1% Furthermore, once c is
5 Ai jpipj ’ R 15 a bounded continouus function of its
chosen, by taking n eu.fficie;tly large, we can arguments Pi and p,(j) in the (bounded) region
insure that M( 5 Al jPiPj) = (1 - cP is as
of allowed values defined by = 1, Pi 80,
small as desired, say, less than 6. The = pi
f p,(j) = 1, p,(j) 3 0. B has a finite
probability in the ensemble of codes of a
particular vord being adjacent to eny other in
its own code is now lees than 6. This implies partial derivative with respect to any pi(j)>O.
that- there are codes in the ensemble for which In fact, we readily calculate,
the ratio of the number of such undesired words
to the total number in the code is lees than or
equal to 6. For, if not, the ensemble average Pi log (P,(j)/ G Pdg(j))
would be worse than 6. Select such a code and a*=
delete from it the words having this property.
We have reduced our rate only by at most A necessary and sufficient condition that R be
log (1 - 6)-l, Since c and 6 were both stationary for small variation-of the non-
arbitrarily small, we obtain error-free codes vanishing p,(j) subject to the conditions given
arbitrarily close. to the rate-log is that
min as stated in the theorem.
pi 5 aijPiPj

In connection with the upper bound of


Theorem 1, the following result is useful in for all I, j, k such that Pi, pi (.I), pi(k) do not
evaluating the minimum C. It is also intereet- vanish. This requires that
ing in its own right end will prove useful later
in connection with channel5having a feedback
link. Pi log p,(j) I r P,p,(j) =
m
Theorem 2: In a discrete memorylees
channel with transition probabilitiee pi(j) and
input letter probabilitiee Pi the follwing Pi log pi(k) / x P,p,(k)
three statements are equivalent. m
If ve let Q ZE P,p,(j), the probability of
1) The rate of tranemi5eion j=

m
-
R = 5 p,P,(j) log (pi(j)/ f p&(j)) output letter j, then this is equivalent to

11

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
-;:Pi(k)
P,(j) the variation of R also vanishes subject to

Qj Qk f pdj) = l*
.-
In other words, p,(j)/Q, is independent of j, 8 Ye now prove that 2) implies 3). Suppose
function of i only whenever Pi>0 and pi(j) log p,(j) = I whenever PIpi(j)>O. Then p,(j)
.-b
This function of i we call ui. Thus T
=j

=e a Qjt a function of j only under this same


p,(j) = uiQj condition. Also, if q (i) is the conditional
probability of i givenj j, then
unless Pip,(j) = 0.
QJ qjw I
Now. taking the partial derivative of R = e
with respect to Pi we obtain: pi Qj
P,(i) = eIPi
$-i F P,(j) log v - 1

1 = > qj(i) = e1 )-Pi


For R to be stationary subject to id3 its .
5 J
we must have aRIaPi = aRpPk. Thus
To prove that 3) implies 2) we assume
x p (j) log pi(j)
ji -= P,(j) = rj
Qj
when P,p,(j)>O. Then

Since for Pip,(j)>0 we have p,(j)/Qj = Qi, this


becomes
P,P,(j)
-=
2 Q,q, (i) qj (i)
= Aj (say) =-e -
‘iQj QJ 5 Qj pi
E P,(j) log (ti = f q&j) log cfk
J Row, summing the equation P A = q (I) over icS
and using the assumption frfd3) dat F Pi 2h
1% “i = log ok we obtain j
hh =l
Thus “i is independent of i and may be written a. J
-1
Consequently so A is h and independent of j. Hence rij = I
j
-1
P,(j) =logh .
-=a
Qj The last statement of the theorem concern-
ing minimum and maximum capacity under variation
P,(j) of p,(j) follows from the fact that R at these
log - = 1% a= I
points must be stationary under variation of all
Qj non-vanishing Pi and p,(j), and hence the
whenever Pipi ( j ) > 0. corresponding Pi end p,(j) satisfy condition 1)
of the theorem.
The converse result is an easy reversal of
the above argument. If For simple channels it is usually more
convenient to apply particular tricks in trying
log -Pi(j) = I, then to evaluate Co instead of the bounds given in
Theorem 1, which involve maximiting and minimizing
QJ processes. The simplest lower bound, as mentioned
aR/aP, = I - 1, by a simple before, is obtained by merely finding the
substitution in the
logarithm of the maximum number of non-adjacent
input letters.
aR/aP, formula. Hence R is stationary under
variation of Pi constrained by XP, = 1. A very useful device for determining Co
Further, aR/ap%(j) = Pi I = aRbpi(k), and hence which works in many cases may be described using
the notion of an adjacency-reducing maming.

I.2

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
By this we mean a mapping of letters into other proof: It is clear that in the case of
letters, i-u(i), with the property that if i and t+e prpduot, the zero error capacity is at least
j are not adjacent in the channel (or graph) then co + co, since we may form a produgt code from two
a(i) ,and a(j) are not adjacent. If we have a codes with rates close to CA and Co. If these
zero-error code, then we may apply such a mapping codes are not of the same length, we use for the
letter by letter to the code end obtain a new new code the least common multiple of the indl-
code which will also be of the’ zero-error type, vidual lengths and form all sequences of the code
since no adjacencies can be produced by the words of each of the codes up to this length. To
mapping. prove equality in case one of the graphs, say that
for the first channel, can be mapped into A non-
Theorem 2: If all the input letters i can adjacent points, suppose we have a code for the
be mapped by en adjacency-reducing mapping product channel. The letters for the product..
i+a(i) into a subset of the letters no two of code, of course , are ordered pairs of letters
which are adjacent, then the zero-error capacity corresponding to the original channels.’ Replace.
Co of the channel is equal to the logarithm of the first letter in each pair in all code wJrds.
the number of letters in this subset. by the letter corresponding to reduction by the
mapping method. This reduces or preserves
For, in the first place, by forming all adjacency between words in the code. Wow sort
sequences of these letters we obtain a zero-error the code words into An subsets according to the
code at this rate. Secondly, any zero error code sequences of first letters in the ordered pairs.
for the channel can be mapped into a code using Each of these subsets can contain at most Bn
only t ese letters and containing, therefore, at members, since this is the largest possible number
la! of codes for the second chennel of this length.
most e on non-adjacent words.
Thus, in total, there are at most AnBn words in
The zero-error capacities, or, more exactly, the code, giving the desired result.
the equivalent numbers of input letters for all
adjacency graphs up to five vertices are shown in In the case of the sum of the two channels,
Fig. 4. These can all be found readily by the we first show how, from two given codes for the
method of Theorem 3, except for the channel of two channels, to construct a code for the sum
Fig. 2 mentioned previously, for which we know channel with equivalent number of letters equal
only that the zero-error capacity lies in the to Al - s + B1 -& , where 6 is arbitrarily small
range +10g51c0sog5. and A and B are the equivalent number of letters
7 Let the two codes have
for the two codes.
All graphs with sXx vertices have been lengths nl and n2. The new code will have length
examined and the capacities of all of these can n where n is the smallest integer greater than
also be found by this theorem, with the exception both +nd+ How form
, codes for the first
of four. These four ten be given in terms of the
capacity of Fig. 2, so that this case is essen- channel and-for the second channel for all
tially the only unsolved problem,up to seven lengths k from zero to n as follows. Let k equal
vertices. Graphs with seven vertices have not + b, where a and b are integers and b <nl.
*1
been completely examined but at least one new We form all sequences of a words from the given
situation arises, the analog of Fig. 2 with seven code for the first channel and fill in the
input letters. remaining b letters arbitrarily, say all with the
first letter in the code alphabet. We achieve
As examples of how the No values were at least Ak - en different words of length k none
computed by the method of adjacency-reducing of which is adjacent to any other. In the same
l=WPingS, several of the graphs in Fig. 4 have way we form codes for the second channel and
been labelled to show a suitable mapping. The achieve Bk - 6n words in this code of length k.
scheme is as follows. All nodes labelled a are We now intermingle the k code for the first
mapped into node a as well as aitself. All channel with the n - k code for the second channel
nodes labelled b and also $ are mapped into nodeB. in all (E)possible ways end do this for each
All nodes labelled c and y are mpped into node value of k. This produces a code n letters long
Y- It is readily verified that no new adjacen- with at least (El Ak - n 6 Bn- k - nb
ties are produced by the mappings indicated and
that the a, S, Y nodes are non-adjacent. &
= (AB)-&(A + B)n different words. It is readily
seen that no two of these different words are
Cc for Sum and product Channels adjacent . The rate is at least log (A + B) - 6
log AB, and since 6 was arbitrarily small, we can
Theorem 4:’ If two memoryless channels have achieve a rate arbitrarily close to log (A + B).
zero-error capacities GA = log A ahd Ct = log B,
their sum has a zero-error capacity.greater than To show that it is not possible, when one
or equal to log (A + B) and their product a zet;o of the graphs reduces by mapping to non-adjacent
error capacity greater than or equal to CA + Co. points, to exceed the rate corresponding to the
If the graph of either of the two channels can number of letters A + B, consider any given code
be reduced to non-adjacent points by the mapping of length n for the sum channel. The words in
method (Theorem 3). then these inequalities can this consist of sequences of letters each letter
be replaced by equalities. corresponding to one or the other of the two

13

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
ONE NODE TWO NODES THREE NODES
0 0 0 -
-- 0 0 -7-v
No= I 2 I 3 2 2 I

FOUR NODES

FIVE NODES

3
0 pbJ @
2O 3

lx!
2 I

Fig. 4 - All graphs with 1, 2, 3, 4, 5 nodes and the corresponding tie for chan-
nels with these as adjacency graphs (note Co = log MO)

I.4

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
channels. The words may be subdivided into Channelswith a Feedback Link
classes corresponding to the pattern of the
choices of letters between the two channels. We now consider the corresponding problem
There are 2n such classes with (E) classes in for channels with complete feedback. By this we
which exactly k of the letters are from the first mean that there exists a return channel sending
channel and n - k from the second. Consider now back from the receiving point to the transmitting
a particular class of words of this type. Re- point, without error, the letters actually
place the letters from the first channel alphabet received. It is assumed that this information is
by the corresponding non-adjacent letters. This received at the transmitting point before the next
does not harm the adjacency relations between letter is transmitted, and can be used, therefore,
words in the code. Wow, as in the product case, if desired, in choosing the next transmitted
partition the code words according to the letter.
sequence of letters involved from the first
channel. This produces at most Ak subsets. Each It is interesting that for a memory-less
of these subsets contains at most Bn - k members, channel the ordinary forward capacity is the same
since this is the greatest possible number of non- with or without feedback. This will be shown in
adjacent words for the second channel of length Theorem 6. On the other hand, the zero error
n - k. In total, then, summing over all values capacity may, in some cases, be greater witi
of k and taking account of the (bn) classes for feedback than without. In the channel shown in
each k, there -we at most r n'.Ak Rn - k Fig. 5, for example, Co = log 2. However, we
k (k) will see as a result of Theorem 7 that with
=(A + B)n words in the code for the sum channel. feedback the zero error capacity COP = log 2.5.
This proves the desired result.

Theorem 4, of course, is analogous to


known results for ordinary capacity C, where the p,
product channel has the sum of the ordinary
capacities and the sum channel has an equivalent
number of letters equal to the sum of the equiva-
lent numbers of letters for the individual p2
channels. We conjecture but have not been able
to prove that the equalities in Theorem 4 hold
in general, not just under the conditions given.
We now prove a lower bound for the probability of p3 z
error when transmitting at a rate greater than Co.

Theorem 5: In any code of length n and


rate R> Co, Co > 0, the probability of error P, p4 -
will satisfy Pez(l - e -n(C, - Ii) ) p n where Fig. 5
min'
P min is the minimum non-vanishing p,(j).

w: By definition of Co there are not


more than enCo non-ad'acent words of length n. We first define a block code of length n
With R> Co, among e xl4 words there must, therefore, for a feedback system. This means that at the
be an adjacent pair. The adjacent pair has a transmitting point there is a device with two
common output word which either can cause with a inputs. or, mathematically, a function with two
probability at least pmyn. This output word can- arguments. One argument is the message to be
not be decoded into both inputs. transmitted, the other. the past received letters
At least one, (which have come in over the feedback link). The
Vcrefore, must cause an error when it leads to
value of the function is the next letter to be
thi.s output word. This gives a contribution at transmitted. Thus, the function may be thought
least eBnR Gin to the probability of error Pe.
of as x = f(k, vj) where x. is the j + 1
NOW omit this word from consideration and apply j+l J+l
the same argument to the remaining enR -1 words transmitted letter in a block, k is an index
of the code. This will give another adjacent pair ranging from 1 to M, and represents the
and another contribution of error of at least specific message, and v j is a received word of
e-nR n length j. Thus j ranges from 0 to n - 1 and vj
pmin* The process may be continued ztil the
number of code points remaining is just e O. At over all received words of these lengths.
this time, the computed probability of error must
be at least (enR _ enCo)e-nR pn In operation, 'if message mk is to be sent
min
f is evaluated for f(k -) where the - means "no
= (1 - en(Co - R)) pzin.

1.5

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
word" and this is sent as the first transmitted The sum on m may be thought of as summed first
letter. If the feedback link sends back a, say, on the m’s which result in the same I (for the
as the first received letter, the next trans- given v), .reoalling that x Is a function of m
mitted letter will be f(k, o). If this is and v, and then summing on the different x18. In
received as $. the next transmitted letter will the first summation, the term Pr[y/v,ml is
be f(k,ap), etc. constant at P&T/X] and the coefficient of the
1cgarithm sume to Pr[x,y/v]. Thus we can write
Theorem 6: In a memorylees discrete
channel with feedback, the forward capacity is
equal to the ordinary capacity C (without feed- AI= z
back). The average change in mutual information X,Y
Ll between received sequence v end message m
for a letter of text is not greater than C. Row consider the rate for the channel (in the’
ordinary sense without feedback) if we should
Proof: Let v be the received sequence to assign to the x18 the probabilities q(x)
date of a block, m the message, x the next trans- = Pr[x/v] . ,!l!he probabilities for pairs, r(x,y),
mitted letter and y the next received letter, and for the y’s alone, w(y), in this situation
These are all random variables and,. also, x is a would then be
function of m and v. This function, namely, is
the one which defines the encoding procedure with r(x,y) = q (x3 P&/xl
feedback whereby the next transmitted letter x is =-Pr[x/v] Pr [y/x]
determined by the ..Jssege m and the feedback
information v from the previous received signals.
The channel being memoryless implies that the = pr Cx,r/vl
next operation is independent of the past, in w(y) = ) dX.Y)
particular, PrCY/Xl = PrCY/X,Vl.

The average change in mutual information,


when a particular v has been received, due to the = ; P.r CX,Y/Vl
x,y pair is given by (we are averaging over X
messages m and next received letters y, for a
given v): = prcs/vl

Hence the rate would be


R = ) r(x,y) log w
XSY
PrEv,y,m]
log PrCv,YlPr[m] - c PrCm/vl* ‘= prCY/Xl
PrCx*Y/vl log Pr[Y,v]
5XL,
E
AI

Since Ric C, the channel capacity (C being the


no~lrimum possible B for all q(x) assignments), we
Since Pr[m/vl = x Pr[y,m/vl, the second sum mey conclude that
9
be rewritten as AI CC.

Since the average change in I per letter is


not greater than C, the average change in n
The two suma then combine to give letters is not greater than nC. Hence, in a blodr
code of length IL with input rate R, if R > C then
the equivocation at the end of a block will be at
least B - C, just as in the non-feedback case.
In other words, it is not possible to approach
zero equivocation (or, as easily follows, zero
probability of error) at a rate exceeding the
Channel capacity. It is, of course, possible to
do this at rates less than C, since certainly
anything that can be done without feedback can
be done with feedback.

16

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
It is interesting that the first sentence possible received letter in common. Assuming
of Theorem 6 can be generalized readily to ohan- this occurs, calculate the next transmitted
nels with memory provided they are of such a letters in the coding system for ml and m2. These
nature that the internal state of the channel also have a possible received letter in common.
can be calculated at the transmitting point from Continuing in this manner we establish a
the initial state and the sequence of letters received word which could be produced by either
that have been transmitted. If this is not the ml or m2 and therefore they cannot be distin-
case, the conclusion of the theorem will not
always be true, that is, there exist channels of guished with certainty.
a more complex sort for which the forward
capacity with feedback exceeds that without feed- Row consider the case where not all pairs
back. We shall not,however, give the details of are adjacent. *We will first prove, by induction
these generalizations here. on the block length n, that the rate log PO-l
cannot be exceeded with a zero error code. For
Returning now to the zero-error problem, n = o the result is certainly true. The induc-
we define a zero error capacity COP for a tive hypothesis will be that no block code of
length n - 1 transmits at a rate greater than
channel with feedback in the obvious way--the log 51, or, in other words, can resolve with
least upper bound of rates for block codes with
no errors. The next theorem solves the problem certainty more than

of evaluating COB for memoryless channels with ,(n-1) log PO1= ,-(n-1)
0
feedback, and indicates how rapidly COB may be different messages. Row suppose (in contradic-
approached as the block length n increases. tion to the desired result) we have a block code
of length n resolving M messages with M>qns
aeorem 7: In a memoryless discrete The first transmitted letter for the code parti-
channel with complete feedback of received lettga tions these M messages among the input letters
to the transmitting point, the zero error for the channel. Let Fi be the fraction of the
capacity CoF is zero if all pairs of input messages assigned to letter 1 (that is, for which
1 is the first transmitted letter). Row these
letters are adjacent. Otherwise COP = log Pi' Fi are like probability assignments to the
where different letters and therefore by definition of
min mex r P o, there is some output letter, say letter k,
PO = such that Fi )P 0 . Consider the set of
pi j f;sj pi ?G
'k
messages for which the first transmitted letter
Pi being a probability assigned to input letter belongs to Sk. The number of messages in this
1 ( z Pi = 1) and S the set of input letters set is at least PoN. Any of these can cause
1 j
which can cause output letter j with probability output letter k as first received letter. Men
greater than zero. A zero error block code of this happens there are n - 1 letters yet to be
length n can be found for such a feedback transmitted and since M >PGn we have PoM>%(n-l).
channel which transmits at a rate Thus we have a zero error code of block length
R;LCoF (1 - 2 log2 2t) where t is the number n - 1 transmitting at a rate greater than
n
of input letters. log p;? contradicting the inductive assumption.
The PO occuring in this theorem has the Rote that the coding function for this code of
length n - 1 is formally defined from the
following meaning. For any given assignment of original coding function by fixing the first
probabilities Pi to the input letters one may received letter at k.
calculate, for each output letter j, the total
probability of all input letters that can (with We.must now show that the rate log PL1 can
positive probability) cause j. This is
actually be approached as closely as desired with
pi’ Output letters for which this is
zero error codes. Let Pi be the set of probabili-
large may be thought of as nbadn in that when ties which, when assigned. to the input letters,
received there is a large uncertainty as to the give PO for min max The general
pi'
cause. To obtain PO one adjusts the Pi so that pi j G
worst output letter in this sense is as good as Oj
possible. scheme of the.coae will be to divide the M
original messages into t different groups
corresponding to the first transmitted letter.
We first show that if all letters are
The number of messages in these groups will be
adjacent to each other C
OF = 6. In fact, in approximately proportional to Pl, P2,... Pt.
aqv coding system, any two messages, say m, and The first transmitted letter, then, will cor-
m2 can lead to the same received sequence with respond to the group containing the message to
positive probability. Namely, the first trane- be transmitted. Whatever letter is received, the
mitted letters corresponding to m, and m2 have a number of possible messages compatible with this

17

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
received letter will be approximately P,M. This I4 (P1-+ P2 +...+ PSW1)
I
subset of possible messages is known both at the
receiver and (after the received letter is sent -by+%+ . ..+msel )I<, 1. If we now define
back to the transmitter) at the transmitting s-l
point. ma as M - x ml then this inequality can be
1
The code system next subdivi&ss this sub- written /W(l - Pa) - ( M - me) 1 $ 1. Hence
set of messages into t groups, again approximate-
m
ly in proportion to the probabilities Pi. The A-P <A !fhus we have achie:ed the
I M .a I \M’
second letter transmitted is that corresponding
to the group containing the actual message. objective of keeping all approximation rml to
Whatever letter is received, the number of mes- within i of Pi and having x m = H.
sages compatible with the two received letters is 1
now, roughly, P$K.
Returning now to our main problem note
This process is continued until only a few first that if PO = 1 then CoF = 0 and the
messages (less than t2) are compatible with all theorem is trivially true. We assume, then, that
the received letters. The ambiguity among these
is then resolved by using a pair of non-adjacent PO <l. We wish to show that PO 4 (,l - 4).
letters in a simple binary code. The code thus Consider the set of input letters which have
constructed will be a zero error code for the the maximum value of Pi. This maximum is
channel.
certainly greater than or equal to the average
Our first concern is to estimate carefully 1
-.t Furthermore. we oan arrange tohave at least
the approximation involved in subdividing the
messages into the t groups. We will show that one of these input letters not connected to some
for any M and any set of Pi x Pi = 1, it is output letter. For suppose this is not the case.
!Chen either there are no other input letters
possible to subdivide the M messages into groups beside this set and we contradict the assumption
of ml,m2,..,mt such that ml = o whenever Pi = o that Po<l, or there are other input letters
and with smaller values of Pi. In this case, by
mi-p reducing the Pi for one input letter in the
I M 1 I "+- 1 = 1,...,t
maximum set and increasing correspondingly that
We assume without loss of generality that for some input letter which does not connect to
Pa are the non-vanishing Pi. Choose ml all output letters, we do not increase the value
P1'P2.... of PO (for any S ) and create an input letter of
to be the largest integer such that j
--ml L-P=. the desired type. By consideration of an output
w
"1 letter to which this input letter does not
Let P1 - -T;T = 6,. Clearly /6#+. IText choose
connect we see that PO 61 -$
m2 to be the smallest integer such that 2% P2
M Now suppose we start with M messages and
and let "2
P2 - M '8,. We have 1 &2i$$. Also subdivide into groups approximating proportion-
ality to the Pi as described above. Then when a
Is1 4'21Gi1 since 6, and 6, are opposite in letter has been received, the set of possible
sign and each less than i in absolute value. messages (compatible with this received letter)
will be reduced to those in the groups correspond-
I?ext is chosen so that 3 approximates, to ing to letters which connect to the actual
,3 M received letter. Each output letter connects to
within 1M' to P . If 61 + 6,%0, then 3 is not more than t - 1 input letters (otherwise we
3
would have PO = 1). For each of the connecting
chosen less than or equal to P If s 1 = s,LO, groups, the error in approximating Pi has been
3’
less than or equal to 1 Hence the total
then 3 is chosen greater than or equal to P -Fr
M 3’ relative number in all connecting groups for any
Thus again P3 -2=h3<$- and output letter is less than or equal to PO + t-l.
M
The total number of possible messages after
6, + 6, + "31-'+* Continuing in this manner receiving the first letter consequently drops
I
through Pa,1 we obtain approximationsfor from M to a number less than or equal to PoM +t-1.

P 1, P 2,...,Ps,1 with the property that In the coding system to be used, this
remaining possible subset of messages is sub-
divided again among the input letters to
S1 + S2 +. ..+6 I+, or approximate in the same fashion the probabilities
I s-l 'M
pi- This subdivision can be carried out both at

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
receiving point and transmitting point using the length a choice from M = Td me&ages. Thus the
same standard procedure (say, exactly the one zero error rate we have achieved is
described above) since with the feedback both
terminals have available the required data, d log P;;'
namely the first received letter. R = $ log Ma
d + log24t2
. The second transmitted letter obtained by
this procedure will egain reduce at the receiving = (1 - Jpg 4t2) log Pi'
point the number of possible messages to a value
not greater than PO (P,M + t - 1) + t - 1. This
same process continues with each transmitted = (1 - $ 1% 4t2) c,F
letter. If the upper bound on the number of
Dossible remaining messe2es after k letters is Thus we can approximate to COP as closely as
9, then Mk + 1 =-PoMk +-t - 1. The solution of desired with zero error codes.
this difference equation is
As an example of Theorem 7 consider the
Mk=bpi: +e channel in Fig. 5. We wish to evaluate PO. It
0 is easily seen that we may take P1 = P2 = P3 in
This may be readily verified by substitution in forming the min max of Theorem 7, for if they are
the difference equation. To satisfy the initial unequal the maximum f- Pi for the correspond-
conditions MO = M requires A = M - ti . Thus I&
1-P j
0 ing three output letters would be reduced by
the solution becomes
equalizing. Also it is evident, then, that
P4 = P1 + P2, since otherwise a shift of
b$=(M-tio)< +f+, probability one way or the other would reduce the
me2&aum. We conclude, then, that Pl = P2 = P3

t-1 = l/5 and P4 = 215. Finally, the zero error


= M Pok + i--- - p,(l - Pok)
capacity with feedback is log Pi1 = log 512.
4 M Pok + t (t - 1) There is a close connection between the
min max process of Theorem 7 and the process of
since we have seen above that 1 - PO>,+. finding the minimum capacity for the channel
under variation of the non-vanishing transition
If the process described is carried out probabilities p,(j) as in Theorem 2. It was
for nl steps, where nl is the smallest integer
noted there,that at the minimum capacity each
>, d where d is the solution of MPod = 1, then the output letter can be caused by the same total
number of possible messages left consistent with probability of input letters. Indeed, it seems
the received sequence will be not greater than very likely that the probabilities of input
letters to attain the minimum capacity are
1+t(t- 1) St2 (since t-1, otherwise we exactly those which solve the min mar problem of
should have C Now the pair of non-
OF = 0). Theorem 7, and, if this is so, the Cmin = log T1.
adjacent letters assumed in the theorem may be
used to resolve the ambiguity among these Acknowledgement
t2 or less messages. This will require not more
than 1 + log2t2 =log22t2 additional letters. I am indebted to Peter Elias for first
Thus. in total, we have used not more than pointing out that a feedback link could increase
the zero-error capacity, as well as for several
d + 1 + log22t2 = d + log24t2 = n say as block suggestions that were helpful in the proof of
length. We have transmitted in this block Theorem 7.

19

Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.

You might also like