The Zero Error Capacity of A Noisy Channel
The Zero Error Capacity of A Noisy Channel
Claude E. Shannon
Bell Telephone Laboratories, Murray Hill, New Jersey
Massachusetts Institute of Technolo,qv, Cambridge, Mass.
Abstract
Introduction
‘8
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
channels is the channel formed by using inputs note that any two words have a possible output
from either of the two given channels with the word in common, namely the word consisting of the
same transition probabilities to the set of out- sequence of common output letters when the two in-
put letters consisting of the logical sum of the put vorde are compared letter by letter. Each of
two output alphabets. Thus the sum channel is the two input words has a probability at least
defined by a transition matrix formed by placing l&n of producing this common output word. In
the matrix of one channel below and to the right using the code, the two particular input words
of that for the other channel and filling the will each occur 1 of the time and will cause the
remaining two rectangles with zeros. If p,(j) common output b Enpmin of the time. This output
and llP;(j)/I are the individual matrices, the
can be decoded in only one way. Hence at least
sum has the following matrix: one of these situations leads to an error. This
error, Kln pmin, is assigned to this code word, end
~~(1) . . . p,(r) 0 . . . 0
from the remaining M - 1 code words another pair
.. . . .. is chosen. A source of error to the amount *
.
a piin is assigned in similar fashion to one of
p,(l) . . . it(r) b . . . b these, and this is a disjoint event. Continuing
0 . ..o in this manner, we obtain a total of at least
pi(l) . . . Pi(r') u pn
. . . as probability of error.
. . . Y min
b . . . b pi*(l). . . pL'(r') If it is not true that the input letters
are all adjacent to each other, it is possible to
transmit at a positive rate with zero probability
The product of two channels is the channel of error. The least upper'bound of all rates
whose input alphabet consists of all ordered which can be achieved with zero probability of
pairs (1.i') where i is a letter from the first error will be called the zero error cauac1t.X of
channel alphabet and 1' from the second, whose the channel and denoted by Co. If we let MO(n) be
output alphabet is the similar set of ordered the largest number of words in a code of length n,
pairs of letters from the two individual output no two of which are adjacent, then Co is the least
alphabets and whose transitton probability from upper bound of the numbers 4 log M,(n) when n
(i,il) to Ll,j9 is p,(j) f,'Ll 1. varies through all positive integers.
The sum of channelscorresponds physically
to a situation where either of two channels mey be One might expect that Co would be equal to
used (but not both), a new choice being made for log M,(l), that is, that if we choose the largest
each transmitted letter. The product channel possible set of non-adjacent letters and form all
corresponds to a situation where both channels sequences of these of length n, then this would be
are used each unit of time. It is interesting to the best error free code of length n. This is not,
note that multiplication and addition of channels in general, true, although it holds in many cases,
are both associative and commutative, and that particularly when the number of input letters is
the product distributes over a sum. Thus one can small. The first failure occurs with five input
develop a kind of algebra for channels in which letters vith the channel in Fig. 2. In this
i&isX~ossible to write, for example, a polynomial channel, it is possible to choose at most two non-
adjacent letters, for example 0 and 2. Using
an 9 where the an are non-negative integers a
sequences of these, 00, 02. 20, and 22 we obtain
and X is a channel. We shall not, however, four vords in a code of length two. However, it
investigate here the algebraic properties of this is possible to construct a code of length two with
system. five members no two of 'which are adjacent as
follows: 00, 12, 24, 31. 43. It is readily
The Zero Error Canacity verified that no two of these are adjacent. Thus,
Co for this channel is at least .$ log 5.
In a discrete channel we will say that two
input letters are adjacent if there is an output
letter which can be caused by either of these two.
Thus, i and-j are adjacent if there exists a t
such that both pi(t) and pj(t) do not vanish. In
Fig.'l, a end c are adjacent, while a and d are
not.
If all input letters are adjacent to each
other, any code with more than one vord has a
probability of error at the receiving point
greater than zero. In fact, the probability of
error in decoding words satisfies
PeG+ Piin
where pm n is the smallest (non-vanishing) among
the Pi(J 3 . n is the length of the code and M is
the number of words in the code. To prove this, Fig. 2
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
No method has been found for determining Theorem 1: The zero error capacity Co of
Co for the general discrete channel, and this we a discrete memorylees channel is bounded by the
propose as an interesting unsolved problem in inequalities
coding theory. We shall develop a number of
results which enable one to determine Co in many - log min > Aij PiPj (Co\< min C
special cases, for example, in all channelswith pi ij pi(S)
five or lees input letters vith the single excep-
tion of the channel of Fig. 2 (or channels EP, = 1, Pi20
equivalent in adjacency structure to it). We i
will also develop some general inequalities
enabling one to estimate Co quite closely-in most $Z p,(j) = 1, p,(j) 2 0
cases.
It WV be seen, in the first place, that where C is the capacity of any channel with
the value of Co depends only on which input transition probabilities p,(j) and having the
letters are adjacent to each other. Let us adjacency matrix Aij.
define the adjacency matrix for a channel, Aij,
as follows. The upper bound is fairly obvious. The
zero error capacity is certainly lees than or
equal to the ordinary capacity for any channel
with the same adjacency matrix since the former
1 if input letter 1 is adjacent to j or requires codes with zero probability of error
Aij = ifi=j vhile the latter requires only codes approaching
0 otherwise zero probability of erroi. By minimizing the
i capacity through variation of the pi(j) we find
the lowest upper bound available through this
Suppose two channelshave the same adjacency argument. Since the capacity is a continuous
matrix (possibly after renumbering the input function of the pi(j) in the closed region
letters of one of them). Then it is obvious that defined by pi(j) 2, 0, f p,(j) = 1, we may
a zero error code for one will be a zero error write min instead of greatest lower bound.
code for the other and..hence, that the zero
error capacity Co for one will also apply to the It is worth noting that it is only neces-
other. sary to consider a particular channel in perform-
ing this minimization, although there are an
The adjacency structure contained in the
infinite number with the s&me adjacency matrix.
adjacency matrix can also be represented as a
This one particular channel is obtained as
linear graph. Construct a graph with as many follows from the adjacency matrix. If Aik = 1
vertices as there are input letters, and connect
for a pair 1 k, define an output letter j with
two distinct vertices with a line or branch of
p (j) and p (j) both differing from zero. Now
the graph if the corresponding input letters are
ih there arf any three input letters, say 1 k 1,
adjacent. Two examples are shown in Fig. 3,
all adjacent to each other, define an output
corresponding to the channels of Figs. 1 and 2.
letter, say m, with pi(m) pk(m) pi(m) all dif-
ferent from zero. In the adjacency graph this
corresponds to a complete sub-graph with three
vertices. Next, subsets of four letters or
complete subgraphs of four vertices, say 1 k 1 m,
are given an output letter, each being connected
to it, end so on. It is evident that any channel
with the same adjacency matrix differs from that
just described only by variation in the number of
0 output symbols for some of the pairs. triplets,
e etc., of adjacent input letters. If a channel
d has more than one output symbol for an adjacent
subset of input letters, then its capacity is
reduced by identifying these. If a channel
I contains no element, say for a triplet i k 1 of
adjacent input letters, this will occur as a
special case of our canonical channel which has
0 output letter m for this triplet when pi(m),
2
%(m) and p,(m) all vanish.
10
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
each containirg M words. each word n letters long. is stationary under variation of all non-vanieh-
The words in a code are chosen by the following ing Pi subject to x Pi = 1 end under varia-
stochastic method. Each letter of each word’ is I
chosen independently of all others and is the
letter 1 with probability Pi. We now compute the tion of p,(j) for those pi(j)suchthat Pip,(j)>0
probability in the ensemble that any particular and subject to f Pi(j) = 1.
word is not adjacent to any other word in its
code. The probability that the first letter of
one word is adjacent to the first letter of a 2) The mutual information between input-
second word is since this sums output pairs Iij = lw (pi(j)/ f p$&)) ia
5 AijPiPj *
the cases of adjacency with coefficient 1 end constant, Iij = I, for all ij pairs of non-vanish-
those of non-adjacency with coefficient 0. The
probability that two vorde are adjacent in all ing probability (i.e. pairs for vhich Pipi(j)‘>O).
letters, and therefore adjacent as words, is
( z &,P,P,)n. The probability of non-adja-
AJ AJ A J 3) We have p,(j) = rj a function of j
cency is, therefore 1 - ( 5 AijPiPj)n. The
only whenever P,p,(j)>O; and also > Pi = h,
probability that all N - 1 other-words in a code i&j
are not adjacent to a given word is, since they .a constant independent of j where S is the Set
are chosen independently, j
of input letters that can produce output letter j
M -1 with probability greater than zero. We also have
greater
1 I = log h-l.
m
-
R = 5 p,P,(j) log (pi(j)/ f p&(j)) output letter j, then this is equivalent to
11
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
-;:Pi(k)
P,(j) the variation of R also vanishes subject to
Qj Qk f pdj) = l*
.-
In other words, p,(j)/Q, is independent of j, 8 Ye now prove that 2) implies 3). Suppose
function of i only whenever Pi>0 and pi(j) log p,(j) = I whenever PIpi(j)>O. Then p,(j)
.-b
This function of i we call ui. Thus T
=j
I.2
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
By this we mean a mapping of letters into other proof: It is clear that in the case of
letters, i-u(i), with the property that if i and t+e prpduot, the zero error capacity is at least
j are not adjacent in the channel (or graph) then co + co, since we may form a produgt code from two
a(i) ,and a(j) are not adjacent. If we have a codes with rates close to CA and Co. If these
zero-error code, then we may apply such a mapping codes are not of the same length, we use for the
letter by letter to the code end obtain a new new code the least common multiple of the indl-
code which will also be of the’ zero-error type, vidual lengths and form all sequences of the code
since no adjacencies can be produced by the words of each of the codes up to this length. To
mapping. prove equality in case one of the graphs, say that
for the first channel, can be mapped into A non-
Theorem 2: If all the input letters i can adjacent points, suppose we have a code for the
be mapped by en adjacency-reducing mapping product channel. The letters for the product..
i+a(i) into a subset of the letters no two of code, of course , are ordered pairs of letters
which are adjacent, then the zero-error capacity corresponding to the original channels.’ Replace.
Co of the channel is equal to the logarithm of the first letter in each pair in all code wJrds.
the number of letters in this subset. by the letter corresponding to reduction by the
mapping method. This reduces or preserves
For, in the first place, by forming all adjacency between words in the code. Wow sort
sequences of these letters we obtain a zero-error the code words into An subsets according to the
code at this rate. Secondly, any zero error code sequences of first letters in the ordered pairs.
for the channel can be mapped into a code using Each of these subsets can contain at most Bn
only t ese letters and containing, therefore, at members, since this is the largest possible number
la! of codes for the second chennel of this length.
most e on non-adjacent words.
Thus, in total, there are at most AnBn words in
The zero-error capacities, or, more exactly, the code, giving the desired result.
the equivalent numbers of input letters for all
adjacency graphs up to five vertices are shown in In the case of the sum of the two channels,
Fig. 4. These can all be found readily by the we first show how, from two given codes for the
method of Theorem 3, except for the channel of two channels, to construct a code for the sum
Fig. 2 mentioned previously, for which we know channel with equivalent number of letters equal
only that the zero-error capacity lies in the to Al - s + B1 -& , where 6 is arbitrarily small
range +10g51c0sog5. and A and B are the equivalent number of letters
7 Let the two codes have
for the two codes.
All graphs with sXx vertices have been lengths nl and n2. The new code will have length
examined and the capacities of all of these can n where n is the smallest integer greater than
also be found by this theorem, with the exception both +nd+ How form
, codes for the first
of four. These four ten be given in terms of the
capacity of Fig. 2, so that this case is essen- channel and-for the second channel for all
tially the only unsolved problem,up to seven lengths k from zero to n as follows. Let k equal
vertices. Graphs with seven vertices have not + b, where a and b are integers and b <nl.
*1
been completely examined but at least one new We form all sequences of a words from the given
situation arises, the analog of Fig. 2 with seven code for the first channel and fill in the
input letters. remaining b letters arbitrarily, say all with the
first letter in the code alphabet. We achieve
As examples of how the No values were at least Ak - en different words of length k none
computed by the method of adjacency-reducing of which is adjacent to any other. In the same
l=WPingS, several of the graphs in Fig. 4 have way we form codes for the second channel and
been labelled to show a suitable mapping. The achieve Bk - 6n words in this code of length k.
scheme is as follows. All nodes labelled a are We now intermingle the k code for the first
mapped into node a as well as aitself. All channel with the n - k code for the second channel
nodes labelled b and also $ are mapped into nodeB. in all (E)possible ways end do this for each
All nodes labelled c and y are mpped into node value of k. This produces a code n letters long
Y- It is readily verified that no new adjacen- with at least (El Ak - n 6 Bn- k - nb
ties are produced by the mappings indicated and
that the a, S, Y nodes are non-adjacent. &
= (AB)-&(A + B)n different words. It is readily
seen that no two of these different words are
Cc for Sum and product Channels adjacent . The rate is at least log (A + B) - 6
log AB, and since 6 was arbitrarily small, we can
Theorem 4:’ If two memoryless channels have achieve a rate arbitrarily close to log (A + B).
zero-error capacities GA = log A ahd Ct = log B,
their sum has a zero-error capacity.greater than To show that it is not possible, when one
or equal to log (A + B) and their product a zet;o of the graphs reduces by mapping to non-adjacent
error capacity greater than or equal to CA + Co. points, to exceed the rate corresponding to the
If the graph of either of the two channels can number of letters A + B, consider any given code
be reduced to non-adjacent points by the mapping of length n for the sum channel. The words in
method (Theorem 3). then these inequalities can this consist of sequences of letters each letter
be replaced by equalities. corresponding to one or the other of the two
13
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
ONE NODE TWO NODES THREE NODES
0 0 0 -
-- 0 0 -7-v
No= I 2 I 3 2 2 I
FOUR NODES
FIVE NODES
3
0 pbJ @
2O 3
lx!
2 I
Fig. 4 - All graphs with 1, 2, 3, 4, 5 nodes and the corresponding tie for chan-
nels with these as adjacency graphs (note Co = log MO)
I.4
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
channels. The words may be subdivided into Channelswith a Feedback Link
classes corresponding to the pattern of the
choices of letters between the two channels. We now consider the corresponding problem
There are 2n such classes with (E) classes in for channels with complete feedback. By this we
which exactly k of the letters are from the first mean that there exists a return channel sending
channel and n - k from the second. Consider now back from the receiving point to the transmitting
a particular class of words of this type. Re- point, without error, the letters actually
place the letters from the first channel alphabet received. It is assumed that this information is
by the corresponding non-adjacent letters. This received at the transmitting point before the next
does not harm the adjacency relations between letter is transmitted, and can be used, therefore,
words in the code. Wow, as in the product case, if desired, in choosing the next transmitted
partition the code words according to the letter.
sequence of letters involved from the first
channel. This produces at most Ak subsets. Each It is interesting that for a memory-less
of these subsets contains at most Bn - k members, channel the ordinary forward capacity is the same
since this is the greatest possible number of non- with or without feedback. This will be shown in
adjacent words for the second channel of length Theorem 6. On the other hand, the zero error
n - k. In total, then, summing over all values capacity may, in some cases, be greater witi
of k and taking account of the (bn) classes for feedback than without. In the channel shown in
each k, there -we at most r n'.Ak Rn - k Fig. 5, for example, Co = log 2. However, we
k (k) will see as a result of Theorem 7 that with
=(A + B)n words in the code for the sum channel. feedback the zero error capacity COP = log 2.5.
This proves the desired result.
1.5
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
word" and this is sent as the first transmitted The sum on m may be thought of as summed first
letter. If the feedback link sends back a, say, on the m’s which result in the same I (for the
as the first received letter, the next trans- given v), .reoalling that x Is a function of m
mitted letter will be f(k, o). If this is and v, and then summing on the different x18. In
received as $. the next transmitted letter will the first summation, the term Pr[y/v,ml is
be f(k,ap), etc. constant at P&T/X] and the coefficient of the
1cgarithm sume to Pr[x,y/v]. Thus we can write
Theorem 6: In a memorylees discrete
channel with feedback, the forward capacity is
equal to the ordinary capacity C (without feed- AI= z
back). The average change in mutual information X,Y
Ll between received sequence v end message m
for a letter of text is not greater than C. Row consider the rate for the channel (in the’
ordinary sense without feedback) if we should
Proof: Let v be the received sequence to assign to the x18 the probabilities q(x)
date of a block, m the message, x the next trans- = Pr[x/v] . ,!l!he probabilities for pairs, r(x,y),
mitted letter and y the next received letter, and for the y’s alone, w(y), in this situation
These are all random variables and,. also, x is a would then be
function of m and v. This function, namely, is
the one which defines the encoding procedure with r(x,y) = q (x3 P&/xl
feedback whereby the next transmitted letter x is =-Pr[x/v] Pr [y/x]
determined by the ..Jssege m and the feedback
information v from the previous received signals.
The channel being memoryless implies that the = pr Cx,r/vl
next operation is independent of the past, in w(y) = ) dX.Y)
particular, PrCY/Xl = PrCY/X,Vl.
16
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
It is interesting that the first sentence possible received letter in common. Assuming
of Theorem 6 can be generalized readily to ohan- this occurs, calculate the next transmitted
nels with memory provided they are of such a letters in the coding system for ml and m2. These
nature that the internal state of the channel also have a possible received letter in common.
can be calculated at the transmitting point from Continuing in this manner we establish a
the initial state and the sequence of letters received word which could be produced by either
that have been transmitted. If this is not the ml or m2 and therefore they cannot be distin-
case, the conclusion of the theorem will not
always be true, that is, there exist channels of guished with certainty.
a more complex sort for which the forward
capacity with feedback exceeds that without feed- Row consider the case where not all pairs
back. We shall not,however, give the details of are adjacent. *We will first prove, by induction
these generalizations here. on the block length n, that the rate log PO-l
cannot be exceeded with a zero error code. For
Returning now to the zero-error problem, n = o the result is certainly true. The induc-
we define a zero error capacity COP for a tive hypothesis will be that no block code of
length n - 1 transmits at a rate greater than
channel with feedback in the obvious way--the log 51, or, in other words, can resolve with
least upper bound of rates for block codes with
no errors. The next theorem solves the problem certainty more than
of evaluating COB for memoryless channels with ,(n-1) log PO1= ,-(n-1)
0
feedback, and indicates how rapidly COB may be different messages. Row suppose (in contradic-
approached as the block length n increases. tion to the desired result) we have a block code
of length n resolving M messages with M>qns
aeorem 7: In a memoryless discrete The first transmitted letter for the code parti-
channel with complete feedback of received lettga tions these M messages among the input letters
to the transmitting point, the zero error for the channel. Let Fi be the fraction of the
capacity CoF is zero if all pairs of input messages assigned to letter 1 (that is, for which
1 is the first transmitted letter). Row these
letters are adjacent. Otherwise COP = log Pi' Fi are like probability assignments to the
where different letters and therefore by definition of
min mex r P o, there is some output letter, say letter k,
PO = such that Fi )P 0 . Consider the set of
pi j f;sj pi ?G
'k
messages for which the first transmitted letter
Pi being a probability assigned to input letter belongs to Sk. The number of messages in this
1 ( z Pi = 1) and S the set of input letters set is at least PoN. Any of these can cause
1 j
which can cause output letter j with probability output letter k as first received letter. Men
greater than zero. A zero error block code of this happens there are n - 1 letters yet to be
length n can be found for such a feedback transmitted and since M >PGn we have PoM>%(n-l).
channel which transmits at a rate Thus we have a zero error code of block length
R;LCoF (1 - 2 log2 2t) where t is the number n - 1 transmitting at a rate greater than
n
of input letters. log p;? contradicting the inductive assumption.
The PO occuring in this theorem has the Rote that the coding function for this code of
length n - 1 is formally defined from the
following meaning. For any given assignment of original coding function by fixing the first
probabilities Pi to the input letters one may received letter at k.
calculate, for each output letter j, the total
probability of all input letters that can (with We.must now show that the rate log PL1 can
positive probability) cause j. This is
actually be approached as closely as desired with
pi’ Output letters for which this is
zero error codes. Let Pi be the set of probabili-
large may be thought of as nbadn in that when ties which, when assigned. to the input letters,
received there is a large uncertainty as to the give PO for min max The general
pi'
cause. To obtain PO one adjusts the Pi so that pi j G
worst output letter in this sense is as good as Oj
possible. scheme of the.coae will be to divide the M
original messages into t different groups
corresponding to the first transmitted letter.
We first show that if all letters are
The number of messages in these groups will be
adjacent to each other C
OF = 6. In fact, in approximately proportional to Pl, P2,... Pt.
aqv coding system, any two messages, say m, and The first transmitted letter, then, will cor-
m2 can lead to the same received sequence with respond to the group containing the message to
positive probability. Namely, the first trane- be transmitted. Whatever letter is received, the
mitted letters corresponding to m, and m2 have a number of possible messages compatible with this
17
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
received letter will be approximately P,M. This I4 (P1-+ P2 +...+ PSW1)
I
subset of possible messages is known both at the
receiver and (after the received letter is sent -by+%+ . ..+msel )I<, 1. If we now define
back to the transmitter) at the transmitting s-l
point. ma as M - x ml then this inequality can be
1
The code system next subdivi&ss this sub- written /W(l - Pa) - ( M - me) 1 $ 1. Hence
set of messages into t groups, again approximate-
m
ly in proportion to the probabilities Pi. The A-P <A !fhus we have achie:ed the
I M .a I \M’
second letter transmitted is that corresponding
to the group containing the actual message. objective of keeping all approximation rml to
Whatever letter is received, the number of mes- within i of Pi and having x m = H.
sages compatible with the two received letters is 1
now, roughly, P$K.
Returning now to our main problem note
This process is continued until only a few first that if PO = 1 then CoF = 0 and the
messages (less than t2) are compatible with all theorem is trivially true. We assume, then, that
the received letters. The ambiguity among these
is then resolved by using a pair of non-adjacent PO <l. We wish to show that PO 4 (,l - 4).
letters in a simple binary code. The code thus Consider the set of input letters which have
constructed will be a zero error code for the the maximum value of Pi. This maximum is
channel.
certainly greater than or equal to the average
Our first concern is to estimate carefully 1
-.t Furthermore. we oan arrange tohave at least
the approximation involved in subdividing the
messages into the t groups. We will show that one of these input letters not connected to some
for any M and any set of Pi x Pi = 1, it is output letter. For suppose this is not the case.
!Chen either there are no other input letters
possible to subdivide the M messages into groups beside this set and we contradict the assumption
of ml,m2,..,mt such that ml = o whenever Pi = o that Po<l, or there are other input letters
and with smaller values of Pi. In this case, by
mi-p reducing the Pi for one input letter in the
I M 1 I "+- 1 = 1,...,t
maximum set and increasing correspondingly that
We assume without loss of generality that for some input letter which does not connect to
Pa are the non-vanishing Pi. Choose ml all output letters, we do not increase the value
P1'P2.... of PO (for any S ) and create an input letter of
to be the largest integer such that j
--ml L-P=. the desired type. By consideration of an output
w
"1 letter to which this input letter does not
Let P1 - -T;T = 6,. Clearly /6#+. IText choose
connect we see that PO 61 -$
m2 to be the smallest integer such that 2% P2
M Now suppose we start with M messages and
and let "2
P2 - M '8,. We have 1 &2i$$. Also subdivide into groups approximating proportion-
ality to the Pi as described above. Then when a
Is1 4'21Gi1 since 6, and 6, are opposite in letter has been received, the set of possible
sign and each less than i in absolute value. messages (compatible with this received letter)
will be reduced to those in the groups correspond-
I?ext is chosen so that 3 approximates, to ing to letters which connect to the actual
,3 M received letter. Each output letter connects to
within 1M' to P . If 61 + 6,%0, then 3 is not more than t - 1 input letters (otherwise we
3
would have PO = 1). For each of the connecting
chosen less than or equal to P If s 1 = s,LO, groups, the error in approximating Pi has been
3’
less than or equal to 1 Hence the total
then 3 is chosen greater than or equal to P -Fr
M 3’ relative number in all connecting groups for any
Thus again P3 -2=h3<$- and output letter is less than or equal to PO + t-l.
M
The total number of possible messages after
6, + 6, + "31-'+* Continuing in this manner receiving the first letter consequently drops
I
through Pa,1 we obtain approximationsfor from M to a number less than or equal to PoM +t-1.
P 1, P 2,...,Ps,1 with the property that In the coding system to be used, this
remaining possible subset of messages is sub-
divided again among the input letters to
S1 + S2 +. ..+6 I+, or approximate in the same fashion the probabilities
I s-l 'M
pi- This subdivision can be carried out both at
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.
receiving point and transmitting point using the length a choice from M = Td me&ages. Thus the
same standard procedure (say, exactly the one zero error rate we have achieved is
described above) since with the feedback both
terminals have available the required data, d log P;;'
namely the first received letter. R = $ log Ma
d + log24t2
. The second transmitted letter obtained by
this procedure will egain reduce at the receiving = (1 - Jpg 4t2) log Pi'
point the number of possible messages to a value
not greater than PO (P,M + t - 1) + t - 1. This
same process continues with each transmitted = (1 - $ 1% 4t2) c,F
letter. If the upper bound on the number of
Dossible remaining messe2es after k letters is Thus we can approximate to COP as closely as
9, then Mk + 1 =-PoMk +-t - 1. The solution of desired with zero error codes.
this difference equation is
As an example of Theorem 7 consider the
Mk=bpi: +e channel in Fig. 5. We wish to evaluate PO. It
0 is easily seen that we may take P1 = P2 = P3 in
This may be readily verified by substitution in forming the min max of Theorem 7, for if they are
the difference equation. To satisfy the initial unequal the maximum f- Pi for the correspond-
conditions MO = M requires A = M - ti . Thus I&
1-P j
0 ing three output letters would be reduced by
the solution becomes
equalizing. Also it is evident, then, that
P4 = P1 + P2, since otherwise a shift of
b$=(M-tio)< +f+, probability one way or the other would reduce the
me2&aum. We conclude, then, that Pl = P2 = P3
19
Authorized licensed use limited to: ULAKBIM UASL - Hacettepe Universitesi. Downloaded on October 11,2023 at 18:08:38 UTC from IEEE Xplore. Restrictions apply.