(T) $ B (T) - ML (T) 1 DT + Z Akl
(T) $ B (T) - ML (T) 1 DT + Z Akl
2, APRIL 1967
ACKNOWLEDGMENT
Ahstraci-The probability of error in decoding an optimal con- Increasing with rate. The upper bound is obtained for a specific
volutional code transmitted over a memoryless channel is bounded probabilistic nonsequential decoding algorithm which is shown to be
from above and below as a function of the constraint length of the asymptotically optimum for rates above Ra and whose performance
code. For all but pathological channels the bounds are asymptotically bears certain similarities to that of sequential decoding algorithms.
(exponentially) tight for rates above &, the computational cutoff
rate of sequential decoding. As a function of constraint length the I. SUMMARY OF RESULTS
performance of optimal convolutional codes is shown to be superior
to that of block codes of the same length, the relative improvement INCE Elias”] first proposed the use of convolutional
(tree) codes for the discrete memoryless channel,
Manuscript received May 20, 1966; revised November 14, 1966. s it has been conjectured that the performance of
The research for this work was sponsored by Applied Mathematics
Division, Office of Aerospace Research, U. S. Air Force, Grant this class of codes is potentially superior to that of block
AFOSR-700-65. codes of the same length. The first quantitative verification
The author is with the Department of Engineering, University of
California, Los Angeles, Calif. of this conjecture was due to Yudkinr2’ who obtained
VITERBI:ERRORBOUNDSFORCONVOLUTIONALCODES 261
K- STAGE SHIFT REGISTER
i- I
b
CHANNEL SYMBOL
v Wq) (SIGNAL) sELEcToR
INNER COMMUTATOR
PRODUCT CHANNEL SYMBOLS
COMPUTERS E,,~2,--~~r
c
an upper bound on the error probability of an optimal the uth computer (U = 1,2, . . . v) forms the inner product
convolutional code as a function of its constraint length, of the vector in the shift register, which is a subsequence
which is achieved when the Fano sequential decoding of a, with some fixed K-dimensional vector g,, whose
algorithmr3’ is employed. components are also elements of GF(q). The result is
In this paper, we obtain a lower bound on the error a matrix multiplication of the K symbol subsequence
probability of an optimal convolutional code independent of a (as a row vector) with a Kxv matrix G (whose uth
of the decoding algorithm, which for all but pathological column is g.) to produce v symbols of the sequence b.
channels is asymptotically (exponentially) equal to the This is added to v symbols of a previously stored
upper bound for rates above R,, the computational cutoff (or generated) q-ary sequence c, whose total length is
rate of sequential decoding. Also, a new probabilistic (L + K - 1)v symbols. The v symbol subsequence of z
nonsequential decoding algorithm is described, which thus generated can be any one of qv v-component vectors.
exhibits and exploits a fundamental property of con- By properly selecting the matrix G and subsequence of c
volutional codes. An upper bound on error probability [or by selecting them at random with uniform probability
utilizing this decoding algorithm is derived by random from among the ensemble of all q”” matrices and q’
coding arguments, which coincides with the upper bound vectors with components in GF(q)], all possible v symbol
of Yudkin.[‘l In the limit of very noisy channels, upper subsequences of z can be made to occur with equal
and lower bounds are shown to coincide asymptotically probability. Finally the channel symbol selection (or
(exponentially) for all rates, and the negative exponent signal selection in the case of continuous channels) consists
of the error probability, also known as the reliability, of a mapping of each q-ary symbol of z onto an r-ary
is shown to be channel symbol xi of the channel input sequence x (where
r 2 q), as follows: let n, of the q-ary symbols be mapped
c/2 0 I R 5 C/2
lim $ In (l/P,) = into El, n, into Ez, etc., such that
.,‘+m C/2 5 R < C
C-R
where N is the code constraint length (in channel symbols), $ ni = 4.
R is the transmission rate and C is channel capacity.
This represents a considerable improvement over block Thus if each symbol of z is with uniform probability
codes for the same channels. Also, it is shown that in any element of GF(q), the probability distribution of
general in the neighborhood of capacity, the negative the jth channel input symbol xi is
exponent is linear in (C - R) rather than quadratic,
as is the case for block codes. p(x, = [J = ; (i = 1, 2, *me r) for all f
Finally, a semisequential modification of the decoding
algorithm is described which has several of the basic and by proper choice of q and r any rational channel
properties of sequential decoding methods.[3’ I “I input distribution can be attained. Furthermore, since
II. DESCRIPTION AND PROPERTIES OF THEENCODER one q-ary data symbol thus produces v channel symbols,
the transmission rate of the system is
The message to be transmitted is assumed to be encoded
into the data sequence a whose components are elements R=lnq nats
of the finite field of q elements, GF(q), where q is a prime (1)
v channel symbol
or a power of a prime. All messages are assumed equally
likely; hence all sequences a of a fixed number of symbols and thus, by proper choice of q (which must be a prime
are equally probable. The encoder consists of a K-stage or the power of a prime) and v, any rate can be closely
shift register, v inner-product computers, and an adder, approximated.
all operating over GF(q), together with a channel symbol We note also that the encoder thus produces a tree
selector connected as shown in Fig. 1. After each q-ary code with q branches, each containing v channel symbols,
symbol of the sequence is shifted into the shift register, emanating from each branching node since for every
262 IEEE TRANSACTIONS ON INFORMATION THEORY, APRIL 1967
/pm (2)
K
and having denoted the constraint length in channel
symbols by
Fig;. 2. Tree code for p = 2, u = 3, T = 2, L = 4, K = 3.
N = Ku (3)
shift of the register a potentially different set of v channel we obtain from (l), (a), and (3)
symbols is generated for each of the q possible values
mln q = pNR. (4)
of the data symbol. An example is shown in Fig. 2 for
Q = 2, v = 3, r = 2, K = 3. The data symbol ai The optimal decoder for paths which are a priori equally
is indicated below each branch while the channel symbols likely must compute the ql” = epNRlikelihood functions
xi are indicated above each branch. p(y 1 a), where a = (ai+l, .+a ai+,) is an m-component
The procedure continues until L data symbols are fed q-ary vector which specifies the path, and y = (Y,.+~, . . .
into the shift register followed by a sequence of K - 1 Yi+n+R-I ) is an (m + K - 1)v = (p + l)N - v com-
zeros. L is known 8s the (branch) tree length, and N = Ku ponent vector, and select the path corresponding to the
as the (symbol) constraint length of the code. The overall greatest. The resulting error probability is lower bounded
encoding algorithm thus produces a tree code with L by the lower bound L51--[71for the best block code with
branching levels. All branches contain v channel symbols elrNR words of length (CL + l)N - v channel symbols
except for the qL final branches which contain N = Ku transmitted over a memoryless channel with discrete
channel symbols. The example of Fig. 2 shows such a input space:
tree code for L = 4 and K = 3.
A basic property of the convolutional code thus PAP, N, R) > exp I-h% + l>FL(R, 1.4 + o(dVlj (5)
generated by the K-stage shift register is the following. where
A) Two divergent paths of the tree code will converge
o&N) + 0 linearly ,.JT + 00
(i.e., produce the same channel symbols) after the
data symbols corresponding to the two paths have been
E,(R, p) = 1.u.b. I&(P) - P -&-RI
identical for K consecutive branches. Two paths are os,o_<m1
VITERBI : ERROR BOUNDS FOR CONVOLUTIONAL CODES
Theorem 1 q)(p)
The probability of error in decoding an arbitrarily long
convolutional code tree of constraint length N (channel
symbols) transmitted over a memoryless channel is R
bounded by E(R)
Thus, we have for each of the qK-l possible vectors (az, a3 . . . ar;).
It thus performs qIcel comparisons each among q path
Corollary 2 likelihood functions. Let the path corresponding to the
greatest likelihood function in each comparison be denoted
For low rates a tighter lower bound than that of
Theorem 1 is: the survivor. Only the q”-’ survivors of as many com-
parisons are preserved for further consideration; the
PX > & i -NE&) + o(N)lJ remaining paths are discarded. Among the qK-’ survivors
VITERBI:ERRORBOUNDSFORCONVOLUTIONALCODES 265
each of the qK-’ vectors (a,, a3, . . . aK) is represented it is reduced by a factor of q by the comparisons. Thus,
uniquely, since by the nature of the comparisons no just after the (L - 1)th step there are only q survivors:
two survivors can agree in this entire subsequence.
(all(L-1) , . . * a:y;, 000 * . ’ O),
Step 5 begins with the computation for each survivor
of Step 1 of the likelihood functions of the q branches ( %l
(L-1)
, ...
(L-1)
%,L-1,
100 . . . O),
(L+K-Ijo
@f’ , &’ ... a$, q - 1, ai+Z, %+a! ‘-’ ai+K)
BLOCK CODES
Also
R, = E,(l) x 2” r-z E,
for all rates, that they remain at the zero rate level of
C/2 up to R = C/2 and then decrease linearly for rates 0.4
up to C. This is to be compared with the corresponding
result for block codes:151
0.2
E’(R)= E,(R) (29)
0
0 I R 5 C/4 0 02 0.4 0.6 0.8 1.0
R/C
NN
Fig. 6. E(R) and E,(R) for the binary symmetric channels with
C/4 _< R < 6. evolutional codes (p = 0.01, p = 0.1, p = 0.4).
The two exponents for very noisy channels (28) and (29)
are plotted in Fig. 5. The relative improvement increases repeat this L/K times. Consequently, the number of
with rate. For R = R. = C/2, the exponent for con- computations is essentially the same for the convolutional
volutional codes is almost six times that for block codes. code decoding algorithm described as is required for
While the upper and lower bound exponents are identical maximum likelihood decoding of the equivalent block
in the limiting case, we see from the example of the error- code.
bound exponents for three binary symmetric channels We should note, however, that since K - 1 zeros
(with p = 0.01, p = 0.1, and p = 0.4), shown normalized are inserted between trees of L branches, the actual
by C in Fig. 6, that as the channel becomes less noisy rate for convolution codes is reduced by a factor of
the upper and lower bounds diverge for R < R,. In L/(L + K - 1) from that of block codes, a minor loss
fact, if for all p, E;‘(p) = 0, then Eo(p) = PC, so that since, because of the greatly increased exponent, we can
R, = 6. Thus, the upper bound exponent equals R. afford to increase L (which affects P, only linearly)
for all R < C. enough to make this factor insignificant.
There remains to show that this significant improvement
over the performance of block codes is achievable without VII. A SEMI-SEQUENTIAL MODIFICATION OF THE
additional decoding complexity. But we observe that DECODING ALGORITHM
in decoding L branches or L In q nats the decoding
We observe from (22) with the substitution N = Ku =
algorithm considered makes slightly less than LqK branch
K In q/R, that
likelihood function computations or LvqK = (L/K)NqK
symbol likelihood function computations. Now the equiv- p Ro/R
alent block code transmits L In q nats in blocks of K In q E for 0 5 R = R, - c < R, (30)
nats at a rate R = In q/v = K In q/N nats/symbol,
which corresponds to transmitting one of qK words of for the specific decoding algorithm considered. However,
length N symbols. Thus, the decoder must perform NqK as we have just noted, the number of likelihood function
symbol likelihood function computations per block and computations per decoded branch is slightly less that qK,
VITERBI:ERRORBOUNDSFORCONVOLUTIONALCODES 269
which means that the error probability decreases more of the decoded path with a threshold. If it exceeds this
than linearly with computational complexity for rates threshold the total path is accepted as correct; otherwise
in this region. the algorithm is repeated with k increased by 1. Since
Now let us consider an iterated version of the previous the last N symbols occur after the tree has stopped
algorithm. At first we shall employ the aid of a magic branching, these can be affected by the last K branches
genie. It is clear that the nonsequential decoding algorithm only since no more than K data symbols are in the coder
can be modified to make decisions based on k branches shift register when these channel symbols are being
where k < K, the constraint length, and that the resulting generated. Thus, there are only qK possible combinations
error probability is the same as (30) with K replaced of channel symbols for the final branches which are of
by k. Thus suppose the decoder attempts to decode the length N channel symbols. The upper bound on the
L-branch tree using k = 1 and at the end of the tree probability of error for a threshold decision involving qK
the genie either tells him he is correct or requires him code words of block length N selected independently isL1ll
to start over with k = 2 and that he proceeds in this
P, < 2 exp [-NE,(R)]
way each time increasing k by 1 until he is either told
he is correct or he reaches the constraint length K. Then, where
since at each iteration the number of computations is
increased by a factor q, the number of computations per E,(R) = max (max
branch performed by the end of the lcth iteration is P(Z) O<P<l
q + q2 + *** + qk = [q(qk - l)/(q - l)] < 2qk. Thus,
denoting the total number of computations per branch
by y, we have using (30), OiR<C
uq - 1) and
Prob (7 > 29”) < l _ q-e,R (q”)-““‘“f
R=K1nqIb as before.
N V