Viterbi Algorithm
Viterbi Algorithm
net/publication/3196488
CITATIONS READS
23 439
1 author:
Jeremiah F. Hayes
Concordia University Montreal
140 PUBLICATIONS 1,767 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jeremiah F. Hayes on 28 August 2019.
Originally published in
IEEE Communications Magazine
March 1975 -Volume 13, Number 2
AUTHOR’S INTRODUCTION
hen I wrote the tutorial on the Viterbi algo- nel trellis coding. This was the successful combination of
rithm (VA), I was a member of the data theory modulation and coding (codulation) that had been talked
g r o u p a t Bell Labs, w h o s e m a i n work was about for some time, where signals were designed accord-
voiceband modems. State of the art was the ing to Euclidean rather than Hamming distance. A key
9600 bps modem, which was about the size of element of this technique was t h e Viterbi algorithm,
a VCR and cost $10,000. (At the time, the rule of thumb already used for convolutional codes. This made the Viter-
o n cost was a d o l l a r p e r bps.) I n t h a t g e n e r a t i o n of bi algorithm a standard component of tens of millions of
modems, the only error-control measure was automatic high-speed modems, and firmly established the value of
adaptive equalizers, which combated intersymbol interfer- research in coding technologies. The symbol “VA” is ubiq-
ence (ISI). T h e application of coding theory was at low uitous in the block diagrams of modern receivers.
ebb and had no place in the modem. Our interest in the Essentially, the VA finds a path through any Markov
Viterbi algorithm was in its ability to deal with ISI. Per- graph, which is a sequence of states governed by a Markov
haps the VA could help us improve the rate to 14,400 bps chain. The many practical applications of the VA go well
on dial-up connections. Dared we to dream of 19,200 bps beyond convolutional decoding and channel trellis decod-
in some distant future? T h e Shannon capacity of voice- ing. It is also used for fading communication channels,
band lines was estimated at something in the neighbor- partial response channels in recording systems, optical
hood of 25,000 bps. character recognition, and voice recognition. The model is
In 1975 it was unimaginable that modems would be a so fundamental that one would expect ever-widening appli-
small component of a computer weighting less than five cation. Recently, it was applied to DNA sequence analysis.
pounds and would reach rates in excess of 56,000 bps. The T h e interesting b u t still not widely used algorithm
Shannon capacity of telephone channels has increased as described in 1975 has proved to be a key building block of
the telephone network improved through the deployment modern information infrastructure.
of modern digital technology and optical fiber. Digital I would like to close by saluting the role of the editor of
technology has also allowed the implementation of modu- the magazine at that time, Steve Weinstein, who was also a
lation and coding techniques that have made a quantum member of the data theory group. H e solicited the article
leap in the state of the art of digital transmission. and provided encouragement and much needed criticism
To my mind the opening shot of the revolution in mod- while I was writing it. The final draft would not have been
ulation and coding was the work of Ungerboeck on chan- what it was without his work.
INTERSYMBOL
INTERFERENCE
A major impairment encountered in the high-
speed transmission of digital data over voice
frequency lines is intersymbol interference.
Consider the situation depicted in Fig. l a , in
which a single pulse is transmitted over a rela-
tivcly narrowband channel, resulting in the
pulse being smeared in time at the output. A
sequencc of pulses, such as is pulse amplitude
modulation systems, suffers intersymbol inter-
ference when the energy from one pulse spills
over into adjacent symbol intervals so as to
interfere with thc detection of these adjacent
pulses (Fig. lb). Thus, a sample at the center
of a symbol interval is a wcighted sum of ampli- FIGURE 2. A pedeshion exomple
tudes of pulses in several adjacent intervals.
This effect, combined with random noise, leads
to error. complexity, it is s h , in most cases of practical
The current practice is to minimize the effect interest, beyond the capability of present-day
of intersymbol interference by channel equaliza- processors. This difficulty will, perhaps, he over-
tion [9], which adjusts the pulse shape so that it come with the growth of computer technology.
does not interfere with neighboring pulses at Moreover, there are suboptimum implementa-
pulse centers (Fig. l a and b). Although this tions that may yield performance close to the
approach is effective in many cases, minimizing optimum.
the effect of intersymbol interference in this way
is inherently suboptimum, since even the inter- EXAMPLE OF DYNAMICPROGRAMMING
ference contains information about the symbols
that were transmitted. In theory, when the chan- Dynamic programming is essentially a computa-
nel causes a time dispersion of signal energy, the tional procedure for finding an optimum path or
whole received signal rather than center values trajectory. The following rather pedestrian exam-
should be used to detect any symbol or group of ple3 will serve to illustrate its basic principles. A
symbols. Heretofore, the obstacle to optimum certain Professor X walks each day from his
detection of a whole sequence of pulses has office in the E E building to the faculty lounge
been computational complexity. The number of for lunch (Fig. 2). Between the two buildings lie ’Anumberoftatson the
computations required i n a straightforward two small streams, christened by some campus subject have been wvitten.
approach grows exponentially with the length of wag as the Publish and the Perish. Each stream See, for unmple, [ I ] and
the transmitted sequence. Furthermore, compu- runs north to south and is crossed by two foot PI.
tation cannot begin until the entire sequence has bridges. In our example, these bridges shall be
been received. designated by the stream they cross and by the For Y treatment of the
The significance of the dynamic programming appellation north or south. One day our scholar- viterbi algodhmfrom a
approach is that the number of computations ly friend decides to find the shortest path to the difierentpoint of view we
required for optimum detection grows only lin- faculty lounge. He could, of course, simply calcu- recommend [ZI].
early with the length of the transmitted late the length of all possible paths and choose
sequence, and hence computations can be car- the shortest. However, sensing that a general For P somewhnt more
ried out while the sequence is bcing received. principle is involved, Professor X eschews the complex unmple, see 12,
Although this approach reduces computational brute force approach. He first writes down the ch. I].
!L
via south bridge
_-__
H TAB11 1.
_,---I__ _-__ II
0.8
--
1.0
_-
Comporison of fofol distances to the foculiy lounge.
1.3 whercp(1) is the transmitted pulse and the sym-
bol rate is l i T Bd. The bit rate over the base-
band channel is (l/T)logzL bitsis.
The output of the baseband channel is writ-
ten
distances from his office to the two bridges N
across the Publish. He thenpostulates that the y(t) = C aih(t - iT) + n(t) (2)
optimum path is via the north bridge across the i=l
Perish. Under this assumption, he calculates the
minimum path from his office to this bridge by where " ( I ) is white Gaussian noise with dou-
comparing the two paths over the Publish. The ble-sided power density spectrum Noi2 WlHz
same procedure is repeated for the south bridge and where h(t) is the convolution ofp(t) with
across the Perish. At this point, the professor the impulse response of the baseband channel.
notes that for the purpose of further calcula- In the following derivation we shall, for sim-
tions, hc need only keep track of the shortest plicity, refer to h ( t ) as t h e channel impulse
path to the north bridge on the Perish and the response.
shortest path to the south bridge on the Perish. A key assumption is that h(t) has finite dura-
In observing this simplification, the good profes- tion m T (as suggested in Fig. l). This assump-
sor has hit upon the basic principle of dynamic tion has two consequences. First, all elements of
programming, the principle of optimalily. The the N symbol sequence are received in the finite
optimum total path must lie along the optimum interval 0 S t 5 z, ( N + m)T < z c m. The sec-
path from his office to either the north or south ond consequence bears upon a term that arises
bridge across the Perish. The final step is a com- in the sequel. We make the definition
parison of the total distances to the lounge via
the north and south bridge across the Perish. c-jL$h(t- iT)h(t- jT)dt. (3a)
The step-by-step procedure followed by Profes-
sor X is shown in Table 1 (note the distances in Now the finite memory of the channel implies
Fig. 2). that
In carrying.out these calculations, six addi-
tions are necessary. Brute force enumeration r i j = 0, for l i j l > m. (3b)
would require eight additions. Now, if Professor We shall refer t o m as the memory of the chan-
X had to cross N streams, each with two bridges, nel in units of T.
dynamic programming would require 4(N - 1)+2
additions, whereas straight enumeration would LIKELIHOOD SEQUENCEESTIMATION
MAXIMUM
require ( N - l ) Z N additions. Notice the differ-
ence between linear and exponential growth with O u r objective is to o p e r a t e on the received
N here. signaly(t), 0 < f < 7 so as to produce an esti-
This example illustrates forward dynamic pro- mate n;, a;, . . _ , a iof the sequence of transmit-
gramming since the computation proceeds from ted symbols a l , az, ... , aN. Given that y ( t )
the starting point of the journey. The computa- is perturbed by additive noise, we cannot repro-
tion could just as well have been carried out duce the transmitted sequence with certainty.
from the faculty lounge working backward, illus- Rather, we seek to minimize the probability
trating backward dynamic p r ~ g r a m m i n gThe
.~ of sequence error,6 i.e., the probability that
principle of optimality applies to both; the opti- a i*, a z* , ... , a i is different from n l , az, ..., aN.
For our example, the mum total path must lie along an optimum sub- Under o u r assumptions, on t h e transmitted
distinction between bnck- path from the beginning o r end to any sequence, maximum likelihood sequence estima-
word and Jonvard is hiv- intermediate point. This principle, applied to tion (MLSE) produces this minimum error prob-
iaL However, the- am finite dimensional problems, gives rise to systcm- ability.
problems that naturallyft atic and efficient algorithms for calculating opti- In order to define MLSE, first define the
one or the other. We shall mum paths. probability density functional
see one shortb.
p[y(t), 0 < t 5 T I at = Zl, a 2 = 22, ... ,
The derivation ofthe
BASEBANDSIGNAL MODEL
aN = L?N]
Yiterbialgarithmpresent- We shall now relate this general mathematical
edinthesequelisdueto theory to the reception of digital data signals. as the probability thaty(r), 0 S t < z is received
Ungerbwck (101, who Let us first consider a mathematical model of a under the assumption that the transmitted sym-
alsoconsideredthepass- , baseband signal, i.e., a signal not modulated bols are $1, a^z, ... , $N. Notice that for a particu-
band case. I onto a carrier.5 Let the sequence of numbers, lar received signal, there are LN values of this
called information symbols, to be transmitted be quantity since there are LN sequences a^,, . . . , a ^ ~ .
6 In r? later section, the denoted a l , ..., a w N , the number of informa- In MLSE we estimate the transmitted sequence
dbrinction between tion symbols, is large but finite. It is assumed to be the sequence that maximizes this likeli-
sequence error and bit that these symbols are independent and can each hood.' Ostensibly, L N calculations are required
e m r is discussed. assume L equally probable values. These sym- to find this maximum. The virtue of the Viterbi
bok amplitude modulate a train of pulses occur- algorithm is that the number of calculations nec-
'Foradisc~sionofdeci- ring at intervals T to produce the transmitted essary for MLSE grows linearly with N rather
sion mles, see [ I l l . waveform than exponentially.
N N N
D= - 2 x iiZi + I
iiijq.j
i-1 i-1 j-I
We now expand the quadratic term under the N-I N-l N-l
integral sign, resulting in three terms, one of = -2 1i i Z i + 2 iiiiri-j
which is i-I i-1 j-I (6)
J;y*(t)dt.
Z , = J ~ y ( r ) h ( r - i T ) d t ; i = l2,,._.., N
can he viewed as the sampled output of the filter Note that ak contains all the data symbols,
matched to the channel impulse response h(t). except for Likfl, that will determineyk+l. There
All that we need to know about the received sig- is a one-to-one correspondence between a
nal y(f) is in these samples. The term sequence of state vectors a,,,,am+l,..., ON and
an estimated sequence of transmitted symbols
^ ^
a l , a*, _..,CN, although it is apparent that the set
of state vectors has much redundancy. The prob-
lem of choosing an optimum sequence from the * The random processy(r)
indicates our knowledge of the memory in the set a ] , .._,aN can therefore be recast as that of ir npproximaied by a
channels (see Eq. (3b)). choosing an optimum a,,,,..., UN. Estimating the sequence of ffirhunen-
optimum sequence of states a , a,+l, ..., U N Loeve exponrions. AN of
can be viewed as optimum path selection through rhe coeficienrs in these
THE VITERBI ALGORITHM ..
a laitice representing the states. In Fig. 3 such a apnnsions are Gaussian
Although Eq. ( 5 ) reflects considerable simplifi- lattice is shown for the case L = 2 and m = 3. random vorinbles. A limir
cation, a brute force approach to its minimiza- The dotted lines indicate the transitions that can of these erponsions yieldr
tion requires LN calculations. Like Professor X, be made from one state to another. For exam- Eq. (4). For details, see
we shall eschew this approach. By suitable defi- ple, state o6 = {a^,a^,i6} = {-I, + I , -1) can (121 and (131.
~ min
n,,.,,,o u(z~.....z N ; a , , ..., O N ) .
This decomposition can be continued. It is
Expanding upon this notation, we write easiest to express this with some additional nota-
tion. We write
-
F(ok+,) min IV(Z~+l;ok,~~+~)+
U*]"*.,
F(o,)l,
(14)
k = m, ...,N - I
Equation (10) indicates that thc minimization where
is carried out in two steps:
F(Uk)&U(A,, ...,Zm;om)
1) With aNheld fixed to one of its L" values,
minimize over a,*,..., aN-, (LN-' cnmpar- and