Viterbi
Viterbi
net/publication/338924737
CITATIONS READS
0 902
1 author:
John Wiss
Parsons
27 PUBLICATIONS 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by John Wiss on 30 January 2020.
PRELIMINARIES
This memo addresses the design of the rate ½ core for implementing a Viterbi decoder
suitable for IEEE 802.11a and other wireless standards.
DECODER ARCHITECTURE
A Viterbi decoder consists of three major blocks, a branch metric computer, a
depuncturing block, and a Viterbi Algorithm (VA) block. One could also claim that the
soft-decisioning block is included as part of the decoder. The generation of soft decisions
is quite different in OFDM than in traditional TCM or PSK based PHYs and is the topic
of a separate memo.
Figure 1 shows the Viterbi decoder architecture. The VA is implemented by the Branch
Metric, Path Metric, and Traceback functions shown in the figure. The Trellis lookup
table (LUT) contains the information necessary to traverse the code’s trellis which in this
implementation contains information about the structure of the code (which can be
simply though of as the state of the convolutional encoder which defines the
convolutional code that this decoder is designed for). The “state” merely means the
contents of the shift registers in the encoder at a particular time k. Figure 2 shows the
convolutional encoder for a particular “parent” rate ½ code. The bits b0-b5 can be
interpreted as a 6-bit binary integer which defines the “state” of the encoder. All the VA
attempts to do is estimate the state of the encoder (and hence the user bits uk) at each time
k based on the coded bitstream c10 , c 02 , c11 , c12 ... .
1
Figure 1. Viterbi Decoder Block Diagram
c01 c11...
2
BLOCKS OF THE VITERBI DECODER
The following paragraphs address the processing blocks in the Viterbi decoder.
Figure 3 shows a portion of one trellis segment (over one time period k) which is
maintained in the memory of the Viterbi decoder and is defined by the code structure (in
this case the shift register and the taps that are added modulo 2 in the encoder). For our
64-state code there are 128 “branches” or “edges” that represent the change in the
encoder state that occurs when a binary user bit u “1” or “0” is clocked into the encoder
at time k. The code bits c1k and c 2k that are produced given the encoder state at time k
along with the user bit u k comprise a “branch” or “edge” which links the states at time k
with those at time k+1. Note that for our rate ½ code each state has a branch leaving and
entering it for the two possibilities of user bit u k . The branch metric is used to compute a
set of path metrics (which are equal to the number of states of the code) updated at each
time k which is used by the Viterbi Algorithm to find the most likely sequence of encoder
states and in turn, the most likely user bit sequence {u} .
The Viterbi Algorithm as implemented in the decoder computes a branch metric which is
equal to the summation of the differences between each code bit c1k and c 2k defining an
edge in the trellis from the soft decision corresponding to each of c1k and c 2k in the
following way:
3
There are four possible branch metrics BM(0)-BM(4) for the rate ½ code corresponding
each possible bit combination of c1k and c 2k even though for a given edge only two will be
needed since there are only two edges entering a given state at time k. In the
implementation with LOG2(N)-bit soft decisions the branch metrics are constructed as
follows:
SD(0) SD(1)
BM(0) = SD(0) + SD(1) 0 0
BM(1) = {N-1-SD(0)} + SD(1) 1 0 (1)
BM(2) = SD(0) + {N-1-SD(1)} 0 1
BM(3) = {N-1-SD(0)} + {N-1-SD(1)} 1 1
Note that SD(0) refers to c1k and SD(1) refers to c 2k . In QPSK the SD(0) will represent
the soft decisioning along the I axis and SD(1) will represent soft decisioning along the Q
axis. For higher order constellations the mapping is more complex. Note that the branch
metric is minimized when the received symbol bits from the slicer corresponds to a
strong “0” or strong “1” as indicated in the columns SD(0) and SD(1) in equation (1).
The smallest branch metric corresponds to the best segment of the path metric. For
punctured code rates an erase feature for SD(0) and SD(1) when either of or both of these
bits has been erased by the puncturing circuit in the transmitter must be implemented.
The erase flag will modify equation (1) as follows:
SD(0) SD(1)
BM(0) = e0SD(0) + e1SD(1) 0 0
BM(1) = e0{N-1-SD(0)} + e1SD(1) 1 0 (1a)
BM(2) = e0SD(0) + e1{N-1-SD(1)} 0 1
BM(3) = e0{N-1-SD(0)} + e1{N-1-SD(1)} 1 1
Where: e0 , e1 = 0 for erased bits corresponding to SD(0) and SD(1) respectively and
are unity otherwise.
4
gets to the current state by u k = 1. Thus at the new time k+1 we have two possible path
metric choices for each state i:
{
PM i (k + 1) = Min PM io (k + 1), PM 1i (k + 1)} (3)
And the survivor path for state i which will be written in the Traceback RAM is updated
as:
0, if PM io (k + 1) < PM 1i (k + 1)
SPi (k + 1) = (4)
1, otherwise
Note that in equation (2) the term: PM 1ps=1 (k ) refers to the Path Metric of the previous
state that leads to the current state i by an edge corresponding to the user bit u k = 1 which
is defined by the trellis structure. The same applies for PM ops=o (k ) for u k = 0. Since there
is no feedback in the encoder the previous states for the u k = 1, and u k = 0 hypotheses is
very simple and will be given later. The selection of which branch metric BM(k) to use
however is tied intimately with the code structure and hence the trellis definition for the
particular parent code being used.
Renormalization/PM saturation
Since the length of a particular packet may be many times the traceback depth of the VA,
some method for limiting the path metrics should be employed in order to make the ACS
units’ adders as small as possible and to conserve memory size for the 64 PM values.
The way to achieve this is to “renormalize” the decoder and to apply saturation to the PM
values. Saturation just makes sure that large PMs (indicating a unlikely path through the
trellis which will not be selected as the survivor path if valid decoding is occurring) do
not overflow into small path metrics which will confuse the minimum PM selection
circuit. Renormalization allows the path metrics to be relatively correct—meaning that a
global adjustment of the path metrics will be applied to all metrics to keep them within a
desired range, There are many ways to renormalize; one way is to subtract a constant
(all PMs are positive unsigned numbers) from all the PMs if all of the PMs exceed this
constant.
5
TRACEBACK AND DECODING
The principle of the decoding process is to eventually (and continuously) decode the bits
comprising the most likely path backward through the traceback RAM based on the
smallest accumulated path metric up to the point when decoding begins. The TB RAM
contains a set of bit sequences representing the user bits u k for each possible survivor
path through the trellis (for 64 possible paths) . The reason that the RAM is traced
backward is because the path metrics are “mature” after a period of time and the
sequence of source bits are obtained backward when traversing the TB RAM from the
best state (the one with the smallest PM). Thus the bits need to be accessed of course in
the order they were encoded so a LIFO function may be used for the decoded bits stored
in the TB RAM. There are three functions occurring at the same time in the decoder;
1. The TB RAM is updated at each time step with the survivor paths
2. The best path traceback is occurring to find the maximum likelihood user bit
sequence
3. The decoder is outputting decoded data
Usually the TB RAM is updated in order in a circular fashion and the traceback functions
and data output functions are implemented in different parts of the RAM at any instant.
The manipulation of pointers to the RAM and the depth of the RAM is chosen to produce
good BER results and easy hardware implementation. Trellis termination by intentional
flushing of the encoder may also need to be handled in the decoder as well as continuous
decoding.
Figures 4a and 4b show a very memory efficient way to manage the traceback RAM (TB
RAM) for decoding. The RAM is split into 6 banks which are accessed by 4 pointers
which control the writing and accessing of the TB RAM. The RAM is equal to 3 times
the traceback depth of the decoder and is broken into six segments that are ½ of the TB
Depth (HTBL). The survivor paths are managed by the write pointer shown in the
figure. The write pointer is always incremented by 1 at each time step and the pointer
wraps modulo{6*HTBL}. The traceback pointers: traceback 1, traceback 2, and
output are always decremented by 1 unit except when the time index is a multiple of
HTBL in which the context of the RAMs advances forward by one of the 6 banks (the
pointer is advanced by 2*HTBL from the location of the pointers just prior to the HTBL
time boundary).
The two traceback pointers are set by initially choosing the best path metric at a given
time (k=N*HTBL). The best state (BS) is selected based on the lowest PM for all 64
states. This state is then traversed backward by traceback 1 from BS by walking
backward through the survivor path which ends at BS. All of the steps of the tracebacks
occur one step per input to the decoder. Now at step HTBL the final state of traceback
1, FS1 (which is really the starting state of the survivor path ending at BS) is then taken
as the starting state for traceback 2. Also the final state of traceback 2 (FS2) is taken as
the initial state of the output pointer.
6
HTBL
k=0
64
k=0
k=6*HTBL-1
FS 1
FS 2
write traceback 2
output
k=HTBL
BS
traceback 1
k=0
k=6*HTBL-1
FS 2
write output
FS 1
k=2*HTBL
traceback 2
BS
traceback 1
k=0
k=6*HTBL-1
7
write FS 2
k=3*HTBL output
BS
traceback 1
FS 1
traceback 2
k=0
k=6*HTBL-1
FS 1
write
k=4*HTBL
traceback 2
BS
traceback 1
FS 2
output
k=0
k=6*HTBL-1
FS 2
write
output BS
k=5*HTBL
traceback 1
FS 1
traceback 2
k=0
k=6*HTBL-1
8
Thus at each HTBL time step the best state serves as the starting state with “infinite
memory” since the traceback RAM is not reinitialized except at message boundaries
which may be thousands of bits long. The output traceback is always started at 2*HTBL
(or the traceback depth) time steps after the start of traceback 1 from the best state BS.
Every HTBL time steps the 6 RAM banks are advanced such that the traceback 1 RAM
starts at the best state (lowest PM) where the write RAM pointer just finished, the
traceback 2 RAM traceback starts at the final state from the traceback 1 RAM
traceback, and the output RAM traceback starts at the final state from the traceback 2
RAM traceback and so on…. Thus the entire traceback starts from the best state and
proceeds backwards through the traceback 1 RAM, the traceback 2 RAM and finally
the output RAM. The total delay through the decoder is 6*HTBL or three times the
traceback depth.
OUTPUT LIFO
A LIFO function is used in conjunction with the output RAM. The LIFO will serve to
reverse the bits stored in the survivor path that has the best path metric state at each
HTBL boundary. Essentially the LIFO operate as shown in Figure 5:
Write 0
Write Read 0
Initial
Direction Then Write 0
Write Write 1
Direction Read 1
Then Write 1
Read N-2
Write
Then Write N-2
Direction
Read N-1
(LIFO Full)
Then Write N-1
The LIFO always uses a read followed by write pattern that zigzags up and down the
LIFO buffer. As seen in Figure 5, the second drawing shows that the read pointer is
reversing the order that the data is written into the LIFO. In the third drawing again the
data written at the previous step is being read out backwards. The depth of the LIFO is
equal to HTBL. It can be shown that continuous operation of the LIFO will always
reverse the outputs modulo the LIFO depth (equal to HTBL in this case).
9
QUICK TRACEBACK MODE
The IEEE 802.11a burst packet structure has a header that is encoded using rate ½ coding
and is exactly always 24 information bits (48 encoded bit) long, the last 6 information
bits into the encoder are all zeros (flush) bits to allow for rapid reliable decoding. The
header must be rapidly decoded in order to configure the receiver for the modulation type
used for the data portion of the burst (the slicer for example) very quickly.
It appears that a good value to use for HTBL happens to be 24 to allow good performance
for the punctured codes (implying a traceback length of 48). If this is indeed the case
then modification of the Viterbi continuous decoding mode to support the quick
traceback is particularly simple. Essentially all that is required is to initialize the output
pointer to point a TBRAM location = 6*HTBL-1 instead of 2*HTBL-1 and to force the
output traceback state to start at state 0. This will trick the Viterbi decoder into decoding
the message right after the deinterleaved header soft decisions are clocked into the
decoder. The output of the LIFO containing the decoded header will then start appearing
24 time steps later rather than waiting for the full decoding delay of 6*HTBL (144 time
steps).
So the Quick Traceback feature (for HTBL = 24) consists of the following:
Upon completion of the quick traceback the slicer and decoder/depuncturer must be
initialized for continuous decoding with the correct puncture rate for the remaining data
symbols in the OFDM burst:
10
RENORMALIZATION AND VITERBI PARAMETERS
The Viterbi decoder path metrics may grow without bound if some rescaling of the path
metrics are not performed. There are a few techniques for renormalization of the decoder
path metrics. One type of renormalization consists of subtracting a constant from all of
the path metrics if all of the metrics (all 64) are above a so-called “renormalization
threshold” (and saturates the metrics to a maximum allowed value) as follows:
Where pm0 and pm1 are the path metric contenders into the ACS block. Also PM_sat is
the saturation value.
A later memo will specify the above mentioned values for good decoder performance
with IEEE 802.11a.
11
TRELLIS DEFINITION
CS PS 0 PS 1 BM 0 BM 1
0 0 32 0 3
1 0 32 3 0
2 1 33 2 1
3 1 33 1 2
4 2 34 3 0
5 2 34 0 3
6 3 35 1 2
7 3 35 2 1
8 4 36 3 0
9 4 36 0 3
10 5 37 1 2
11 5 37 2 1
12 6 38 0 3
13 6 38 3 0
14 7 39 2 1
15 7 39 1 2
16 8 40 0 3
17 8 40 3 0
18 9 41 2 1
19 9 41 1 2
20 10 42 3 0
21 10 42 0 3
22 11 43 1 2
23 11 43 2 1
24 12 44 3 0
25 12 44 0 3
26 13 45 1 2
27 13 45 2 1
28 14 46 0 3
29 14 46 3 0
30 15 47 2 1
31 15 47 1 2
32 16 48 1 2
33 16 48 2 1
34 17 49 3 0
35 17 49 0 3
36 18 50 2 1
37 18 50 1 2
38 19 51 0 3
39 19 51 3 0
40 20 52 2 1
41 20 52 1 2
42 21 53 0 3
43 21 53 3 0
44 22 54 1 2
45 22 54 2 1
46 23 55 3 0
47 23 55 0 3
48 24 56 1 2
49 24 56 2 1
50 25 57 3 0
51 25 57 0 3
52 26 58 2 1
53 26 58 1 2
54 27 59 0 3
55 27 59 3 0
56 28 60 2 1
57 28 60 1 2
58 29 61 0 3
59 29 61 3 0
60 30 62 1 2
61 30 62 2 1
62 31 63 3 0
63 31 63 0 3
12
CS = Current State
PS 0 = Previous State for u k = 0
PS 1 = Previous State for u k = 1
BM 0 = Branch Metric to use for u k = 0
BM 1 = Branch Metric to use for u k = 1
The bit order for the BM numbering is given above in equation (1a).
PUNCTURING PATTERNS
Figure 6 shows the puncturing patterns for the rate 2/3 and 3/4 codes:
13