Document Content 4 Am MINE
Document Content 4 Am MINE
COMMUNICATION
CHAPTER 1
INTRODUCTION
The use of convolution encoder with probabilistic decoding can significantly improve
the error performance of a communication system. The Viterbi algorithm , which is widely
used decoding algorithms, is optimal, but its complexity in both number of computations and
memory requirement exponentially increases with the constraint length k of the code. Hence
when the codes with a longer constraint length are required in order to achieve a low error
probability, decoding algorithms whose complexity does not depend on k becomes
attractive.Several multiple paths, Breath first decoding algorithms, such as M-algorithm,
Simmon’s algorithm have been proposed to the alternatives of the Viterbi algorithm .
Unfortunately, with these algorithms, should the correct path be lost, then its recovery is
rather difficult, leading to very long error events.
The error propagation is usually contained by organizing the data in frames or blocks
with a known starting state or by using some special recovery schemes . Trellis coded
modulation schemes are used in many bandwidth efficient systems. Typically, a TCM uses
convolutional code, which leads to high complexity of the viterbi decoder for the TCM
decoder, even if the constraint length of the convolutional code is moderate. For example, the
rate ¾ convolutional code used in a TCM system, has constraint length of having the
computational complexity is equivalent that of a VD for a rate ½ convolutional code with a
constraint length of 9 due to the large number of transitions in the trellis .
So, in terms of power consumption, the viterbi decoder is a dominant module in a TCM
decoder. General solutions for low power viterbi decoder design have been studied by
existing work. Power reduction in VD s could be achieved by reducing the number of states or
by over scaling the supply voltage. Over scaling of the supply voltage usually needs to take
into the whole consideration the whole system that includes includes the VD, at which we are
focusing at our research. T-algorithm has been shown to be very efficient in reducing the
power consumption.
However searching for the optimal PM in the feedback loop still reduces the decoding
speed. To overcome this drawback, two variations of the T-algorithm have been proposed, the
relaxed adaptive viterbi decoder , which suggests using an estimated optimal PM, instead of
finding the real one each cycle and the limited search parallel state VD based on scarce
transition (SST) . When applied to the high rate convolution codes the relaxed adaptive viterbi
decoder suffers a severe degradation of bit-error-rate performance due to inherent drifting
error between the estimated optimal PM and the accurate one
On the other side, the SST based scheme requires predecoding and reencoding process
and is not suitable for TCM decoders. Here we propose an add-compare-select unit (ACSU)
for VDs incorporating T-algorithm for a rate ½ code, which will decrease the power
efficiently. A systematic way is shown to analyze and to achieve the theoretical iteration
bound. We discuss low power viterbi decoder design for the rate ½ code. Finally, simulation
results of convolutional encoder and the VD have been reported.
Then BMs are fed into the ACSU that recursively computes PMs and output
decision bits for each possible state transition. After that the decision bits are stored in and
retrieved from the SMU in order to decode the source bits along the final survivor path. T-
algorithm requires extra computation in the ACSU loop for calculating optimal PM. So It
automatically reduce the decoding speed.
For the decoding of convolutional codes we can observe two types soft decision decoding and
hard decision decoding. Soft decision decoding is much complex at which we are not
concentrating at our research.
2. Memory Requirements
Since the algorithm maintains a trellis structure and stores path metrics for each
state, the memory usage can become excessive, especially for long constraint lengths. The
need to store survivor paths for traceback operations further increases memory demands. In
resource- constrained environments, such as embedded systems or low-power devices, this
limitation becomes a critical concern.
3. Latency Issues
The Viterbi decoder introduces significant latency due to the need to process the
entire trellis before making final decoding decisions. This is particularly problematic for real-
time applications such as high-speed wireless communication, video streaming, and satellite
communications, where low-latency decoding is essential. The traceback operation further
adds to the delay, making it unsuitable for time-sensitive applications.
4. Power Consumption
High computational complexity and memory usage directly impact power
consumption. The Viterbi decoder continuously updates path metrics, survivor paths, and
performs traceback operations, leading to increased energy consumption. This is a significant
drawback in battery-operated devices such as mobile phones, IoT devices, and satellites,
where energy efficiency is a priority.
5. Performance Degradation for Noisy Channels
While the Viterbi algorithm is an optimal Maximum Likelihood Sequence
Estimation (MLSE) decoder, its performance degrades in highly noisy environments. In
channels with severe interference, fading, or burst errors, the decoded sequence may suffer
from error propagation, reducing the effectiveness of the decoder. Techniques like soft-
decision decoding help improve performance but further increase computational complexity.
6. Inflexibility in Handling Varying Code Rates
The standard Viterbi decoder is optimized for a fixed convolutional code rate. If
the system requires adaptive coding rates to handle dynamic channel conditions, a separate
decoder or additional modifications like puncturing and depuncturing are needed. This
increases system complexity and requires additional processing steps.
7. Limited Suitability for High-Rate Codes
For high-rate convolutional codes, the number of states increases significantly,
making the Viterbi algorithm computationally expensive. Alternative decoding
approaches,such as the Turbo decoder or LDPC decoder, become more efficient for such
sens.
DEPT,ELECTRONICS AND COMMUNICATION ENGINEERING, SIET,AMALAPURAM 5
MODIFICATION OF VITERBI DECODER FOR HIGH SPEED
COMMUNICATION
CHAPTER 2
PROPOSED SYSTEM
In this project we propose an architecture for viterbi decoder with T-algorithm which can
effectively reduce the power consumption with a negligible decrease in speed.Implementation
result is for code rate ½ with constraint length 9 used for trellis coded modulation. This
architecture reduces the power up to 64% without any performance loss when compared with
the ideal viterbi decoder, while the degradation in the clock speed is negligible.Viterbi
decoder with T-algorithm.The viterbi decoder with T–algorithm is shown in the figure 2.1. As
compared with the Ideal viterbi decoder here we will have the extra computation like
Threshold generator and purge unit. We will modify the path of ACSU by the extra
computation of threshold generator and purge unit as shown in the figure 2.1. By using the
Threshold generator the number of computations will decrease, by that the power
consumption of entire system will decrease. The theoretical iteration bound also we will
get.The rate ½ convolutional code employed in TCM system.
Preliminary bit error rate(BER) have been discussed . Estimated BER performance of the
VD employing T-algorithm with different values of T over an additive Gaussian noise
channel The simulation is based on TCM employing the rate 1/2 code. Compared to the ideal
viterbi algorithm, the threshold ‘Tpm’ can be lowered to 0.3 with less than 0.1 db
performance loss. The functional block diagram of the VD with T-algorithm is shown in
fig. 3.B.16. The minimum value of each BM group (BMG) can be calculated in BMU or
TMU and then a passed to the ‘Threshold Generator’ unit (TGU) to calculate (PMopt+T).
(PMopt+T) and the new PMs are then compared in the ‘purge unit’ (PU). The 64 states and
PMs are labeled from 0 to
63. The precomputation steps is expressed as…
PMopt= min[min{min(cluster0(n-2))+min(BMG0(n-1)),
min(cluster1(n-2))+min(BMG1(n-1)),
min(cluster2(n-2))+min(BMG3(n-1))
min(cluster3(n-2))+min(BMG2(n-1))}+min(even BMs(n),
min{min(cluser0(n-2))+min(BMG1(n-1))
min(cluster1(n-2)+min(BMG0(n-1))
min(cluster2(n-2))+min(BMG2(n-1)),
min(cluster3(n-2))+min(BMG3(n-1))}+min(oddBMs(n))]
We can obtain (PMopt+T) during the period when ACSU updates for new PMs. The only extra
calculation for T-algorithm is the comparison between the (PMopt+T) and all the PMs.
Therefore the critical path is greatly shortened as the eq.
TT-algo=Tadder+T4-in_comp+2T2in_comp
Encoder Output:
Function: This block represents the sequence of encoded symbols generated by the
Convolutional Encoder. The encoder takes the original data bits and adds redundancy
according to the defined code rate and constraint length. This output is the direct input to the
Viterbi decoder, replacing the channel symbols that would be present in a system with a
noisy channel. It is assumed the data is being sent through a channel that is noisy, and the
decoder is receiving the corrupted data.
The T-algorithm is also the iteration bond we can get for viterbi decoder when T-
algorithm is employed. The functional block of Fig. 16 is slightly different from the Fig. 14,
where the minimum BM is sent to the PPAU from the BMU. Since the estimated optimal PM
calculated in each cycle, an accurate an optimal PM is also needed every 6 to 7 cycles to
compensate for the estimated one. For example, at time slot n, the decoder memorizes
PMopt_esti(n) and PMs(n). After 7 cycles, PMopt_accu(n)- PMopt_esti(n) is added to
PMopt_esti(n+7). The problem with this compensation scheme is that the error between
PMopt_esti and PMopt_accu accumulates over at least 7-cycles due to the inherent delay of
the scheme itself.
The Branch metric can be calculated by two types: Hamming distance and Euclidean
distance . Consider a VD for a convoluional code with a constraint length k, where each state
receives p candidate paths. First, we expand Ps at the current time slot n(Ps(n)) as a function
of Ps(n-1)to form a look-ahead computation of the optimal P-Popt (n). If branch metrics are
calculated based on the Euclidean distance, popt(n) is the minimum value of Ps(n) can be get
as
popt(n) =min{p0(n),p1(n),……..p2-1k(n)} (1)
=min{min[p0,0(n-1)+B0,0(n),p0,1(n-1)+B0,1(n)………., p0,p(n-1)+B0,p(n)],
Min[p1,0(n-1)+B1,0(n),p1,1(n-1)+B1,1(n),……,p1,p(n-1) +B1,p(n)],...........,
In the design example , with a coding rate of ¾ and constraint length of 7, the minimum
precomputation steps for the VD to meet the iteration bound is 2 according to (4). It is the
same value as we obtained from direct architecture design . In some cases, the number of
remaining metrics may slightly expand during a certain pipeline stage after addition with Bs.
Usually, the extra delay can be absorbed by an optimized architecture or circuit design. Even
if the extra delay is hard to eliminate, the resultant clock speed is very close to the theoretical
bound. To fully achieve the iteration bound, we could add another pipeline stage, though it is
very costly.
Computational overhead (compared with conventional T-algorithm) is an important
factor that should be carefully evaluated. Most of the computational overhead comes from
adding Bs to the metrics at each stage as indicated in (4). In other words, If there are m
remaining metrics after comparison in a stage, the computational overhead from this stage is
at least m addition operations. The exact overhead varies from case to case based on the
convolutional code’s trellis diagram. Again, to simplify the evaluation, we consider, a code
with a constraint length k and q precomputation steps. Also, we assume that each remaining
metric would cause a computational overhead of one addition operation. In this case, the
number of metrics will reduce at a ratio of 2(k-1)/q and the overall computational overhead is
(measured with addition operation)
Noverhead=20+2(k-1)/q+22(k—1)/q................+2(q-1)(k-1)/q
=2q.(k-1)/q-1/2(k-1)/q-1
=2k-1-1/2(k-1)/q-1 (7)
DEPT,ELECTRONICS AND COMMUNICATION ENGINEERING, SIET,AMALAPURAM 15
MODIFICATION OF VITERBI DECODER FOR HIGH SPEED
COMMUNICATION
The estimated computational overhead according to (7) is 63/ (26/q-1) when k=7 and
q.≤ 6, which almost exponentially to q. In a real design the overhead increases even faster
than what is given by (7) when other factors (such as comparisons or expansion of metrics as
we mentioned above) are taken into consideration. Therefore, a small number of
precomputational steps is preferred even though the iteration bound may not be fully satisfied.
In most cases, one or two-step precomputation is a good choice.
The above analysis also reveals that precomputation is not a good option for low rate
convolutional codes (rate of 1/RC, .), because it usually needs more than two steps to
effectively reduce the critical path(in that case, R=1 in(4) and qb is k-1). However, for TCM
systems, where high-rate convolutional codes are always employed, Two steps of
precomputation could achieve the iteration bound or make a big difference in terms of clock
speed. In addition, the computational overhead is a small.
In order to decode the input sequence, the survivor path, or shortest path through the
trellis must be traced. The selected minimum metric path from the ACS output points the path
from each state to its predecessor. In theory, decoding of the shortest path would require the
processing of the entire input sequence. In practice the survivor paths merge after some
number of iterations, as shown in bold lines in the 4-state example of Figure 3.B.19. From the
point they merge together, the decoding is unique. The trellis depth at which all the survivor
paths merge with high probability is referred as the survivor path length.
METHODOLOGY
Vivado “projects” are directory structures that contain all the files needed by a particular
design. Some of these files are user-created source files that describe and constrain the design, but
many others are system files created by Vivado to manage the design, simulation, and
implementation of projects. In a typical design, you will only be concerned with the user-created
source files. But, in the future, if you need more information about your design, or if you need
more precise control over certain implementation details, you can access the other files as well.
When setting up a project in Vivado, you must give the project a unique name, choose a
location to store all the project files, specify the type of project you are creating, add any pre-
existing source files or constraints files (you might add existing sources if you are modifying an
earlier design, but if you are creating a new design from scratch, you won’t add any existing files
– you haven’t written
DEPT,ELECTRONICS AND COMMUNICATION ENGINEERING, SIET,AMALAPURAM 17
MODIFICATION OF VITERBI DECODER FOR HIGH SPEED
COMMUNICATION
them yet), and finally, select which physical chip you are designing for. These steps are illustrated
below.
Start Vivado
In Windows, you can start Vivado by clicking the shortcut on the desktop. After Vivado is started,
the window should look similar to the picture in figure 1.
Click on “Create Project” in the Quick Start panel. This will open the New Project dialog as shown
in the Figure
Set Project Name and Location: Enter a name for the project. In the figure, the project name is
“project_1”, which isn’t a particularly useful name. It’s usually a good idea to make the project
name more descriptive, so you can more readily identify your designs in the future. For example, if
you design a seven-segment controller, you might call the project “seven segment controller”. For
projects related to coursework, you might include the course name and project number - for
example,
“ee214_project2”. You should avoid having spaces in the project name or location, because spaces
can cause certain tools to fail.
Select Project Type: The “project type” configures certain design tools and the IDE appearance
based on the type of project you are intending to create. Most of the time, and for all Real Digital
courses, you will choose “RTL Project” to configure the tools for the creation of a new design.
(RTL stands for Register Transfer Language, which is a term sometimes used to mean a hardware
design language like Verilog).
Add Existing Sources: In a typical new or early-stage design, you won’t add any existing sources
because you haven’t created them yet. But as you complete more designs and build up a library of
previously completed and known good designs, you may elect to add sources and them use them in
a new design. For now, there are no existing sources to add, so just click Next.
Select Parts: Xilinx produces many different parts, and the synthesizer needs to know exactly
what part you are using so it can produce the correct programming file. To specify the correct
part, you need to know the device family and package, and less critically, the speed and
temperature grades (the speed and temperature grades only affect special-purpose simulation
results, and they have no effect on the synthesizer’s ability to produce accurate circuits). You
must choose the appropriate part for the device installed on your board.
For example, the Blackboard uses a zynq device with the following attributes:
Family Zynq-7000
Package clg400
Speed Grade -1
Temperature Grade C
Check Project Configuration Summary: On the last page of the Create Project Wizard a
summary of the project configuration is shown. Verify all the information in the summary is
correct, and in particular make sure the correct FPGA part is selected. If anything is incorrect, click
back and fix it; otherwise, click Finish to finish creating an empty project.
After you have finished with the Create Project Wizard, the main IDE window will be displayed.
This is the main “working” window where you enter and simulate your Verilog code, launch the
synthesizer, and program your board. The left-most pane is the flow navigator that shows all the
current files in the project, and the processes you can run on those files. To the right of the flow
navigator is the project manager window where you enter source code, view simulation data, and
interact with your design. The console window across the bottom shows a running status log. Over
the next few projects, you will interact with all of the panels.
All projects require at least two types of source files – an HDL file (Verilog or VHDL) to
describe the circuit, and a constraints file to provide the synthesizer with the information it needs to
map your circuit into the target chip.
This tutorial presents the steps required to implement a Verilog circuit on your Real Digital
board: first, a Verilog source file is created to define the circuits behavior (again, for this tutorial,
you can simply copy or download the completed file rather than typing it); second, a constraints
files is created to define how the Verilog circuit is mapped into the Xiling logic device (again,
copied or downloaded for this tutorial); third, the Verilog source file and constraints file are
synthesized into a “.bit” file that can be programmed onto your board; and fourth, the device is
configured with the circuit.
After the Verilog source file is created, it can be directly simulated. Simulation (discussed in
more detail later) lets you work with a computer model of a circuit, so you can check its behavior
before taking the time to implement it in a physical device. The simulator lets you drive all the
circuit inputs with varying patterns over time, and to check that the outputs behave as expected
under all conditions.
After the constraint file is created, the design can be synthesized. The synthesis process
translates Verilog source code into logical operations, and it uses the constraints file to map the
logical
DEPT,ELECTRONICS AND COMMUNICATION ENGINEERING, SIET,AMALAPURAM 24
MODIFICATION OF VITERBI DECODER FOR HIGH SPEED
COMMUNICATION
operations into a given chip. In particular (for our needs here), the constraints file defines which
Verilog circuit nodes are attached to which pins on the Xilinx chip package, and therefore, which
circuit nodes are attached to which physical devices on your board. The synthesis process creates a
“bit” file that can be directly programmed into the Xilinx chip.
There are many ways to define a logic circuit, and many types of source files including
VHDL, Verilog, EDIF and NGC netlists, DCP checkpoint files, TCL scripts, System C files, and
many others. We will use the Verilog language in this course, and introduce it gradually over the
first several projects. For now, you can get familiar with some of the basic concepts by reading the
following.
To create a Verilog source file for your project, right-click on “Design Sources” in the Sources
panel, and select Add Sources. The Add Sources dialog box will appear as shown – select “Add or
create design sources” and click next.
Figure 3.10: Add or create design sources using Add Source Dialog
In the Add or Create Design Sources dialog, click on Create File, enter project1_demo as
filename, and click OK. The newly created file will appear in the list as shown. Click Finish to
move to the next step.
ModelSim /VHDL, ModelSim /VLOG, ModelSim /LNL, and ModelSim /PLUS are
produced by Model Technolog, a Mentor Graphics Corporation company. Copying, duplication, or
other reproduction is prohibited without the written consent of Model Technology. The information
in this manual is subject to change without notice and does not represent a commitment on the part
of Model Technology.
The program described in this manual is furnished under a license agreement and may not
be used or copied except in accordance with the terms of the agreement. The online documentation
provided with this product may be printed by the end-user. The number of copies that may be
printed is limited to the number of licenses purchased.
ModelSim is a registered trademark and Signal Spy, TraceX, ChaseX and Model
Technology are trademarks of Mentor Graphics Corporation. PostScript is a registered trademark of
Adobe Systems Incorporated. UNIX is a registered trademark of AT&T in the USA and other
countries. FLEXlm is a trademark of Globetrotter Software, Inc. IBM, AT, and PC are registered
trademarks, AIX and RISC System/6000 are trademarks of International Business Machines
Corporation. Windows, Microsoft, and MS-DOS are registered trademarks of Microsoft
Corporation. OSF/Motif is a trademark of the Open Software Foundation, Inc. in the USA and other
countries. SPARC is a registered trademark and SPARCstation is a trademark of SPARC
International, Inc. Sun Microsystems is a registered trademark, and Sun, SunOS and OpenWindows
are trademarks of Sun Microsystems is a registered trademark, and Sun, SunOS and OpenWindows
are trademarks of Sun Microsystems, Inc. All other trademarks and registered trademarks are the
properties of their respective holders.
Copyright © 1990 -2003, Model Technology, a Mentor Graphics Corporation company. All
rights reserved. Confidential. Online documentation may be printed by licensed customers of Model
Technology and Mentor Graphics for internal business purposes only.
Software versions:
This documentation was written to support ModelSim SE 5.7e for UNIX and Microsoft
Windows 98/Me/NT/2000/XP. If the ModelSim software you are using is a later release, check the
README file that accompanied the software. Any supplemental information will be there.
Although this document covers both VHDL and Verilog simulation, you will find it a useful
reference even if your design work is limited to a single HDL. ModelSim’s graphic interface
While your operating system interface provides the window-management frame, ModelSim
controls all internal-window features including menus, buttons, and scroll bars.
The resulting simulator interface remains consistent within these operating systems:
3.3 ALGORITHM:
The received noisy symbols from the communication channel are fed into the Branch Metric
Unit (BMU) for further processing.
The BMU calculates the branch metric for each possible state transition.
It measures the difference (or Euclidean distance) between the received symbol and the
expected symbol for each transition.
The ACSU updates the path metrics (cumulative metric for each state) by adding the branch
metric to the previous path metric.
It compares multiple paths leading to the same state and retains the one with the lowest metric
(best path).
Instead of retaining all paths, a Threshold Generator is used to set a limit on path metrics.
Decision Process:
If the path metric is above the threshold (T) → The path is retained.
If the path metric is below the threshold (T) → The path is pruned (discarded).
The Purge Unit eliminates the weak paths that do not meet the threshold, reducing computation
complexity.
The surviving paths (after thresholding) are stored in the PMU for traceback operations.
The PMU retains the best path history required for decoding.
The SMU traces back through the stored survivor paths to determine the most likely sequence
of states.
The final decoded bit sequence is extracted from the traced path and sent as output.
The algorithm repeats for the next set of received symbols until the entire message is decoded
CHAPTER -4
PROCESS OF IMPLEMENTATION
The convolution encoder simply uses the polynomial of constraint length and we will
do XOR operation with the input code bits. The constraint length of our code is 9 so we will
use 9 length polynomial. Here our code rate is ½ so we will use two polynomials to get the
two outputs. There are many choices of choosing the polynomials for any order code. We are
using the two polynomials are 110101111 and 100110101.
polyB=1+x3+x4+x6+x8
This module takes input data and performs convolutional encoding. The encoder uses
generator polynomials configured by the user. When punctured encoding is enabled, the encoder
performs 1/2 rate encoding irrespective of the encoder rate. The puncture unit will use the 1/2 rate
code to generate the appropriate user-programmed rate.
we will generate the two output bits by using this two polynomials and one input sequence. Our
constraint length 9 convolution encoder have been shown in the figure.1.
+
FIRST
P(B) OUTPUT
Z Z Z Z
- -
- - - - - -
Z -
Z
1 Z Z Z 1 1
1 1 1 1
1 1
P(A) +
SECOND OUTPUT
Configurable Parameters
The following core parameters give the user the capability to tailor the core to realize different
Convolutional Encoder configurations. These parameters can be configured through the GUI dialog
box in IPexpress.
Constraint Length:
This defines the constraint register length. The value can be any integer from 3 to 12.
Input Rate:
This defines the input symbol rate for the encoder. The input rate for non-punctured codes is
always
DEPT,ELECTRONICS AND COMMUNICATION ENGINEERING, SIET,AMALAPURAM 32
MODIFICATION OF VITERBI DECODER FOR HIGH SPEED
COMMUNICATION
1. For punctured codes, the input rate can be any value from 2 to 12.
Output Rate:
This defines the output symbol rate for the encoder. The output rate for non-punctured codes can
be any value between from 2 to 8. For punctured codes, the output rate can be any value from 3 to
23 (k+1 to 2k -1,where k is the input rate).
Generator Polynomials:
PolyA, polyB, polyC, polyD etc... Are generator polynomials. For non-punctured encoders, the
number of generator polynomials is always equal to the output rate. For punctured encoders, the
number of generator polynomials is 2.
The encoder supports punctured or non-punctured data. For punctured data, the block size
(punctured block size) is equal to the input rate. The two puncture patterns PP0 and PP1 can be
defined by the user. The total number of 1’s in both puncture patterns must equal the output r
The branch metric is a measure of the distance between what was transmitted and what
was received and is defined each arc in the trellis. In hard decision decoding, where we have
given a sequence of parity bits, the branch metric is the hamming distance between the
expected parity bits and the received bits. As example is shown in Fig. 3.10, Where
received bits are
00. Foreach state transition, the number on the arc shows branch metric for its transition.
Two of the branch metrics are 0, corresponding to only states and transitions where the
corresponding hamming distance is 0.
An attractive soft decision metric is the square of the difference between the received and
expected. If the convolutional code produces the p parity bits, and the corresponding analog
samples are v= v1, v2, v3,…..,vp, then we can construct the branch metric as (1)
BM[U,V]= ∑�
� (𝑼𝒊 − 𝑽𝒊)𝟐 −−−−−−(1)
𝒊=𝟏
time i 00 i+1
00 0/000
1/112
0/101
01
1/011
state
0/112
10
1/000
0/011
11
1/101
4.1.3 Add-Compare-Select:
A new value of the state metrics has to be computed at each time instant. In other
words, the state metrics have to be updated every clock cycle. Because of this recursion,
pipelining, a common approach to increase the throughput of the system, is not applicable.
The Add- Compare-Select (ACS) unit hence is the module that consumes the most power and
area. In order to obtain the required precision, a resolution of 7 bits for the state metrics is
essential,
while 5 bits are needed for the branch metrics. Since the state metrics are always
positive numbers and since only positive branch metrics are added to them, the accumulated
metrics would grow indefinitely without normalization. In this project we have chosen to
implement modulo normalization, which requires keeping an additional bit (8 instead of 7).
The operation of the ACS unit is shown in Figure 3.12. The new branch metrics are added to
previous state metrics to form the candidates for the new state metrics. The comparison can be
Hamming distance between the received code word and the allowed code word is
calculated by checking the corresponding bit positions of the two code words. For example
hamming distance between the code words 00 and 11 is 0 or the hamming distance between
the code words 00 and 11 is 2.
The hamming distance metric is cumulative so that the path with the largest total metric
is final winner. Thus the hard decision Viterbi decoding makes use of maximum hamming
distance in order to determine the output of the decoder.
The actual working of the hard decision Viterbi decoder is as explained in the following
figures. The trellis is drawn for each time tick, The corrupted data bit stream at the input of
the hard decision Viterbi decoder is [01 10 00 10 11].
Step 1: At time t = 0, we have received the bits 01. The decoder always starts at the initial
state of 00. From this point on it has two paths available, but neither matches the incoming
bits. The decoder computes the branch metric for both of these and will continue
simultaneously along both of these branches, in contrast to the sequential decoding where a
choice is made at every decision point.
The metric for both branches is equal to 1, which means that one of the two bits was
“matched” with the incoming bits. The corresponding trellis and path metric are as shown in
figure 3.13.
8
Fig.4.4:
Step 2: At time t = 1, we have received the bits 10. The decoder fans out from these two states
to all four of the possible states. The branch metrics for these branches are computed and
added to the previous branch metrics. The corresponding trellis new path metrics are as shown
in Figure 3.14.
Step 3: At time t = 2, we have received the bits 00. The paths progress forward and now begin
to converge at the nodes. Two metrics are computed for each of the paths coming into a node.
Considering the maximum Hamming distance principle, at each node we discard the path with
the lower metric because it is less likely. This is as shown in Figure 3.15
This discarding of paths at each node helps to reduce the number of paths that have to be
examined and thus, gives the Viterbi method of decoding its strength. The corresponding
trellis and new path metrics are as shown in Figure 3.16.
Step 5: At time t = 4, we have received the bits 11. The procedure from Step 4 is repeated.
But now, the trellis is complete. The corresponding trellis and new path metrics are as
shown in Figure 3.19 and 3.20.
The path with the highest metric is looked for and a winner path is traced. The path traced by
the states 00, 10, 01, 10, 01, 00 and corresponding to the bits 10100 is the decoded sequence
and is as shown in Figure 3.B.15.
Figure.4.12 Decoded sequence 10100 for the noisy encoded bit stream 01 10 00 1011.
Thus, we see how the hard decision Viterbi decoder, using maximum Hamming distances,
works and achieves the decoded data bit stream, from a convolutionally encoded input data bit
stream transmitted over an AWGN channel from the transmitter.
Suppose the receiver has computed the path metric PM[s,i] for each state s (of which there are
2k1 ,where k is the constraint length) at time step i, The value of PM[s , i] is the total number
of bit errors detected when comparing the received parity bits to the most likely transmitted
message, considering all messages that could have been sent by the transmitter until time step
i( starting from state ‘00’, which we will take by convention to be the starting state always).
Among all possible states at time step i, the most likely state is the one with the smallest
path metric. I f there are more than one state, they are all equally possibilities. Now we
determine the path metric at time step i+1, PM[s,i+1],First observe that if the transmitter is at
state s at time step i+1, then it must have been in only one of the two possible states at time
step i. These two predecessor states, labeled α and β, are always same for a given state. In
fact, they depend only on the constraint length of the code and not on the parity functions.
Fig. 3.11 shows the predecessor states for each state. For instance, for state 00, α=00 and
β=01; for state 01, α=10,β=11.
The Survivor Metric Unit (SMU) plays a crucial role in the Viterbi decoder, as it
determines which state sequences (paths) should be retained for the final decision process.
The Viterbi algorithm operates by finding the most likely sequence of transmitted symbols
based on the received noisy data. However, without optimization, the algorithm requires
significant memory and computational power, as it stores all possible paths. The T-algorithm
is an efficient path-pruning technique that dynamically eliminates paths with high path
metrics, thereby reducing the computational complexity of the decoder. This makes the
Viterbi decoder more power-efficient and suitable for real-time applications.
The SMU works in conjunction with the Add-Compare-Select Unit (ACSU), which
updates the path metrics of all states at each time step. Instead of storing all paths, the SMU
retains only the most probable ones, discarding those whose path metrics exceed a
dynamically adjusted threshold, T. This threshold is determined based on the minimum path
metric at each decoding stage. Paths with significantly larger metrics than the best one are
unlikely to contribute to the correct decoded sequence and are removed to save memory and
processing power. By applying the T-algorithm, the SMU ensures that only a limited number
of survivor paths are maintained, leading to a significant reduction in storage requirements.
There are two common methods for implementing the SMU: traceback memory and
register-exchange. In the traceback method, only the necessary decisions are stored, and the
final path is reconstructed by tracing back through memory. This method is memory-efficient
but requires additional processing time. The register-exchange method, on the other hand,
continuously updates and shifts entire state sequences, making it faster but requiring more
memory. The choice of implementation depends on the trade-off between speed, power
consumption, and memory availability.
One of the key challenges in the SMU with the T-algorithm is setting the threshold T
dynamically to balance computational savings and decoding accuracy. If T is too tight, useful
paths might be discarded, leading to decoding errors. If T is too loose, computational savings
will be minimal. Adaptive threshold techniques are often used to optimize this balance based
on channel conditions. Overall, the SMU with the T-algorithm significantly enhances the
efficiency of the Viterbi decoder, making it ideal for applications in wireless communications,
satellite communication, and error correction in storage devices.
RESULTS
This waveform shows the Viterbi decoder with the T-algorithm in action, verifying its
functionality. The threshold-based pruning reduces computational complexity by discarding
unlikely paths. Key signals include received bits, encoded output, decoded output, path
metrics, and survivor paths. The results confirm that the decoder is working correctly while
optimizing processing efficiency.
XilinxVivado Output:
This synthesized design in Xilinx Vivado represents the Viterbi decoder with the T-
algorithm. It shows the placement of BMU, PMU, ACSU, and SMU on the FPGA. The layout
helps analyze resource utilization and optimize pruning for efficient decoding.
This image shows the floorplan of the Viterbi decoder with the T-algorithm in Xilinx
Vivado. It maps BMU, PMU, ACSU, and SMU onto the FPGA's logic and clock regions.
Good floorplanning improves speed, resource use, and power efficiency.
This synthesis report in Xilinx Vivado shows the resource usage and optimization of the
Viterbi decoder with the T-algorithm. It provides details on hardware efficiency, memory
usage, and timing performance. The report helps in evaluating the design’s feasibility for
FPGA implementation.
This power analysis report shows the estimated power consumption of the Viterbi
decoder with the T-algorithm in Xilinx Vivado. The total on-chip power is 0.076 W, with
static power consumption dominating. These results help evaluate the energy efficiency of the
decoder for FPGA implementation.
The design employs advanced techniques like pipelining and parallelism to maintain
high decoding speed, ensuring minimal degradation in clock speed despite the power-saving
measures. The bit error rate (BER) remains virtually unchanged, ensuring robust performance
even in noisy communication channels. This makes the architecture ideal for real-time, high-
speed applications in mobile communication, satellite links, and wireless sensor network.
FUTURE SCOPE :
The future scope of this low-power Viterbi Decoder design offers several promising
avenues for further enhancement and practical application. First, additional power
optimization techniques, such as dynamic voltage and frequency scaling (DVFS) or adaptive
power control, could be explored to further reduce power consumption while maintaining
performance, especially in fluctuating signal environments. Hardware implementation on
platforms like FPGAs or ASICs could provide insights into the real-world feasibility of the
design, allowing further optimizations in terms of area, power, and speed. Additionally, the
decoder could be extended to support more advanced modulation schemes like QAM and
OFDM, which are integral to modern communication standards such as 5G, enabling higher
data rates and improved performance in noisy channels. Another area of development could
be the adaptation of the decoder for soft-decision decoding, turbo codes, or LDPC codes,
which would offer improved error-correction capabilities, particularly in challenging
communication environment.
The low-power nature of the design also makes it ideal for emerging
technologies like IoT devices and satellite communication, where energy efficiency and
reliable data transmission are critical. Moreover, incorporating machine learning for adaptive
algorithm selection and energy-aware learning could further optimize decoding strategies
based on real- time conditions. Finally, integrating the Viterbi decoder into future
communication standards, such as 5G and beyond, could help achieve high data rates and
reliability, particularly in advanced applications like massive MIMO and beamforming.
Overall, this project’s future direction lies in enhancing the decoder’s power efficiency,
flexibility, and compatibility with next-generation communication systems.
Annexure 1
History of ModelSim:
1991 – Initial Release
Mentor Graphics acquired Model Technology, integrating ModelSim into its suite of
electronic design automation (EDA) tools.
Enhanced support for mixed VHDL and Verilog simulations, improving co-simulation
capabilities.
Implemented support for SystemVerilog, aligning with industry trends towards advanced
verification methodologies.
Significant user interface overhaul and integration with Mentor Graphics' verification tools,
enhancing user experience and productivity.
Introduced support for VHDL-2008 standards, providing designers with updated language
features.
Improved performance and scalability for large-scale designs, addressing the growing
complexity in digital systems.
Transitioned to a date-based versioning scheme, starting with 2019.1, to reflect the year and
release sequence.
Further integration with Siemens EDA tools, following Siemens' acquisition of Mentor
Graphics in 2017.
Introduced cloud-based simulation capabilities, aligning with industry shifts towards cloud
computing.
Enhanced support for the latest FPGA architectures and improved integration with other EDA
tools.
Annexure 2
SAMPLE PROGRAM
module convolutional_encoder
( input Clock,
input Reset,
input [8:0] DataIn, // 9-bit input data
output reg [17:0] EncodedOut // 18-bit encoded output (1/2 rate)
);
reg [2:0] shift_reg;
integer i;
module BMU (
input [1:0] ReceivedBits, // Received bits
input [1:0] EncodedBits, // Encoder output bits
output reg [1:0] BranchMetric // Hamming
distance
);
always @(*) begin
BranchMetric = (ReceivedBits[0] ^ EncodedBits[0]) +
(ReceivedBits[1] ^ EncodedBits[1]);
end
endmodule
module ACSU (
input [7:0] PathMetricIn0, // Path metric for state 0
input [7:0] PathMetricIn1, // Path metric for state 1
input [1:0] BranchMetric0, // Branch metric for state 0
input [1:0] BranchMetric1, // Branch metric for state 1
input [7:0] Threshold, // Threshold value
output reg [7:0] PathMetricOut, // Selected path metric
output reg SurvivorPath // Indicates the selected path
);
reg [7:0] Metric0, Metric1;
// Apply threshold
if (Metric0 > Threshold) Metric0 = 8'd255;
if (Metric1 > Threshold) Metric1 = 8'd255;
module PMU (
input [7:0] PathMetric0,
input [7:0] PathMetric1,
input [7:0] Threshold,
input SurvivorPath,
output reg [7:0] SelectedMetric
);
always @(*) begin
// Apply threshold to path metrics
if (PathMetric0 > Threshold && SurvivorPath == 0)
SelectedMetric = 8'd255; // Infinity-like value for invalid paths
else if (PathMetric1 > Threshold && SurvivorPath == 1)
SelectedMetric = 8'd255;
else
SelectedMetric = SurvivorPath ? PathMetric1 : PathMetric0;
end
endmodule
module SMU (
input Clock,
input Reset,
input SurvivorPath,
input [8:0] DataIn,
output reg [8:0] SurvivorSequence // Reconstructed decoded output
);
always @(posedge Clock or posedge Reset) begin
if (Reset) begin
SurvivorSequence <= 9'b0;
end else begin
SurvivorSequence <= DataIn; // Directly propagate DataIn
end
end
endmodule
module viterbi_decoder_top (
input Clock,
input Reset,
input [8:0] DataIn,
input [1:0] ReceivedBits,
input [7:0] Threshold,
output [8:0] DecodedOutput,
output [17:0] EncodedBits,
output [1:0] BranchMetric, // BMU output
output [7:0] PathMetricOut, // ACSU Path Metric Output
output SurvivorPath // ACSU Survivor Path Output
);
// Internal wires
wire [1:0] BranchMetric_internal;
wire [7:0] PathMetricOut_internal;
wire SurvivorPath_internal;
// Encoder
convolutional_encoder enc (
.Clock(Clock),
.Reset(Reset),
.DataIn(DataIn),
.EncodedOut(EncodedBits)
);
// BMU
BMU bmu (
.ReceivedBits(ReceivedBits),
.EncodedBits(EncodedBits[1:0]), // Compare first encoded pair
.BranchMetric(BranchMetric_internal)
);
// ACSU
ACSU acsu (
.PathMetricIn0(8'd0),
.PathMetricIn1(8'd10),
.BranchMetric0(BranchMetric_internal),
.BranchMetric1(BranchMetric_internal),
.Threshold(Threshold),
.PathMetricOut(PathMetricOut_internal),
.SurvivorPath(SurvivorPath_internal)
);
// SMU
SMU smu (
.Clock(Clock),
.Reset(Reset),
.SurvivorPath(SurvivorPath_internal),
.DataIn(DataIn),
.SurvivorSequence(DecodedOutput)
);
REFERENCES:
[1] A.J. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum
[2] J. A. Heller and I. M. Jacobs. “Viterbi Decoding for Satellite and Space Communication”,
October 1971.
[3] S. Lin and D. J. Costello, “Error Control Coding”, Second Edition. Prentice-
[4] G. Forney, “The Viterbi Algorithm”, Proceedings of the IEEE, pp. 268-278, March
1973.
[5] J.K. Omura, “On the Viterbi decoding algorithm”, IEEE Transactions on
Areaefficient architectures for the viterbi algorithm Part II: Applications,” IEEE
[9] H.L. Lou, “Implementing the Viterbi Algorithm”, IEEE Signal Processing Magazine,
IEEE Transactions on Communication, Vol. 37, no. 11, pp. 1220-1222, Nov.
1989.
[11] J. J. Kong and K.K. Parhi, “Low-latency architectures for high-throughput rate
[14] P. J. Black and T. H.-Y. Meng, “Hybrid survivor path architectures for viterbi decoders”,
International Conference on Acoustics, Speech, and Signal Processing,
and area efficient high-throughput Viterbi decoders”, IEEE Journal of Solid State
[17]. F. Chan and D. Haccoun, “Adaptive viterbi decoding of convolutional codes over
[19]. D.Haccoun and M.Ferguson, Genralised stack algorithms for the decoding of
convolutional codes” IEEE Trans. Inform. Theory, vol. IT-21, pp. 638-651, nov. 1975.
[20]. C.F.L in, “A truncated viterbi algorithm approach to trellis codes”, Ph.D dissertation,
Dep. Electrical eng.,, Rensselaer polytechnic Inst., Troy,NY, Sept. 1986.
[21]. “Bandwidth modulations”, Consultative committee for space data system matera, Italy,
CCSDS 401(3.3.6)Green book, Issue 1, Apr. 2003.
[22]. R.A. Abdullah and N.R Shan hag,” Error resilient low-power viterbi decoder
architectures”, IEEE trans. Signal process, vol. 57, n0. 12 pp. 4906-4917, Dec. 2009.
[23]. J.Jin and C.Y. Tsui, “low-power limited-search parallel state viterbi
decoderimplementation based on scarece sate transition”.IEEE trans. Very large scale
integration(VLSI systems). Vol. 15., no. 11 pp. 1172-1176oct. 2007.
[24]. “ Viterbi decoding of convolutional codes”, MIT 6.2 DAFT lecture notes October , 2010.
[25]. J.He, H.Liu, and Z.Wang,” A fast ACSU architecture for viterbi decoder using T-
algorithm,” in proc. 43 rd IEEE asimolar conf. signal syst. Comput., nov 2009, pp. 231-235.