0% found this document useful (0 votes)

80 views10 pages

High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms

This document describes a project to design and implement a high performance floating point unit (FPU) on a Virtex-4 FPGA. The authors implemented a simple floating point format and a complex radix-2 butterfly kernel to test the FPU. They achieved optimizations by utilizing DSP blocks for multiplication and leveraging FPGA architecture like localized placement of components to improve timing. The initial implementation met 250MHz but required further FPGA-specific optimizations to integrate into the complex kernel without timing issues.

Uploaded by

Tudor Cret

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views10 pages

High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms

Uploaded by

Tudor Cret

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

High Performance FPGA based Floating Point

Arithmetics

Project report for Computer Arithmetic Algorithms

Andreas Ehliar and Per Karlström

{ehliar,perk}@isy.liu.se

June 13, 2006

1 Introduction
We decided to investigate what kind of floating point arithmetic performance
it is possible to achieve in a modern FPGA. In order to gain a thorough un-
derstanding of the issues involved we decided to try to implement a fast FPU
ourselves. We do expect that an expert in the field could come up with a better
solution, especially given the limited amount of time available for this project.
However, a search on the Internet did not turn up any references to high per-
formance FPU:s on Virtex 4 FPGA:s.
The Virtex-4 uses a relatively standard FPGA architecture with CLB:s con-
sisting of 4 slices which each contains 2 4-LUTs and 2 flip flops. The FPGA
also has a large number of embedded memories and DSP blocks containing high
speed multipliers and adders. In addition, the Virtex-4 also contains a number
of specialized components which were not used in this project For further details
about the Virtex-4 FPGA, see the Virtex-4 User Guide [2]. The DSP blocks are
thoroughly described in the XtremeDSP user guide [3].
In order to test the FPU in a realistic environment we decided to implement
a complex radix-2 butterfly kernel with the FPU adder and multiplier we were
going to implement. This kernel can be used to implement for example higher
radix FFT:s.
We selected a simple floating point format with no denormalized numbers and
neither NaN nor Inf.

2 Methodology
To test the final result we implemented a C++ class for floating point numbers.
The number of bits in the mantissa and exponents could be configured from 1

1
to 31 bits. The C++ model was used to generate the test vectors for the RTL
test benches.
An initial RTL model was then developed and tested against the floating point
test data. The RTL model was written with the hardware in mind but it was
not optimized for the Virtex 4 FPGA.
The performance of the initial RTL model was evaluated and the most critical
part of the design was optimized to better fit the FPGA. After the optimization,
the model was verified with the test benches. This was repeated until the
performance was satisfactory.
Finally, the design was tested in an FPGA by downloading test data to the
FPGA and uploading the results from the butterfly calculation for verification
against test patterns generated by the C++ model.

3 Floating point format

The first version of the RTL code was fairly configurable in regards to mantissa
and exponent sizes. In order to ease the development of an optimized FPGA
implementation, we decided to limit the floating point format to a maximum
of one sign bit, 10 bits of exponent, and 15 bits of mantissa with an implicit
one. The mantissa is represented using regular unsigned binary numbers. The
exponent is implemented using excess 511.

4 Multiplier
The multiplier is quite simple to construct due to the large number of available
multiplier blocks in the FPGA. A single multiplier is used for the mantissa and
an adder is used for the exponent. It is also necessary to normalize the result
of the multiplication. This normalizer is very simple since the most significant
bit can only be located at one out of two bit positions given normalized inputs
to the multiplier. The overall architecture of the multiplier is shown in figure 1.
A simple rounding scheme was chosen where the rounding was done before
the normalization. This can be implemented basically for free in the DSP48
blocks in the FPGA. This can be contrasted with the rounding schemes used in
IEEE-754 where rounding is performed after normalization with an extra small
normalization step required to check for overflow after rounding. This would
not map very well to the DSP48 block. Except for the utilization of the DSP48
block, no FPGA specific optimizations were performed in the multiplier block.

2
Round

Normalize

Figure 1: The floating point multiplier architecture

5 Adder/Subtracter
A floating point adder is more complicated than a floating point multiplier. The
basic architecture for the adder is shown in figure 2. The first step compares
the operands and swaps them if necessary so that the largest number always
enters the left path. This step also adds the implicit one if the input operands
are non-zero. In the next step, the smallest number is aligned so that the
exponents of both operands match. After this step, an addition or subtraction
of the two numbers are performed. A subtraction can never cause a negative
result because of the earlier comparison and swap step. The normalization step
is the final step. It is implemented using two pipeline stages. The first stage
looks at the mantissa in 4 bits intervals as seen in figure 3. The first module
looks at the first four bits and outputs a normalized result assuming a one was
found in these bits. An extra output signal, shown as gray lines in the figure,
is used to signal that all bits four bits were zero. The second module assumes
that the first four bits were all zero and instead looks at the following four bits,
outputting a normalized result. This is repeated for the remaining bits of the
mantissa. The next stage decides which of the previous results that should be
used. If all bits were zero, a zero is output as the result. The value needed to
correct the exponent is generated according to the same scheme. This is shown
as dashed lines in the figure.

3
Exponent Mantissa
Compare/Select

CMP
Align
Add
Denormalization

Find leading one

Figure 2: The overall architecture for the adder

4
Unnormalized mantissa Exp

0 4 8 12

ff1 in 4 ff1 in 4 ff1 in 4 ff1 in 4

shift shift shift shift

Priority
decoder 4−1 MUX

Normalized mantissa New exponent

Figure 3: The normalizer architecture

5.1 FPGA optimizations

Initially the adder met timing at 250 MHz. It did not achieve this performance
once it was inserted into a complex butterfly. At this point further optimizations
were required. The first FPGA specific optimization was to make sure that the
adder/subtracter was implemented using only one LUT per bit. A standard
adder structure as compared to an adder structure with both addition and
subtraction is shown in figure 4.
Another optimization was to optimize the exponent selection in the normaliza-
tion step. At first, this was implemented using a 5 to 1-mux in front of an adder.
By implementing a 2 to 1-mux directly in the same LUT used for the addition,
a smaller 4 to 1-mux could be used in front of the adder. In order to make sure
that this mux was placed near the adder, RLOC directives were used to place
the components in relation to each other.
In both the exponent and mantissa mux, the reset signal of the flip flop was
used to set the result to zero instead of embedding this logic into the LUT.
Another technique that we tried was to construct a 4 to 1-mux combined with
a priority decoder as shown in figure 5. This mux should achieve slightly better
performance than an ordinary mux since there is only one level of LUTs. In
a later stage of the implementation we moved the or function to the previous
pipeline stage as well.

5
Carry out
Carry out

Sub
B =1
B
=1 =1
A =1 Sum A =1 Sum

Carry in Carry in

Figure 4: A regular adder using 1 LUT / bit as compared to a adder/subtracter

using 1 LUT/bit.

X X
1
Y A
B

O
Functionality table Z

X Y Z O C
1 x x A D
0 1 x B
0 0 1 C
0 0 0 D

Figure 5: A priority decoder combined with a 4-to-1 mux.

6
6 Floorplanning
In order to improve performance of the final system we tried to locate different
pipeline stages close to each other by using RLOC directives. Doing this resulted
in more regularity and smaller area footprint.

7 Results and Discussion

Knowing the FPGA architecture is important to write efficient HDL code. A
good understanding of the Virtex 4 architecture enables the designer to use the
fabric in ways not (yet) supported by the synthesis tools. In some cases the
gains can be substantial in other cases the gains are more limited.
With the initial RLOC optimizations we achieved better timing results but as
soon as we tried to use RLOC over pipeline boundaries we got worse timing
results. Eventually we managed to reach a 250 MHz clock frequency for the
radix-2 butterfly by using RLOC. The floorplan for this implementation is shown
in figure 6. At this point however, a number of low level optimizations had been
done which enabled the design to meet timing at 250 MHz even without the
use of RLOC. Sadly enough the RLOC:ed radix-4 butterfly could not be fit into
the FPGA because one radix-2 butterfly was too wide. Unfortunately we did
not have time to correct this problem. Thus the radix-4 butterfly could only be
placed without RLOC directives. The radix-4 butterfly also met timing at 250
MHz. The floorplan for the radix-4 butterfly is shown in figure 7. Table 1 lists
the final resource utilization in the FPGA for various components. The radix-2
and radix-4 are complex valued butterflies whereas the floating point adder and
multiplier operates on real values.

Resource Radix-4 Radix-2 Adder Multiplier Available

LUTs 10104 2514 73 372 30720
Flip Flops 14432 3660 63 325 30720
DSP48 16 4 1 0 192

Table 1: Component resource utilization

There are a number of opportunities for further optimizations in this design.

For example, instead of using CLB:s for the shifting, a multiplier could be used
for this task by sending in the number to be shifted as one operand and a bit
vector with a single one in a suitable position as the other operand.
If the application of the floating point blocks are known it is possible to do some
application specific optimizations. For example, in a butterfly with an adder
and a subtracter operating on the same operands the first compare stage could
be shared between these. If the application can tolerate it, further pipelining
could increase the performance significantly. If the latency tolerance is very
high, bit serial arithmetics could probably be used as well. In this project we
limited the pipeline depth to compare well with FPU:s used in CPU:s.
According to a post on comp.arch.fpga it is possible to achieve 400MHz per-
formance for IEEE single precision floating point arithmetics. Few details are

7
Figure 6: RLOC:ed complex butterfly

8
Figure 7: Non RLOC:ed radix-4 butterfly

9
available but a key technique is to use the DSP48 block for the adder since an
adder implemented with a carry chain would be too slow. The post normaliza-
tion step is supposed to be implemented using both DSP48 and Block RAMs [1].
The pipeline depth of this implementation is not known.
It would also be interesting to look at the newly announced Virtex 5 architecture.
The 6-LUT architecture should reduce the number of logic levels and routing
all over the design. Unfortunately no tools are publicly available today that
targets the Virtex 5.

8 RLOC related problems

It is relatively easy to RLOC individual pipeline stages. Once we tried to hier-
archically RLOC several pipeline stages, the performance suddenly decreased.
Generally, the place and route tool seems to place modules quite far from each
other. This generally balances the different pipeline stages as well and eases
routing due to lower congestion. However, as soon as we started to RLOC sev-
eral pipeline stages together the distance between two non-RLOCed stages grew
larger and it was harder to meet timing. In the end, we had to RLOC at least
some parts of all modules involved in the design to be able to meet timing.

9 Conclusions
The Virtex 4 FPGA is not really suited for floating point arithmetics. With
some techniques detailed in this report it is possible to get relatively decent
performance. We would have liked to be able to achieve a higher performance
though. We also realized that the placer does a pretty good job and it is not
trivial to achieve higher performance by doing some of the placement by hand.

References
[1] Andraka, Ray; Re: Floating point reality check, news:comp.arch.fpga, 14
May 2006
[2] Xilinx ; Virtex-4 User Guide
[3] Xilinx ; XtremeDSP for Virtex-4 FPGAs User Guide

Floating Point Arithmetic Operations
No ratings yet
Floating Point Arithmetic Operations
61 pages
Content Server 20.3 Administration Guide
No ratings yet
Content Server 20.3 Administration Guide
578 pages
Floating Point Arithmetic Unit With Multi-Precision For DSP Applications
No ratings yet
Floating Point Arithmetic Unit With Multi-Precision For DSP Applications
8 pages
M. Al-Ashrafy, A. Salem, and W. Anis, An Efficient Implementation of Floating
No ratings yet
M. Al-Ashrafy, A. Salem, and W. Anis, An Efficient Implementation of Floating
6 pages
DSP48E Efficient Floating Point Multiplier Architectures On FPGA
No ratings yet
DSP48E Efficient Floating Point Multiplier Architectures On FPGA
6 pages
Synopsis and Literature Survey
No ratings yet
Synopsis and Literature Survey
10 pages
Fpga Implementation of FFT Algorithms Using Floating
No ratings yet
Fpga Implementation of FFT Algorithms Using Floating
5 pages
Hardware Implementation of 24-Bit Vedic Multiplier
No ratings yet
Hardware Implementation of 24-Bit Vedic Multiplier
5 pages
21CS403Notes 4
No ratings yet
21CS403Notes 4
8 pages
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
No ratings yet
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
12 pages
Area Efficient Floating-Point Adder and Multiplier With IEEE-754 Compatible Semantics
No ratings yet
Area Efficient Floating-Point Adder and Multiplier With IEEE-754 Compatible Semantics
8 pages
Finalpublishedpaperoriginal PDF
No ratings yet
Finalpublishedpaperoriginal PDF
10 pages
Shi Wal 95 A
No ratings yet
Shi Wal 95 A
8 pages
Design and Implementation of A High Performance Floating
No ratings yet
Design and Implementation of A High Performance Floating
15 pages
Floating-Point Hardware Design A Test Perspective
No ratings yet
Floating-Point Hardware Design A Test Perspective
5 pages
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
No ratings yet
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
4 pages
B1 Group3
No ratings yet
B1 Group3
13 pages
Hybrid FP FXP Dot Product
No ratings yet
Hybrid FP FXP Dot Product
12 pages
Floating 2
No ratings yet
Floating 2
5 pages
An Efficient Implementation of Oating Point Multiplier: Conference Paper
No ratings yet
An Efficient Implementation of Oating Point Multiplier: Conference Paper
6 pages
Design of Low-Area and High Speed Pipelined
No ratings yet
Design of Low-Area and High Speed Pipelined
6 pages
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
No ratings yet
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
11 pages
Networkjourney CCNP Enterprise 2021 Lab Workbook 1631597584
No ratings yet
Networkjourney CCNP Enterprise 2021 Lab Workbook 1631597584
77 pages
Ijspr 1203 438
No ratings yet
Ijspr 1203 438
4 pages
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
No ratings yet
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
6 pages
Floating Point Elsevier
No ratings yet
Floating Point Elsevier
12 pages
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
No ratings yet
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
4 pages
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
No ratings yet
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
6 pages
FPGA Design of A Fast 32-Bit Floating Point
No ratings yet
FPGA Design of A Fast 32-Bit Floating Point
3 pages
Design of FPGA Based 32-Bit Floating Point Arithmetic Unit and Verification of Its VHDL Code Using MATLAB
No ratings yet
Design of FPGA Based 32-Bit Floating Point Arithmetic Unit and Verification of Its VHDL Code Using MATLAB
14 pages
OSN 8800 6800 3800 V100R011C10 Trouble Shooting 01
100% (1)
OSN 8800 6800 3800 V100R011C10 Trouble Shooting 01
273 pages
Single Precision Floating Point Unit
No ratings yet
Single Precision Floating Point Unit
45 pages
An FPGA Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier Using Urdhva Tiryagbhyam Technique
No ratings yet
An FPGA Implementation of High Speed and Area Efficient Double-Precision Floating Point Multiplier Using Urdhva Tiryagbhyam Technique
6 pages
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
No ratings yet
S S 32-B M C D: Imulation and Ynthesis of IT Ultiplier Using Onfigurable Evices
8 pages
Floating Point Arith
100% (1)
Floating Point Arith
8 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
No ratings yet
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
20 pages
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
No ratings yet
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
6 pages
DVCon Europe 2015 TA2 3 Paper
No ratings yet
DVCon Europe 2015 TA2 3 Paper
8 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Floating Point Multipliers: Simulation & Synthesis Using VHDL
No ratings yet
Floating Point Multipliers: Simulation & Synthesis Using VHDL
40 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
What Are The Parts of The Research Paper
No ratings yet
What Are The Parts of The Research Paper
6 pages
Lab 1
100% (1)
Lab 1
10 pages
2174 PDF
No ratings yet
2174 PDF
7 pages
Double Precision Floating Point Arithmetic
100% (3)
Double Precision Floating Point Arithmetic
12 pages
32 Bit Floating Point ALU
0% (1)
32 Bit Floating Point ALU
7 pages
On-Chip Implementation of High Resolution High Speed Low Area Floating Point AdderSubtractor With Reducing Mean Latency For OFDM Applications
No ratings yet
On-Chip Implementation of High Resolution High Speed Low Area Floating Point AdderSubtractor With Reducing Mean Latency For OFDM Applications
6 pages
32 Bit Floating Point ALU
80% (5)
32 Bit Floating Point ALU
7 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
7 pages
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
No ratings yet
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
8 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
Elastic Introduction To Application Performance Monitoring
No ratings yet
Elastic Introduction To Application Performance Monitoring
16 pages
FPGA Based Reciprocator
No ratings yet
FPGA Based Reciprocator
5 pages
Verilog Project Report
No ratings yet
Verilog Project Report
13 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
VIH Series60
100% (2)
VIH Series60
1 page
Energy Efficient High Speed Floating Point Arithmetic Unit: Somya Kumawat, Arpan Shah, Ramesh Bharti
No ratings yet
Energy Efficient High Speed Floating Point Arithmetic Unit: Somya Kumawat, Arpan Shah, Ramesh Bharti
3 pages
Troubleshooting
No ratings yet
Troubleshooting
100 pages
Week4 - Understanding Colors
No ratings yet
Week4 - Understanding Colors
43 pages
Design and Implementation of Fast Floating Point Multiplier Unit
No ratings yet
Design and Implementation of Fast Floating Point Multiplier Unit
5 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
Basic CRUD Operations, F Unctions, Expressions An D Clauses
No ratings yet
Basic CRUD Operations, F Unctions, Expressions An D Clauses
35 pages
Mark VI Turbine Controls GE - AddingIO - Doc 1 ADDING NEW INPUTS/OUPUTS
No ratings yet
Mark VI Turbine Controls GE - AddingIO - Doc 1 ADDING NEW INPUTS/OUPUTS
29 pages
API's Examples
No ratings yet
API's Examples
194 pages
Cracking More Password Hashes With Patterns
No ratings yet
Cracking More Password Hashes With Patterns
28 pages
February
No ratings yet
February
2 pages
Unit I Introduction To DevOps and The Culture
No ratings yet
Unit I Introduction To DevOps and The Culture
38 pages
VA Remediation
No ratings yet
VA Remediation
7 pages
Assistive Technologies To Support Students With Dyslexia: Author: Kara Dawson Et Al
100% (1)
Assistive Technologies To Support Students With Dyslexia: Author: Kara Dawson Et Al
16 pages
WORK CYCLE 7.2 Web Application Configuration Guide
No ratings yet
WORK CYCLE 7.2 Web Application Configuration Guide
132 pages
Start Guide
No ratings yet
Start Guide
37 pages
Adam: A Method For Stochastic Optimization: Diederik P. Kingma and Jimmy Lei Ba
No ratings yet
Adam: A Method For Stochastic Optimization: Diederik P. Kingma and Jimmy Lei Ba
41 pages
A Dot Matrix Printer
No ratings yet
A Dot Matrix Printer
21 pages
Digital Ethics - FINAL - 160616
No ratings yet
Digital Ethics - FINAL - 160616
36 pages
DS Leader Tic 3.3-4.3 LR700-1000 Zf.07.365.en.1
No ratings yet
DS Leader Tic 3.3-4.3 LR700-1000 Zf.07.365.en.1
3 pages
Tapo C310 2.0&2.20&2.26&2.28 - Datasheet
No ratings yet
Tapo C310 2.0&2.20&2.26&2.28 - Datasheet
8 pages
DBMS (CMP509)
No ratings yet
DBMS (CMP509)
5 pages
Yashwanth Kumar G N: Mob No:-9980703082 Email ID
No ratings yet
Yashwanth Kumar G N: Mob No:-9980703082 Email ID
2 pages
SAMPLE Econsultancy Business Intelligence Meets Web Analytics
No ratings yet
SAMPLE Econsultancy Business Intelligence Meets Web Analytics
14 pages
MapPort - Case Study
No ratings yet
MapPort - Case Study
7 pages
Sound Pool
No ratings yet
Sound Pool
3 pages
Aspnet Tutorial12 ErrorHandling Cs
No ratings yet
Aspnet Tutorial12 ErrorHandling Cs
13 pages
AP-14 Ver 1.0 EN
No ratings yet
AP-14 Ver 1.0 EN
3 pages
Cbse Class 10 Maths Pre Board Sample Paper For 2023 24
No ratings yet
Cbse Class 10 Maths Pre Board Sample Paper For 2023 24
7 pages
SSD Buying Guide
No ratings yet
SSD Buying Guide
1 page
10 Pitfalls To Enterprise Agile Adoption: Best Practices White Paper
No ratings yet
10 Pitfalls To Enterprise Agile Adoption: Best Practices White Paper
10 pages
Automatic Generation of CNC Codes Based On Machining Features
No ratings yet
Automatic Generation of CNC Codes Based On Machining Features
5 pages
SDLC DeploymentReadiness Checklist
No ratings yet
SDLC DeploymentReadiness Checklist
2 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms

Uploaded by

High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms

Uploaded by

High Performance FPGA based Floating Point

Project report for Computer Arithmetic Algorithms

Andreas Ehliar and Per Karlström

June 13, 2006

3 Floating point format

Figure 1: The floating point multiplier architecture

Find leading one

Figure 2: The overall architecture for the adder

ff1 in 4 ff1 in 4 ff1 in 4 ff1 in 4

Normalized mantissa New exponent

Figure 3: The normalizer architecture

5.1 FPGA optimizations

Figure 4: A regular adder using 1 LUT / bit as compared to a adder/subtracter

Figure 5: A priority decoder combined with a 4-to-1 mux.

7 Results and Discussion

Resource Radix-4 Radix-2 Adder Multiplier Available

Table 1: Component resource utilization

There are a number of opportunities for further optimizations in this design.

8 RLOC related problems

You might also like