Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks

Volume II, Issue IV, April 2015 IJRSI ISSN 2321 - 2705
Implementation of 32 Bit Floating Point MAC Unit to

Feed Weighted Inputs to Neural Networks
Yadagiri Karri#, Prof. Rajesh Misra*
#
Department of ECE, CUTM, JITM
Abstract- This paper describes an FPGA implementation of sequence of mantissa operations swap, shift, add, normalize,
IEE-754 format single precision floating point MAC unit that and round. A floating point adder first compares the
is used in artificial neural networks to feed the weighted inputs exponents of the two input operands, swaps and shifts the
to the neurons. Use of floating point numbers improves the mantissa of the smaller number to get them aligned. The
range of the representation of data from very small number to
number has to be adjusted if the incoming number is
a very large number which is mostly recommended for
Artificial Neuron Networks. negative. Finally, the sum is renormalized, Exponents are
adjusted accordingly, and resulting mantissas are truncated
Keywords- FPGA, IEEE-754, floating point MAC, weighted by an appropriate rounding scheme. If extra speed is
inputs, Artificial Neural Networks. required then FP adders use leading-zero anticipatory (LZA)
logic to carryout pre-decoding for normalization shifts in
I. INTRODUCTION parallel with the mantissa addition. Floating point
multiplication basically involves xoring of the signs,
multiplication of significands and adding exponents of both
T he main goal of this paper is designing of floating point
MAC unit. Representing real numbers in binary format
requires floating point numbers .In this paper floating point
the numbers. After addition the exponent is called tentative
result exponent. Then we have to subtract bias from added
numbers are represented according to IEEE 754 standard exponents. Result can be a normalized number if the MSB
format. In single precision, a Floating Point number consists is 1. In this paper floating point multiplier and floating point
of a 32-bit word divided in 1 bit for sign, 8 bits for exponent adder/subtractor are designed and an accumulator is
that constitutes a 127 bias, and 23 bits for the significand. designed. Then floating point MAC unit is designed.MAC
This standard supports two types of formats binary basically consists of adder, multiplier and an accumulator
interchanges format and decimal interchange format. In certain values are used for special number representation, as
many applications computation is done using floating point follows. If the exponent is 0 and mantissa is 0 then the
arithmetic. Earlier floating point operations were mainly number is a zero number. If the exponent is 0 and mantissa
implemented as software while for main stream general is greater than 0 then the number is a subnormal number. If
purpose processor hardware implementation was an option the exponent lies in between 0 and 255 and mantissa is
because cost of the hardware was not reasonable. Today greater than 0 then the number is a normal number. If the
every microprocessor is hardware specific for handling exponent is 255 and mantissa is 0 then the number is an
various floating point operations. In artificial neural infinite number. If the exponent is 255 and the mantissa is
network applications floating point MAC unit is required in greater than 0 then the number is not a number.
order to achieve desired performance. But because of the
Table 1. Special Numbers
advancements in reconfigurable logic now these Mac units
can be implemented on FPGA. The goal of this project is S.No Exponent Mantissa Output
FPGA implementation of floating point MAC unit for ANN 1 =0 =0 Zero
applications. Floating point number can be given by
2 =0 >0 Subnormal
equation (1):
3 0<E<255 >0 Normal
Z= (-1s) *2 (exp-bias)*(1*M) (1) 4 =255 =0 Infinity
5 =255 =0 NAN
Equation 1 represents IEEE 32 bit single precision floating
point format. One very important requirement of the IEEE-
754 representation is that the number should be represented The concept of ANN is basically introduced from the
with it closest equivalent for the precision chosen, which subject of biology where neural network plays a important
means that it is assumed that any operation is performed and key role in human body. In human body work is done
with infinite precision Any floating point number is first of with the help of neural network. Neural Network is just a
all converted into this format (1) and further operations are web of inter connected neurons which are millions and
performed. Floating-point (FP) addition is based on millions in number. With the help of these interconnected
www.rsisinternational.org/IJRSI.html Page 40
neurons all the parallel processing is done in human body Point Unit (FPU) consisting of a multiplier and
and the human body is the best example of Parallel adder/Subtractor units is proposed. A novel multiplication
Processing. An Artificial Neuron is basically an engineering algorithm is proposed and used in the multiplier implementation.
approach of biological neuron. It has device with many
inputs and one output. ANN consists of large number of III. ORGANIZATION OF WORK
simple processing elements that are interconnected with
each other and also layered. Similar to biological Neuron In this section IEEE 754 single precision format, floating
Artificial Neural Network also have neurons which are point adder, Floating point multiplier and floating point
artificial and they also receive inputs from the other MAC unit are explained. MAC consists of a multiplier and
elements or other artificial neurons and then after the inputs an accumulator unit. Multiplier will multiply two numbers
are weighted and added, the result is then transformed by a and result will be added to the number already stored in the
transfer function into the output. The transfer function may accumulator.
be anything like Sigmoid, hyperbolic tangent functions or a
step. 3.1. STANDARD IEEE 754 FORMAT
The standard binary floating point format was issued by

IEEE in 1985 [6]. It covers different types of floating-point
formats (e.g. single, double), special coding representations
(e.g. 0, +∞, −∞), rounding mechanisms, arithmetic
operations, etc. The standard radix-2 binary floating-point
representation can be written as in equation1 with s as the
sign bit, M as the mantissa or fraction.
Fig 1 Artificial Neuron S Bit Exponent Mantissa

32 31 23 0
Figure 3. IEEE single precision floating point format
3.2. FLOATING POINT ADDER
The computation is done in four major steps:

1. Sorting: puts the number with the larger magnitude on
the top and the number with the smaller magnitude on the
bottom .
2. Alignment: aligns the two numbers so they have the same
exponent. This can be done by adjusting the exponent of the
small number to match the exponent of the big number.
The significand of the small number has to shift to the right
according to the difference in exponents.
3. Addition/subtraction: adds or subtracts the significands of
two aligned numbers.
4. Normalization: adjusts the result to normalized format.
Three types of normalization procedures may be needed:
Fig 2 Functions of an Artificial Neuron
i) After a subtraction, the result may contain leading zeros
II. RELATED WORK in front.
ii) After a subtraction, the result may be too small to be
Guillermo Marcus presents a multiplier and an normalized and thus needs to be converted to zero.
adder/subtractor for single precision floating point numbers iii) After an addition, the result may generate a carry-out bit.
in IEEE format. They have pipelined architecture which are
implemented in VHDL. Mohamed Al-Ashrafy presents a During alignment and normalization, the lower bits of the
floating point multiplier In IEEE single precision floating significand will be discarded when shifted out. The design
point format. The multiplier does not implement rounding is divided into four stages, each corresponding to a step in
and it just presents the significand multiplied result. Carlos the foregoing algorithm. The circuit in the first stage
Minchola has presented FPGA implementation of a Decimal compares the magnitudes and find the larger number and
Floating Point (DFP) Adder/Subtractor. Lamiaa S. A. the smaller number.. The comparison is done between
Hamid [10] has presented a high speed generic Floating
expl&fracl and exp2&f rac2. It implies that the exponents are subtracted from it. Subtraction is easily achieved by adding
compared first, and if they are the same, the significands are carry in to the sum and then subtracting 128 from it by
compared. complementing most significant bit. For multiplying
significands 48 bit multiplier is used. The 28 bits are
considered as the most significand bits out of which 24 bits
are the mantissa bits, 3 bits are for proper rounding, 1 bit is
for range overflow. The result is then normalized for proper
approximation to closest value. The approximation consists
of a possible single bit right shift and corresponding
exponent is incremented depending on b1 bit. The resultant
sign, exponent and mantissas are obtained. The resultant
sign, exponent and mantissas are then obtained. The figure
shown below is simple floating point multiplier. In this
paper three floating point multipliers have been designed
using carry save, carry look ahead, ripple carry adder. Same
flow is used for all of them .only for addition of exponents
different adders are used.
Figure 4 Block Diagram for floating point addition
The circuit in the second stage performs alignment. It first

calculates the difference between the two exponents, which
is expb - exps, and then shifts the significand, fracs, to the
right by this amount. The circuit in the third stage performs
sign-magnitude addition.Note that the operands are
extended by 1 bit to accommodate the carry-out bit.
The circuit in the fourth stage performs normalization,
which adjusts the result to make the final output conform to
the normalized format. The normalization circuit is
constructed in three segments. The first segment counts the Figure 5 Block Diagram for floating point multiplier
number of leading zeros. It is somewhat like a priority
encoder. The second segment shifts the significands to the
left by the amount specified by the leading-zero counting 3.4. FLOATING POINT MAC UNIT
circuit. The last segment checks the carry-out and zero
conditions and generates the final normalized number. Basically composed of adders, multiplier and an
accumulator. The inputs which are given to the MAC are
3.3. FLOATING POINT MULTIPLIER fetched from memory location and fed to the multiplier of
block of MAC which performs multiplication and gives the
Given two floating point numbers n1, n2 and after result back to adder which will accumulate the result and
multiplication the result is n. store it in a memory location. Complete process is achieved
in a single cycle. The design consists of 32 bit floating point
n=n1*n2 adder and 1 register for memory location. A typical MAC
= (-1)s1.n1.2e1*(-1)s2.n2.2e2 unit consists of multiplier, adder and accumulator. The most
= (-1)s1+s2.p1.p2.2e1+e2 important feature that differentiates general processor from
digital signal processor is it’s multiply and accumulate unit.
In Figure 5 we present a general multiplier block diagram. Each DSP algorithm would require some form of
The sign, exponent and mantissas are extracted from both Multiplication and accumulation system. This one is the
the numbers respectively. Pipelining has been used for most important block in DSP systems. Usually adders that
designing multiplier. The sign bits of both the numbers are are used are carry save, carry select, ripple carry adders
xored. The 8 bit exponents are added and then bias is because of their speed. The inputs of MAC are supposed to
be fetched from memory location and then they are fed to
the multiplier. Multiplier will multiply the inputs and it will Then a complete MAC unit is designed using floating point
give the results back to the adder and then the results of the adder and multiplier and its FPGA implementation is done.
multiplier are added to the previously accumulated results. This MAC Unit is used to feed the inputs to a neuron in
Computation of most important formula i.e. b (n) x (n-k) is Artificial Neural Networks.
easily solved by this operation.
REFERENCES
[1]. Mohamed Al-Ashrafy, Ashraf Salem and Wagdi Anis, “An Efficient
Implementation of Floating Point Multiplier, “proceeding of 2011
IEEE.
[2]. Guillermo Marcus, Patricia Hinojosa, Alfonso Avila and Juan
Nolazco-Flores, “A Fully Synthesizable Single-Precision, Floating-
Point Adder/Subtractor and Multiplier in VHDL for General and
Educational Use”, Proceedings of the Fifth IEEE International
Caracas Conference on Devices, Circuits and Systems, Dominican
Republic.
[3]. Xilinx Inc, ISE, at http://www.xilinx.com.
[4]. Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware
Designs, 1st ed. Oxford: Oxford University Press, 2000
[5]. John G. Proakis and Dimitris G. Manolakis (1996), “Digital Signal
Figure 6 Block Diagram for floating point MAC
Processing: Principles. Algorithms and Applications”, Third Edition.
[6]. Patterson, D. & Hennessy, J. (2005), Computer Organization and
Design: The Hardware/software Interface, Morgan Kaufmann.
IV. RESULTS [7]. Mentor Graphics Inc, FPGA Advantage, at
http://www.mentor.com/fpgaadvantage.
The proposed Mac unit is implemented on Xilinx ISE design suite [8]. IEEE Standards Board, IEEE-754, IEEE Standard for Binary
Floating-Point Arithmetic, New York: IEEE, 1985.
9.2 in Vertex 2P family. The device utilization factor for the
[9]. Lamiaa S.A.Hamid, Khaled A.Sheata, Hassan El-Ghitani, Mohamed
floating point MAC unit is given below. Elsaid (2010),“ Design of Generic Floating Point Multiplier and
Adder/Subtractor Units”, in proceedings of the 12th IEEE
Table 2. Device Utilization Summary of FPMAC international Conference on computer modeling and Simulation.
[10]. YAJUAN CH. and Q. WU. Design and implementation of PID
Logic Used Available Utilization controller based on FPGA and genetic algorithm. In: Proceedings of
utilization 2011 International Conference on Electronics and Optoelectronics.
Dalian: IEEE, 2011, pp. 308–311. ISBN 978-1-61284-275-2.DOI:
10.1109/ICEOE.2011.6013491
Number of slices 610 1408 43% [11]. ZHENBIN G., X. ZENG, J. WANG and J. LIU. FPGA
implementation of adaptive IIR filters with particle swarm
optimization algorithm. In: 11th IEEE Singapore International
Number of 4 input 1123 2816 39%
LUTs
Conference on Communication Systems. Guangzhou: IEEE, 2008,
pp. 1364–1367. ISBN 978-1-4244-2424-5. DOI:
Number of bonded 99 140 70% 10.1109/ICCS.2008.4737406.
IOBs
[12]. OTSUKA, T., T. AOKI, E. HOSOYA and A. ONOZAWA. An
Number of MULT 4 12 33% Image Recognition System for Multiple Video Inputs over a Multi-
18x18s FPGA System. In: IEEE 6th International Symposium on Embedded
Multicore SoCs. Aizu-Wakamatsu: IEEE, 2012, pp. 1–7. ISBN 978-
0-7695-4800-5. DOI: 10.1109/MCSoC.2012.33.
[13]. RAMAKRISHNAN, A. a J. M. CONRAD. Analysis of floating point
operations in microcontrollers. In: Proceedings of IEEE
Southeastcon. Nashville: IEEE, 2011, pp. 97–100. ISBN: 978-1-
61284-739-9. DOI: 10.1109/SECON.2011.5752913.
[14]. UNDERWOOD, K. FPGAs vs. CPUs. In: Proceeding of the 2004
ACM/SIGDA 12th international symposium on Field programmable
gate arrays. New York: ACM Press, 2004, pp. 171–180. ISBN 1-
58113-829-6.DOI: 10.1145/968280.968305.
Figure 7.Simulation of Floating point MAC unit.
V. CONCLUSION
A FP adder and a FP multiplier are presented in this paper.

Both are available in pipeline architectures and they are
implemented in VHDL, are fully synthesizable.

Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks

Uploaded by

Copyright:

Available Formats

Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks

Uploaded by

Copyright:

Available Formats

Volume II, Issue IV, April 2015 IJRSI ISSN 2321 - 2705

Implementation of 32 Bit Floating Point MAC Unit to

Z= (-1s) 2 (exp-bias)(1*M) (1) 4 =255 =0 Infinity

The standard binary floating point format was issued by

Fig 1 Artificial Neuron S Bit Exponent Mantissa

3.2. FLOATING POINT ADDER

The computation is done in four major steps:

Figure 4 Block Diagram for floating point addition

The circuit in the second stage performs alignment. It first

Figure 7.Simulation of Floating point MAC unit.

A FP adder and a FP multiplier are presented in this paper.

You might also like