0% found this document useful (0 votes)
58 views6 pages

VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic

The document discusses VLSI implementation of a bit serial architecture based multiplier for floating point arithmetic. It analyzes array multipliers and other approaches and finds that a bit serial architecture based multiplier provides good optimization of area, speed, power and precision for neural network applications that require both high performance and accuracy in floating point multiplication.

Uploaded by

meg.huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views6 pages

VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic

The document discusses VLSI implementation of a bit serial architecture based multiplier for floating point arithmetic. It analyzes array multipliers and other approaches and finds that a bit serial architecture based multiplier provides good optimization of area, speed, power and precision for neural network applications that require both high performance and accuracy in floating point multiplication.

Uploaded by

meg.huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

VLSI Implementation of Bit Serial Architecture

based Multiplier in Floating Point Arithmetic

Jitesh R Shinde Suresh S Salankar


Research Scholar & IEEE member Professor, Electronics & Communication Department
Nagpur, India G.H.Raisoni College of Engineering
Nagpur, India

Abstract— VLSI implementation of Neural network and should have high degree of precision and dynamic range
processing or digital signal processing based applications simultaneously.
comprises large number of multiplication operations. A key
design issue, therefore in such applications depends on efficient The multipliers approach presented in last decade were not
realization of multiplier block which involves trade-off between satisfying our goal i.e. a VLSI implementation of floating point
precision, dynamic range, area, speed and power consumption of multiplier which is area-power-speed efficient and should have
the circuit. high degree of precision and dynamic range.

The study in this paper investigates performance of VLSI


There were three possible approaches to meet our design
implementation of bit serial architecture based multiplier (Type constraints viz.
III) in floating point arithmetic (IEEE 754 Single Precision -array multiplier (MUL1) (fig. 2.1).
format).
-multiplier based on digit serial architecture approach
Results of implementation of 32x32 bit multiplier on FPGA as (Type III) (MUL2) [3].
well as on Backend VLSI Design tool indicate that bit serial
architecture based multiplier design provides good trade-off in -multiplier based on bit serial architecture based (Type
terms of area, speed, power and precision over array multiplier III) (MUL3) [4].
and other multipliers approach proposed since last decade. In
other words, bit serial architecture based multiplier (Type III)
approach may provide good multi-objective solution for VLSI
circuits.

Keywords—Array Multiplier, bit serial architecture based


multiplier, floating point arithmetic, Not a Number (NaN),
underflow or de-normalized number, Ripple Carry Adder (RCA).

I. INTRODUCTION
Multiplication is the fundamental operation in neural
network processing or digital signal processing (DSP) based
applications. There are certain applications in this domain
wherein not only design should be area-power and speed
efficient but it also demands high degree of precision and Fig.2.1: An example of 4×4 Array multiplier
dynamic range. Therefore, in such cases it appears
implementation of multiplier block in floating point arithmetic
with adequately chosen parameters is a good compromise [1].
For performing floating point multiplication the numbers
are represented in the desired floating point format. The
product is obtained by multiplying the mantissa and adding the
exponents. The sign bits of mantissa should be added
separately to determine the sign of product of mantissa [2].

II. PROBLEM STATEMENT


The demand of our research work based on multi- Fig.2.2: Bit-serial type-III multiplier with word-length of 4 bits
optimization for VLSI implementation of neural network was
multiplier used in design should be area-speed-power efficient

978-1-4799-8792-4/15/$31.00 2015
c IEEE 1672

Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
Comparative analysis suggests that bit serial architecture
(Type III) provides better trade off to realize multi-objective
optimization approach for VLSI Implementation of digital
neural network.
So, the research work in this paper presents two multiplier
viz. array multiplier & bit serial architecture based multiplier
implemented in floating point arithmetic (IEEE-754 single
precision format).

III. DESIGN & IMPLEMENTATION


The generalized block schematic of multiplier used is
shown in figure (3.1). The major entities in block diagram are
unpackfp, multiplier block, exponent adder block, fpnormalize,
fpround and packfp respectively. The description and working
of each block is as follows:-
Fig.2.3: Digit Cell for type-III multiplier

Bit - serial arithmetic and communication is efficient for


computational processes, allowing good communication within
and between VLSI chips and tightly pipelined arithmetic
structures. It is ideal for neural networks as it minimizes the
interconnect requirement by eliminating multi - wire busses
[15].
Comparative analysis of multiplier (N*N) with respect to
multiplicand data size ‘A” & multiplier data size of N=8 in
both cases are shown in table 2.1.
Table 2.1: Comparison of Multipliers
Type of Multipliers
Parameters
MUL1 MUL3 (Bit
MUL2 (Digit)
(Array) Serial)

G1 (And
N2=64 W*(D*D)=32 N=8
gates)

G2 (Adders) N(N-1)=12 2*(N/W)=8 N=8

Pipelining Absent Present Present

Best due to
Speed Low Better
unfolding concept

Better than array


Area High Optimum
multiplier
Fig.3.1: Generalized block schematic structure of IEEE 754 single
precision multiplier block
Higher than bit
(Dynamic
serial architecture Function of unpackFP block, Packfp block & Logic
Power Moderate Optimum
due to unfolding block to predict nature of output :- Unpack block unpacks
Dissipation
concept incoming data (31 down to 0) into three parts viz. sign bit
(MSB 31st bit), exponent (30 down to 23) and mantissa (22
down to 0). This blocks maps 23 bit mantissa into 32 bit by
appending zeros at LSBs.
Where description of notations G1, G2 used in above table
is as follows: Packfp block packs final result of multiplication obtained
after normalization & rounding i.e. its mantissa, exponent and
- G1 => approximate number of AND gates required sign bit into IEEE -754 single precision format.
for partial product implementation.
Unpackfp and packfp block also checks the exponent and
- G2 => approximate number of Full Adders required. mantissa part of inputs for the following conditions (underflow
- Digit size D=N/W=4. No. of folding W=2.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 1673

Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
(de-normalized number), overflow, infinity and not a number
(NaN) to find the 32 bit input data and hence output (via rst
FPpack block & logic block to predict nature of output) is a
U3
valid IEEE 754 single precision floating point number.

rs t
ex_clk
U1
clk clk load

c lk s
c lk b
rs t
The summary of conditions used in unpackFP block & sel(4:0)
load
sel(4:0)
Logic block to predict nature of output to find the 32 bit input U2 a(23:0)dout(23:0) dout(23:0)
b(23:0)
fsm_bsmt31
data is a valid IEEE 754 single precision floating point number clk sout(23:0) bsmt3_241_test2
are given in table 3.1 and flowchart (figure 3.2). rst

Table 3.1: Table listing conditions to check nature of input data_gen1

Number sign exponent mantissa


U4
clk sout(23:0)
normalized number 0/1 01 to FE any value rst

data_gen2
De-normalized
0/1 00 any value
number (underflow)
Fig.3.3: RTL view of bit serial architecture based multiplier block

zero 0/1 00 0

infinity (Overflow) 0/1 FF 0

NaN i.e Not a Number any value


FF
(inf*0 or inf/inf or 0/0 form) but not 0

The IEEE 754 standard partially solves the problem of


underflow by using de-normalized representations in which a
de-normalized representation is characterized by an exponent
code being all 0's, interpreted as having the whole part of the
significand being an implied 0 instead of an implied 1.

Fig.3.4: State diagram of fsm_31bsmt31 used in bit serial architecture


based multiplier block

clks
U3
U2
U1
clkb clk clk

b(23:0) d(23:0) y d s d dout(23:0) dout(23:0)


sel(4:0) sel(4:0) rst rst rst

a(23:0) a(23:0) load


muxps241
bsmt3_241 stop241

load
Fig.3.2: Flowchart representing logic used in program to check for underflow
condition Fig.3.4: RTL view of bsmt3_241_test2 used in bit serial architecture
Multiplier block and exponent adder block: - Multiplier based multiplier block
block performs the multiplication of two input data (mantissa) Bit serial architecture based multiplier block working:-
coming from unpackFP0 & unpackFP1 block and generates
mantissa part of final output. Output data bus of multiplier Initially by applying rst= ‘1’ in first clock cycle the entire
blocks implementated with array multiplier logic 32bit×32bit circuit is reset i.e. sel= ‘0000’, load= ‘0’. In next clock cycle
(figure 3.1) was of 64 bit and total partial product inferred were the rst = ‘0’, the counter gets initialized and count value gets
64. Output data bus of multiplier block implemented with bit loaded in sel. Counter value changes with respect to external
serial logic was 24 bit and number of AND gates inferred to clock i.e every 2 cycles of main clock. Counter increments till
implement partial product were 24. count =25 after that it roll backs to zero.

1674 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
As soon as count becomes zero, load is set and inputs viz. The FPround block performs the function of rounding. If
A & B are loaded in bsmt3_241_test2 from data_gen1 and third LSB bit of mantissa is ‘1’, then no need of data rounding
data_gen2 block. Select line sel(4:0) of multiplexer muxps241 and SIG_in is copied in SIG_in. Otherwise, bit’1’ is added to
is driven by counter of fsm_31bsmt31 block. Multiplexer SIG_in (22 down to 3) and the result is concatenated with
applies bit b (i) to d input of bsmt3_241 block. Block “000” at LSB. The block schematic of FPround block is shown
bsmt3_241 block performs the multiplication operation based in figure 3.7. Logic used to implement this block is
on bit serial approach Type III as shown in figure 2.2. represented in the form of flowchart in figure 3.8.
At count = ‘25’, the result of multiplication i.e. 48 bit (47
down to 0) result is obtained. The stop241 blocks copies 24 bit
(47 down to 24) bit in output port of multiplier block i.e. dout.
The RTL view or state diagram of blocks used in
implementation of this block are shown in figure 3.3, 3.4 & 3.5
respectively.
Exponent adder blocks add exponent parts of respective
exponent parts of input A and B to generate exponent of output
Z.
FPnormalize block: The block schematic of FPnormalize
block is shown in figure 3.5. It first checks whether 23 bit
mantissa’s MSB bit is one or zero. If it is one then mantissa is
in de-normalized form and FPnormalize block converts de-
normalized mantissa into normalized form. Logic used to
implement this block is represented in the in form of flowchart Fig.3.7: Block schematic of FPround
in figure 3.6.

Fig.3.5: Block schematic of FPnormalize block

Fig.3.8: Flowchart representing logic used to implement FPround block in


program

Fig.3.6: Flowchart representing logic used to implement FPnormalize block


IV. RESULTS & COMPARISON
FPround block: - In floating point arithmetic the size of The code for 32×32 bit array multiplier (MUL1) & bit
the result of an operation may be exceeding the size of binary serial architecture based Type III multiplier (MUL2) were
used in the number system. In such cases the low order bits has written in Aldec Active HDL tool and synthesized on Altera’s
to be eliminated in order to store the result. The method of Quartus tool and it was targeted on FPGA Cyclone 2, Device
eliminating these lower order bits is rounding [2]. EP2C5AF256A7. Later on code was also tested at Backend on
Synopsis tool on 45 nm & 90 nm tech file. Experimental
results were found to be matching with theoretical results.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 1675

Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
The performance comparison of implemented multiplier at
frontend and backend VLSI design are given in table 4.1 and
4.2 respectively and graphical representation of total cell area
and total dynamic power dissipation are shown in figure 4.1
and 4.2 respectively.
Table 4.1: Performance comparison of implemented multiplier at
frontend level

Parameters MUL1 MUL3


32×32 bit 32×32 bit
Total Logic Elements 234 209
Total dynamic power 6.11 0.54
dissipation(mW)
Worst propagation delay 41.604 21.495 Fig 4.1: Total Cell area graphical representation of Array multiplier and bit
(nsec) serial architecture based multiplier
0.54x26=14.04
Table 4.2: Performance comparison of implemented multiplier at Backend
level

Parameters 90 nm tech file 45 nm tech file (no


Workload model)
MUL1 MUL3 MUL1 MUL3 32×32
32×32 32×32 32×32 bit bit
bit bit
Total Area (nm 20677. 5420.9 -- --
square) 499757 71736

Total cell area 20007. 5304.7 19676.340 2185.060771


(nm square) 014559 29626 759
Total dynamic 0.5913 0.0275 6.3856 0.0312965
power 478 544
dissipation(mW)
Data arrival time 19.89 4.19 1.93 1.34 Fig 4.2: Total dynamic power dissipation graphical representation of Array
(nsec) multiplier and bit serial architecture based multiplier
0.0275544x26=0.7164144
The experimental results at Frontend VLSI design level V. CONCLUSIONS
indicates that MUL3 is better than MUL1
The experimental results indicated that bit serial
-in area by 10.6837 % architecture Type III based multiplier implementated in
-in dynamic power dissipation by 83.88 % floating point arithmetic (IEEE 754- single precision format)
leads to an area efficient, low power and high speed digital
-in delay by 48.3342% multiplier with high degree of precision. It has also proven to
be better alternative over array multiplier. In other words,
The experimental results at Backend VLSI design level approach used by us i.e. bit serial architecture Type III based
indicates that MUL3 is better than MUL1 multiplier implementated in floating point arithmetic provides
-In total area by 73.7832 % in 90 nm tech file. a good multi-objective optimization solution.
-In total cell area by 73.4856 % in 90 nm tech file and System designers often face problems in realizing and
by 88.8894% in 45 nm tech file. optimizing area, power and speed simultaneously with high
degree of precision & dynamic range in VLSI implementation
-In dynamic power dissipation by power by 95.3404 % of complicated digital circuits. The bit serial architecture Type
in 90 nm tech file and by 99.5099 %. III based multiplier approach suggested in this paper were
-In data arrival time by 79.133 % in 90 nm tech file found to be giving better performance than other promising
and by 30.5699 % in 45 nm tech file. findings available in literature [5 ,6, 7, 8, 9, 10, 11, 12, 13 &
14] . Many of these findings were single objective optimization
based and were not multi-objective optimization based. Thus,
the approach suggested in this paper may provide a challenging
solution in realizing area, power as well as speed efficient
optimized design for VLSI circuits.

1676 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
Future research work includes application of this module in
high end applications like image processing, neural networks
and digital signal processing .

REFERENCE

[1] Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-


Pierre, Jeannerod Vincent Lef`ever, Guillaume Melquiond, Nathalie
Revol, Damien Stehl´ e, Serge Torres, “ Handbook of floating point
arithmetic”, Birkh¨auser Boston, part of Springer Science+Business
Media.
[2] A.Nagoor Kani, “Digital Signal Processing”, ch.8, Tata McGrawhill.
[3] Yun-Nan Chang,Student Member, IEEE, Janardhan H. Satyanarayana,
Member, IEEE, and Keshab K. Parhi, Fellow, IEEE, “Systematic Design
of High-Speed and Low-Power Digit-Serial Multipliers”, IEEE
Transactions on Circuits and Systems—II: Analog & Digital Signal
Processing, Vol. 45, no. 12, December 1998.
[4] Ms.P.J.Tayade, Dr. Prof. A.A.Gurjar, “Systematic Design of High-
Speed and Low-Power Digit-Serial Multipliers VLSI Based”,
International Journal of Management, IT and Engineering, Vol. 2, Issue
5, Pg. no. 439-446, May 2012
[5] Summit Vaidya, Deepak Dandekar, “Delay-Power Performance
Comparison of Multipliers in VLSI Circuit Design”, International
Journal of Computer Networks & Communications (IJCNC)”, Vol. II,
Issue IV, July 2010.
[6] M.K.Pavuluri, T.S.R. Krishna Prasad, Ch.Rambabu “Design &
Implementation of Complex Floating Point Processor using FPGA”,
International Journal of VLSI Design & Communication Systems
(VLSICS)”, Vol. IV, Issue V, October 2013.
[7] Prashant Kumar Sahu, Nitin Meena, “Comparative Study of Different
Multiplier Architectures”, International Journal of Engineering Trends &
Technology (IJETT)”, Vol. IV, Issue X, October 2013.
[8] Deepak Purohit, Himanshu Joshi, “ Comparative Study & Analysis of
Fast Multipliers”, International Journal of Engineering & Technical
Research (IJETR), Vol. II, Issue VII, July 2014.
[9] Anitha R, Alekhya Nelapati, L.Jesima W, V.Bagyaveereswaran, “
Comparative Study of High Performance Bruan’s Multiplier using
FPGAs”, IOSR Journal of Electronics & Communication Engineering
(IOSRJECE)”, Vol. I, Issue IV, pp 33-37, May-June,2012.
[10] Kumar Mishra, V.Nandanwar, Eskinder Anteneh Ayele, S.B.Dhok, “
FPGA Implementation of Single Precision Floating Point Multiplier
Using High Speed Compressors”, International Journal of Soft
Computing & Engineering, Vol. IV, Issue II, May 2014.
[11] B.Jeevan, S.Narendra, Dr. C.V.Reddy, Dr. K.Sivani, “A High Speed
Binary Floating Point Multiplier Using Dadda Algorithm”, IEEE 2013.
[12] Shaifali, Sakshi, “ FPGA Design of Pipelined 32-bit Floating Point
Multiplier”, International Journal of Computational & Management,
Vol. XVI, Issue V, September, 2013.
[13] Chaitali V. Matey, Dr. S.D. Chede, S.M.Sakhare, “ Design &
Implementation of Floating Point Multiplier Using Wallace and Dadda
Algorithm”, International Journal of Application or Innovation in
Engineering & Management”, Vol. III, Issue VI, June, 2014.
[14] R.Sai Siva Teja, A.Madhusudhan “FPGA Implementation of Low Area
Floating Point Multiplier Using Vedic Mathematics”, International
Journal of Engineering & Advanced Engineering, Vol. III, Issue XII,
December 2013.
[15] Alan F. Murray, Anthony V. W. Smith and Zoe F. Butler, “BIT -
SERIAL NEURAL NETWORKS”, American Institute of Physics 1988.
[16] Jitesh Shinde, Suresh Salankar, “VLSI Implementation of Neural
Network”, Current Trends in Technoligy & Science Journal,Vol. 4,
Issue 03, April-May,2015.

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 1677

Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.

You might also like