VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
Abstract— VLSI implementation of Neural network and should have high degree of precision and dynamic range
processing or digital signal processing based applications simultaneously.
comprises large number of multiplication operations. A key
design issue, therefore in such applications depends on efficient The multipliers approach presented in last decade were not
realization of multiplier block which involves trade-off between satisfying our goal i.e. a VLSI implementation of floating point
precision, dynamic range, area, speed and power consumption of multiplier which is area-power-speed efficient and should have
the circuit. high degree of precision and dynamic range.
I. INTRODUCTION
Multiplication is the fundamental operation in neural
network processing or digital signal processing (DSP) based
applications. There are certain applications in this domain
wherein not only design should be area-power and speed
efficient but it also demands high degree of precision and Fig.2.1: An example of 4×4 Array multiplier
dynamic range. Therefore, in such cases it appears
implementation of multiplier block in floating point arithmetic
with adequately chosen parameters is a good compromise [1].
For performing floating point multiplication the numbers
are represented in the desired floating point format. The
product is obtained by multiplying the mantissa and adding the
exponents. The sign bits of mantissa should be added
separately to determine the sign of product of mantissa [2].
978-1-4799-8792-4/15/$31.00 2015
c IEEE 1672
Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
Comparative analysis suggests that bit serial architecture
(Type III) provides better trade off to realize multi-objective
optimization approach for VLSI Implementation of digital
neural network.
So, the research work in this paper presents two multiplier
viz. array multiplier & bit serial architecture based multiplier
implemented in floating point arithmetic (IEEE-754 single
precision format).
G1 (And
N2=64 W*(D*D)=32 N=8
gates)
Best due to
Speed Low Better
unfolding concept
2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 1673
Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
(de-normalized number), overflow, infinity and not a number
(NaN) to find the 32 bit input data and hence output (via rst
FPpack block & logic block to predict nature of output) is a
U3
valid IEEE 754 single precision floating point number.
rs t
ex_clk
U1
clk clk load
c lk s
c lk b
rs t
The summary of conditions used in unpackFP block & sel(4:0)
load
sel(4:0)
Logic block to predict nature of output to find the 32 bit input U2 a(23:0)dout(23:0) dout(23:0)
b(23:0)
fsm_bsmt31
data is a valid IEEE 754 single precision floating point number clk sout(23:0) bsmt3_241_test2
are given in table 3.1 and flowchart (figure 3.2). rst
data_gen2
De-normalized
0/1 00 any value
number (underflow)
Fig.3.3: RTL view of bit serial architecture based multiplier block
zero 0/1 00 0
clks
U3
U2
U1
clkb clk clk
load
Fig.3.2: Flowchart representing logic used in program to check for underflow
condition Fig.3.4: RTL view of bsmt3_241_test2 used in bit serial architecture
Multiplier block and exponent adder block: - Multiplier based multiplier block
block performs the multiplication of two input data (mantissa) Bit serial architecture based multiplier block working:-
coming from unpackFP0 & unpackFP1 block and generates
mantissa part of final output. Output data bus of multiplier Initially by applying rst= ‘1’ in first clock cycle the entire
blocks implementated with array multiplier logic 32bit×32bit circuit is reset i.e. sel= ‘0000’, load= ‘0’. In next clock cycle
(figure 3.1) was of 64 bit and total partial product inferred were the rst = ‘0’, the counter gets initialized and count value gets
64. Output data bus of multiplier block implemented with bit loaded in sel. Counter value changes with respect to external
serial logic was 24 bit and number of AND gates inferred to clock i.e every 2 cycles of main clock. Counter increments till
implement partial product were 24. count =25 after that it roll backs to zero.
1674 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
As soon as count becomes zero, load is set and inputs viz. The FPround block performs the function of rounding. If
A & B are loaded in bsmt3_241_test2 from data_gen1 and third LSB bit of mantissa is ‘1’, then no need of data rounding
data_gen2 block. Select line sel(4:0) of multiplexer muxps241 and SIG_in is copied in SIG_in. Otherwise, bit’1’ is added to
is driven by counter of fsm_31bsmt31 block. Multiplexer SIG_in (22 down to 3) and the result is concatenated with
applies bit b (i) to d input of bsmt3_241 block. Block “000” at LSB. The block schematic of FPround block is shown
bsmt3_241 block performs the multiplication operation based in figure 3.7. Logic used to implement this block is
on bit serial approach Type III as shown in figure 2.2. represented in the form of flowchart in figure 3.8.
At count = ‘25’, the result of multiplication i.e. 48 bit (47
down to 0) result is obtained. The stop241 blocks copies 24 bit
(47 down to 24) bit in output port of multiplier block i.e. dout.
The RTL view or state diagram of blocks used in
implementation of this block are shown in figure 3.3, 3.4 & 3.5
respectively.
Exponent adder blocks add exponent parts of respective
exponent parts of input A and B to generate exponent of output
Z.
FPnormalize block: The block schematic of FPnormalize
block is shown in figure 3.5. It first checks whether 23 bit
mantissa’s MSB bit is one or zero. If it is one then mantissa is
in de-normalized form and FPnormalize block converts de-
normalized mantissa into normalized form. Logic used to
implement this block is represented in the in form of flowchart Fig.3.7: Block schematic of FPround
in figure 3.6.
2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 1675
Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
The performance comparison of implemented multiplier at
frontend and backend VLSI design are given in table 4.1 and
4.2 respectively and graphical representation of total cell area
and total dynamic power dissipation are shown in figure 4.1
and 4.2 respectively.
Table 4.1: Performance comparison of implemented multiplier at
frontend level
1676 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)
Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.
Future research work includes application of this module in
high end applications like image processing, neural networks
and digital signal processing .
REFERENCE
2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 1677
Authorized licensed use limited to: National Taiwan University. Downloaded on March 21,2024 at 08:52:25 UTC from IEEE Xplore. Restrictions apply.