0% found this document useful (0 votes)
21 views10 pages

Paper M

This document describes a high-speed signed multiplier that uses the Radix-4 Booth algorithm and Ladner-Fischer adder. The Radix-4 Booth algorithm reduces the number of partial products generated during multiplication, increasing speed. The Ladner-Fischer adder allows for partial product generation and addition to occur concurrently, improving performance over traditional ripple carry adders. The proposed multiplier design is implemented in Verilog HDL and synthesized using the Xilinx ISE design suite to target FPGAs. It aims to provide an efficient signed multiplication solution for applications like digital signal processing that require both speed and reduced hardware resources.

Uploaded by

manishmeenahlm2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

Paper M

This document describes a high-speed signed multiplier that uses the Radix-4 Booth algorithm and Ladner-Fischer adder. The Radix-4 Booth algorithm reduces the number of partial products generated during multiplication, increasing speed. The Ladner-Fischer adder allows for partial product generation and addition to occur concurrently, improving performance over traditional ripple carry adders. The proposed multiplier design is implemented in Verilog HDL and synthesized using the Xilinx ISE design suite to target FPGAs. It aims to provide an efficient signed multiplication solution for applications like digital signal processing that require both speed and reduced hardware resources.

Uploaded by

manishmeenahlm2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

A High-Speed Effective Signed Multiplier Using Ladner-Fischer Adder for

Hardware Boosters

Pedada Ravi Raj1 and G. V. Subba Reddy2


1M. Tech Scholar, Department of ECE, GRIET, Hyderabad, India

[email protected]
2Department of ECE, GRIET, Hyderabad, India

[email protected]

Abstract. Multiplication is a mathematical operation that is used in many different applications, including digital
signal processing and communication systems. The Booth algorithm and the Ladner-Fischer architecture are used to
design a signed multiplier in this study. The Booth algorithm is based on the Radix-4 Booth encoding multiplier,
which minimizes the generated partial products in halves, increasing the multiplier’s speed and reducing the
multiplier’s circuit area. The Ladner-Fischer design has advantages in partial product production and addition that
occur concurrently. To add up the generated partial product, this multiplier employs the Ladner-Fischer parallel
prefix adder, whereas previously designed partial product generators employ the ripple carry adder which increases
the latency. As a result, employing this adder provides greater performance in terms of delay. The proposed approach
is written in Verilog HDL and is synthesised with the 14.7 version of the Xilinx ISE design suite.

Keywords: Radix-4 Algorithm, Multiplier, Ladner-Fischer, Look Up Tables (LUTs).

1 Introduction
Multipliers are essential components of many digital systems, including microprocessors, digital filters, and
digital signal processors. These can also be utilised in the discrete Fourier transform implementations,
ranging measurement and correlation. Multiplication is nothing more than an add and shift technique, which
is a sequence of repetitive additions. In other words, the multiplicand multiplies itself several times. This
multiplication will require a high number of hardware parts and will function at a slower pace. The most
crucial aspect to consider in many real-time applications is speed. Because the DSP sections are not
distributed uniformly over the Field Programmable Gate Array (FPGA), the critical path latency may be
influenced once the majority of them must be combined for enormous multiplication operations [1, 2]. In
general, multiplication is performed by producing a partial product first and afterwards adding to it. Velocity
of the multiplier is determined by how quickly partial products are produced and combined. For the partial
products to be accelerated, decrease the amount of partial products, and use an efficient adder to speed up
the addition. In linear and vectors computing, high-speed specialised multipliers are employed [3]. Booth
multipliers with high speeds and pipelines are found in digital signal processing (DSP) applications like
communication networks and multimedia. Fast Fourier transform (FFT) and other high-speed DSP
processing applications require adds and multiplications.
Technology is currently growing at a breakneck pace in such a brief period. The circuits under designed
contain billions of elements that are small in size, fast, and consume little power. As a result, the design of
any circuit must consider area, speed, and power. To fulfil market demands, a device with a small footprint
and minimal latency limits must be built [4]. LUTs are key resources in Field Programmable Gate Arrays
(FPGAs) which perform Boolean operations. Xilinx FPGAs consists of 6 input LUTs which can perform
any Boolean operations with up to six inputs. The LUTs may be set up as a single 6-input, single-output
LUT or as two 5-input LUTs having distinct outcomes but shared inputs. Every LUT output can be stored
in a flip-flop if desired. There are specialised interconnected routes within each Combinational Logic Block
(CLB) for linking LUTs without having to leave and re-enter a CLB, substantially lowering the utilisation
of global routing capabilities. Whenever the FPGA layout changes, the synthesis methods must adjust to
provide the optimum mapping on the resources available [5]. The most essential function in computer
arithmetic is binary addition. VLSI integer adders are essential components in digital signal processors and
general-purpose microprocessors because they are used in floating-point arithmetic data paths, address
generating elements and ALUs [6].
A parallel prefix adder (PPA) is presently thought to be an efficient adder for adding two multi-bit values.
At the level of efficient circuitry, PPA speed and circuit complexity are critical criteria. Parallel prefix adders
are now regarded as combinational circuits which are effective for conducting binary addition of 2 multi-bit
values. Such adders are commonly found in arithmetic-logic units where they are part of modernised
processors like digital signal processors and microprocessors [7]. When designing a VLSI circuit, various
entities must be optimised. Often, these entities cannot be optimised concurrently and must be enhanced one
at a time at the expense of one or more others. It has become challenging to implement an effective integrated
circuit that is effective in terms of area, power, and speed at the same time. Power dissipation is now
considered an important element in current multipliers. The goal of a good multiplier is to create a spatially
compact, high-speed, low-power semiconductor.
1.1 Concept of Radix-4 Algorithm and Sign Extension
A typical sort of multiplication is multiplication using the Modified Booth algorithm. It is a signed-radix-4 encoding
technique which is redundant. Its major feature is that, when compared to any other radix-2 representation, it decreases
the amount of partial products in multiplication by halve, increasing speed. It is also known as the bit pair algorithm.
The key concept is that rather than shifting and merging each column of the multiplier phrase and multiplying with one
or zero, just choose each second column and multiply with +1, -1, +2, -2, 0.
Table 1. Radix-4 Algorithm

i+1 i i-1 Booth encoding

0 0 0 0

0 0 1 +1

0 1 0 +1

0 1 1 +2

1 0 0 -2

1 0 1 -1

1 1 0 -1

1 1 1 0

A multiplicand (the MSB) sign and the associated sum of Booth encoding determine the partial product right
sign in a booth’s encoding-stationed multiplier (BE). The Bewick’s sign extension (SE) approach is used in
this case to provide the proper sign for a partial product.
Table 2. Sign Extension

Booth Encoding MSB Multiplicand Sign Extension

0 0 0

0 1 0

1 0 0

1 1 1

2 0 0

2 1 1

2̅ 0 1
̅2 1 0

1̅ 0 1
̅1 1 0
Concept of Ladner-Fischer Adder

Fig. 1. Ladner-Fischer Adder [Website URL: https://renaysha.me/ladner-fischer-adder-63/]

The Ladner-Fischer PPA is utilised to conduct the addition function. It looks like a tree like structure used
to execute the arithmetic process. For high-performance addition, this type of adder is utilised. It is having
black and grey cells. Every black cell is equipped with 2 AND gates and 1 OR gate. Every grey cell contains
a single AND gate.
It consists of 3 stages: Pre-Processing, Carry generation and Post-Processing stages.
Pre-Processing stage: During this phase, propagate and generate are taken from every pair of inputs. The
propagate gives a "XOR" action on the input bits and the generate performs a "AND" action.
Carry generation stage: Carry is produced for every bit at this stage, which is referred to as carry generate.
Carry propagate and carry generate are formed for subsequent operations, while carry is provided by the last
cell in every bit operation. The final bit carry will aid in producing the sum of the next bit and the previous
bit concurrently.
Post-Processing stage: In the final step of this adder, the carry of first bit is XORed with the subsequent bit
of propagates, and the outcome is delivered as sum.

2 Literature Survey

Semeen Rehman, Salim Ullah, Akash Kumar, Muhammad Shafique have proposed exact and
approximation multipliers with great performance for FPGA-based hardware boosters. Multipliers is a
common arithmetic operation in a variety of applications including machine learning and image/video
processing. High- performance multipliers are available as DSP blocks from FPGA companies. These
multipliers can cause additional routing delays, be inefficient for multiplications of smaller bit widths, and
are limited in quantity and have fixed placements on FPGAs. As a result, FPGA suppliers now provide soft
IP cores that are designed for multiplication. Furthermore, for complicated applications with competing
FPGA resource requirements, manually optimising the allocation of necessary FPGA resources to improve
performance improvements may be possible.
D. Kalaiyarasi, M. Saraswathi have proposed the improvement of a highspeed Radix-4 booth multiplier
for signed and unsigned numbers. A generic add-and-shift operation may be used to do multiplication, in
which each multiplier bit produce a number of bits of the multiplicand that must be added to the partial
product. In the range measurement, discrete Fourier transform and correlation,e multipliers are also used.
Given that multiplication is a very slow process, any digital system's performance is typically judged by the
number of multipliers it uses. A sequence of repeated additions make up the add and shift algorithm, which
is all that multiplication is. To put it another way, the multiplicand multiplies itself several times. Reducing
the amount of additions will reduce the number of unfinished products, which will enhance performance.
Jie Han, Honglan Jiang, Fabrizio Lombardi, Fei Qiao have proposed Radix-8 booth multipliers with
minimal power and great performance. Multipliers are more complicated than subtractors and adders, and
their speed generally sets the operating speed of the DSP system. With additional variables such as system
latency, hardware complexity, and power consumption, high accuracy is often seen as a stringent criterion.
By encoding and minimising the number of partial products, the Booth Multiplier is commonly used for
high-performance signed multiplication. The Radix-4 method multiplier is highly efficient owing to the
simplicity of creating partial products, however the Radix-8 Booth multiplier is sluggish because making
the multiplicand's odd multiples is so challenging.
Haroon Waris, Weiqiang Liu, Chenghua Wang have developed approximation booth multipliers based
on hybrid low radix encoding. In recent years, the design of energy-efficient embedded systems has become
increasingly important. Because a significant variety of applications necessitate bespoke hardware with
lower power consumption, this is the case. Conversely, the amount of data that these hardware units must
handle has expanded dramatically, making it increasingly difficult to achieve both criteria. To resolve
difficulty, approximate computing has emerged as a viable option. In general, approximation circuits refer
to the construction of arithmetic circuits like multipliers and adders. Booth Multiplier with truncation
technique provides great hardware gain but has a big error; consequently, solutions with error compensation
modules are also exhibited. Radix-4 Booth encoding is commonly utilised to generate power-efficient and
small-area signed multipliers because it facilitates the synthesis of partial products.
Pakkiraiah Chakali, Madhu Kumar Patnala have designed carry select adder based on high speed
Ladner-Fischer. There are a variety of addition algorithms available, ranging from simple Ripple Carry
Adders to complicated CLA. Multiplication, Addition, and Accumulation are the three basic operations in
any Digital Signal Processing system. In every digital, DSP, or control system, addition is a necessary action.
As a result, the performance of adders determines how quickly and accurately a digital system operates. As
a result, the key topic of research in VLSI system design is enhancing adder performance. Many various
adder architectures have been developed and suggested to accelerate binary additions during the last decade.
L. P. Deepthi Bollepalli, Chris D. Martinez, David H. K. Hoe have proposed a parallel prefix adder with
fault tolerance for FPGA and VLSI design. Circuits used in nanoscale technologies require a fault-tolerant
system in particular because the smaller device parameters render the circuit vulnerable to outside
interference like cosmic rays. Future technologies will therefore prioritise a circuit's ability to identify and
address problems. Optimising the adder design is a current research subject since it commonly determines
the critical path across many digital circuit systems, such as processor data pipelines and digital signal
processors.

3 Methodology

Both the existing and the proposed designs use the same methodology, but in the proposed design’s partial
product accumulation phase, a Ladner-Fischer adder is used rather than RCA, which has the drawback of
being slower because, unlike the existing design, each full adder must wait until the last full adder yields
output carry, which uses a full adder right away. This proposed design will perform better in terms of latency
thanks to the Ladner-Fischer PPA utilised in it.
3.1 Generation of Accurate Signed Partial Products

(a) (b) (c)

Fig. 2. LUT configuration: (a) Type-A (b) Type-B (c) Type-C

A LUT Type-A arrangement is used to carry out booth encoding as shown in fig. 2(a). The multiplicand's
an and an-1, as well as the multiplier's bm+1, bm, and bm-1, are its five inputs. The LUT's core implementation
includes three multiplexers. Depending on the BE value, the first MUX chooses whether to transmit a n-1 or
an for partial product manufacture. The 2nd MUX, which is regulated by the ‘c’ signal, inverts the output of
the 1st MUX. Lastly, the 3rd multiplexer could produce the partial product ‘0’ depending on the value of the
z signal. This data is transmitted as a carry propagate signal "pout" in the direction of the corresponding carry
chain. The carry signal generate 'gout' for the carry chain is generated by the input an.
As demonstrated in Fig. 2(b) and (c), the Bewick's approach, a sign extension strategy, is applied in each
row of the partial product. The multiplicand's an (MSB of multiplicand) , pin, as well as the multiplier's bm+1,
bm, and bm-1, are its five inputs. When it comes to the first partial product of the rows, this input pin is fixed
at "1," while it remains at "0" for the subsequent rows. The LUT determines the signal 𝑆𝐸 ̅̅̅̅ , XORs it, and
then sends the carry propagate signal to the appropriate carry chain. The produced carry signal "gout" is
instantly provided by the pin signal. To transmit the right sign content from one row of partial product to
the next row of partial product, LUT Type-C is used.
The first row of partial products for an 8x8 multiplier utilising LUTs A, B, and C are shown in Fig. 3(a).
The needed input carry is computed using the rightmost Type-A LUT in every row of partial product. In 2's
complement form, this carry input is used to indicate a partial product. 4 partial product rows will be created
for an 8x8 multiplier. The last partial product row doesn't really necessitate the use of a Type-C LUT.

(a)

(b)

Fig. 3. First row of partial product for an 8x8 multiplier: (a) First version multiplier (b) Optimized version

Optimization of critical path delay


A NxM multiplier's carry chain in every row of partial product is N+4 bits long. It can be shortened to N+1
bits to enhance the multiplier's critical path time. A critical path-delay reduction design of the multiplier is
shown in Fig. 3(b). The pp(x, 0), pp(x, 1) partial product terms in every partial product row needs 1 and 2 bits
of multiplicand, correspondingly. A single 6-input LUT ‘A1’ can implement these 2 partial product terms.
Similarly, in each partial product row, pp (x, 2) may be done independently using another LUT 'A2' which is
of 6-input. The proper input carry may be calculated for each partial product row using a separate 6-input
LUT called "CG."
(a)

(b)

Fig. 4. LUT’s configuration types: (a) LUT A1 (b) LUT A2/CG

Fig. 4. illustrates the internal arrangements of A1, A2, and CG LUTs. The output signal pp(x, 2) and cgout are
the only differences between LUT A2 and CG. The pp(x, 2) signal is used exclusively by LUT A2, but the
cgout signal is used only by LUT CG. For a NxM multiplier, the needed amount of LUTs to produce partial
products is,
𝑀
(N+3) x [ ]
2

4 Existing Design

Fig. 5. Ripple Carry Adder-based existing signed multiplier

The above fig. 5 represents the signed multiplier using Ripple Carry Adder (RCA) in the accumulation stage
of generated partial products. The resultant output from the multiplexer is i.e., the partial products which are
produced using the radix-4 booth encoding are entered into the generation stage of the partial product and
then the generated partial products are accumulated employing the ripple carry adder where it produces the
final product. These are implemented using 6-input LUT which are of different types and the associated
carry chains. This design thereby parallelizes the computation of all partial products, adding the resulting
partial products using the Ripple carry adder.

4.1 Proposed Design

Fig. 6. Proposed signed multiplier using Ladner-Fischer Adder

The above fig. 6 represents then signed multiplier using Ladner-Fischer Adder. Comparing, this signed
multiplier with existing design the resultant output from multiplexer which is obtained using Radix-4 booth
technique is entered into the generation stage of the partial product where all the partial products are
computed parallelly. The generated partial products will be accumulated using parallel prefix adder called
Ladner-Fischer adder where it gives the better performance in terms of delay unlike in existing design, RCA
had the drawback in terms of delay because of each full adder should wait until the previous full adder to
generate output carry.
Generated Partial Products Accumulation
Ternary adders, 4:2 compressors and Binary adders can be utilised to minimize the partial products which
are generated in order to calculate the final result. Four partial product rows can be split into two output
rows using a 4:2 compressor. In contrast to binary adders, ternary adders experience longer critical path
delays. In the existing work, the binary adders and 4:2 compressors are employed to decrease the partial
products which are generated. Further to increase the performance in terms of delay, binary adders are
replaced by the Ladner-Fischer adder in the proposed design.

4 Results and Discussion

In this section, the simulation results of existing and proposed signed multiplier designs will be discussed.

4.1 Simulation results of existing design

Fig. 7. Simulation results of existing 8-bit signed multiplier using RCA

The above fig. 7 represents the simulation outcomes of existing 8-bit signed multiplier. It multiplies two 8-
bits and produces a result of 16-bits.
Fig. 8. Simulation results of existing 16-bit signed multiplier using RCA

The above fig. 8 represents the simulation outcomes of existing 16-bit signed multiplier. It multiplies two
16-bits and produces a result of 32-bits.

Fig. 9. Simulation results of existing 32-bit signed multiplier using RCA

The above fig. 9 represents the simulation outcomes of existing 32-bit signed multiplier. It multiplies two
32-bits and produces a result of 64-bits.
Simulation results of proposed design

Fig. 10. Simulation results of proposed 8-bit signed multiplier using Ladner-Fischer Adder

The above fig. 10 represents the 8-bit proposed signed multiplier which multiplies two 8-bit inputs and
produces a result of 16-bits which is implemented using Ladner-Fischer Adder.

Fig. 11. Simulation results of proposed 16-bit signed multiplier using Ladner-Fischer Adder

The above fig. 11 represents the proposed 16-bit signed multiplier which multiplies two 16-bit inputs and
produces a result of 32-bits which is implemented employing Ladner-Fischer Adder.
Fig. 12. Simulation results of proposed 32-bit signed multiplier using Ladner-Fischer Adder

The above fig. 12 represents the proposed 32-bit signed multiplier which multiplies two 32-bit inputs and
produces a result of 64-bits which is implemented using Ladner-Fischer Adder.
Comparison table of existing and proposed multipliers
Table 2. Comparison Table

Parameters Existing signed multiplier Proposed signed multiplier

8-bit 16-bit 32-bit 8-bit 16-bit 32-bit

Delay 2.384ns 8.003ns 6.170ns 1.622ns 7.952ns 4.486ns

Power 82mW 27mW 14mW 100mW 11mW 14mW


Frequency 419.5MHz 124.9MHz 162.07MHz 616.67MHz 125.75MHz 222.936MHz

Number of slices 156 576 2316 173 585 2000

5 Conclusion

In this work, the implementation of the 8, 16, 32-bit signed multipliers designs are carried out by using the
software tool XILINX ISE DESIGN SUITE 14.7 version. The existing signed multiplier computes the
partial products parallelly and adds the partial products which are generated using ripple carry adder in the
partial product accumulation stage where it had the drawback in terms of delay, because every full adder
must wait for the output carry of the previous full adder. The proposed work shows better performance in
terms of delay where the ripple carry adder is replaced by the parallel prefix adder called Ladner-Fischer
adder in the accumulation stage. From the output results, it is observed that the latency is reduced when
compared with the existing design.

References

1. S. Ullah, et al, “Area-optimized low-latency approximate multipliers for FPGA-based hardware


accelerators,” in DAC (IEEE, San Francisco, CA, USA, 2018).
2. I.Kuon, J. Rose, “Measuring the gap between FPGAs and ASICs,” in IEEE TCADICS (IEEE, 2007).
3. Ravindra P Rajput, M. N Shanmukha Swamy, “High speed Modified Booth Encoder multiplier for
signed and unsigned numbers,” in International Conference on Computer Modelling and Simulation
(IEEE, Cambridge, UK, 2012).
4. Yamini devi Ykuntam, Katta Pavani, Krishna Saladi, “Design and analysis of High-speed Wallace tree
multiplier using parallel prefix adders for VLSI circuit designs,” in ICCCNT (IEEE, Kharagpur, India,
2020).
5. G.C. Cardarilli, S. Pontarelli, M. Re, A. Salsano, “On the use of Signed Digit Arithmetic for the new 6-
inputs LUT based FPGAs,” in ICECS (IEEE, Saint Julian’s, Malta, 2008).
6. Shilpa K. C and Shwetha M, Geetha B. C, Lohitha D. M, Navya and Pramod N. V, “Performance
Analysis of Parallel Prefix Adder for data path VLSI design,” in ICICCT (IEEE, Coimbatore, India,
2018).
7. Aung Myo San, Alexey N. Yakunin, “Reducing the Hardware Complexity of a Parallel Prefix Adder,”
in IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (IEEE,
Moscow and St. Petersburg, Russia, 2018).
8. David H. K. Hoe, Chris Martinez and Sri Jyothsna Vundavalli, “Design and Characterization of Parallel
Prefix Adders using FPGAs,” in SSST (IEEE, Auburn, AL, USA, 2011).
9. Chris D. Martinez, L. P. Deepthi Bollepalli, and David H. K. Hoe, “A Fault Tolerant Parallel Prefix
Adder for VLSI and FPGA Design,” in SSST (IEEE, Jacksonville, FL, USA, 2012).
10. V. Gupta, et al, “Low-Power Digital Signal Processing Using Approximate Adders,” in IEEE
Transactions on CAD of Integrated Circuits and Systems, vol. 32, pp. 124-137, 2013.
11. Jamal, K., & Srihari, P., “Analysis of test sequence generators for built-in-self-test implementation,” in
IEEE 2015 International Conference on Advanced Computing and Communication Systems, pp. 1-4,
2015.
12. Honglan Jiang, Jie Han, Fei Qiao, Fabrizio Lombardi, “Approximate Radix-8 Booth Multipliers for Low
Power and High-Performance operation,” in IEEE Transactions on Computers, vol. 65, pp. 2638-2644,
2015.
13. Jamal, K., & Srihari, P., “Low power TPC using BSLFSR,” in International Journal of Engineering and
Technology (IJET), 8(2), 759-e, 2016.
14. Jamal, K., Srihari, P., & Kanakasri, G., “Test Vector Generation using Genetic Algorithm for Fault
Tolerant Systems,” in International Journal of Control Theory and Applications (IJCTA), 9(12), pp.
5591-5598, 2016.
15. A. Kakacak, et al, “Fast multiplier generator for FPGAs with LUT based partial product generation and
column/row compression,” in Integr. VLSI J., vol. 57, pp. 147-157, 2017.
16. D. Kalaiyarasi, M. Saraswathi, “Design of an Efficient High Speed Radix-4 Booth Multiplier for both
Signed and Unsigned Numbers,” in AEEICB (IEEE, Chennai, India, 2018).
17. Jamal, K., Srihari, P., Chari, K. M., & Sabitha, B., “Low power test pattern generation using test-per-
scan technique for BIST implementation,” in ARPN Journal of Engineering and Applied Sciences, vol.
13(8), 2018.
18. Haroon Waris, Chenghua Wang, and Weiqiang Liu, “Hybrid Low Radix Encoding based Approximate
Booth Multipliers,” in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, pp. 3367-
3371, 2020.
19. Salim Ullah, Hendrik Schmidl, Siva Satyendra Sahoo, Semeen Rehman and Akash Kumar, “Area-
optimized Accurate and Approximate Softcore Signed Multiplier Architectures,” in IEEE Transactions
on Computers, vol. 70, pp. 384-392, 2020.
20. Salim Ullah, Semeen Rehman, Muhammad Shafique, Akash Kumar, “High-Performance Accurate and
Approximate Multipliers for FPGA-based Hardware Accelerators,” in IEEE TCAD, vol. 41, pp. 211-
224, 2021.

You might also like