Booth Vs Array Mul
Booth Vs Array Mul
FPGA Implementation
by
Anantha Gunturu
Master of Science
in the
Electrical Engineering
Program
December 2019
Analysis of Booth’s Multiplier Algorithm vs Array Multiplier Algorithm and their
FPGA Implementation
Anantha Gunturu
I hereby release this thesis to the public. I understand that this thesis will be made
available from the OhioLINK ETD Center and the Maag Library Circulation Desk for
public access. I also authorize the University or other individuals to make copies of this
thesis as needed for scholarly research.
Signature:
____________________________________
Anantha Gunturu, Student Date
Approvals:
__________________________________
Frank X Li, Thesis Advisor Date
__________________________________
Edward Burden, Committee Member Date
__________________________________
Eric MacDonald, Committee Member Date
__________________________________
Dr. Salvatore A. Sanders, Dean of Graduate Studies Date
ABSTRACT
The purpose of this study is to understand the Booth’s Multiplier algorithm for a 32-bit
input and compare its performance with an Array Multiplier algorithm for a 32-bit input.
The analysis involves implementing the developed VHDL design on an FPGA to
understand and compare the performance of these multiplier algorithms. Efficient
algorithms for signal processing are critical to very large-scale future applications such as
video processing and four-dimensional medical imaging. Similarly, efficient algorithms
are important for embedded and power-limited applications since, by reducing the
number of computations, power consumption can be reduced considerably.
After comparing the implementations of both 32-bit Array and Booth multiplier on a
Cyclone V FPGA, a conclusion was made that the Booth multiplier has 56 Logic
Elements versus 1,719 Logic Elements. Both the multipliers have shown comparable
calculation performances.
iii
ACKNOWLEDGMENT
I would like to express my sincere gratitude and thanks to my advisor, Dr. Frank Li, for
his consideration, support and guidance throughout this thesis without which this study
would not have been possible. I thank Dr. Eric MacDonald and Professor Edward Burden
for participating in this thesis committee and providing their valuable feedback. I thank
the rest of the faculty of the Electrical and Computer Engineering department for their
teachings and support as I pursued this milestone in my career. I would also like to thank
my friends and family for their love, support, and confidence in me.
iv
TABLE OF CONTENTS
v
LIST OF FIGURES
LIST OF TABLES
vi
CHAPTER 1
INTRODUCTION
1.1 Organization
This thesis is organized into 5 chapters. This chapter discusses the motivation and
purpose of this thesis. Chapter 2 provides background information on binary multipliers
and their applications and discusses recent significant research in this field. Chapter 3
discusses the methods of analysis used during this thesis. Chapter 4 discusses the
simulation results obtained from the Altera Quartus II software version 18.1. Chapter 5
concludes the analysis from implementing the subject multiplier algorithms on a DE10
Standard FPGA hardware.
1.2 Motivation
Efficient algorithms for signal processing are critical to very large-scale future
applications such as video processing and four-dimensional medical imaging. Similarly,
efficient algorithms are important for embedded and power-limited applications since, by
reducing the number of computations, power consumption can be reduced considerably.
Multiplication is a crucial operation in several Digital Signal Processing (DSP)
applications involving convolution, Fast Fourier Transform (FFT) and in the Arithmetic
and Logic Unit (ALU) of microprocessors. Several Very Large-Scale Integration (VLSI)
design criteria such as the area, power dissipation, speed and cost are dependent on the
performance of the multipliers that execute the multiplication operation. Understanding
the performance aspects of various multipliers would ultimately help in designing
efficient algorithms that execute these multiplication operations.
1
1.3 Purpose and Objective
The purpose of this thesis is to understand and study the Radix-2 Booth’s Multiplier
algorithm for a 32-bit input and compare its performance with that of an Array Multiplier
algorithm for a 32-bit input. The study involves implementing the developed VHDL
design on a DE10 Standard FPGA to understand and compare the performance of these
multiplier algorithms. Altera Prime Lite Quartus II version 18.1 is chosen for simulation
of the models. DE10 Standard FPGA development board by Terasic Technologies will be
used for the hardware implementation of these VHDL models.
End of Chapter 1
2
CHAPTER 2
LITERATURE REVIEW
2.1 Background
12125
x 13134
-------------
48500 // this is 12125 x 4
36375 // this is 12125 x 3, shifted 1 position to left
12125 // this is 12125 x 1, shifted 2 positions to left
36375 // this is 12125 x 3, shifted 3 positions to left
12125 // this is 12125 x 1, shifted 4 positions to left
---------------
159249750 // this is the result of 12125 x 13134 operation upon addition of all
partial products.
It is to be noted that computing the partial products could also involve the addition of the
carry when applicable, to the next partial product in the process of multiplication. The
standard decimal system multiplication process applies to binary system as well, although
it is simpler than the decimal system as there is no table of basic multiplications to
remember.
3
Other difficulties with the traditional multiplication style are that it handles sign of the
number with a separate rule. While digital processing units include the sign of the input
numbers within the number itself using the 2’s complement technique. This complicates
the process and often requires adjustments to the processor to accept and handle such
inputs.
2.2 Multipliers
A binary multiplier is an electronic circuit built using binary adders. The multiplication
operation is executed using a sequence of shifting, accumulating and adding the partial
products as explained in section 2.1.
For an n-bit multiplier and m-bit multiplicand, the resultant product is n + m bits. The
generation of n partial products requires n*m two input AND gates. The product is a
result of n+m bits. May require at least n adders.
4
With the understanding we gained from the details of the multiplication process, let’s
now try to design a 2-bit multiplier. A multiplier with inputs as 2-bits long result in a 4-
bit long product. Below are the circuit and truth table representation of a 2x2 bit
multiplier.
5
2.3 Multiplier Algorithms
This section introduces some of the multiplier algorithms popularly used in various signal
and image processing applications.
This multiplier employs a sequential circuit using a single n-bit adder to compute the
product of two binary numbers, X and Y of n-bit and m-bit length respectively. This
sequential circuit processes the partial products one at a time and repeats the process m
times. In each step few partial products will be generated, then added to an accumulated
partial sum and the resulting partial sum will be shifted to align the accumulated sum
with a partial product of next steps. Therefore, each step of a sequential multiplication
consists of three operations, i.e. generating partial products, adding the generated partial
products to the accumulated partial sum and shifting the partial sum.
These are used to perform multiplication of two unsigned or signed binary numbers.
Given two n-bit inputs X and Y, it is possible to express the 2n-bit product in terms of a
combinational function P = X.Y. Such multipliers use the technique of partial product
accumulation. Each bit of the multiplier is multiplied against the multiplicand, the
product is associated according to the position of the bit within the multiplier, and the
resulting products are then added to form the result. If the multiplier bit is a 1, the product
is a shifted copy of the multiplicand; if the multiplier bit is a 0, the partial product is 0.
This algorithm is very similar to the traditional multiplication process based on the add-
and-shift technique followed in any number system. It employs an array of full adders
and half adders for the computation of the product. The process involves multiplying bit
by bit of the multiplier with the entire multiplicand input. Such individual multiplications
result in multiple partial products obtained by sequential shifting and eventually adding
6
all the partial products to obtain the result of the multiplication. Refer to section 2.1 for
an illustration.
The below figure shows the multiplication process through the generation of the partial
products and their sum that becomes the result of the multiplication. The example
considered below is a 4x4 input that results in an 8-bit product. p0 to p7 indicates the
product as a result of the sum of appropriate partial products represented as a nbn where
n=0 to 7.
The below figure shows the implementation of the above discussed 4x4 array multiplier
using a combination of half adders and full adders. These adders execute the sum of
partial products to form the result of the multiplication.
7
2.3.4 Booth’s Multiplier Algorithm
This algorithm is a very powerful and efficient algorithm to compute the multiplication of
two signed binary numbers in two's complement notation. This algorithm examines
adjacent pairs of bits in the 'N'-bit multiplier Q, in signed two's complement
representation, including an implicit bit below the least significant bit, N−1 = 0. Where
these two bits are equal, the product accumulator P is left unchanged. With i=0 to N-1,
where Qi = 0 and Qi−1 = 1, the multiplicand times 2i is added to P; and where Qi = 1 and
Qi−1 = 0, the multiplicand times 2i is subtracted from P. The final value of P is the signed
product. The order of the steps is not determined in this case. Typically, it proceeds from
LSB to MSB, starting at i = 0; the multiplication by 2i is then typically replaced by
incremental shifting of the P accumulator to the right between steps; low bits can be
shifted out, and subsequent additions and subtractions can then be done just on the
highest N bits of P. Below is a flowchart representation of this algorithm.
8
The below figure shows the architecture of Radix-2 Booth’s Algorithm implementation.
Let us understand the working of Booth’s algorithm using an example. Consider that the
multiplicand A= -7 and multiplier Q = +3. The working of this algorithm can be
represented in the form of a tracing table showing the status at each phase of
computation. In the current case, input A is a negative number and requires its 2’s
complement equivalent for further computation.
A= (-7)10 = (1001)2 while (-A)= (0111)2
At the stage when n-1=0, the result in the PQ = 11101011. Note that this is a negative
number and requires its 2’s complement equivalent for the resulting product in base-10.
Booth’s algorithm preserves the sign of the result. With the signed bit as 1 in the value of
PQ, the result shall be represented with a negative notation. 2’s complement of PQ =
(00010101)2 = (-21)10
9
2.3.5 Significant Improvements
a. Booth’s Multiplication Algorithm
There have been significant improvements to the Booth’s Multiplication algorithm such
that the number of bits grouped would increase thereby reducing the number of
computation stages. These strategies have proven to greatly improve the performance of
the multipliers and eventually improve the efficiency of signal processing applications.
Table 3 lists the bit grouping and the corresponding operation in Radix-4, Radix-8, and
Radix-16 type Booth’s Multiplication algorithm. A similar strategy has also been
followed in developing Radix-32, Radix-128, Radix-256 and even radix-4096 type
multipliers whose further research and implementation have been proposed for optimal
application design.
b. Array Multiplier
Although the Array Multipliers are not the top preference for signal processing
applications, there have been ongoing research and proposals to improve the efficiency of
these multipliers. Use of compressors has been proposed to greatly reduce the number of
half and full adders and there by reducing the power consumption. 4:2 compressors are
now considered basic components in the design of parallel multipliers. It is called
compressor, since it compresses four partial products into two. Study on making the
Array Multipliers be applicable for signed inputs is also under proposal.
10
Radix 4 Radix 8 Radix 16
Code Operation Code Operation Code Operation
000 0 0000 0 00000 0
001 1 * Multiplicand 0001 1 * Multiplicand 00001 1 * Multiplicand
010 1 * Multiplicand 0010 1 * Multiplicand 00010 1 * Multiplicand
011 2 * Multiplicand 0011 2 * Multiplicand 00011 2 * Multiplicand
100 -2 * Multiplicand 0100 2 * Multiplicand 00100 2 * Multiplicand
101 -1 * Multiplicand 0101 3 * Multiplicand 00101 3 * Multiplicand
110 -1 * Multiplicand 0110 3 * Multiplicand 00110 3 * Multiplicand
111 0 0111 4 * Multiplicand 00111 4 * Multiplicand
1000 -4 * Multiplicand 01000 4 * Multiplicand
1001 -3 * Multiplicand 01001 5 * Multiplicand
1010 -3 * Multiplicand 01010 5 * Multiplicand
1011 -2 * Multiplicand 01011 6 * Multiplicand
1100 -2 * Multiplicand 01100 6 * Multiplicand
1101 -1 * Multiplicand 1101 7 * Multiplicand
1110 -1 * Multiplicand 01110 7 * Multiplicand
1111 0 01111 8 * Multiplicand
10000 -8 * Multiplicand
10001 -7 * Multiplicand
10010 -7 * Multiplicand
10011 -6 * Multiplicand
10100 -6 * Multiplicand
10101 -5 * Multiplicand
10110 -5 * Multiplicand
10111 -4 * Multiplicand
11000 -4 * Multiplicand
11001 -3 * Multiplicand
11010 -3 * Multiplicand
11011 -2 * Multiplicand
11100 -2 * Multiplicand
11101 -1 * Multiplicand
11110 -1 * Multiplicand
11111 0
End of Chapter 2
11
CHAPTER 3
DESIGN AND SIMULATION
This report emphasizes studying the Radix-2 Booth’s Multiplier and its comparison with
the Array Multiplier. As a part of this study, VHDL models have been built to simulate
and analyze the performance of these multipliers. Altera Prime Lite Quartus II version
18.1 was used for simulations.
library IEEE;
USE IEEE.std_logic_1164.ALL;
use IEEE.std_logic_unsigned.all;
entity Boothsmult is
port ( clk, st: in std_logic;
Mplier, Mcand : in std_logic_vector (31 downto 0);
Done : out std_logic;
Product : out std_logic_vector (62 downto 0) );
end BoothsMult;
begin
Product <= Acc(30 downto 0) & RegB (32 downto 1);
Co <= B1 and not B0; -- B1B0 = 10, add 2's complement of Compouot to ACC
Compout <= not RegC when Co = '1' else RegC;
-- std_logic_vector'(0 => Co) is 00000000Co
Addout <= Acc + (Compout(31) & Compout) + std_logic_vector'(0 => Co);
Process(clk)
begin
12
if clk'event and clk = '1' then
case state is
when 0 => if St = '1' then state <= 1; -- load operation
Done <= '0';
ACC <= (others => '0');
RegB <= Mplier & '0';
RegC <= Mcand;
else state <= 0 ;
end if;
when 1 => if (B1 xor B0) = '1' then -- shift operation
ACC <= Addout; state <=2;
else
ACC <= ACC(32) & ACC(32 downto 1);
RegB <= Acc(0) & RegB(32 downto 1);
if Counter /= 31 then
Counter <= Counter +1; state <= 1;
else
Counter <= 0; state <= 0;Done <= '1';
end if;
end if;
when 2 => if Counter /= 31 then
Counter <= Counter +1; state <= 1;
else
Counter <= 0; state <= 0; Done <= '1';
end if;
ACC <= ACC(32) & ACC(32 downto 1);
RegB <= ACC(0) & RegB(32 downto 1);
end case;
end if;
end process;
end BoothsMult_arch;
13
3.2 Array Multiplier Design and Simulation
entity ArrayMult32 is
port(X, Y: in std_logic_vector(31 downto 0); --32-bit inputs
P: out std_logic_vector(63 downto 0)); --64-bit output
end ArrayMult32;
begin
--Generate AND gates and signals
ANDgen1: for j in 0 to 31 generate --For X input
ANDgen2: for k in 0 to 31 generate --For Y input
XY(j,k) <= X(j) and Y(k); --And each X and Y input bit, store in XY matrix
End generate;
End generate;
----Row 1 special case, 30 full adders with half adder on each end
FA_loopR1 : for col in 2 to 31 generate --Instantiates 30 copies of Full Adder for
Row 1
FA_R1_col : FullAdder10ns port map (XY(0,col), XY(1,col - 1), C(1,col),
C(1,col+1), S(2,col-1));
14
End generate;
-----Rows 2 to 30
FA_loopR2_30 : for row in 2 to 30 generate --Instantiate Rows 2 thru 30
FA_loopR2 : for col in 2 to 31 generate --Instantiates 30 copies of Full Adder each
FA_row_col : FullAdder10ns port map (S(row,col), XY(row,col-1), C(row,col),
C(row,col+1), S(row+1,col-1));
End generate;
End generate;
--Half Adders (n half adders, 32 total)30 generated here, 2 added in row 1 elsewhere
HA_loopC1 : for row in 2 to 31 generate --Instantiates 30 copies of Half Adder for
Column 1
HA_row_C1 : HalfAdder10ns port map (S(row,1), XY(row,0), C(row,2), P(row));
End generate;
--end half adders column 1
end Behavioral;
Following is the declaration of a Half Adder included as a code block in the Array
Multiplier Quartus II project.
library IEEE;
use IEEE.std_logic_1164.all;
15
--Half Adder Entity Description
entity HalfAdder10ns is
port(X, Y: in std_logic;
Cout, Sum: out std_logic);
end HalfAdder10ns;
Following is the declaration of a Full Adder included as a code block in the Array
Multiplier Quartus II project.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity FullAdder10ns is
Port (X, Y, Cin: in std_logic;
Cout, Sum: out std_logic);
end FullAdder10ns;
begin
end gate_level;
Following are the results from the behavioral simulation of the above VHDL design and
respective test bench;
16
It was also observed that when the delay in the adders has been omitted, the array
multiplier results were computed with an insignificant delay.
Figure 9. Array Multiplier Modelsim simulation for unsigned inputs – no adder delay
The original Array Multiplier is intended for unsigned inputs only. When a signed input
is involved, it is observed that the algorithm still considers it as an unsigned input
(usually a large decimal) and results in a product accordingly.
End of Chapter 3
17
CHAPTER 4
DESIGN AND IMPLEMENTATION
This report emphasizes on studying the performance of Radix-2 Booth’s Multiplier and
its comparison with the Array Multiplier. As a part of this study, the VHDL models of
these multipliers were implemented on the FPGA hardware. Altera Prime Lite Quartus II
version 18.1 was used for simulation and implementation of the models. DE10 Standard
FPGA development board by Terasic Technologies was used for the hardware
implementation of these VHDL models. The FPGA has been configured with these
design modules using the Joint Test Action Group (JTAG) mode. JTAG is an industry-
standard method for testing the hardware implementation of integrated designs and the
interconnects on printed circuit boards (PCBs) that are implemented at the integrated
circuit (IC).
18
Figure 12. Terasic DE10 Standard FPGA Board
Resources Characteristics
Logic Elements 110k
ALM 41910
Register 166036
Memory (Kb) M10K 5570
Memory (Kb) MLAB 621
Variable Precision DSP block 112
18x18 multiplier 224
FPGA PLL 6
HPS PLL 3
3 Gbps Transceiver 9
FPGA GPIO 288
HPS I/O 181
LVDS Transmitter 72
LVDS Receiver 72
PCIe Hard IP Block 2
FPGA Hard Memory Controller 1
HPS Hard Memory Controller 1
ARM Cortex-9 MPCore Processor Dual-Core
Table 4. Intel Cyclone V SE 5CSXFC6D6F31C6N Specifications
19
4.2. FPGA Design Flow
The standard FPGA design flow begins with the creation of the digital circuit design
using schematics or a hardware description language (HDL) such as Verilog or VHDL.
This digital circuit design flow then proceeds through compilation, simulation,
programming and implementation on the FPGA hardware.
4.3.a Implementation:
library IEEE;
USE IEEE.std_logic_1164.ALL;
use IEEE.std_logic_unsigned.all;
entity implementation is
port ( Clock_50 : in std_logic;
key : in std_logic; -- to enable st signal
SliderSwitch : in std_logic_vector (7 downto 0); -- to input the
multiplicand and the multiplier. assignment from sw[0] to sw[7].
seg71, seg72, seg73: out STD_LOGIC_VECTOR (6 downto 0);
LEDR: out STD_LOGIC ); -- to indicate done signal
end implementation;
20
-- signals for the 7 segment display
signal prod: std_logic_vector (7 downto 0):= "00000000";
signal bcd_1, bcd_2, bcd_3 : std_logic_vector (3 downto 0);
component Boothsmult is
port ( clk, st: in std_logic;
Mplier, Mcand : in std_logic_vector (31 downto 0);
Done : out std_logic;
Product : out std_logic_vector (62 downto 0) );
end component;
component hex_seg7 is
Port (product : in STD_LOGIC_VECTOR (3 downto 0);
seg7 : out STD_LOGIC_VECTOR (0 to 6) );
end component;
component binary_bcd is
Port ( binary : in std_logic_vector (7 downto 0);
hundreds : out std_logic_vector (3 downto 0);
tenths : out std_logic_vector (3 downto 0);
unit : out std_logic_vector (3 downto 0) );
end component ;
begin
end implement_arch;
21
4.3.b Binary to BCD conversion
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity binary_bcd is
Port ( binary : in std_logic_vector (7 downto 0);
hundreds : out std_logic_vector (3 downto 0);
tenths : out std_logic_vector (3 downto 0);
unit : out std_logic_vector (3 downto 0) );
end binary_bcd;
begin
num := unsigned(binary);
unity := X"0";
tenth := X"0";
hundred := X"0";
-- Loop eight times. if the numerical value of the alias is greater than 5, then per shift and
add algorithm, alias is incremented by 3
-- and then the contents of the shift register are shifted to the left by 1 place.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity hex_seg7 is
Port (product : in STD_LOGIC_VECTOR (3 downto 0);
seg7 : out STD_LOGIC_VECTOR (0 to 6) );
end hex_seg7;
23
Upon initial compiling of the top-level module and the code blocks, the pin assignment is
completed per the specifications listed in the DE10 Standard User Manual version March
20, 2018.
The project is finally compiled to verify the pin assignment and then the FPGA hardware
is configured in the JTAG mode using the Quartus II Programmer interface.
24
In the first case, multiplier = (-13)10 = (11111111111111111111111111110011)2
Multiplicand = (-9)10 = (11111111111111111111111111110111)2
Result = (117)10
Case 1
Case 2
Figure 16. Booth’s Multiplier Algorithm Hardware Implementation
25
4.4 Array Multiplier Design and Implementation
Additional code blocks to enable implementation of the Array Multiplier algorithm are
added to the initial design. These blocks include a top-level implementation code, a
binary to BCD conversion code and a BCD to Hexadecimal 7 segment display code.
Below are the VHDL codes for these blocks.
4.4.a Implementation:
library IEEE;
USE IEEE.std_logic_1164.ALL;
use IEEE.std_logic_unsigned.all;
entity implementation is
port ( SliderSwitch : in std_logic_vector (7 downto 0); -- to input the multiplicand and
the multiplier. assignment from sw[0] to sw[7].
seg71, seg72, seg73: out STD_LOGIC_VECTOR (6 downto 0)
);
end implementation;
component ArrayMult32 is
port(X, Y: in std_logic_vector(31 downto 0); --32-bit inputs
P: out std_logic_vector(63 downto 0)); --64-bit output
end component;
component hex_7seg is
Port (product : in STD_LOGIC_VECTOR (3 downto 0);
seg7 : out STD_LOGIC_VECTOR (0 to 6) );
end component;
component binary_bcd is
Port ( binary : in std_logic_vector (7 downto 0);
hundreds : out std_logic_vector (3 downto 0);
tenths : out std_logic_vector (3 downto 0);
26
unit : out std_logic_vector (3 downto 0) );
end component ;
begin
Mplier_32 <= "0000000000000000000000000000" & SliderSwitch(3 downto 0);
Mcand_32 <= "0000000000000000000000000000" & SliderSwitch(7 downto 4);
end implement_arch;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
entity binary_bcd is
Port ( binary : in std_logic_vector (7 downto 0);
hundreds : out std_logic_vector (3 downto 0);
tenths : out std_logic_vector (3 downto 0);
unit : out std_logic_vector (3 downto 0) );
end binary_bcd;
27
begin
num := unsigned(binary);
unity := X"0";
tenth := X"0";
hundred := X"0";
-- Loop eight times. if the numerical value of the alias is greater than 5, then per shift and
add algorithm, alias is incremented by 3
-- and then the contents of the shift register are shifted to the left by 1 place.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity hex_7seg is
Port (product : in STD_LOGIC_VECTOR (3 downto 0);
seg7 : out STD_LOGIC_VECTOR (0 to 6) );
end hex_7seg;
28
when "0000"=> seg7 <="1000000"; -- '0'
when "0001"=> seg7 <="1111001"; -- '1'
when "0010"=> seg7 <="0100100"; -- '2'
when "0011"=> seg7 <="0110000"; -- '3'
when "0100"=> seg7 <="0011001"; -- '4'
when "0101"=> seg7 <="0010010"; -- '5'
when "0110"=> seg7 <="0000010"; -- '6'
when "0111"=> seg7 <="1111000"; -- '7'
when "1000"=> seg7 <="0000000"; -- '8'
when "1001"=> seg7 <="0011000"; -- '9'
when "1010"=> seg7 <="0001000"; -- 'A'
when "1011"=> seg7 <="0000011"; -- 'b'
when "1100"=> seg7 <="1000110"; -- 'C'
when "1101"=> seg7 <="0100001"; -- 'd'
when "1110"=> seg7 <="0000110"; -- 'E'
when "1111"=> seg7 <="0001110"; -- 'F'
when others => NULL;
end case;
end process;
end Behavioral;
Upon initial compiling of the top-level module and the code blocks, the pin assignment is
completed with reference to the DE10 Standard User Manual version March20, 2018.
The project is then final compiled for the pin assignment and then the FPGA hardware is
configured in the JTAG mode using the Quartus II Programmer interface.
29
Figure 18. Quartus II Programmer interface for Array Multiplier
We discussed earlier that the Array Multiplier algorithm is capable of processing only
unsigned inputs and reviewed the simulation results when a signed as well as an unsigned
input combination is used in the algorithm. The below case shows the implementation of
two positive inputs.
End of Chapter 4
30
CHAPTER 5
CONCLUSION
The primary objective of this thesis has been to understand the functioning of binary
multipliers and design their VHDL models to analyze and compare the performance of
individual multipliers. The role of these multipliers has been realized to be crucial when
considering the grand scheme of their application. Several digital signal processing
applications are based on the multiplication process. Hence the efficiency of these signal
processing applications greatly relies on the performance of these multiplication
algorithms. The goal in designing such critical blocks will be to ensure their minimal
ultimate space utilization on the FPGA.
The subject multipliers of this report were the Booth’s Multiplication Algorithm and the
Array Multiplication Algorithm. The VHDL models were built considering a 32x32 bit
input to these multipliers. Altera Prime Lite Quartus II version 18.1 was used for
simulation and implementation of the models. DE10 Standard FPGA development board
by Terasic Technologies was used for the hardware implementation of these VHDL
models.
Logic utilization is calculated by estimating how many half-ALMs are needed to fit a
design and is a good representation of how full a device is. The logic utilization for the
Booth’s Multiplication algorithm has been realized to be 3% of the total logic utilization
of the Array Multiplication algorithm. Combinational ALUT usage is the actual number
of completely or partially used half-ALMs in the design after logic analysis and
synthesis. The Booth’s Multiplier needed only 2% of the total ALUTs needed for an
Array Multiplier. It was also observed that as the range of the inputs increases, the
complexity of implementing an Array multiplier increases as a result of an increase in the
number of levels of adders needed to accomplish the product result. See Figure 22 and 23
for a representation on the implementation and design space required by Booth’s and
Array multipliers respectively that was realized during this study. Therefore, the radix-2
type Booth’s Multiplication algorithm considered for this study proved to be more
efficient than the Array multiplier.
31
BoothsMult ArrayMult
32
Figure 22. RTL View – Booth’s Multiplier Implementation
Based on the understanding on the performance of Radix-4, Radix-8 and Radix-16 type
Booth’s Multipliers, it can be assumed that the efficiency of signal processing
applications would be greatly improved as a result of lesser logic utilization. It is
proposed that further study and implementation of higher radix order Booth’s multipliers
would benefit the efficiency of their applications.
Considering the outcome of this study and the assumptions made with the understanding
from this study, it can be noted that the modified Booth’s Algorithm holds the future of
designing the signal processing applications with a promise of increased efficiency, lesser
power consumption and lesser space utilization.
End of Chapter 5
33
BIBLIOGRAPHY
https://courses.cs.washington.edu/courses/cse467/15wi/docs/Quartus_II_Handbook.p
df
www/global/en_US/portal/dsn/42/doc-us-dsnbk-42-5505271707235-de10-standard-
user-manual-sm.pdf
Retrieved from:
https://rocketboards.org/foswiki/pub/Documentation/DE10Standard/DE10-
Standard_My_First_Fpga.pdf
4. DE10-Standad Getting Started Guide, Edition April 20, 2017. Retrieved from:
https://rocketboards.org/foswiki/pub/Documentation/DE10Standard/DE10-
Standard_Getting_Started_Guide.pdf
5. Intel® Quartus® Prime Pro Edition User Guide, UG-20140 | Edition 2019.09.30.
Retrieved from:
https://www.intel.com/content/www/us/en/programmable/documentation/spj1513986
956763.html
2018.
34
7. Bewick, Gary & Flynn, Michael. (1970). Fast Multiplication: Algorithms And
https://www.researchgate.net/publication/2575879_Fast_Multiplication_Algorithms_
And_Implementation
8. Laxman S, Darshan Prabhu R, Mahesh S Shetty, Mrs. Manjula BM, Dr. Chirag
“Simulation Analysis of Binary Multipliers used in the MAC Unit of Digital Signal
End of Document
35