0% found this document useful (0 votes)
4 views5 pages

FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications

The document presents a novel FPGA-based multiplier utilizing an approximate full adder (FA) aimed at enhancing performance in error-resilient applications. The proposed multiplier demonstrates significant improvements in power efficiency and power-delay product (PDP), achieving reductions of 56.09% and 73.02%, respectively, compared to existing designs. It is optimized for implementation on FPGAs, yielding better accuracy and efficiency while addressing the challenges of traditional exact computing methods.

Uploaded by

Dinesh Kumar J R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications

The document presents a novel FPGA-based multiplier utilizing an approximate full adder (FA) aimed at enhancing performance in error-resilient applications. The proposed multiplier demonstrates significant improvements in power efficiency and power-delay product (PDP), achieving reductions of 56.09% and 73.02%, respectively, compared to existing designs. It is optimized for implementation on FPGAs, yielding better accuracy and efficiency while addressing the challenges of traditional exact computing methods.

Uploaded by

Dinesh Kumar J R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FPGA-Based Multiplier with a New Approximate

Full Adder for Error-Resilient Applications


Ali Ranjbar Elham Esmaeili Roghayeh Rafieisangari Nabiollah Shiri
Department of Electrical Department of Electrical Department of Electrical Department of Electrical
Engineering, Shiraz Branch Engineering, Shiraz Branch Engineering, Shiraz Branch Engineering, Shiraz Branch
Islamic Azad University Islamic Azad University Islamic Azad University Islamic Azad University
Shiraz, Iran Shiraz, Iran Shiraz, Iran Shiraz, Iran
[email protected] [email protected] [email protected] [email protected]
m m

Abstract— Electronic devices primarily aim to offer low power subtractors, compressors, and multipliers. The FA is a crucial
consumption, high speed, and a compact area. The performance component in arithmetic cells. Utilizing FAs and compressors to
of very large-scale integration (VLSI) devices is influenced by add partial products (PPs) simplifies the circuit design.
arithmetic operations, where multiplication is a crucial operation. However, replacing exact FA with approximate alternatives can
Therefore, a high-speed multiplier is essential for developing any yield significant benefits in circuit performance. In high-speed
signal-processing module. Numerous multipliers have been multipliers, both compressors and FAs facilitate faster
reviewed in existing literature, and their speed is largely accumulation of PPs. In [5], an approximate 4:2 compressor was
determined by how partial products (PPs) are accumulated. To proposed for adding PPs, offering enhanced error metrics.
enhance the speed of multiplication beyond current methods, an
However, this approach leads to a substantial increase in circuit
approximate adder-based multiplier is introduced. This approach
allows for the simultaneous addition of PPs from two consecutive
area. The authors utilized both their proposed exact compressors
bits using a novel approximate adder. The proposed multiplier is and approximate compressors to create configurable dual-
utilized in a mean filter structure and implemented in ISE Design quality multipliers. However, their approach does not apply to
Suite 14.7 using VHDL and synthesized on the Xilinx Spartan3- field programmable gate arrays (FPGAs) because current
XC3S400 FPGA board. Compared to the literature, the proposed commercial FPGAs lack power gating capabilities. Moreover,
multiplier achieves power and power-delay product (PDP) their 4:2 compressor still requires significant hardware resources
improvements of 56.09% and 73.02%, respectively. The validity of and power, resulting in moderate error rates when implemented
the expressed multiplier is demonstrated through the mean filter on FPGAs. Based on the discussions above, it is evident that
system. Results show that it achieves power savings of 33.33%. using the presented approximation techniques for developing
Additionally, the proposed multiplier provides more accurate various types of multipliers leads to improvements in
results than other approximate multipliers by expressing higher performance, energy efficiency, and area reduction. However,
values of peak signal-to-noise ratio (PSNR), (30.58%), and controlling their application is challenging. Therefore,
structural similarity index metric (SSIM), (22.22%), while power approximation techniques that have been successfully applied in
consumption is in a low range. application-specific integrated circuits (ASICs) yield limited
advantages when transferred to FPGAs. Xilinx and Intel FPGAs
Keywords—approximate computing, approximate full adder, offer fast DSP-based multipliers suitable for low-power digital
multiplier, mean filter.
signal processing applications; however, utilizing these
I. INTRODUCTION multiplier intellectual properties (IPs) leads to substantial
routing delays because they are only accessible at certain
In real-time application systems, the key objectives are locations on the FPGA. Recently, [6] proposed optimizing the
speed, power efficiency, and area optimization. Multiplication multipliers by removing the least significant PP of a 4×2
and addition are crucial components in digital signal processors multiplier to conserve look-up tables (LUTs), subsequently
(DSPs), central processing units (CPUs), and digital filters [1]. using this component to construct larger operand size
Multipliers serve diverse functions, influenced by the specific multipliers. This strategy, however, provided only marginal
constraints of each application. Therefore, it is important to gains in area and power efficiency. To address these challenges,
analyze the area, power consumption, and delay characteristics this research focuses on the design and analysis of a new
of the multipliers utilized in signal processing tasks [2]. In approximate FA, which is applied in a multiplier and presents
numerous error-tolerant applications, like multimedia, image an approximate multiplier with various accuracy levels. The
processing, and machine learning, exact computations are not introduced approximate multiplier is optimized for effective
always required [1]-[4], and approximate computing is an implementation on FPGAs, achieving high electrical
effective approach. Approximate computing reduces power performance characteristics such as low power consumption,
consumption and improves the performance of embedded minimal area, and low latency, along with minimal accuracy
systems. By allowing some errors in the outputs of a complex loss. Consequently, this multiplier is well-suited for digital
circuit, the logic expressions are simplified, which in turn signal-processing applications like image processing. The
decreases the logic counts. Approximate-based arithmetic cells multiplier is compared with state-of-the-art designs in terms of
require fewer logic gates, resulting in lower power consumption delay, power consumption, area, and power-delay product
but sacrificing accuracy. Recently, researchers have designed (PDP).
various arithmetic circuits [4], such as full adders (FAs),
This paper is arranged as follows: Section II presents the of ≥ 2 can significantly impair the overall accuracy of a circuit,
proposed circuits. Section III analyzes and implements the new particularly in more complex structures such as multipliers. But
approximate designs and error analysis. Section IV discusses in the proposed circuit ED=1 leads to a lower NMED and
application and performance evaluation, and Section V indicates better accuracy. Fewer gates through critical paths
concludes the paper. without inverters, reduce static power and delay. In the 8-bit
RCA of Fig.1, the number of approximate bits (NABs) varies to
II. PROPOSED CIRCUITS evaluate the performance of the approximate FAs. NAB1
A. Presented Approximate FA indicates that an approximate FA is applied solely to the least
significant bit (LSB). For NAB2, approximate FAs are used for
FA is a core unit for the performance enhancement of digital
the two LSBs, and this pattern continues up to NAB8, where
systems. Various FAs’ techniques are utilized in intermediate
they are applied to the most significant bit (MSB).
modules to generate sum and carry outputs [7]-[8]. A primary
limitation of FAs is their operational speed, which necessitates B. Approximate Multiplier
a focus on minimizing delay. Approximate computing offers a Multiplier is a fundamental arithmetic operation in various
trade-off between accuracy and improvements in circuit applications like finite impulse response (FIR) filters, discrete
parameters. Many applications that require significant cosine transform (DCT), fast Fourier transform (FFT), and
computational resources are inherently error resilient, given the multimedia processing. To achieve optimal quality and accuracy
limitations of human visual perception or the absence of a in the output data of signal processing modules, it is essential to
definitive correct answer for specific problems. Thus, enhance the speed, area, and power efficiency of the arithmetic
approximate computing can effectively enhance the digital modules [9-11]. Consequently, the multiplication module is
hardware specifications like area, power, and speed for these crucial for reducing computation delay and improving system
error-tolerant applications. In this paper, an approximate FA is speed [12]. Fig. 2 illustrates the architecture of a typical 8×8
proposed. As shown in Fig. 1, according to the truth table, the multiplier, which performs multiplication in three primary steps:
outputs for Sum and Cout of the FA exhibit 4 and 2 errors, (1) recording and generating PPs, (2) reducing PPs, and (3)
respectively. Also, the gate-level structure is just one OR gate, accumulating those PPs.
and an 8-bit ripple carry adder (RCA) is considered for the
A7, A6, ….., A0 8-bit Multiplier
performance evaluation. Here, the Sum is generated with only
an OR gate and Cout equal to B input. The functions of the FA B7, B6, B0 8-bit Multiplicand

are given by (1)-(2).


Sum
Cout1
𝑆𝑢𝑚 = 𝐴 + 𝐶𝑖𝑛 (1)
PP phase

Cout2
FA HA
𝐶𝑜𝑢𝑡 = 𝐵 (2) 4:3 Counter
5:3 Counter

The error distance (ED), error rate (ER), and normalized mean 6:3 Counter

error distance (NMED) for the FA are |-1|, 0.5%, and 0.166, 7:3 Counter

respectively. The ED reflects the difference between the exact


and approximate values, specifically focusing on the difference
between the Cout and Sum (CS) outputs.
PPRT Phase

Proposed Approximate FA Truth Table

Proposed
A B Cin
CS ED A
0 0 0 00 - OR Sum
0 0 1 01 - Cin
0 1 0 10 +1
0 1 1 11 +1 B Cout
Addition

1 0 0 01 -
1 0 1 01 -1 RCA Based on the Proposed
Last

1 1 0 11 +1
1 1 1 11 - Approximate FA
B1 A7 B1 A1 B0 A0
P12 P8 P4 P0
P13 P9 P5 P1
C0 Cin P14 P10 P6 P2
Cout Proposed C7 C1 Proposed Proposed
P15 P11 P7 P3
Full Adder Full Adder Full Adder
Fig. 2. The architecture of the proposed 8*8 approximate multiplier.
S7 S1 S0
Multipliers can be categorized based on their partial product
Fig. 1. Truth table, gate level, and 8-bit RCA of the proposed approximate FA. reduction (PPR) methods, with common types being linear array
multipliers and tree multipliers. In an array multiplier, the
Therefore, one objective of this FA is to minimize the multiplication of two binary numbers follows an add-and-shift
occurrences of ED=1. Extensive literature indicates that an ED
1
method, which involves a regular structure. However, when 𝑀𝐸𝐷 =
2𝑁
∑2𝑖=1 𝐸𝐷𝑖 ()
22𝑁
dealing with a large number of bits, this type of multiplier can
incur significant delays and high power consumption due to 1 2𝑁 𝐸𝐷𝑖
carry propagation, highlighting the importance of optimizing 𝑀𝑅𝐸𝐷 = ∑2𝑖=1 ()
22𝑁 𝑆𝑖
carry propagation speed (the critical output path) [13]-[14]. Tree
multipliers, organize PPs in rows or columns, reducing the ∑2
2𝑁
1 𝑖=1 𝐸𝐷𝑖
number of components compared to array types. The 𝑁𝑀𝐸𝐷 = × ()
(2𝑁 −1)2 22𝑁
accumulation of PPs often limits the multiplication speed. An
array multiplier employing a modified FA-based multiplexer
was developed to minimize power consumption [15]. A
multiplier that incorporates an approximate 4:2 compressor
demonstrates significantly reduced delay and area compared to
other types of multipliers [16], although it does produce a
notable error rate. Given these findings, it is critical to focus on
the accumulation process to enhance speed, area, and accuracy.
The initial step of multiplication involves generating PPs
through logical AND operations. For an 8×8 multiplier with
multiplier A (A0 to A7) and multiplicand B (B0 to B7), the first
step involves ANDing the LSB of multiplicand B (B0) with
every bit of multiplier A. The proposed approximate multiplier
comprises three stages, each utilizing different sizes of adders.
The first stage contains two half adders (HAs), two proposed Fig. 3. Results of the PDP and NMED for proposed FA and references.
approximate FAs, and several counter structures (4:3, 5:3, 6:3,
and three 7:3 counters) [12]. The outputs from this stage are Table. I show the comparison between the presented 8×8
passed into the next stage's adders. The second stage includes multiplier and previous works in terms of resource utilization
three HAs and nine FAs, where the outputs from the first stage and accuracy. The proposed 8×8 multiplier outperforms the
are added concurrently, with results sent to a final RCA, which prior works utilizing 71 out of 7168 available 4-input LUTs
represents the final product of the 8×8 multiplication. By (0.99%) and 9 out of 7168 flip-flops (0.13%). Compared to [6],
replacing the 4:2 compressor in the proposed multiplier with the proposed multiplier can achieve power and PDP
approximate FAs, fewer gates than traditional structures are improvements of 56.09% and 73.02%, respectively. As shown
required. This reduction contributes to lower power in Table. I, in terms of MED, MRED, and NMED proposed
consumption and enhanced speed, area, and accuracy. So, in the multiplier has better performance and less resource utilization
proposed multiplier, the use of approximate FAs instead of 4:2 compared to other references.
compressors is preferred for minimizing critical path delays. By
strategically selecting the appropriate combination of counters TABLE. I. PERFORMANCE ANALYSIS OF THE 8 × 8 MULTIPLIERS.
and FAs, the critical paths can be further optimized, leading to Design
Power Delay No. of PDP
MED MRED NMED
(mW) (ns) LUTs (PJ)
improved overall performance.
M [13] 0.636 5.31 68 3.377 0.0034 0.3676 0.0154

III. SIMULATION RESULTS M [14] 0.468 4.84 71 2.265 0.0022 0.0196 0.0034
M [15] 0.744 5.11 101 3.802 0.0011 0.0548 0.0154
The proposed approximate circuits and references are M [6] 0.984 8.31 57 8.177 0.0054 0.0029 0.0008
described using VHDL and are synthesized and implemented M [16] 0.780 5.37 91 4.189 0.0013 0.0062 0.0020

using the Xilinx ISE Design Suite 14.7 on the Xilinx Spartan3 Proposed 0.432 5.11 71 2.207 0.0010 0.0148 0.0017

XC3S400-4PQ208 FPGA board to verify its functionality. For


delay and power calculations and simulations, ISE XPower IV. APPLICATION
Analyzer and ISim simulator are used. The results are acquired One of the best possible ways for multiplier circuits’
under standard operating conditions with a system clock assessments is their applications in image processing. In [17], an
frequency of 50 MHz. Each adder and multiplier are simulated image enhancement algorithm was implemented on FPGA,
and verified separately. The circuitry performance is evaluated which highlights the potential of FPGA-based systems in image
by PDP, and the accuracy is checked by NMED. The results of processing applications, especially denoising images. Noise is
normalized PDP and NMED of the introduced FA and created by interferences that may occur during the image
references for the NAB1 in the RCA are shown in Fig. 3. acquisition and transmission stages in a digital image. A noisy
Regarding PDP, the proposed FA has the minimum value and image can be modeled as follows:
shows the best performance in terms of power and NMED. Let's
examine an N×N multiplier. To assess the quality of the 𝑔(𝑥, 𝑦) = 𝑓 (𝑥, 𝑦) +(𝑥, 𝑦) ()
multiplier, the error metrics of mean error distance (MED),
mean relative error distance (MRED), and NMED are used. EDi The noisy image, g(x, y), consists of the original image (f (x,
represents the ED, which is the arithmetic difference between y)) and the performed noise on it ((x, y)). There are many
the i-th accurate product and its approximate counterpart, while different models for image noise like Gaussian, Rayleigh,
Si denotes the i-th accurate product. The definitions of MED, Erlang, Exponential, Uniform, Bumpy, and Salt and pepper
MRED, and NMED are as follows:
noise. In this paper, the Gaussian noise is applied to the input U8, U7, …, and U0 in Fig. 4. At first, each pixel of the image is
images and then, a system including the mean filter algorithm entered the first FIFO block (F8), stays as long as a clock in the
reduces the noise of the images and gives the output image with flip-flop, and then transfers to the next FIFO block (F7) until it
the least possible noise. The mean filter is one of the spatial quits from the last block (F0). In every rising edge of the clock,
filters that is utilized in smoothing, denoising, and restoration of the binary pixel in each FIFO block is copied eight times as it
images in digital image processing. The mean filter can be used can be one of the inputs of the proposed 8×8 multiplier. On the
to remove various types of noise and is calculated as follows: 1
other hand, the other input of each multiplier is as a pixel of
9
1 the mean filter mask. With nine FIFO blocks, nine multipliers
𝐽(𝑥, 𝑦) = ∑𝑘𝑖=−𝑘 ∑𝑘𝑖=−𝑘 (2𝑘+1)2 𝐼(𝑥 + 𝑖, 𝑦 + 𝑗) () are needed to apply the mask to the pixels that are shown as W8,
W7,…, and W0 in Fig. 4. Eventually, the outputs of the
The mean filter first considers a window around a pixel and multipliers are added to each other using four adders of the
then takes the average intensity of the pixels in that window as FPGA (Adder 3,…, and Adder 0). The usage of four adders
the new value of that pixel. Usually, the window around a pixel instead of just one is the limited input/output block of the FPGA
is considered to be a square that has (2𝑘 + 1) pixels on each side device. However, it is obvious that the order of the pixels is not
of itself. If it is the original image the intensity of (𝑥, 𝑦) pixel of affected and the outputs of the system show the denoised image
this image is 𝐼(𝑥, 𝑦) , then a mean filter with a (2𝑘 + correctly. The best output of the system shows that the
1) × (2𝑘 + 1) window, changes the intensity of (𝑥, 𝑦) pixel implemented mean filter algorithm by using the proposed
from 𝐼(𝑥, 𝑦) to 𝐽(𝑥, 𝑦). As can be seen, the mean filter is a linear approximate multiplier reduces the noises from the input images
filter with a (2𝑘 + 1) × (2𝑘 + 1) matrix mask that all arrays of and smooths the images by an acceptable range. As is shown in
1 Table. II, both sensitive (bioimages) and non-sensitive
the mask are 𝑤𝑖,𝑗 = 2 . In this paper, the image processing (standard) images are denoised and smoothed with less power,
(2k+1)
algorithm is implemented based on a 3×3 window of the input and the best peak signal-to-noise ratio (PSNR) and structural
image pixels and a 3×3 window of the mask of the mean filter. similarity index metric (SSIM).
1
As shown in Fig. 4, each pixel of the mean filter mask is . In
9 TABLE. II. THE PROPOSED IMAGE PROCESSING RESULTS.
the pre-processing stage, each image is affected by the Gaussian
noise with 0.003 variances. Then the grayscale images are Gray Inputs Noisy Inputs Denoised Output
PSNR
(dB)
SSIM
Power
(mW)
Power
Saving (%)
converted to binary images as the single-bit inputs of the system.
The proposed design is performed on the Xilinx Spartan3
XC3S400-4PQ208 FPGA board. To implement the mean filter
utilizing the proposed multiplier, each 3×3 window of the pixels 49.12 0.95 1.23 33.32

of the input image should be multiplied by the mean filter mask


using the proposed 8×8 multiplier and then, added together to
reach the mean value of the centered pixel of the input image.
Fig. 4 illustrates how the proposed multiplier is utilized in the
mean filter algorithm to reduce the noise of images.
49.32 0.98 1.21 33.33
FIFO
Noisy Image U8 U7 U6 U5 U4 U3 U2 U1 U0

1/9 1/9 1/9 W8 W7 W6


1/9 1/9 1/9 W8 W7 W6 W5 W4 W3 W2 W1 W0 W5 W4 W3
1/9 1/9 1/9 W2 W1 W0
Mean Filter Mask 3*3 Window 39.99 0.88 1.25 33.31

Adder 3 Adder 2 Adder 1

Adder 0
39.52 0.88 1.27 33.31
Denoised Image
Fig. 4. Hardware implementation of the mean filter algorithm using the
proposed multiplier on an FPGA.

All output pixels of the system are sorted respectively to


make the denoised image out of a noisy image. As mentioned 39.45 0.87 1.29 33.30
before, to multiply each pixel and eight pixels surrounding it at
the mean filter mask, the pixels of the input image should be in
3×3 format and a window. To do so, a first-in-first-out (FIFO)
memory with nine flip-flops is used in this design, as shown by
By substituting the exact multipliers with the designed terms of peak signal-to-noise ratio (PSNR) and structural
approximate multipliers, the mean filter performance is similarity index metric (SSIM) while consuming less power.
assessed. The quality metrics including PSNR, SSIM, power,
and power saving are evaluated. As provided in Table. III, the AI USAGE STATEMENT
proposed approximate multiplier is a highly precise design with The authors confirm that no artificial intelligence tools were
power savings of 33.33%. used in the preparation of this article.
TABLE. III. QUALITY AND POWER SAVINGS COMPARISON FOR 8×8 REFERENCES
MULTIPLIERS.
[1] M. Rafiee, N.Shiri, A.Sadeghi. “High-performance 1-bit full adder with
PSNR Power Power Savings excellent driving capability for multistage structures’’. IEEE Embedded
Multiplier SSIM
(dB) (mW) (%) Syst Lett. 14(1):47-50. doi:10.1109/LES..3108474, 2021.
[13] 15.98 0.77 1.64 14.2 [2] W.Liu, T. Zhang, E. McLarnon, M. O'Neill, P. Montuschi. F. Lombardi.
[14] 44.98 0.99 1.38 4.2 “Design and analysis of majority logic based approximate adders and
[15] 35.14 0.99 1.87 30 multipliers” . IEEE Trans Emerg Top Comput. 2019;9(3):1609-1624.
[6] 14.63 0.81 1.92 33.30
[3] M. C. Parameshwara and N. Maroof, “An Area-Efficient Maority Logic-
[16] 50.51 0.99 2.13 28.3
Based Approximate Adders with Low Delay for Error-Resilient
Proposed 50.62 0.99 1.20 33.33
Applications’’, Circuits, Systems, and Signal Processing 41:4977–4997.
2022.
Fig. 5, demonstrates SSIM and PSNR for several
[4] E. Esmaeili, F.Pesaran, N.Shiri. “A high-efficient imprecise discrete
approximate multipliers, where the proposed multiplier is more cosine transform block based on a novel full adder and Wallace multiplier
accurate than the others. Note that concerning the SSIM, the for bioimages compression’’ Int J Circ Theor Appl. 2023; 1‐24.
proposed designs, and [16], are the best (highest SSIM values). doi:10.1002/cta.3551.
Notably, the PSNR is calculated by (8). [5] A. Sadeghi, R. Rasheedi, I. Partin-Vaisband and D. Pal, “Energy Efficient
Compact Approximate Multiplier for Error-Resilient Applications,” in
25522 IEEE Transactions on Circuits and Systems II: Express Briefs, doi:
𝑃𝑆𝑁𝑅 = 10 𝑙𝑜𝑔10 () 10.1109/TCS II.2024.3437235.
𝑀𝑆𝐸
[6] S. Ullah, ‘‘Area-optimized low-latency approximate multipliers for
PSNR SSIM FPGAbased hardware accelerators,’’ in Proc. 55th Design Automat.
1.05
Conf. (DAC), San Francisco, CA, USA, Jun. 2018, pp. 1–6, doi: 10.1109/
50 0.99 0.99 DAC.2018.8465781.
1
[7] N. Shiri, A.Sadeghi,M. Rafiee, M.Bigonah. “SR-GDI CNTFET- based
40 0.95 magnitude comparator for new generation of programmable integrated
PSNR (dB)

circuits.” Int Circ Theor;1-26. doi:10.1002/cta.3251, Appl. 2022.


0.89
0.91 0.90
30 [8] M. Mirzaei, S. Mohammadi, “Process variation-aware approximate
fulladders for imprecision-tolerant applications’’,Computers & Electrical
0.85
20 0.81 Engineering, 2020;87:106761. doi:10.1016/j.compeleceng.2020.106761.
0.77 0.8 [9] M. Zhang, S. Nishizawa, and S. Kimura. Area Efficient Approximate 4–
10 2 Compressor and Probability-Based Error Adjustment for Approximate
0.75
Multiplier. IEEE Trans. on Circuits and Systems II (TCAS II), 2023.
0 0.7 [10] Y. Guo, H. Sun, and S. Kimura, “Small-area and low-power FPG-Abased
[6] [13] [14] [15] [16] This Work multipliers using approximate elementary modules,” in Proc. 25th Asia
Comparable References South Pac. Design Autom. Conf. (ASP-DAC), Beijing, China, 2020, pp.
599–604.
Fig. 5. PSNR and SSIM for the image processing. [11] Waris H, Wang C, Liu W. High-performance approximate half and full
adder cells using Nand logic gate. IEICE Electronics Express 2019.
The results of the FPGA implementation of the multiplier https://doi.org/10.1587/elex.16.20190043. 16–20190043.
and mean filter system confirm the research contribution in [12] S. Ullah, H. Schmidl, S. S. Sahoo, S. Rehman, and A. Kumar, “Area
image processing, especially sensitive images like medical optimized accurate and approximate softcore signed multiplier
magnetic resonance imaging (MRI), and computed tomography architectures,” IEEE Trans. Comput., vol. 70, no. 3, pp. 384–392, Mar.
(CT) scans. On the other hand, the presented system is a useful 2021, doi: 10.1109/TC.2020.2988404.
method for disease detection and low-power neural network [13] M. Ha and S. Lee, ‘‘Multipliers with approximate 4-2 compressors and
implementation. error recovery modules,’’ IEEE Embedded Syst. Lett., vol. 10, no. 1, pp.
6–9, Mar. 2018.
V. CONCLUSION [14] T. Yang, T. Ukezono, and T. Sato, ‘‘Low-power and high-speed
9approximate multiplier design with a tree compressor,’’ in Proc. 35th
An approximate full adder (FA)-based 8×8 multiplier is Int. Conf. Comput. Design, Boston, MA, USA, Nov. 2017, pp. 89–96.
presented and implemented on the field programmable gate [15] S. Venkatachalam and S.-B. Ko, ‘‘Design of power and area efficient
array (FPGA). The multiplier uses a new approximate FA to approximate multipliers,’’ IEEE Trans. Very Large Scale Integr. (VLSI)
reduce hardware complexity, delay, and power. The multiplier Syst., vol. 25, no. 5, pp. 1782–1786, May 2017.
outperforms look-up table (LUT)-based multipliers available on [16] C. Liu, J. Han, and F. Lombardi, ‘‘A low-power, high-performance
approximate multiplier with configurable partial error recovery,’’ in Proc.
FPGAs regarding dynamic power dissipation, power-delay- Design, Automat. Test Eur. Conf. Exhib. (DATE), Dresden, Germany,
product (PDP), and mean relative error distance (MRED). The Mar. 2014, pp. 1–4, doi: 10.7873/DATE.2014.108.
results implementing the multiplier in the mean filter system [17] P. Patel, A. Ahmadi, M. Khalid, ‘‘Implementing An Improved Image
show that it achieves power savings of 33.33%. Additionally, Enhancement Algorithm On FPGA,’’ 2021 IEEE Canadian Conference
the proposed multiplier produces more accurate output than on Electrical and Computer Engineering (CCECE), doi:
other approximate multipliers by achieving higher quality in 10.1109/CCECE53047.2021.9569049.

You might also like