0% found this document useful (0 votes)

38 views13 pages

Resize-Pdf - Base Paper 6 - Copy-Numbered

Uploaded by

sagarphtos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views13 pages

Resize-Pdf - Base Paper 6 - Copy-Numbered

Uploaded by

sagarphtos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

KARATSUBA ALGORITHM:A

PARADIGM SHIFT IN
MULTIPLICATIONEFFICIENCY
1
Mrs.C.Anjani
2
R.Kavya
3
S. Poojitha Reddy
4
A.Lasya priya
1
Professor in department of Electronics and Communication Engineering
2,3,4
UG Students of Sridevi Women’s Engineering College
1,2,3,4
Sridevi Women’s engineering College Telangana, Hyderabad India
1
[email protected]
2
[email protected]
3
[email protected]
4
[email protected]

Abstract: become increasingly evident in the era of the Internet

of Things(IoT), spanning a wide range of applications,
The karatsuba algorithm is a fast multiplication algorithm that uses a
from bio-signals to advanced image processing. Take,
divide and conquer approach to multiply two n-digit numbers. Here,
the system compiler takes lesser time to compute the product than the
for instance, the ubiquitous presence of wearable

time-taken by a normal multiplication. The divide-and-conquer health monitoring devices, crucial given that 47% of
algorithm reduces the multiplication of two n-digit numbers to three cardiac diseases – the leading cause of death globally
multiplications of n/2-digit numbers and, by repeating this reduction, – manifest outside of hospital settings. Similarly,
to at mostsingle-digitmultiplications.Itisthereforeasymptoticallyfaster Unmanned Aerial Vehicles (UAVs), such as drones,
than the traditional algorithm, which performs single-digit products. are proliferating across various domains including
The karatsuba algorithm was the first multiplication algorithm
object/self tracking, search and surveillance,
asymptotically faster than the quadratic "grade school" algorithm.
agricultural operations, and entertainment.
Multiplying large numbers efficiently is an important task , however
the traditional, naive way of multiplying numbers involves Various sectors, including entertainment, agriculture,
multiplying each digit in one number to each digit in the second search and surveillance, object/self-tracking, and
number, requiring n 2 single-digit computations. Asthesize of wildlife monitoring, witness a surge in the utilization
multiplication increases, the time required to solve using the naive of drones and other UAVs. Field-Programmable
way increases dramatically. So ,to overcome this problem, multipliers Circuit Arrays (FPGAs), readily accessible in
and dividers are designed using karatsuba algorithm. This algorithm
commercial markets, offer a viable substitute for
can provide high throughput, high efficiency.Itcan alsoreduce the
power-intensive Application-Specific Integrated
time complexity from O(n 2 ) to O(nlog23)≈O(n 1.58).The
Circuits (ASICs) in implementing these programs.
multipliers are designed using field programming array(FPGA).In
This is primarily due to FPGAs' rapid prototyping
this paper we proposed pipelined soft multipliers using karatsuba
algorithm. Experimental results obtained with vivado, Xilinx which capabilities and their adaptability in post-fabrication
demonstrate the efficiency of proposed pipelined multipliers using datapath adjustments, making them an attractive
karatsuba algorithm. option for such applications.The adaptability of
medical device technology, exemplified by its
I. INTRODUCTION
capacity to adjust to the unique physiological
The growing demand for edge computing has
characteristics and fluctuations in heart activity of

1
the DCT (quantization) stage. However, the reported

individual patients, is paramount. Similarly, performance gains typically focus on these individual
kernels rather than considering the impact on the
parallelizable applications handling substantial
entire end-to-end application implementation.
data volumes frequently opt for methods
Thirdly, although much attention is directed towards
that enhance throughput and/or minimize
optimizing multiplication operations. This
power consumption. underscores the need for comprehensive optimization
strategies that address both multiplication and

While Application-Specific Integrated Circuits division operations in FPGA-based designs.

(ASICs) offer high power efficiency for

implementing such programs, off-the-shelf Field-
Programmable Gate Arrays (FPGAs) have emerged
as commercially viable alternatives. Their rapid
prototyping and post-fabrication data path versatility
make them capable of keeping pace with the rapid
evolution of algorithms, which often outstrip Figure1:Comparing area,delay and energy of 8,16,32bit
hardware updates. Consider, for example, the need Multipliers and dividers.

for health monitoring devices to adapt to different Multiplication stands as a prevalent operation with in bio-
patients'physiological traits and changes in heart signal or visual processing work loads, and FPGAs
activity. integrate built-in DSP units to expedite this process.
Moreover, there's a significant demand for high Nonetheless, there exist three potential reasons why DSP
throughput and energy efficiency to accelerate blocks might fail to meet design criteria. Firstly, they may
parallelizable applications that continuously process lack sufficient processing power for applications
large volumes of data. necessitating extensive multiplication or parallel operation,

Challenges in ASIC, state-of-the-art approximation, owing to their limited ratio compared to Look-up Tables

and DSP have led to the predominant design of (LUTs). Additionally, DSP blocks are permanently

multipliers using FPGA technology. These embedded in FPGAs, escalating routing costs and

challenges encompasses various aspects: potentially diminishing performance in specific industries.

Lastly, digital signal processors prove inadequate in
Firstly, while approximation approaches tailored for
addressing precision issues associated with multiplication
ASIC platforms have shown promising performance
using only 18×18 bits. Notably, prominent FPGA vendors
gains, directly transferring them to FPGAs proves
like Xilinx and Intel have mandated the use of soft
challenging due to the differing architectural
Intellectual Properties (IPs) for operations such as
specifications of the two platforms. Secondly,
arithmetic.
approximation techniques are often applied to
II.RELATEDWORK
individual kernels within multi-kernel applications.
For instance, in JPEG compression, approximation A. .Customized Convolutional

may involve replacing multiplication or division Neural Network for FPGA

operations with imprecise versions specifically in Platforms:

2
Image search engines, object detection in mobile robot mining, where ECG features are compared against
vision, and a myriad of other applications have embraced established rules. Our suggested Bit_Q_Apriori
Convolutional Neural Networks (CNNs) extensively. hardware-oriented data-mining method aims to
Furthermore, edge devices heavily rely on specialized
enhance processing speed. The adoption of the right
hardware accelerators like FPGAs and Application-
implementation promises improved scalability,
Specific Integrated Circuits (ASICs) due to the high
effectiveness, throughput, and cost-efficiency
memory and computational demands of CNN models.
compared to competing hardware solutions.
Among these options, FPGA accelerators hold an edge
owing to their versatility, low power consumption, and C.Utilising template matching and implementing in
rapid development capabilities, surpassing other FPGA for real-time identification of characteristics in
specialist hardware accelerators for artificial neural ECG Waves:
networks. Previous efforts in FPGA acceleration designs The electrocardiogram (ECG) holds immense potential in
predominantly focused on configuring hardware to providing crucial clinical in sight sin to cardiac processes.
support CNN model structures. In contrast, our approach We present an algorithmic approach for real-time
leverages reinforcement learning to autonomously search identification and characterization of wave peaks in one-
for optimal network designs, empowering users to create lead ECG data. Initially, the ECG data undergoes
custom convolutional neural networks tailored to preprocessing to eliminate power line interference and
specialized FPGA hardware. high-frequency noise. Subsequently, a set of rule bases is
B. An Amplifier to real-time ECG Research et Diagnosis established based on slope and polarity using the first

Utilizing:Telemedicine, which utilizes Information 6,000 samples, laying the foundation for detecting R-

and Communication Technology (ICT) to deliver peaks, P-waves, and T-waves in beats. For this study, we
employed the Spartan III FPGA from Xilinx to execute the
medical care remotely, emerges as a potential
code. To validate the methodology, 8-bit encoded ECG
solution to the challenges confronting current
data was transmitted to the FPGA via the computer's
healthcare systems. These challenges include
parallel port using a parallel transfer mechanism. The
catering to an aging population, a rising number of measured sensitivities for P-waves, R-waves, and T-waves
patients, and a shortage of qualified medical were 97.58%, 98.4%, and 97.78%, respectively, owing to
professionals. With recent advancements in
Association-Rule Filtering on an FPGA. To
telemedicine, particularly in wearable ECG
address the need for expedited processing and
monitors, there is a growing demand formore
diagnosis of real-time electrocardiogram (ECG)
sophisticated and precise automated ECG data, we propose a streaming architecture
evaluation and diagnostic systems. Association- leveraging Field-Programmable Gate Arrays
Rule Filtering on an FPGA. To address the need for (FPGAs). Early diagnosis can be facilitated
expedited processing and diagnosis of real-time through association-rule mining, where ECG
electrocardiogram (ECG) data, we propose a features are compared against established rules.

streaming architecture leveraging Field- Our suggested Bit_Q_Apriori hardware-oriented

Programmable Gate Arrays (FPGAs). Early data-mining method aims to enhance processing
speed. The adoption of the right implementation
diagnosis can be facilitated through association-rule

3
promises improved scalability, effectiveness, including machine learning and image/video processing.
throughput, and cost-efficiency compared to To build high-performance multipliers, FPGA providers
competing hardware solutions. offer digital signal processing (DSP) blocks. However,

C. Utilising template matching and implementing there are constraints on the number and placement of these

in FPGA for real-time identification of multipliers on FPGAs, which can lead to additional routing

characteristics in ECG delays and inefficiencies, especially form narrower bit

widths. As a solution, FPGA manufacturers also provide
Waves:
multiplicatively tuned soft IP cores. This article argues that
The electrocardiogram (ECG) holds immense
despite their advantages, FPGA soft multiplication IP
potential in providing crucial clinical in sight sin
cores require better designs to achieve high performance
to cardiac processes. We present an algorithmic
with low resource consumption. Our proposed generic,
approach for real-time identification and
area-optimized, low-latency technology improves upon
characterization of wave peaks in one-lead ECG
existing softcore designs by leveraging the architectural
data. Initially, the ECG data undergoes
attributes of FPGAs, such as fast-carry chains and lookup
preprocessing to eliminate power line
table (LUT) structures. When compared to Xilinx's
interference and high-frequency noise.
multiplier Logi CORE IP, our recommended unsigned
Subsequently, a set of rule bases is established
accurate design can reduce LUT usage byupto53%across
based on slope and polarity using the first 6,000
various multiplier sizes, while our proposed signed correct
samples, laying the foundation for detecting R-
architecture achieves reductions ofupto25%.Additionally,
peaks, P-waves, and T-waves in beats. For this
our unmarked approximated multiplier designs maintain
study, we employed the Spartan III FPGA from
output accuracy while reducing critical path delay (CPD)
Xilinx to execute the code. To validate the
by up to 51% compared to the LogiCORE IP. The
methodology, 8-bit encoded ECG data was
proposed multiplier architecture has enhanced the area and
transmitted to the FPGA via the computer's
performance of accelerators used in image and video
parallel port using a parallel transfer mechanism.
applications. You can find our open-source collection of
The measured sensitivities for P-waves, R-
approximate and exact multipliers at https://cfaed.tu-
waves, and T-waves were 97.58%, 98.4%, and
dresden.de/pd-downloads. Our objectives include
97.78%, respectively, owing to Xilinx's
facilitating result replication, sparking new inquiries
implementation. Among the various wave
within the FPGA community, and encouraging further
characteristics detected—including height,
study and enhancement in this field.
polarity, and duration—an average miss rate of
E. Area-Optimized Low-Latency Approximate Multipliers
9.3% was attained. In a clinical setting, a
for FPGA- Based Hardware Accelerators:
medical expert verified the detected wave
patterns, emphasizing the reliability and The performance advantages of employing ASIC

accuracy of the proposed approach. approximation techniques in FPGA-based configurable

computing systems are limited by architectural constraints
D. Multiplexers for Field Programmable Gate
between ASICs and FPGAs. This paper introduces a
Array Hardware Accelerators with High
comprehensive solution comprising a freely available
Achievement, Both Precise and Generated.:
library, an efficient design methodology, and an innovative
Multiplication plays a critical role in various fields,

4
architecture tailored for approximation multipliers error (ARE). Specifically:
optimized specifically for FPGA-based fabrics. Our - For3-coefficientmultipliermethods,aggregated sub-
approach not only enhances output accuracy but also region inaccuracies are kept below preset thresholds
achieves improvements in area, delay, and energy (e.g.,3%for5-coefficientschemesand2.5%for 10-coefficient
consumption compared to existing approximation schemes).
multipliers based on ASICs. Notably, our proposed
- Error-reduction ratios for each set of sub-intervals are
method outperforms Xilinx Vivado multipliers IP,
determined using mathematical methods described in [45].
delivering significant energy savings (up to 67%),
Table I presents the proposed binary multiplier
reduced latency (53%), and a
and divider coefficients. Partitioning is
30%improvementinareautilization,allwhile maintaining
implemented using small multiplexers designed in
high accuracy (average relative error < 1%).For those
HDL, with the complexity of conditional
interested in contributing to this field or witnessing its
statements affecting LUT consumption. We
ongoing progress, our library of approximation
introduce three methods to decrease this
multipliers is accessible online at
complexity, limiting the number of coefficients
https://cfaed.tudresden.de/pd-downloads. This
per method to 10.
advancement opens up new avenues for research within
Additionally, simplifying conditional statements
the FPGA community.
by comparing only four major stream bits of
F. Proposed light-weight error-reduction scheme:
fractional sections during division further reduces
Streamlining the error-reduction categories could address complexity. Each 6-LUT functions as a 4-MUX in
the overflow issue observed in both INZeD and MBM, as hardware, requiring one FPGA slice containing
well as reduce the excessive parameter count (e.g., 256 in four 6-LUTs for a 16:1 multiplexer.
REALM). RAPID, unlike REALM/SIMDive, allocates
The proposed partitioning mechanism, based on
the squared-off area among power-of-two combinations.
MUXes, maintains scalability compared to
Key factors considered in this partitioning include:
REALM and SIMDive, as the resource cost does
1. Opting for four fractional multiple-significance-bands not exponentially increase with coefficient count.
(MSBs) instead of three for enhanced accuracy. Our approach demonstrates superior resource-
2. Recommending a reduction in the number of error trade-offs compared to state-of-the-art
partitions while maintaining four MSBs to conserve methods. With ten error coefficients and four
resources during parameter selection. MSBs, our method outperforms
3. Minimizing variance pattern and error volume in each SIMDive/REALM in terms of LUT usage and

area by optimizing the estimate of the error-magnitude achieves a Mean Relative Error (ARE) of 0.6%.
integral. Refer to Table III for detailed comparisons.

We derived these methods from Vivado's resource-usage

data and fault analysis, illustrated in Figure 2. Our
primary research objective has been to find optimal
partitioning strategies that minimize the average absolute

5
Figure3:Overall structure of multiplier and divider using Mitchell’s
Figure 2: Proposed error reduction schemes of RAPID for
algorithm
multiplication and division based on MSBs of fractional
parts. In our FPGA-customized approach, the LOD computation
relies on 4-bit LODs, configured directly within LUTs.
Here's how it works:

1. Zero Detection and Leading-One Detection (LOD):

Table1:Binary representation of error reduction Each 4-bit segment of the operands undergoes
coefficients in 16 bit multiplier and divider.
simultaneous analysis. One LUT serves as a logical OR
function to detect the presence of a '1' in the
III.MITCHELL’SAPPROXIMATEALGORITHM
segment(acting as a zero-detection flag). Another 6-LUT
Mitchell’s algorithm for multiplying two numbers
is configured as two 5-LUTs to determine the position of
using logarithms is straightforward. The
the leading one in the 4-bit segment (LOD4-LUT). The
logarithms of the input numbers are added and the
resulting bits from these LUTs determine the position of
antilogarithm of the sum is determined. The
the leading one in the most significant group through
method used to find the logarithm and the
priority logic.
antilogarithm impacts the accuracy. Mitchell
2. Extension to Larger LODs: Similar methods are
presented a simple method to approximate the
applied for 16- and 32-bit LODs. For example, in a 16-
logarithm and antilogarithm calculations.
LOD, if the upper half of the operand is zero, the LOD is
The existing units efficiently utilize 6-input
equal to the lower 8-bit LOD. Otherwise, the position of
Look-up Tables (6-LUTs)and fast carry chains to
the leading one is calculated accordingly.
implement Mitchell’s approximate algorithms.
3. LOD Step Orchestration: In our LeAp architecture,
To address the first challenge, the first
LOD steps are orchestrated through a Finite State Machine
logarithmic multiplier tailored specifically for
(FSM) and executed in at most five clock cycles. To
FPGAs. LeAp's design is motivated by the
maintain efficiency and minimize registers, LOD
translation of multiplication operations into
implementation is realized as combinational logic, with
addition within the logarithmic
critical path analysis guiding balanced partitioning for
domain, achieved through Mitchell's algorithm.
pipelining.
4. Integer Parts Addition: Each4-bitaddition is handled
byoneVirtex-7slice,comprising four 6-LUTsand associated
fast carry chains, forming a Carry Look-Ahead
Adder(CLA).Extendingto8-bitadditionsinvolves

6
connecting the carry-out from a prior slice to the carry-in fractional components. In contrast, in REALM
of the next. [45], MBM [20], and INZeD [16], Mitchell's

5. LUT-Optimized Ternary Addition: We optimize circuit cannot accommodate the error-reduction

ternary addition by configuring FPGA LUTs and carry parameter, or half of it, without relying on an

chain primitives to implement a ternary adder. This additional circuit that operates based on the

aligns with our error reduction approach, allowing the intermediate addition/subtraction of fractional

addition of error reduction coefficients alongside parts.

fractional parts in a single step, minimizing resource

usage. Unlike other methods, where an additional circuit
is required to add error-reduction terms to Mitchell's
circuits, our approach seamlessly integrates this process,
leveraging fixed FPGA primitive delays without
additional overhead.

This streamlined approach ensures efficient computation Figure4:2,3,4Stage pipelined model ofmultipliers and dividers

and resource utilization, critical for FPGA-based systems

In our LeAp approach [17], we focus solely on

IV.KARATSUBAALGORITHM
reducing error factors based on fractional bits,
unlike MBM/INZeD [20, 16], where LUT- The main idea of the Karatsuba Algorithm is to
optimized ternary addition considers the interim reduce multiplication of multiple sub problems to
outcome of Mitchell Mul/Div. Fortunately, the multiplication of three sub problems. Arithmetic
Xilinx UNISIM library provides the necessary operations like additions and subtractions are
LUTs and rapid carry chains for constructing a performed for other computations .For this
ternary adder [59]. By carefully configuring algorithm, two n-digit numbers are taken as the
FPGA logic units and carry chain elements we've input and the product of the two number is
transformed them into a tri adder. This enables obtained as the output.
us to simultaneously incorporate fractional
The Karatsuba algorithm is a recursive algorithm;
components and error-reduction ratios while
since it calls smaller instances of itself during
maintaining the same resource footprint, aligning
execution. According to the algorithm, it calls
perfectly with our error-reduction methodology.
itself only thrice on n/2-digit numbers in order to
At the end of the ternary adder chain, an
achieve the final product of two n-digit numbers.
additional LUT is needed when the sum of Now, if T(n) represents the number of digit
frac1i+frac2i+error coefficient i+ Cin (carry-out multiplications required while performing the
from bit to bit) results in three bits. However, multiplication.
compared to the raw version, only one extra bit
is required at the most significant bit (MSB)
position [19]. The fixed delay of FPGA
primitives eliminates the need for extra design
effort to integrate the error-reduction period with

7
[A=10^{frac{n}{2}}A_1+A_2]

[B=10^{frac{n}{2}}B_1+B_2]

where A1, A2, B1, and B2 each have n/2

digits. Step4:Compute variables U,V,and W
as follows:
Figure5:Block diagram of multiplier using karatsuba algorithm
[ U = A_1B_1 ]
[V=A_2B_2]
Assume A and B are the two inputs of ‘n’ bit seach. The A
and B are divided into two segments say AH, BH and AL, [W=(A_1+A_2)(B_1+B_2)
BL. Here AH, BH are the higher- order bits and AL, BL are ] [ Z = W - (U + V) ]
the lower order bits. Step5:Obtain the product P by substituting the
values into the formula
AB=(2^n/2*AH+AL)(2^n/2*BH+BL)=2^n(AHBH)+
2^n/2(AHBL+ALBH)+(ALBL)By Karatsuba multiplier [P=10^n(U)+10^{frac{n}{2}}(Z)+V]
algorithm, AHBL+ALBH=(AH+AL)(BH+BL)–AHBH–
AL BL Therefore, 4 * n/2 bit multiplications is decreased to [P=10^n(A_1B_1)+10^{frac{n}{2}}(A_1B_2+A_2B_1)
3* n/2 bit multiplications. Time complexity of Karatsuba +A_2B_2]
multiplication algorithm is O(n) = n^1.58.

Step 6: Recursively call the algorithm by passing the sub

II. STEPSINVOLVEDINMULTIPLICATIONTHROUGH problems (A1, B1), (A2, B2), and (A1 + A2, B1 + B2)
KARATSUA ALGORITHM. separately. Store the returned values in variables U, V,
and W, respectively.
Step1:Assumenisapowerof2.

In this paper, the performance of Karatsuba

Step 2: If n equals 1, use multiplication tables to compute P = algorithm is investigated for multiplicand and
AB. multiplier having 4, 8,
16 and 32 bit length. Moreover, the performance of
Step 3: If n is greater than 1,split the n-digit numbers in half
Karatsuba algorithm is analyzed in terms of the
and represent them using the formulas:
number of multiplication and the total process time.

8
The applications used for performance analysis are
implemented using vivado.
The bit length increases along with the number of
multiplication due to the processing of Karatsuba
algorithm. In addition, the more the number of
multiplication raises, the more the amount of
hardware increases. Therefore, the cost required
for performing multiplication operation rises.
When compared to each other, the number of
multiplication of Karatsuba algorithm is less than
classical multiplication method. The performance
Figure 7:Performance analysis of Karatsuba algorithm interms of the
of Karatsuba algorithm in terms of the total total process time for different bit lengths.

process time for different bit lengths is analyzed The graph illustrates that as the bit length
as shown in Fig. VI. increases, the total processing time also rises. This
trend occurs because the number of necessary
multiplications escalates in tandem with the bit
length. Furthermore, the total processing time
inversely correlates with the processing speed; as
the former increases, the latter decreases due to the
slowdown in multiplication. When juxtaposed with
the classical multiplication method, the Karatsuba
algorithm demonstrates superior performance in
terms of total processing time.
V.RESULTS

Figure 6 : Performance analysis of Karatsuba algorithm interms of

the number of multiplication for different bit lengths.

Figure 8:simulation results of multipliers using mitchell’s

algorithm

9
approaches, which are costly and inefficient, this
incorporates innovative error-reduction technologies,
achieving an impressive accuracy range of 99.4–99.4
percent. For instance, when contrasted with pipelined
accurate IPs, this pipelined multiplication and division
operations could potentially reduce LUT usage by
36%, enhance performance/watt by 2.3 times, and
boost throughput by up to 3.3 times.

Through comprehensive end-to-end testing,

Figure 9:RTL view of RAPID demonstrates significant enhancements in various
applications, such as heartbeat detection (35%
improvement), compressed JPEG images (33%
improvement), and Harris corner identification (45%
improvement), across delay, area, and Area-Delay-
Product (ADP), respectively, without compromising
reception quality. The pipelined design presents an
excellent opportunity to expedite the execution of
diverse applications that operate on data streams and
continuously process vast amounts of data. While
Qualityof Reception (QoR) remains unaffected,
Figure 10:power,area,timings of RAPID. latency, area, and Size-Delay-Product (SDP) increase
by 35%,33%,and45% respectively, compared to
correct kernels.

Our primary aim is to evaluate the performance of the

pipeline mode in various environments, including
neural networks, which offer opportunities for SIMD
and pipelining. Addressing data dependencies
sequentially poses a challenge, often only partially
mitigated by processors' out-of-order execution, which
fails to fully exploit pipelining potentials.
Figure 11: Power,Area,Timing of proposed system. Consequently, we are developing improved pipelined
III.CONCLUSION divider and multiplication implementations to resolve
In our study, employing fine-grain pipelining, we data dependenciesandfacilitate internal data transfers
introduce more efficiently. Notably, intra-unit bypassing would
, the pioneering design for an approximation multiplier yield faster execution with reduced overhead.
using karatsuba algorithm.. Compared to current Additionally, we aim to create an ALU similar to

10
assess its effectiveness in the data-path of softer CPUs 17, 3.
like RISC-V. [6] S. Ullah et al. 2021. High
Performance Accurate and Approximate
One promising application is the mantissa Multipliers for FPGA-based Hardware
multiplier/divider, where division delay can be up to 35 Accelerators. IEEE Transactions on
times longer than additionoperations,consumingover95% Computer-Aided Design of Integrated
of the floating-point unit's space and power. The surge in Circuits and Systems (TCAD).
popularity of this technology is largely attributed to its [7] S. Ullah et al. 2018. Area-Optimized
widespread adoption in 3D graphics software. Low-Latency Approximate Multipliers
REFERENCES for FPGA-Based Hardware Accelerators.
[1] World Health Organisation. 2018. In IEEE/ACM Design Automation
Cardiovascular diseases (CVDs). Conference (DAC).
https://www.who.int/ news - room/ fact - [8] I. Kuon and J. Rose. 2007.
sheets/ detail/cardiovascular - diseases- Measuring the gap between fpgas and
(cvds). (2018). asics. IEEE Transactions on Computer
[2] P. Kostic. 2017. Heart Disease and Aided Design of Integrated Circuits and
Early Heart Attack Care. https : / / www . Systems (TCAD), 26, 2.
bnl . gov / hr / occmed / hpp / linkable [9] A. Boutros et al. 2018. Embracing
files / pdf / Diversity: Enhanced DSP Blocks for
EarlyHeartAttackSymptoms.pdf. (2017). Low-Precision Deep Learning on
[3] Y. Yang et al. 2019. FPNet: FPGAs. In IEEE International
Customized Convolutional Neural Conference on Field Programmable
Network for FPGA Platforms. In IEEE Logic and Applications (FPL).
International Conference on [10] S. Lee et al. 2019. Double MAC on
FieldProgrammable Technology a DSP: Boosting the Performance of
(ICFPT). Convolutional Neural Networks on
[4] X. Gu et al. 2016. A Real-Time FPGAs. IEEE Transactions on
FPGA-Based Accelerator for ECG Computer-Aided Design of Integrated
Analysis and Diagnosis Using Circuits and Systems (TCAD), 38, 5.
Association-Rule Mining. ACM [11] Xilinx. 2015. LogiCORE IP
Transactions on Embedded Computing multiplier v12.0.
Systems (TECS), 15, 2. https://www.xilinx.com/ support /
[5] H.K. Chatterjee et al. 2015. documentation / ip documentation / mult
Real–time detection of electrocardiogram gen / v12 0 / pg108 - mult-gen.pdf.
wave features using template matching (2015).
and implementation in FPGA. [12] Xilinx. 2016. LogiCORE IP Divider
International Journal of Biomedical v5.1. https://www.xilinx.com/ support/
Engineering and Technology (IJBET), documentation/ip documentation/ div

11
gen/ v5 1/ pg151 - div -
gen.pdf. (2016)

12
13

An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
No ratings yet
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
15 pages
Guide To FPGA Implementation of Arithmetic Functions
No ratings yet
Guide To FPGA Implementation of Arithmetic Functions
472 pages
Docslide - Us New Database1
No ratings yet
Docslide - Us New Database1
274 pages
Ofdm Fpga Thesis
100% (3)
Ofdm Fpga Thesis
4 pages
Bashir-UCP Art1, Trade Payment
No ratings yet
Bashir-UCP Art1, Trade Payment
89 pages
Chester Thesis
No ratings yet
Chester Thesis
135 pages
Electronics 12 00605 v2
No ratings yet
Electronics 12 00605 v2
19 pages
Vlsi Mtech Document
No ratings yet
Vlsi Mtech Document
72 pages
Ullah 2021
No ratings yet
Ullah 2021
14 pages
Acmjetc
No ratings yet
Acmjetc
18 pages
Hardware Implementation of Bit-Parallel Finite Field Multipliers
No ratings yet
Hardware Implementation of Bit-Parallel Finite Field Multipliers
68 pages
Self Reconfigurable Constant Multiplier
No ratings yet
Self Reconfigurable Constant Multiplier
17 pages
An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier For FGPA Implementation
No ratings yet
An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier For FGPA Implementation
10 pages
1 s2.0 S0045790624001459 Main
No ratings yet
1 s2.0 S0045790624001459 Main
11 pages
High-Performance Accurate and Approximate Multipliers For FPGA-Based Hardware Accelerators
No ratings yet
High-Performance Accurate and Approximate Multipliers For FPGA-Based Hardware Accelerators
14 pages
Applsci 14 03323 v2
No ratings yet
Applsci 14 03323 v2
15 pages
Design of Power and Area Efficient Approximate Multipliers
0% (1)
Design of Power and Area Efficient Approximate Multipliers
22 pages
Reconfigurable Multiplier
No ratings yet
Reconfigurable Multiplier
16 pages
Guide To FPGA
No ratings yet
Guide To FPGA
472 pages
Karatsuba Algorithm and Urdhva-Tiryagbhyam Algorithm
No ratings yet
Karatsuba Algorithm and Urdhva-Tiryagbhyam Algorithm
6 pages
Design of A 32-Bit Accuracy-Controllable Approximate Multiplier For FPGAs
No ratings yet
Design of A 32-Bit Accuracy-Controllable Approximate Multiplier For FPGAs
2 pages
Lecture04 - High-Level Digital Design Automation
No ratings yet
Lecture04 - High-Level Digital Design Automation
30 pages
Efficient Design of Single Precision Floating Point Multiplier Paper
No ratings yet
Efficient Design of Single Precision Floating Point Multiplier Paper
6 pages
Area Optimized Low Latency Approximate M
No ratings yet
Area Optimized Low Latency Approximate M
6 pages
Master in Business For Architecture and Design
No ratings yet
Master in Business For Architecture and Design
27 pages
Hybrid FP FXP Dot Product
No ratings yet
Hybrid FP FXP Dot Product
12 pages
ASIC Implementation of High-Speed Adaptive Recursive Karatsuba Multiplier With Square-Root-Carry-Select-Adder
No ratings yet
ASIC Implementation of High-Speed Adaptive Recursive Karatsuba Multiplier With Square-Root-Carry-Select-Adder
4 pages
Braun's Multipliers: Spartan-3AN Based Design and Implementation
No ratings yet
Braun's Multipliers: Spartan-3AN Based Design and Implementation
4 pages
ROBA
67% (3)
ROBA
11 pages
Applsci 14 04085
No ratings yet
Applsci 14 04085
15 pages
Vendor Qualification and Requirements - 1P - Latest 22-11-2019
100% (2)
Vendor Qualification and Requirements - 1P - Latest 22-11-2019
7 pages
A2 Intro
No ratings yet
A2 Intro
28 pages
MICPRO2011-An Iterative Logarithmic Multiplier
No ratings yet
MICPRO2011-An Iterative Logarithmic Multiplier
11 pages
Convolution FPGA
No ratings yet
Convolution FPGA
6 pages
Research Outcome
No ratings yet
Research Outcome
4 pages
Energy-Ef Cient Low-Latency Signed Multiplier For FPGA-based Hardware Accelerators
No ratings yet
Energy-Ef Cient Low-Latency Signed Multiplier For FPGA-based Hardware Accelerators
4 pages
Karatsuba Matrix Multiplication and Its Efficient Custom Hardware Implementations
No ratings yet
Karatsuba Matrix Multiplication and Its Efficient Custom Hardware Implementations
15 pages
Adaptive Area-Efficient Multiplier With Accuracy-Configurable Lookahead Multiplication
No ratings yet
Adaptive Area-Efficient Multiplier With Accuracy-Configurable Lookahead Multiplication
23 pages
31 Design JJ New
No ratings yet
31 Design JJ New
8 pages
Lutmul: Exceed Conventional Fpga Roofline Limit by Lut-Based Efficient Multiplication For Neural Network Inference
No ratings yet
Lutmul: Exceed Conventional Fpga Roofline Limit by Lut-Based Efficient Multiplication For Neural Network Inference
7 pages
Braun's Multipliers: A Delay Study: Mohammed H. Al Mijalli
No ratings yet
Braun's Multipliers: A Delay Study: Mohammed H. Al Mijalli
2 pages
2018 Efficient Implementation of Karatsuba Algorithm Based Three-Operand Multiplication Over Binary Extension Field
No ratings yet
2018 Efficient Implementation of Karatsuba Algorithm Based Three-Operand Multiplication Over Binary Extension Field
9 pages
FPGA Based Modified Karatsuba Multiplier
No ratings yet
FPGA Based Modified Karatsuba Multiplier
6 pages
FPGA Implementation of IEEE-754 Karatsuba Multiplier
No ratings yet
FPGA Implementation of IEEE-754 Karatsuba Multiplier
4 pages
Literature Review
No ratings yet
Literature Review
2 pages
Performance Analysis and Implementation 097e10b9
No ratings yet
Performance Analysis and Implementation 097e10b9
20 pages
A Performance Comparison Review of Multiplier Designs
No ratings yet
A Performance Comparison Review of Multiplier Designs
6 pages
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
No ratings yet
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
6 pages
Neyaz 2020
No ratings yet
Neyaz 2020
4 pages
Performance Evaluation of Fixed-Point Array Multipliers On Xilinx Fpgas
No ratings yet
Performance Evaluation of Fixed-Point Array Multipliers On Xilinx Fpgas
5 pages
Vlsi Ieee 2017 List
No ratings yet
Vlsi Ieee 2017 List
3 pages
A Low-Power High-Accuracy Approximate Multiplier Using High-Order Approximate Compressors
No ratings yet
A Low-Power High-Accuracy Approximate Multiplier Using High-Order Approximate Compressors
10 pages
How To Solve The Rubik's Cube
No ratings yet
How To Solve The Rubik's Cube
23 pages
Esda 3rd
No ratings yet
Esda 3rd
4 pages
Design of Roba Multiplier Using Mac Unit
No ratings yet
Design of Roba Multiplier Using Mac Unit
15 pages
QA Assignment 02
No ratings yet
QA Assignment 02
2 pages
Approximation of Hardware Accelerators Driven by Machine-Learning Models Embedded Tutorial
No ratings yet
Approximation of Hardware Accelerators Driven by Machine-Learning Models Embedded Tutorial
2 pages
High Speed Reconfigurable FFT Design by Vedic Mathematics: Ashish Raman, Anvesh Kumar and R.K.Sarin
No ratings yet
High Speed Reconfigurable FFT Design by Vedic Mathematics: Ashish Raman, Anvesh Kumar and R.K.Sarin
5 pages
Multiplier 6.10 CameraReady
No ratings yet
Multiplier 6.10 CameraReady
6 pages
DR Opeyemi Idaewor
No ratings yet
DR Opeyemi Idaewor
69 pages
Approximate Multipliers For Optimal Utilization of FPGA Resources
No ratings yet
Approximate Multipliers For Optimal Utilization of FPGA Resources
6 pages
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
No ratings yet
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
5 pages
Existing Methodology: I I I-1 I I-1 I I
No ratings yet
Existing Methodology: I I I-1 I I-1 I I
9 pages
Application-Development 2008 Reconfigurable-Computing
No ratings yet
Application-Development 2008 Reconfigurable-Computing
4 pages
Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder
No ratings yet
Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder
3 pages
Generic ISO 14001 EMS Templates: ACT Plan
No ratings yet
Generic ISO 14001 EMS Templates: ACT Plan
59 pages
Article VI Reviewer
No ratings yet
Article VI Reviewer
7 pages
Welbilt Bread Machine Model Abm1h70 Instruction Manual & Recipes Abm 1h70
No ratings yet
Welbilt Bread Machine Model Abm1h70 Instruction Manual & Recipes Abm 1h70
4 pages
Portfolio in Psych Stats - Docx 1
No ratings yet
Portfolio in Psych Stats - Docx 1
17 pages
Niact 2
No ratings yet
Niact 2
25 pages
Hillstone HSM 4.0.0 EN
No ratings yet
Hillstone HSM 4.0.0 EN
2 pages
BSBWRT401 - Assessment 2 Template
No ratings yet
BSBWRT401 - Assessment 2 Template
13 pages
Book Tool: Kickoff Meeting Template
No ratings yet
Book Tool: Kickoff Meeting Template
7 pages
Operations Strategy at Compaq Computer
100% (2)
Operations Strategy at Compaq Computer
6 pages
GPS Antenna Cable
No ratings yet
GPS Antenna Cable
5 pages
Production and Operations Management 5th Edition S. N. Chary Ebook All Chapters PDF
100% (5)
Production and Operations Management 5th Edition S. N. Chary Ebook All Chapters PDF
55 pages
Strategy in The Marketing Channel - 3
No ratings yet
Strategy in The Marketing Channel - 3
22 pages
Automotive Servicing NC Ii Jay Christian T Agsalon
No ratings yet
Automotive Servicing NC Ii Jay Christian T Agsalon
3 pages
Consent Form
No ratings yet
Consent Form
5 pages
Acknowledgement Thesis Sample Friends
100% (2)
Acknowledgement Thesis Sample Friends
5 pages
RA No 11232 Revised Corporation Code of The Philippines Sec 115 To Sec 132
No ratings yet
RA No 11232 Revised Corporation Code of The Philippines Sec 115 To Sec 132
4 pages
TTT Trainer Checklist
No ratings yet
TTT Trainer Checklist
4 pages
Qatar PPPs
No ratings yet
Qatar PPPs
29 pages
Beton Dizayn Programi
No ratings yet
Beton Dizayn Programi
4 pages
Prepared Food Photos, Inc V New Kianis Pizza & Subs, Inc: Judgment Entered $51,461.50
No ratings yet
Prepared Food Photos, Inc V New Kianis Pizza & Subs, Inc: Judgment Entered $51,461.50
5 pages
How To Get Started As An Online English Teacher
No ratings yet
How To Get Started As An Online English Teacher
2 pages
Sample DLP 2024
No ratings yet
Sample DLP 2024
3 pages
Copy Assessment - WNG
No ratings yet
Copy Assessment - WNG
3 pages
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet

Resize-Pdf - Base Paper 6 - Copy-Numbered

Uploaded by

Resize-Pdf - Base Paper 6 - Copy-Numbered

Uploaded by

KARATSUBA ALGORITHM:A

Abstract: become increasingly evident in the era of the Internet

While Application-Specific Integrated Circuits division operations in FPGA-based designs.

(ASICs) offer high power efficiency for

challenges encompasses various aspects: potentially diminishing performance in specific industries.

may involve replacing multiplication or division Neural Network for FPGA

operations with imprecise versions specifically in Platforms:

streaming architecture leveraging Field- Our suggested Bit_Q_Apriori hardware-oriented

characteristics in ECG delays and inefficiencies, especially form narrower bit

accuracy of the proposed approach. approximation techniques in FPGA-based configurable

We derived these methods from Vivado's resource-usage

1. Zero Detection and Leading-One Detection (LOD):

5. LUT-Optimized Ternary Addition: We optimize circuit cannot accommodate the error-reduction

addition of error reduction coefficients alongside parts.

fractional parts in a single step, minimizing resource

and resource utilization, critical for FPGA-based systems

In our LeAp approach [17], we focus solely on

where A1, A2, B1, and B2 each have n/2

Step 6: Recursively call the algorithm by passing the sub

In this paper, the performance of Karatsuba

Figure 6 : Performance analysis of Karatsuba algorithm interms of

Figure 8:simulation results of multipliers using mitchell’s

Through comprehensive end-to-end testing,

Our primary aim is to evaluate the performance of the

You might also like