0% found this document useful (0 votes)

35 views5 pages

MACIo T

This document describes a proposed low-power multiply-accumulate (MAC) unit for Internet of Things (IoT) processors. The MAC unit is capable of performing 16-bit, dual 16-bit, and 32-bit MAC operations on signed and unsigned numbers with up to three operands. It uses multiplexers and array multipliers to efficiently reuse hardware and minimize area and power consumption. The MAC unit was designed and implemented in VHDL, simulated using Vivado, and tested on an FPGA development board.

Uploaded by

swetha sillveri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views5 pages

MACIo T

Uploaded by

swetha sillveri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2018 2nd European Conference on Electrical Engineering and Computer Science (EECS)

Implementation of Low-Power Multiply-Accumulate

(MAC) Unit for IoT Processors
Kareem Mansour Ahmed Saeed
Department of Microsystems Engineering - IMTEK Electrical Engineering Department
Albert-Ludwigs-Universität Freiburg Future University in Egypt
Freiburg, Germany Cairo, Egypt
[email protected] [email protected]

Abstract—Embedded processors are key building blocks for

IoT platforms. Multiply-Accumulate (MAC) units are vital arith-
metic circuits in several applications performed by the processors
including digital signal processing (DSP). It is necessary to
reduce the power consumed by the processor. In this paper,
the design and implementation of 32-bit MAC unit optimized
for low-power budget targeting IoT processors is introduced.
The proposed MAC unit is capable of performing several 16-
bit, dual 16-bit, and 32-bit MAC operations that can be carried
out on signed and unsigned numbers with up to three operands
involved. The performance of MAC unit is analyzed in terms of
delay and power. The unit is described in VHDL, implemented
and simulated on Vivado and tested using Nexys 4 DDR board
featuring Xilinx’s Artix-7 FPGA.
Index Terms—FPGA, IoT, MAC, Multiplier, Vedic, VHDL

I. I NTRODUCTION
The Internet of Things (IoT) refers to a giant network that
Fig. 1: Block diagram of IoT device.
extends to everyday objects, namely ’Things’. These things,
while not considered computers, can be sensors, actuators,
wearables, mechanical machines, home appliances or even
persons that are able to communicate with other objects and
computers through the Internet via embedded systems without the components that make up an ARM core; one of the most
human intervention. Fig. 1 depicts the components of an IoT pervasive processors in the world that are embedded in a wide
device. Sensors in each IoT device are used to monitor and range of products from cell phones to vehicles [2].
collect data from the surrounding environment; local processor DSP applications are typically performed by Multiply-
or microcontroller (MCU) is used to process these data and Accumulate unit which multiplies two numbers and accumu-
interface to a wireless device for connectivity [1]. lates the result onto an accumulator. MAC unit is a fundamen-
Typically, mobile IoT devices transfer a small amount of tal block that maximizes the performance of the processor.
data and are powered through rechargeable batteries and/or
This paper introduces a new design for a low-power MAC
ambient energy sources such as solar energy, thermal energy,
unit capable of performing several 16-bit, dual 16-bit, and 32-
wind energy, electromagnetic energy from radio transmitters,
bit operations in which up to three operands are involved;
vibrations or physical motion. This is reason that power
two operands are the multiplicand and multiplier while the
dissipation has become an important concern, where it is
third operand is used for optional accumulation and subtraction
essential for devices to use minimal power and provide a
purposes. The proposed MAC operations can be carried out on
good performance. Power consumption depends on the type
signed and unsigned numbers and the result can be 32-bit or
of sensors, microcontroller and radio transceiver within the
64-bit according to the type of operation. In order to maximize
device.
the performance, all the MAC operations are executed in one
Unlike the traditional embedded devices, which would
cycle.
contain two separate processor, most cutting-edge devices
can handle the interface and manipulate DSP applications by The rest of this paper is organized as follows: section I
means of one single-core microcontroller. Single-core micro- introduces the architecture and the design of the proposed
controllers can reduce power consumption and has the com- MAC unit in details, the simulation and results are discussed
putational power to process real-time signals. Fig. 2 abstracts in section II. Finally, section III concludes the paper.

978-1-7281-1929-8/18/$31.00 ©2018 IEEE 356

DOI 10.1109/EECS.2018.00072
result of multiplication is optionally accumulated on another
operand by means of a 32-bit and/or a 64-bit adder.
Multiplexers are responsible for choosing the desired inputs
and outputs of the multipliers and adders according to the
operation. They are used to select the desired words and
halfwords of the input and resulting operands. In addition,they
are used to select the type of multiplication, whether signed or
unsigned. In this way, the same hardware components can be
reused to perform a wide range of operations while consuming
less area and power by avoiding component replication.
The multipliers used are unsigned by default, and hence,
an extra hardware is required to perform signed multiplica-
tion. For unsigned multiplication, the operands are allowed
into the multipliers without any change. However, for signed
multiplication, the absolute value of operands is fed into the
multipliers, unsigned multiplication is performed, and finally
the sign is separately calculated and added to the result.
B. Design
The multiplier blocks, denoted by ‘M0’, ‘M1’, ‘M2’ and
Fig. 2: ARM core functional units and dataflow model. ‘M3’, are 16x16 bit unsigned array multipliers that use dig-
ital combinational circuits to perform parallel multiplication.
Array multipliers outperform serial multiplication schemes in
II. A RCHITECTURE AND D ESIGN terms of speed and performance. The design of an array
In this section, the architecture and the design of the multiplier is based upon partial product generation, shifting
proposed MAC unit are explained in detail. The multiplicand and addition. The partial product is generated by the multipli-
and multiplier operands are denoted by A and B, respectively, cation of the multiplicand with one multiplier bit. Each partial
while the third operand is denoted by R, and the result product is shifted according to its bit position. Finally, the
is denoted by Y . For demonstration, the main idea of the result is obtained by adding the shifted partial products. Fig. 4
proposed MAC unit operations is listed below: demonstrates the multiplication method of array multiplier.
In order to maximize the performance while maintaining
minimum power and area and enable hardware reuse, the vedic
−Multiply words and Accumulate scheme is used to construct a 32x32 bit multiplier. Fig. 5 shows
Y =R±A×B an example of 4x4 bit vedic multiplier, where the first row
−Multiply Halfwords and Accumulate represents the multiplicand (B = b3 b2 b1 b0 ) while the second
Y = R ± A(half word) × B(half word) row is the multiplier (A = a3 a2 a1 a0 ). In step 1, the least
−Multiply Word by Halfword and Accumulate significant bits are multiplied representing the least significant
Y = R ± A × B(half word) bit of the multiplication result. In the subsequent steps, the
multiplication results are added i.e., a0 × b1 + a1 × b0 in step
−Dual Multiply Halfwords, Add/Subtract, Accumulate
2, a0 ×b2 +a2 ×b0 +a1 ×b1 in step 3, etc. Any carry generated
Y = R ± A(half word) × B(half word)
±A(other half-word) × B(other half-word) from the addition process should be added to the next step of
addition. Same procedure is followed through the final step.
The same methodology can be extended to construct 32x32-
A. Architecture bit vedic multiplier [3].
The proposed MAC unit consists of four 16x16 bit array In the proposed design, ‘M0’ is used to multiply the lower
multipliers whose inputs and outputs are connected to adders halfwords of the input operands. ‘M3’ is used to multiply the
and multiplexers. Fig. 3 shows the block diagram of the upper halfwords. The other two multipliers ‘M1’ and ‘M2’
proposed MAC unit and demonstrates its dataflow architecture. are used to multiply the lower halfword of the first operand
The design of each block will be discussed in the next by the upper halfword of the second operand and vice-versa.
subsection. In this way, the result of multiplying the different halfwords is
For 16-bit MAC operations, only one multiplier is required obtained at the same time and can be used in operations where
to multiply two halfwords. In case of dual 16-bit operations, two standalone multiplication results are required. The result
two multipliers are involved in multiplying two pairs of stan- selection depends on the operation and is done by means of
dalone halfwords and their result is then added or subtracted. multiplexers. In order to use the same hardware to perform
The four multipliers are connected together along with adders 32x32 bit multiplication, the concept of vedic multiplication
to form a 32x32 bit vedic multiplier in 32-bit operations. The was used. The results of the four multipliers are connected to

357
R [63:0] B [31:0] A [31:0]

64 32 32

ABS ABS

ABS (B) [31:0] ABS (B [31:16]) ABS (B [15:0]) ABS (A) [31:0] ABS (A [31:16]) ABS (A [15:0])
32 16 16 32 32 16 16 32

MUX MUX

MUL_B [31:0] MUL_A [31:0]

32 32

MUL_B[31:16] | MUL_A[31:16] MUL_B[31:16] | MUL_A[15:0] MUL_B[15:0] | MUL_A[31:16] MUL_B[15:0] | MUL_A[15:0]

16 16 16 16 16 16 16 16

MUL 16x16 MUL 16x16 MUL 16x16 MUL 16x16

M3 [31:0] M2 [31:0] M1 [31:0] M0 [31:0]

32 32 32 32

Sign Vedic Sign

32 32 Sign 32 32

M3_SIGN Y_VEDIC 64 64 Y_VEDIC_SIGN M0_SIGN

MUX
64 MULTIPLICATION_RESULT

ADR64_A | ADR64_B ADR32_A | ADR32_B

64 64 32 32

ADDER64 ADDER32

64 ADR64_Y 32 ADR32_Y

MUX
64 ACCUMULATOR_RESULT

Y [63:0]

Fig. 3: The block diagram of the proposed MAC unit.

signed operands and is required only in signed multiplication

operations. To obtain the absolute value, the most-signiﬁcant
bit of the operand is inspected. If it was ‘1’ then the operand is
negative and the absolute value is the two’s complement of the
operand. Otherwise, the operand is positive and the absolute
value is equal to the operand itself. The obtained absolute
value is ready to be used with the unsigned multipliers. Fig. 7
Fig. 4: The multiplication method of array multiplier. depicts the design of both the ‘ABS’ block.
The ‘Sign’ block is required to calculate the sign of the
result in signed multiplication. The resulting operand should
adders to extend the multiplication process. Fig. 6 depicts the be negative only if A and B were of different signs. In
connection of adders in the vedic block. order to achieve this, the most-signiﬁcant bit of A and B
The ‘ABS’ block is used to obtain the absolute value of the are inspected. The signed result is the two’s complement of

358
Y A(31) B(31)
32

32 32 32

ADDER

SIGN(Y)
Fig. 5: vedic scheme for 4x4-bit multiplier.
Fig. 8: The design of the ‘Sign’ block.
M3 & X"0000" X"0000" & M2 M1 X"0000" & M0 [31:16]

48 48 32 32

ADDER ADDER

Y2 X"0000" & Y1

48 48
M0 [15:0]
(a) With guard evaluation.
ADDER

48 16

Y_VEDIC [63:16] Y_VEDIC [15:0]

Fig. 6: The design of the ‘Vedic’ block.

(b) Without guard evaluation.
A/B
Fig. 9: The Power report of the MAC unit: (a) With guard
32
evaluation. (b) Without guard evaluation.
1

32 32 32 III. S IMULATION AND R ESULTS

ADDER The proposed MAC unit was implemented using VHDL
A/B(31) using Vivado software tool by Xilinx. The power reports of
the MAC unit obtained from Xilinx Vivado are shown in
Fig. 9. The power consumption is shown with and without
using guard evaluation low-power technique in 9a and 9b
32
respectively. It can be clearly seen that the dynamic power
ABS(A/B) consumption has been significantly reduced from 29 mW to
Fig. 7: The design of the ABS block. 22 mW after using this technique. It is worthy to note that, the
power will reduced significantly after integrating the proposed
MCU into the whole processor design because of the resource
sharing.
the unsigned multiplication result when the most-significant
The design was simulated using the simulation tool ISim,
bits are different. Otherwise, the signed result is equal to the
integrated with Vivado, to test the functionality of the design.
unsigned multiplication result. Fig. 8 depicts the design of both
The simulation results showed that the MAC unit was able
the ‘Sign’ block.
to perform the designed MAC operations correctly. Fig. 10
Regarding the low-power consumption, one MAC operation
shows the simulation results the MAC operations. It is worth
does not use all the components at a time. Only some specific
mentioning that the test-bench covers more operations than
components are necessary to operate when executing a certain
those listed in Fig.10.
operation and the rest can be switched off. Therefore, guard
evaluation low-power technique [4] is used to block the change IV. C ONCLUSIONS
in inputs to these blocks; Hence, saving dynamic power due
Efficient hardware architecture for low-power 32x32 bit
to transitions. In the next section, the improvement in power
MAC unit for IoT processors have been designed and imple-
consumption after using this technique will be reviewed.

359
Fig. 10: The simulation results of the MAC unit.

mented in this work. The proposed MAC unit is implemented [2] Andrew Sloss, Dominic Symes, and Chris Wright. ARM System De-
using VHDL on Nexys 4 development board featuring Xilinx’s veloper’s Guide: Designing and Optimizing System Software. Morgan
FPGA. The implementation results obtained from simulation Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.
[3] V. Kulkarni, L. Kulkarni, and V. Kulkarni. High speed and area efﬁcient
show that power consumption is very low of about 22 mW and vedic multiplier. In 2012 International Conference on Devices, Circuits
the delay is very small. Although the implemented MAC unit and Systems, pages 360-364, March 2012.
has a minimum power consumption, it is expected that such [4] C. Ravishankar, J. H. Anderson and A. Kennings, ”FPGA Power
Reduction by Guarded Evaluation Considering Logic Architecture,” in
unit will have further reduction after being integrated with the IEEE Transactions on Computer-Aided Design of Integrated Circuits
other components of the processor. Future work will focus on and Systems, vol. 31, no. 9, pp. 1305-1318, Sept. 2012.
improving the power results for the whole processor to ﬁt the
IoT power budge.

R EFERENCES
[1] Pallavi Sethi and Smruti R. Sarangi. Internet of things: Architectures,
protocols, and applications. J. Electrical and Computer Engineering,
2017:9324035:1–9324035:25, 2017.

360

Design and Implementation of High Speed 32 Bit Vedic Arithmetic Unit On FPGA
100% (2)
Design and Implementation of High Speed 32 Bit Vedic Arithmetic Unit On FPGA
25 pages
MEGA MAC A Merged Accumulation Based App
No ratings yet
MEGA MAC A Merged Accumulation Based App
4 pages
Idioms For 12th Class
0% (1)
Idioms For 12th Class
21 pages
DDCS V3.1 MANUAL V3 Projeto Final 2
100% (2)
DDCS V3.1 MANUAL V3 Projeto Final 2
84 pages
De Pin
No ratings yet
De Pin
22 pages
Design of A Vedic Multiplier Based 64-Bit Multiplier Accumulator Unit 444
No ratings yet
Design of A Vedic Multiplier Based 64-Bit Multiplier Accumulator Unit 444
7 pages
Low Power MAC Unit For DSP Processor
No ratings yet
Low Power MAC Unit For DSP Processor
3 pages
Vedic Multiplier
100% (1)
Vedic Multiplier
65 pages
File Index
No ratings yet
File Index
8 pages
Design of Arithmetic Unit For High Speed Performance Using Vedic Mathematics
No ratings yet
Design of Arithmetic Unit For High Speed Performance Using Vedic Mathematics
6 pages
Optimization of Delay IIN Pipeline Mac Unit Using Wallace Tree Multiplier
No ratings yet
Optimization of Delay IIN Pipeline Mac Unit Using Wallace Tree Multiplier
9 pages
Honeywell: Precision Platform 4022 Scanner System Manual
No ratings yet
Honeywell: Precision Platform 4022 Scanner System Manual
135 pages
Information Retrieval Master Thesis
100% (2)
Information Retrieval Master Thesis
7 pages
Design and Analysis of 8-Bit Vedic Multiplier
No ratings yet
Design and Analysis of 8-Bit Vedic Multiplier
5 pages
GE DigitalFlow GF868
No ratings yet
GE DigitalFlow GF868
163 pages
Design, Comparison and Implementation of Multipliers On FPGA
No ratings yet
Design, Comparison and Implementation of Multipliers On FPGA
8 pages
Phase II Review
No ratings yet
Phase II Review
30 pages
Dynamic Business Strategy Competing in A Fastchanging Uncertain
No ratings yet
Dynamic Business Strategy Competing in A Fastchanging Uncertain
134 pages
Implementation of MAC Unit Using Booth Multiplier & Ripple Carry Adder
No ratings yet
Implementation of MAC Unit Using Booth Multiplier & Ripple Carry Adder
3 pages
Booth Multiplier On 23 06 10
No ratings yet
Booth Multiplier On 23 06 10
25 pages
Desigine and Implimentation of Application Specific Low Power Multipliers
No ratings yet
Desigine and Implimentation of Application Specific Low Power Multipliers
7 pages
Coa Unit5 Arithmetic
No ratings yet
Coa Unit5 Arithmetic
15 pages
Design of High-Speed Multiplier Architecture Based
No ratings yet
Design of High-Speed Multiplier Architecture Based
4 pages
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
No ratings yet
Ijecet: International Journal of Electronics and Communication Engineering & Technology (Ijecet)
11 pages
Pham Aggarwal
No ratings yet
Pham Aggarwal
7 pages
Assignment: - 4: Part - A
No ratings yet
Assignment: - 4: Part - A
9 pages
16 Bit Multiplier Implementation Using V
No ratings yet
16 Bit Multiplier Implementation Using V
7 pages
Design and Implementation of A SHARC Digital Signal Processor Core in Verilog HDL
No ratings yet
Design and Implementation of A SHARC Digital Signal Processor Core in Verilog HDL
6 pages
ET3491-ES&IOT LAB MANUAL - Stud
No ratings yet
ET3491-ES&IOT LAB MANUAL - Stud
66 pages
Mini-Project Report Final
No ratings yet
Mini-Project Report Final
36 pages
Ijlbps 65f86753839c3
No ratings yet
Ijlbps 65f86753839c3
9 pages
Implementation of A 32-Bit MAC Unit in AISC Flow Using Vedic Multiplier and CSA
No ratings yet
Implementation of A 32-Bit MAC Unit in AISC Flow Using Vedic Multiplier and CSA
4 pages
Article 87
No ratings yet
Article 87
4 pages
Design and Analysis of 8-Bit Multiplier For Low Power VLSI Applications
No ratings yet
Design and Analysis of 8-Bit Multiplier For Low Power VLSI Applications
5 pages
Doc-20231116-Wa0020. 20240705 113735 0000
No ratings yet
Doc-20231116-Wa0020. 20240705 113735 0000
9 pages
Pawar 2017
No ratings yet
Pawar 2017
5 pages
Design and Implementation of FPGA Based 64-Bit MAC Unit Using VEDIC Multiplier and Reversible Logic Gates
No ratings yet
Design and Implementation of FPGA Based 64-Bit MAC Unit Using VEDIC Multiplier and Reversible Logic Gates
8 pages
JETIR1902228
No ratings yet
JETIR1902228
4 pages
High Performance Multiply
No ratings yet
High Performance Multiply
11 pages
Design of High Performance Radix-4 and Radix-8 Multiplier Using Verilog HDL
No ratings yet
Design of High Performance Radix-4 and Radix-8 Multiplier Using Verilog HDL
11 pages
Mac
No ratings yet
Mac
20 pages
Gartner Market Guide For Network Detection and Response 2022
No ratings yet
Gartner Market Guide For Network Detection and Response 2022
13 pages
Design of High-Speed Area Efficient Mac Unit Using Reversible Logic
No ratings yet
Design of High-Speed Area Efficient Mac Unit Using Reversible Logic
6 pages
Implementation of Low Power and High Speed Multiplier-Accumulator Using SPST Adder and Verilog
No ratings yet
Implementation of Low Power and High Speed Multiplier-Accumulator Using SPST Adder and Verilog
8 pages
A Performance Comparison Review of Multiplier Designs
No ratings yet
A Performance Comparison Review of Multiplier Designs
6 pages
An Efficient MAC Unit With Low Area Consumption
No ratings yet
An Efficient MAC Unit With Low Area Consumption
5 pages
An Efficient Design of 16 Bit MAC Unit Using Vedic Mathematics
No ratings yet
An Efficient Design of 16 Bit MAC Unit Using Vedic Mathematics
4 pages
Architecture Design of A Coarse-Grain Reconfigurable Multiply-Accumulate Unit For Data-Intensive Applications
No ratings yet
Architecture Design of A Coarse-Grain Reconfigurable Multiply-Accumulate Unit For Data-Intensive Applications
20 pages
J Chitra Conference
No ratings yet
J Chitra Conference
5 pages
66M/70Mw Hs and Ultra-Low Power 16X16 Mac Design Using TG For Web-Based Multimedia System
No ratings yet
66M/70Mw Hs and Ultra-Low Power 16X16 Mac Design Using TG For Web-Based Multimedia System
3 pages
Approaches To Low-Power Implementations of DSP Systems
No ratings yet
Approaches To Low-Power Implementations of DSP Systems
22 pages
A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application To A Double-Throughput MAC Unit
No ratings yet
A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application To A Double-Throughput MAC Unit
9 pages
Priyanka - 50300 16 130
No ratings yet
Priyanka - 50300 16 130
4 pages
16x16 Bit VM
No ratings yet
16x16 Bit VM
8 pages
DSP Arch
No ratings yet
DSP Arch
10 pages
An Efficient MAC Unit With Low Area Consumption
No ratings yet
An Efficient MAC Unit With Low Area Consumption
5 pages
Low Power Datapath Architecture For Multiply - Accumulate MAC Unit
No ratings yet
Low Power Datapath Architecture For Multiply - Accumulate MAC Unit
5 pages
VLSI Course Based Project Report Batch-2
No ratings yet
VLSI Course Based Project Report Batch-2
18 pages
A Novel Low Power and High Speed Multiply-Accumulate MAC Unit Design For Floating-Point Numbers
No ratings yet
A Novel Low Power and High Speed Multiply-Accumulate MAC Unit Design For Floating-Point Numbers
7 pages
A Novel High Performance Implemance and Design of 64 Bit MAC Unit& Their Delay Comparision
No ratings yet
A Novel High Performance Implemance and Design of 64 Bit MAC Unit& Their Delay Comparision
17 pages
Ijarcet Vol 1 Issue 5 346 351
No ratings yet
Ijarcet Vol 1 Issue 5 346 351
6 pages
Vlsi Architecture of Parallel Multiplier - Accumulator Based
No ratings yet
Vlsi Architecture of Parallel Multiplier - Accumulator Based
8 pages
DRD
No ratings yet
DRD
16 pages
Alu Paper 5
No ratings yet
Alu Paper 5
6 pages
FPGA and ASIC Vedic Multiplier
No ratings yet
FPGA and ASIC Vedic Multiplier
4 pages
Implementation of Low Power and High Speed Multiplier-Accumulator Using SPST Adder and Verilog
No ratings yet
Implementation of Low Power and High Speed Multiplier-Accumulator Using SPST Adder and Verilog
8 pages
JAVA Internship
No ratings yet
JAVA Internship
63 pages
ETECH WorkSheet 1
No ratings yet
ETECH WorkSheet 1
10 pages
09 Constraint Satisfaction Problems
No ratings yet
09 Constraint Satisfaction Problems
51 pages
Objective 5 - Format A Multiple-Column Newsletter
No ratings yet
Objective 5 - Format A Multiple-Column Newsletter
20 pages
7 Physical Principles of CT
No ratings yet
7 Physical Principles of CT
77 pages
Manual - Excel Masterclass 1 - DS7
No ratings yet
Manual - Excel Masterclass 1 - DS7
4 pages
What Is Semi-Supervised Learning
No ratings yet
What Is Semi-Supervised Learning
5 pages
IJE Volume 32 Issue 3 Pages 381-392
No ratings yet
IJE Volume 32 Issue 3 Pages 381-392
12 pages
MayankJain Resume
No ratings yet
MayankJain Resume
1 page
e 20171130
No ratings yet
e 20171130
14 pages
MS PDF VIEWER Snowsetanswers 2
No ratings yet
MS PDF VIEWER Snowsetanswers 2
475 pages
Low Power and High Performance JK Flip - Flop Using 45 NM Technology
No ratings yet
Low Power and High Performance JK Flip - Flop Using 45 NM Technology
5 pages
Indian Porn Sex Archita Pukham Viral Video Clip Full Original Video Social ...
No ratings yet
Indian Porn Sex Archita Pukham Viral Video Clip Full Original Video Social ...
4 pages
11 Privacy Preservation For Federated Learning in Health Care 2024 Patterns
No ratings yet
11 Privacy Preservation For Federated Learning in Health Care 2024 Patterns
14 pages
Devops Unit 4
No ratings yet
Devops Unit 4
6 pages
SAP Access Change and Monitoring Protocols - V1.7 - Mar072022
No ratings yet
SAP Access Change and Monitoring Protocols - V1.7 - Mar072022
19 pages
IJAER
No ratings yet
IJAER
13 pages
An 8K H.265:HEVC Video Decoder Chip With A New System Pipeline Design
No ratings yet
An 8K H.265:HEVC Video Decoder Chip With A New System Pipeline Design
14 pages
A New Excess-1 Circuit Based High-Speed Carry Sele
No ratings yet
A New Excess-1 Circuit Based High-Speed Carry Sele
20 pages
Shukla 2020
No ratings yet
Shukla 2020
9 pages
Smart Spaces - Mar11Eve
No ratings yet
Smart Spaces - Mar11Eve
39 pages
Sharma 2015
No ratings yet
Sharma 2015
4 pages
GNN Python Code in Keras and Pytorch - by YashwanthReddyGoduguchintha - Medium
No ratings yet
GNN Python Code in Keras and Pytorch - by YashwanthReddyGoduguchintha - Medium
10 pages
Moxa MC 3201 Series Datasheet v1.1
No ratings yet
Moxa MC 3201 Series Datasheet v1.1
8 pages
Red PPT Template-71-75
No ratings yet
Red PPT Template-71-75
5 pages
Your Budak Paste - SPaste2
No ratings yet
Your Budak Paste - SPaste2
3 pages
Smartcookingflow
No ratings yet
Smartcookingflow
7 pages
Sim Hosting Api Version 2.O
No ratings yet
Sim Hosting Api Version 2.O
6 pages
Case Study HR
No ratings yet
Case Study HR
2 pages

MACIo T

Uploaded by

MACIo T

Uploaded by

2018 2nd European Conference on Electrical Engineering and Computer Science (EECS)

Implementation of Low-Power Multiply-Accumulate

Abstract—Embedded processors are key building blocks for

978-1-7281-1929-8/18/$31.00 ©2018 IEEE 356

MUL_B [31:0] MUL_A [31:0]

MUL_B[31:16] | MUL_A[31:16] MUL_B[31:16] | MUL_A[15:0] MUL_B[15:0] | MUL_A[31:16] MUL_B[15:0] | MUL_A[15:0]

MUL 16x16 MUL 16x16 MUL 16x16 MUL 16x16

M3 [31:0] M2 [31:0] M1 [31:0] M0 [31:0]

Sign Vedic Sign

M3_SIGN Y_VEDIC 64 64 Y_VEDIC_SIGN M0_SIGN

ADR64_A | ADR64_B ADR32_A | ADR32_B

Fig. 3: The block diagram of the proposed MAC unit.

signed operands and is required only in signed multiplication

Y_VEDIC [63:16] Y_VEDIC [15:0]

Fig. 6: The design of the ‘Vedic’ block.

32 32 32 III. S IMULATION AND R ESULTS

You might also like