An Area-Power-Efficient Multiplier-Less Processing Element Design For CNN Accelerators
An Area-Power-Efficient Multiplier-Less Processing Element Design For CNN Accelerators
Abstract—Machine learning has achieved remarkable In binary operations, multiplying a binary number by a
success in various domains. However, the computational power of 2 can be efficiently achieved by shifting the bits of
demands and memory requirements of these models pose the number, eliminating the need for actual multiplications.
challenges for deployment on privacy-secured or wearable edge Leveraging this characteristic, we propose a model
devices. To address this issue, we propose an area-power- quantization method and a corresponding multiplier-less PE.
efficient multiplier-less processing element (PE) in this paper. Our proposed method and PE design offer significant
Prior to implementing the proposed PE, we apply a power-of-2 advantages, resulting in approximately 30% reduction in
dictionary-based quantization to the model. We analyze the
computation power and 35% reduction in core area when
effectiveness of this quantization method in preserving the
compared to a baseline multiplication-and-accumulation
accuracy of the original model and present the standard and a
specialized diagram illustrating the schematics of the proposed
(MAC) PE. The analysis and evaluations conducted in this
PE. Our evaluation results demonstrate that our design study focus on convolutional neural networks (CNNs), which
achieves approximately 30% lower power consumption and are widely used for image-related applications and comprise
35% smaller core area compared to a conventional massive multiplication operation.
multiplication-and-accumulation (MAC) PE. Moreover, the The remaining sections of this paper are organized as
applied quantization reduces the model size and operand bit- follows. Section II presents the quantization method
width, resulting in reduced on-chip memory usage and energy
employed for the proposed PE design. In Section III, we
consumption for memory accesses.
provide a detailed illustration of schematics of the proposed
Keywords—a multiplier-less processing element, energy- PE. The evaluation results are presented in Section IV,
efficient, area-efficient, machine learning model quantization showcasing the performance of our design. Finally, Section
V concludes the paper.
I. INTRODUCTION
II. MODEL QUANTIZATION
In recent years, significant advancements have been made
in artificial intelligence (AI), particularly in areas such as Before implementing the proposed multiplier-less PE, a
object detection and allocation, natural language processing, power-of-2 dictionary-based quantization is applied to the
and image generation. To handle complex tasks, GPUs with pretrained neural network model. In this quantization process,
float-point multipliers have been predominantly employed in each original weight is represented by a combination of
big data centers or servers. However, with the rapid growth several sub-weights and scaled by a bias coefficient. This
of the Internet of Things (IoT) and increasing demands for approach allows for efficient representation and computation,
privacy security, a challenge arises in bringing AI models to enabling the subsequent utilization of the multiplier-less PE
edge devices. To address this challenge and reduce on-chip in the hardware accelerator design.
memory usage and power consumption, various hardware Equation (1) represents the power-of-2 dictionary, Dict,
accelerators have been proposed to enable more efficient which is utilized in the quantization process. Thus, for the
computations. One fundamental circuit component that is quantization of a single weight using the items in the
commonly utilized in these accelerators is the processing dictionary, if � = � + � + � × 2−���� it indicates that the
element (PE). weight (W) can be represented by three sub-weights in Dict
The basic component of PEs in hardware accelerators (i.e. �, �, � ∈ ����) and the bias term is a coefficient shared
typically comprises a fixed-point multiplier, which consumes across the layers, enabling efficient scaling of the quantized
a significant amount of computation energy. One common weights.
approach to reduce computation power is the utilization of
approximate multipliers. Fang et al. [1] introduced error ���� = 0, 20 , ± 21 , …, ± 2� ; � ∈ �+ (1)
balancing techniques, while Perri et al. [2] employed on-chip
quantization to preserve better precision in approximate The effectiveness of the bias term in increasing the range
multipliers. However, both designs were developed for of weights after quantization and mitigating the degradation
general applications and lack exclusive optimization for in accuracy has been demonstrated. By incorporating the bias
machine learning models. Another method involves the use term into the quantization process, the quantized weights are
of zero-gating designs [3], but such approaches often require able to cover a wider range, thereby preserving more
additional hardware overhead to detect zero operands. information and reducing the impact on model accuracy [4].
This finding highlights the significance of the bias term in
This work is supported in part by the Waseda University Open
Innovation Ecosystem Program for Pioneering Research (W-SPRING),
Grant Number JPMJSP2128.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 06,2024 at 08:43:04 UTC from IEEE Xplore. Restrictions apply.
TABLE I. ACCURACY TABLE OF THE TESTBENCH VGG9 MODEL
n
1 2 3
m
1 < 2.00 % 51.45 % 54.24 %
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 06,2024 at 08:43:04 UTC from IEEE Xplore. Restrictions apply.
TABLE III. AN EXAMPLE OF THE BISIGN SUB-WEIGHT ENCODING
Positive Negative
Binary Code
Sub_W Sub_W
+20 -20 0 0
+21 -21 0 1
+22 -22 1 0
+23 -23 1 1
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 06,2024 at 08:43:04 UTC from IEEE Xplore. Restrictions apply.
signals involved. Consequently, the bit-width of the weights
after quantization plays a crucial role in determining the
energy consumption and can be easily calculated by:
��������_�_�������ℎ = � × � (2)
������_�_�������ℎ = 2 × (� − 1) (3)
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 06,2024 at 08:43:04 UTC from IEEE Xplore. Restrictions apply.