07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS
07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS
Abstract—Due to the increasing demand of high committed to reducing the bit widths of features and
energy-efficient processor for deep neural networks, tradi- weights [2] to reduce memory access and power consumption
tional neural network engines with high-precision weights and while ensuring accuracy as much as possible. Recently
activations that usually occupies huge on/off-chip resources with
proposed binary neural networks (BNN) [3] quantized both
large power consumption are no longer suitable for Internet-
of-Things applications. Binary neural networks (BNNs) reduce weights and activations to +1 and −1, which dramatically
memory size and computation complexity, achieving drastically reduced their memory/computation. Thus, it is preferable to
increased energy efficiency. In this brief, an energy-efficient realize energy-efficient hardware with BNN engine in edge
time-domain binary neural network engine is optimized for computing applications [4].
image recognition, with time-domain accumulation (TD-MAC), The key module of the deep neural network is the MAC
timing error detection based adaptive voltage scaling design unit. While in the binarized CNN, the convolution operation of
and the related approximate computing. The proposed key
multiply-accumulate (MAC) can be realized by XNOR and bit-
features are: 1) an error-tolerant adaptive voltage scaling
system with TD-MAC chain truncation for aggressive power counting operations instead of multiply and accumulation [5].
reduction, working from near-threshold to normal voltage; A special delay cell [3] is designed to complete the multiply
2) architectural parallelism and data reuse with 100% TD-MAC and accumulate operation by calculating the delay. By this, it
utilization; 3) low power TD-MAC based on analog delay drastically reduces the power and memory footprint that allows
lines. Fabricated in a 28nm CMOS process, the whole system near-threshold voltage (NTV) design to further improve energy
achieves a maximum 51.5TOPS/W energy efficiency at 0.42V efficiency [6].
and 25MHz, with 99.6% accuracy on MNIST dataset. When
the length of the TD-MAC chain is truncated by configuration,
On the other hand, PVT variations in chip mass produc-
with a 90% accuracy on MNIST and a 150MHz, the proposed tion are severe that sufficient margin should be reserved
BNN achieves a power-saving of 13.2% and a further energy in design time, especially for NTV designs, while not
efficiency increasing of 67.6%. much work considered this effect. Since NN algorithms
Index Terms—Adaptive voltage scaling, analog delay line, are inherently noise-tolerant that a certain amount of errors
binary neural network, error resilience, time domain. would not cause much influence to its results, some
work began to use voltage scaling based on timing error
detection to improve energy-efficiency, such as [7]. These
timing error detections based resilient methods safely mini-
I. I NTRODUCTION
mize the worst-case voltage guard band down to the point
NERGY-EFFICIENT convolutional neural
E network (CNN) engines are essential for IoT and
mobile applications [1]. Traditional CNN engines use
of first failure (PoFF) without a costly error correction
mechanism.
In this brief, an energy-efficient time-domain binary neu-
8/16/32-bit fixed-point or even floating-point calculation, ral network (TD-BNN) is proposed for energy-efficiency and
which are not energy-efficient. Therefore, researchers are error resilience [9]. Our main contributions include: (1) Time-
domain mixed-signal processing that uses analog delay cells
Manuscript received May 13, 2021; accepted June 9, 2021. Date of pub- chain to address the challenge of wide vector summation in
lication June 14, 2021; date of current version August 30, 2021. This work
was supported in part by the Natural Science Foundation of Jiangsu Province BNN. (2) A new framework that enables aggressive voltage
under Grant BK20200002, and in part by the National Natural Science scaling of high-performance DNN engines with acceptable
Foundation of China under Grant 62074035 and Grant 61774038. This brief classification accuracy even in the presence of timing error
was recommended by Associate Editor F. Qiao. (Corresponding author:
Weiwei Shan.)
rates by approximate computing.
Zhikuang Cai is with the National and Local Joint Engineering Laboratory, Fabricated in 28nm CMOS, our proposed TD-BNN
RF Integration and Micro-Assembly Technology, College of Electronic and engine achieves 0.28/48.6mW power consumption and
Optical Engineering & College of Microelectronics, Nanjing University of 51.5/6.17 TOPS/W energy efficiency at 0.42/0.9V with no
Posts and Telecommunications, Nanjing 210023, China.
Boyang Cheng, Yuxuan Du, Xinchao Shang, and Weiwei Shan are with timing violations of 99.86% accuracy on MNIST. When the
the National ASIC System Engineering Research Center, School of Electronic length of the TD-MAC chain is truncated by configuration,
Science & Engineering, Southeast University, Nanjing 210096, China (e-mail: the proposed BNN achieves a power-saving of 13.2% and less
[email protected]).
Color versions of one or more figures in this article are available at
computing time at 150MHz frequency with a 90% accuracy
https://doi.org/10.1109/TCSII.2021.3088857. on MNIST, resulting in a further 67.6% improving energy
Digital Object Identifier 10.1109/TCSII.2021.3088857 efficiency.
1549-7747
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
3178 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 9, SEPTEMBER 2021
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
CAI et al.: TIME-DOMAIN BINARY CNN ENGINE WITH ERROR-DETECTION-BASED RESILIENCE IN 28nm CMOS 3179
Fig. 3. Proposed TD-MAC operations with various mode configurations for CLs and FCLs. (a) Overall structure of configurable PE array. (b) Structure and
timing waveform of 3×3 TD-MAC. (c) TD-MAC chain mode with different configurations.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
3180 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 9, SEPTEMBER 2021
Fig. 5. Error detection and TD-MAC chain truncation in TD-BNN. (a) Transistor-level structure of error detector and its waveform. (b) Configured TD-MAC
Array (6 EDs in TD-MAC Array). (c) TD-MAC chain with truncation (truncated at ED5).
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
CAI et al.: TIME-DOMAIN BINARY CNN ENGINE WITH ERROR-DETECTION-BASED RESILIENCE IN 28nm CMOS 3181
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.