0% found this document useful (0 votes)

67 views5 pages

07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS

This document summarizes a research paper that proposes a time-domain binary convolutional neural network (TD-BNN) engine for energy efficient image recognition. The key contributions are: 1) A time-domain mixed-signal processing approach using analog delay cells chained together to address the challenge of wide vector summation in binary neural networks (BNNs). 2) A framework that enables aggressive voltage scaling of deep neural network engines through timing error detection and approximate computing, improving energy efficiency while maintaining acceptable classification accuracy. 3) A 28nm CMOS implementation of the proposed TD-BNN engine that achieves 51.5 TOPS/W energy efficiency at 0.42V and 99.6% accuracy on

Uploaded by

AMANDEEP SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views5 pages

07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS

Uploaded by

AMANDEEP SINGH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO.

9, SEPTEMBER 2021 3177

A Time-Domain Binary CNN Engine With

Error-Detection-Based Resilience
in 28nm CMOS
Zhikuang Cai , Boyang Cheng , Yuxuan Du, Xinchao Shang , and Weiwei Shan , Member, IEEE

Abstract—Due to the increasing demand of high committed to reducing the bit widths of features and
energy-efficient processor for deep neural networks, tradi- weights [2] to reduce memory access and power consumption
tional neural network engines with high-precision weights and while ensuring accuracy as much as possible. Recently
activations that usually occupies huge on/off-chip resources with
proposed binary neural networks (BNN) [3] quantized both
large power consumption are no longer suitable for Internet-
of-Things applications. Binary neural networks (BNNs) reduce weights and activations to +1 and −1, which dramatically
memory size and computation complexity, achieving drastically reduced their memory/computation. Thus, it is preferable to
increased energy efficiency. In this brief, an energy-efficient realize energy-efficient hardware with BNN engine in edge
time-domain binary neural network engine is optimized for computing applications [4].
image recognition, with time-domain accumulation (TD-MAC), The key module of the deep neural network is the MAC
timing error detection based adaptive voltage scaling design unit. While in the binarized CNN, the convolution operation of
and the related approximate computing. The proposed key
multiply-accumulate (MAC) can be realized by XNOR and bit-
features are: 1) an error-tolerant adaptive voltage scaling
system with TD-MAC chain truncation for aggressive power counting operations instead of multiply and accumulation [5].
reduction, working from near-threshold to normal voltage; A special delay cell [3] is designed to complete the multiply
2) architectural parallelism and data reuse with 100% TD-MAC and accumulate operation by calculating the delay. By this, it
utilization; 3) low power TD-MAC based on analog delay drastically reduces the power and memory footprint that allows
lines. Fabricated in a 28nm CMOS process, the whole system near-threshold voltage (NTV) design to further improve energy
achieves a maximum 51.5TOPS/W energy efficiency at 0.42V efficiency [6].
and 25MHz, with 99.6% accuracy on MNIST dataset. When
the length of the TD-MAC chain is truncated by configuration,
On the other hand, PVT variations in chip mass produc-
with a 90% accuracy on MNIST and a 150MHz, the proposed tion are severe that sufficient margin should be reserved
BNN achieves a power-saving of 13.2% and a further energy in design time, especially for NTV designs, while not
efficiency increasing of 67.6%. much work considered this effect. Since NN algorithms
Index Terms—Adaptive voltage scaling, analog delay line, are inherently noise-tolerant that a certain amount of errors
binary neural network, error resilience, time domain. would not cause much influence to its results, some
work began to use voltage scaling based on timing error
detection to improve energy-efficiency, such as [7]. These
timing error detections based resilient methods safely mini-
I. I NTRODUCTION
mize the worst-case voltage guard band down to the point
NERGY-EFFICIENT convolutional neural
E network (CNN) engines are essential for IoT and
mobile applications [1]. Traditional CNN engines use
of first failure (PoFF) without a costly error correction
mechanism.
In this brief, an energy-efficient time-domain binary neu-
8/16/32-bit fixed-point or even floating-point calculation, ral network (TD-BNN) is proposed for energy-efficiency and
which are not energy-efficient. Therefore, researchers are error resilience [9]. Our main contributions include: (1) Time-
domain mixed-signal processing that uses analog delay cells
Manuscript received May 13, 2021; accepted June 9, 2021. Date of pub- chain to address the challenge of wide vector summation in
lication June 14, 2021; date of current version August 30, 2021. This work
was supported in part by the Natural Science Foundation of Jiangsu Province BNN. (2) A new framework that enables aggressive voltage
under Grant BK20200002, and in part by the National Natural Science scaling of high-performance DNN engines with acceptable
Foundation of China under Grant 62074035 and Grant 61774038. This brief classification accuracy even in the presence of timing error
was recommended by Associate Editor F. Qiao. (Corresponding author:
Weiwei Shan.)
rates by approximate computing.
Zhikuang Cai is with the National and Local Joint Engineering Laboratory, Fabricated in 28nm CMOS, our proposed TD-BNN
RF Integration and Micro-Assembly Technology, College of Electronic and engine achieves 0.28/48.6mW power consumption and
Optical Engineering & College of Microelectronics, Nanjing University of 51.5/6.17 TOPS/W energy efficiency at 0.42/0.9V with no
Posts and Telecommunications, Nanjing 210023, China.
Boyang Cheng, Yuxuan Du, Xinchao Shang, and Weiwei Shan are with timing violations of 99.86% accuracy on MNIST. When the
the National ASIC System Engineering Research Center, School of Electronic length of the TD-MAC chain is truncated by configuration,
Science & Engineering, Southeast University, Nanjing 210096, China (e-mail: the proposed BNN achieves a power-saving of 13.2% and less
[email protected]).
Color versions of one or more figures in this article are available at
computing time at 150MHz frequency with a 90% accuracy
https://doi.org/10.1109/TCSII.2021.3088857. on MNIST, resulting in a further 67.6% improving energy
Digital Object Identifier 10.1109/TCSII.2021.3088857 efficiency.
1549-7747
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
3178 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 9, SEPTEMBER 2021

Fig. 1. Software level algorithm and hardware level implementation.

II. T IME -D OMAIN B INARY N EURAL

N ETWORK E NGINE D ESIGN Fig. 2. Analog delay cell circuit design. (a) Definition of its positive and
negative output results. (b) Transistor-level structure and its function table.
A. Binary Neural Network
In the Binary Neural Network (BNN), the activation value
and weight are binarized to +1 and −1 so that the MAC
to know whether the intermediate y2 obtained by batch nor-
is simplified. In its hardware implementation, we use 0 to
malization is greater than or less than the median of the input
represent −1 and use 1 to represent +1, so that using 1-bit to
data range. This work takes advantage of this feature and uses
store two status is achieved, saving hardware overhead. And
delay information to replace the accumulation, biasing and
the multiplication of activation and weight is transformed to
batch normalization operations. As shown in Fig. 2(a), in the
XNOR operation.
same condition, two inverter chains are designed with the same
At the software level, the operation of the complete binary
output load. The relative delay of the rising edges of the out-
convolutional layer is shown in Fig. 1 left. The activation and
puts Y1 and Y2 of the two inverter chains is used to define
weight vectors of A = {a(i)} and W = {w(i)} (i = 1, 2, . . . , n)
the sign of result.
performs convolution and then add the bias b to obtain the
The Y2 signal is used as the reference signal. X1 and X2 are
intermediate result y1. y1 is used as the input of the batch
the same rising signal at the same time. When the rising edge
normalization operation. Then the binary activation judges the
of output signal Y1 arrives earlier than the reference standard
output y2 with sign function to obtain the final result Y.
signal Y2, the defined output value is 1. On the other hand,
For BNN hardware, the batch normalization can be com-
when the rising edge of output signal Y1 arrives later than the
bined with the biasing operation [6], so the entire convolution
signal Y2, the defined output value is 0.
operation is simplified into a single module of convolution-
The inverter chain consists of special inverter [9]. Its struc-
bias-normalization, shown by Fig. 1 right. This module con-
ture is shown in Fig. 2(b). Two signals a and w control the
sists of XOR operation, accumulation, biasing and batch
conduction of 3 parallel PMOS and 3 parallel NMOS and
normalization. The activation function at the hardware level
thus affect the charging and discharging ability of the struc-
is different from the activation function at the software level.
ture, which makes the delay from X to Y is controlled by the
1, i wi ai +bBN = m, (i = 1, 2, . . . , n) input signals a and w.
Y= (1)
0, i wi ai + bBN < m, (i = 1, 2, . . . , n)
At the software level, it judges the positive or negative of the C. TD-MAC Chain
input to generate the output, while at the hardware level, the
Here we describe how the computations for BNN
input and an intermediate parameter m need to be compared
are efficiently performed by the time-domain technique.
to determine the output, as shown in (1). Here m is the median
Fig. 3(a) shows the details of TD-MAC array. The fast and
of the input data range, for example, when n = 9, the input
slow speed to generate Y represent that the results of XNOR
data range is 1-9 and its median m is (n + 1)/2 = 5. The
operation are 0 or 1, respectively. Thus, the single MAC chain
activation function module consists of a comparator.
is configured as shown in Fig. 3(b). After the positive edge of
The max pooling [8] is adopted because it can be realized
the CK signal passes through the two chains, outputs Y1 and
with only OR gate, so it is more hardware-friendly than the
Y2 with different delays are obtained. Y1 and Y2 are con-
average pooling scheme.
nected to the data and clock input of the DFF, respectively.
The output ‘sign’ can be understood as the sampling result of
B. Analog Delay Cell the DFF when capturing Y1 at clock Y2.
In BNN, the activation function only needs to determine Considering the actual MAC computing in BNN, the
the positive or negative of an intermediate parameter and does proposed TD-MAC structure shows a time-domain accumu-
not care about the previous calculation process. So, in the lator for a 3×3 convolutional operation or a 9-bit summation
hardware implementation, the activation function only needs in FCLs, as shown in Fig. 3(b). The base chain is configured

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
CAI et al.: TIME-DOMAIN BINARY CNN ENGINE WITH ERROR-DETECTION-BASED RESILIENCE IN 28nm CMOS 3179

Fig. 3. Proposed TD-MAC operations with various mode configurations for CLs and FCLs. (a) Overall structure of configurable PE array. (b) Structure and
timing waveform of 3×3 TD-MAC. (c) TD-MAC chain mode with different configurations.

as the value of ((N + 1)/2+bias+BN offset) when consider-

ing the batch normalization (BN) in BNN, where N equals
to the length of XNOR-Delay line. Different from the tradi-
tional all-digital method, our proposed TD-MAC only needs to
obtain the delay difference between two chains in our structure
rather than to get a precise accumulation result. If activation
and weight are the same, the phase difference of Y and CK
is large. Otherwise, the phase difference is small. At last, the
phase difference between the base (Y1) and the MAC (Y2)
chains is reflected on 1-bit ‘sign’. Here are two situations:
1) If Y2 > Y1, sign = 1; 2) If Y2 < Y1, sign = 0. The timing Fig. 4. Overall architecture of proposed time-domain binary CNN engine [9].
diagram of a 3 × 3 TD-MAC is shown in the right part of
Fig. 3(b).
Applied in BNN, the whole TD-MAC chain consists of error detection and correction (EDAC) design is integrated into
64 TD-MACs in total. To achieve different MAC operations the engine to improve energy efficiency further through the
in CLs and FCLs, 128 2-to-1 MUXs is placed between every method of approximate computing.
two TD-MACs, deciding where to cut the MAC chain and
export an output value. For different layers in TD-BNN engine, III. TD-BNN W ITH A DAPTIVE VOLTAGE S CALING AND
4 chain modes can be configured with various TD-MAC chain C ONFIGURABLE T RUNCATION
length and MUX configurations, as shown in the Fig. 3(c). For
Razor systems [10] can detect timing information in the
example, when TD-BNN engine performs the convolutional
critical paths, which allows excessive worst-case guardbands
operation of layer CL2, the select signals of S0, S16, S32,
to be safely minimized down to the lowest supply voltage,
S48 are configured to 0 while the select signals of other MUXs
where timing violations start to occur. In conventional digital
are configured to 1. The 64 TD-MACs are divided into 4 inde-
chips, a mechanism for error correction should be applied to
pendent TD-MAC chains, each composed of 16 TD-MACs.
ensure functionality. In fact, in deep learning applications, it
Since each TD-MAC can complete a 3×3 convolutional oper-
is not always necessary to correct errors. Neural network is
ation, each segmented chain can simultaneously complete the
inherently noise-tolerant in algorithm and is a natural fit for
3×3×16 convolutional operation required by the CL2 layers.
approximate computing and error-tolerant design.
Thus, all operations in BNN are achieved by the proposed
Taking advantage of this error tolerance mechanism, a novel
TD-MAC chain with high energy efficiency.
method of TD-MAC chain truncation in the neural network
is proposed, which provides aggressive voltage scaling to
D. Overall Architecture decrease power when timing error occurs.
The overall architecture of our proposed TD-BNN is To realize error detection with quick response and low over-
described in Fig. 4. It mainly consists of controllers, 32kB head in TD-BNN, error detectors (ED) with 12 transistors are
weight/feature SRAMs, local data buffers and an 8×8 TD- inserted at the selected position. It is based on current sens-
MAC array. On-chip weight/feature SRAMs are connected ing, which flags a late-arriving transition as an error during
to weight/feature buffers respectively to provide data for the the positive clock phase. Both the schematic and the timing
TD-MAC array through dispatchers. The TD-MAC array is waveform of ED are depicted in Fig. 5(a).
composed of 64 MAC processing units, which can be con- A total of 6 EDs are inserted in the TD-MAC array for error
figured as various computing modes by kernel parallelism and detection, as shown in Fig. 5(b), where a series of TD-MACs
reuse in different convolutional layers and fully connected lay- form critical paths according to the configurations. They are
ers with 100% TD-MAC utilization. Furthermore, AVS with used to monitor whether there is timing violations when the

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
3180 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 9, SEPTEMBER 2021

Fig. 5. Error detection and TD-MAC chain truncation in TD-BNN. (a) Transistor-level structure of error detector and its waveform. (b) Configured TD-MAC
Array (6 EDs in TD-MAC Array). (c) TD-MAC chain with truncation (truncated at ED5).

Fig. 7. Overall test platform and time-domain waveforms.

Fig. 6. Die microphotograph and chip specifications.

length of TD-MAC chain changes with different configura-

tions under different supply voltages and possible process and
temperature variations. And all the timing error signals of EDs
are gathered by Dynamic-OR gates, which then propagate an
error signal to the adaptive voltage scaling module.
When scaling down the voltage for higher energy efficiency,
ED1 is firstly enabled to monitor timing violations, which
accounts for the longest critical path. Another five points
(ED2-ED6) are truncation points, which are in charge for the
critical path to be truncated to be shorter, in order to avoid
Fig. 8. Measured max frequency and power consumption across 0.42-0.9V.
timing errors. This is based on the observation in NN engine
that timing errors affect detection precision a lot, so that by
TABLE I
truncating long paths to be short to avoid timing errors while C OMPARISONS W ITH S TATE - OF -A RT W ORK
enduring approximate computing, it is beneficial for the NN
system. These truncation points are configured according to
different accuracy requirements.
The TD-MACs after ED2-ED6 will be truncated to shorten
the critical path when timing violation rate reaches a certain
value. For example, as shown in Fig. 5(c), TD-MAC chain is
truncated at ED5, resulting in only 48 processing elements in
the whole chain. This approximate computing ensures correct
timing again and provides extra margins for aggressive voltage
scaling, which provides good tradeoff between accuracy and
energy efficiency.

IV. M EASUREMENT R ESULTS

The proposed time-domain binary neural network engine
named TD-BNN is implemented in a 28nm CMOS technol- be seen that the “NN finish” signal periodically generates high-
ogy. The chip microphotograph is shown in Fig. 6, occupying level pulses, indicating that the function of TD-BNN engine
an area of 1.92mm×1.35mm with the testing pads. The imple- is correct.
mentation details are also shown in Fig. 6. Our whole chip To ensure correct operation across all chips and operating
consists of a 14kB feature SRAM, a 18kB weight SRAM, conditions, all chips are required to operate at a conservative
a PLL and a TD-BNN core for neural network calculation. The frequency for the worst case. Here a conventional worst-case
whole chip works reliably at the frequency of 25-500MHz in variation in the normal voltage. Under these margined con-
a wide voltage range across 0.42-0.9V. The overall test plat- ditions, the baseline frequency at 0.9V is measured to be
form and time-domain waveforms are shown in Fig. 7. It can 425MHz.

As shown in Fig. 10, with TD-MACs truncated to 44 instead

of 64 when encountering timing violations, the delay of TD-
MAC chain dropped by 31.25%, and the power is reduced by
13.2%. These two facts lead to an energy efficiency improve-
ment of 67.6%, with a 90% accuracy on MNIST. So, when
the accuracy of the network is expected to be higher, a longer
TD-MAC chain should be used. On the contrary, the TD-MAC
chain can be truncated aggressively to obtain higher energy
efficiency. Considering the inherent fault tolerance of neural
network, the truncation method provides a good solution for
energy efficient NN hardware engine design.

Fig. 9. Power savings with adaptive voltage scaling [9]. V. C ONCLUSION

In summary, a time-domain binary neural network engine
named TD-BNN in 28nm CMOS is designed in this brief,
which drastically reduces computing and memory resource in
the neural network. A TD-MAC chain comprised of analog
delay lines is proposed to achieve low power. Through archi-
tectural parallelism and data reuse, 100% TD-MAC utilization
is realized for BNN calculation. Considering PVT variations in
mass production, an error-resilient AVS system with TD-MAC
chain truncation is introduced into TD-BNN engine, which
further reduces power consumption according to actual PVT
conditions. With AVS enabled, the whole chip works from
near-threshold to normal voltage, resulting in 0.28/48.6mW
and 51.5/6.17 TOPS/W at 0.42/0.9V respectively. With TD-
MAC chain truncated when encountering timing errors, our
Fig. 10. Neural network accuracy changes with voltage, under different
TD-MAC chain lengths.
TD-BNN chip further reduce power by 13.2%. In summary,
the techniques proposed in this brief effectively improve the
energy efficiency of BNN engine.
Fig. 8 shows the measurement results without activating
the error-detection based resilience for the fabricated BNN
processor. The engine can operate at 0.42-to-0.9V supply volt- R EFERENCES
age with a maximum 425MHz clock frequency at 0.9V and
[1] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy effi-
25MHz at 0.42V. The power consumption at 0.9V and 0.42V is cient reconfigurable accelerator for deep convolutional neural networks,”
48.6mW and 0.28mW, respectively. When the voltage is scaled IEEE J. Solid State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017
down to 0.42V, the energy efficiency reaches 51.5TOPS/W. For [2] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” in Proc. NIPS,
these operations, the detection accuracy on MNIST dataset is 2016, p. 5.
[3] K. Ando et al., “BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse
up to 99.6%. Table I shows the comparisons with state-of-art binary/ternary reconfigurable in-memory deep neural network acceler-
work. Our TD-BNN engine achieves ∼4.3× energy efficiency ator in 65 nm CMOS,” in Proc. IEEE Symp. VLSI Circuits Dig. Tech.
as compared to another time-domain accelerator in [4]. Paper, Jun. 2017, pp. C24–C25.
Fig. 9 summarizes the power savings offered by adaptive [4] A. Sayal, S. Fathima, S. S. T. Nibhanupudi, and J. P. Kulkarni, “All-
digital time-domain CNN engine using bidirectional memory delay lines
voltage frequency scaling technique in a wide voltage range for energy-efficient edge computing,” in Proc. IEEE Int. Solid-State
across 0.42V∼0.9V. Three VDD operating points are given, Circuits Conf. Dig. Tech. Papers, Feb. 2019, pp. 228–230.
starting with the baseline point, then showing voltage scaling [5] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
to PoFF and the final supply voltage. With error detection “Binarized neural networks,” in Proc. Adv. Neural Inf. Process. Syst.,
vol. 29, 2016, pp. 4107–4115.
enabled voltage tuning, the design provides 15.8%, 22.1% [6] W. Shan et al., “A 510nW wake-up keyword-spotting chip using serial-
and 41.5% power savings at 0.9V/425MHz, 0.7V/150MHz, FFT-based MFCC and binarized depthwise separable CNN in 28nm
0.56V/25MHz when encountering the first timing violation, CMOS,” IEEE J. Solid-State Circuits, vol. 56, no. 1, pp. 151–164,
respectively. Power consumption is further reduced due to Jan. 2021.
[7] P. N. Whatmough, S. K. Lee, D. Brooks, and G.-Y. Wei, “DNN Engine:
error tolerance. For example, at the baseline of 0.56V and A 28-nm timing-error tolerant sparse deep neural network processor
25MHz, AVS begins to eliminate timing margins gradually by for IoT applications,” IEEE J. Solid-State Circuits, vol. 53, no. 9,
lowering power to 0.42V with 99.86% classification accuracy, pp. 2722–2731, Sep. 2018.
which provides TD-BNN with 46.2% power saving. 20mV [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
tion with deep convolutional neural networks,” in Proc. NIPS, 2012,
voltage reduction is obtained as compared to PoFF (0.44V) pp. 1097–1105.
with 4.7% more power improvement. [9] Y. Du, X. Shang, and W. Shan, “An energy-efficient time-domain binary
Furthermore, both the truncation of TD-MAC chains and neural network accelerator with error-detection in 28nm CMOS,” in
Proc. IEEE Asia Pac. Conf. Circuits Syst., 2020, pp. 70–73.
encountering timing violation cause a decrease of BNN [10] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
accuracy. Fig. 10 shows the accuracy of the neural network network training by reducing internal covariate shift,” in Proc. Int. Conf.
changes with voltage, under different TD-MAC chain lengths. Mach. Learn., 2015, pp. 448–456.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.

Thesis Proposal Conceptual Framework
100% (2)
Thesis Proposal Conceptual Framework
8 pages
Bead Dealers in Chawri Bazar, Delhi, India - Justdial
No ratings yet
Bead Dealers in Chawri Bazar, Delhi, India - Justdial
6 pages
A Deep Learning Accelerator Based On A Streaming Architecture For Binary Neural Networks
No ratings yet
A Deep Learning Accelerator Based On A Streaming Architecture For Binary Neural Networks
19 pages
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
No ratings yet
A 64 KB Reconfigurable Full-Precision Digital ReRAM-Based Compute-In-Memory For Artificial Intelligence Applications
13 pages
Oracle Inventory Setups
No ratings yet
Oracle Inventory Setups
3 pages
UNPU An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
No ratings yet
UNPU An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision
13 pages
Bridgelink User Guide
No ratings yet
Bridgelink User Guide
93 pages
Scaler Masterclass - Notification Systems - HLD - Dec 10 2024
No ratings yet
Scaler Masterclass - Notification Systems - HLD - Dec 10 2024
10 pages
US Address Generator - Fake Address, Random Address Generator 2 PDF
No ratings yet
US Address Generator - Fake Address, Random Address Generator 2 PDF
1 page
PL-NPU An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing
No ratings yet
PL-NPU An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing
14 pages
An In-Memory VLSI Architecture For Convolutional Neural Networks
No ratings yet
An In-Memory VLSI Architecture For Convolutional Neural Networks
12 pages
Interfacing of Flame Sensor With Arduino
No ratings yet
Interfacing of Flame Sensor With Arduino
14 pages
Robo Chemist
100% (1)
Robo Chemist
3 pages
A Convolutional Neural Network Accelerator Architecture
No ratings yet
A Convolutional Neural Network Accelerator Architecture
5 pages
Control-M Installation Guide 6.1.03 PDF
No ratings yet
Control-M Installation Guide 6.1.03 PDF
418 pages
Architecture Design For Highly Flexible and Energy-Efficient Deep Neural Network Accelerators
No ratings yet
Architecture Design For Highly Flexible and Energy-Efficient Deep Neural Network Accelerators
147 pages
The Field Guide To Human Error Investigations by Sidney Dekker
0% (1)
The Field Guide To Human Error Investigations by Sidney Dekker
3 pages
Speidel Braumeister Brochure
No ratings yet
Speidel Braumeister Brochure
56 pages
Analog Architectures For Neural Network Acceleration Based On Non-Volatile Memory
No ratings yet
Analog Architectures For Neural Network Acceleration Based On Non-Volatile Memory
35 pages
An Efficient Hardware Architecture For Exploiting Sparsity in Neural Networks Master Thesis
No ratings yet
An Efficient Hardware Architecture For Exploiting Sparsity in Neural Networks Master Thesis
63 pages
Hardware Approximate Techniques For Deep Neural Network Accelerators: A Survey
No ratings yet
Hardware Approximate Techniques For Deep Neural Network Accelerators: A Survey
36 pages
SQL For Beginners
No ratings yet
SQL For Beginners
79 pages
Cse291d 2 PDF
No ratings yet
Cse291d 2 PDF
54 pages
08) COMPAC Compressed Time-Domain Pooling-Aware Convolution CNN Engine With Reduced Data Movement For Energy-Efficient AI Computing
No ratings yet
08) COMPAC Compressed Time-Domain Pooling-Aware Convolution CNN Engine With Reduced Data Movement For Energy-Efficient AI Computing
16 pages
Figure PPT ch003
No ratings yet
Figure PPT ch003
50 pages
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
No ratings yet
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
32 pages
A 12.08-TOPS/W All-Digital Time-Domain CNN Engine Using Bi-Directional Memory Delay Lines For Energy Efficient Edge Computing
No ratings yet
A 12.08-TOPS/W All-Digital Time-Domain CNN Engine Using Bi-Directional Memory Delay Lines For Energy Efficient Edge Computing
16 pages
Dkvm-8E: 8-Port Keyboard, Video, and Mouse Switch
No ratings yet
Dkvm-8E: 8-Port Keyboard, Video, and Mouse Switch
30 pages
ADC0831/ADC0832/ADC0834 and ADC0838 8-Bit Serial I/O A/D Converters With Multiplexer Options
No ratings yet
ADC0831/ADC0832/ADC0834 and ADC0838 8-Bit Serial I/O A/D Converters With Multiplexer Options
33 pages
Euro Company Profile-Packers and Movers Kolkata
No ratings yet
Euro Company Profile-Packers and Movers Kolkata
17 pages
Adaptive Intelligence For Ba Eryless Sensors Using So Ware-Accelerated Tsetlin Machines
No ratings yet
Adaptive Intelligence For Ba Eryless Sensors Using So Ware-Accelerated Tsetlin Machines
14 pages
Energy Efficient and Low-Latency Spiking Neural Networks On Embedded Microcontrollers Through Spiking Activity Tuning
No ratings yet
Energy Efficient and Low-Latency Spiking Neural Networks On Embedded Microcontrollers Through Spiking Activity Tuning
21 pages
Hardware-Friendly User-Specific Machine Learning For Edge Devices
No ratings yet
Hardware-Friendly User-Specific Machine Learning For Edge Devices
29 pages
THESIS LucasHuijbregts Final
No ratings yet
THESIS LucasHuijbregts Final
86 pages
BR-CIM An Efficient Binary Representation Computation-In-Memory Design
No ratings yet
BR-CIM An Efficient Binary Representation Computation-In-Memory Design
14 pages
Sensors 22 08845
No ratings yet
Sensors 22 08845
16 pages
NeuronLink An Efficient Chip-to-Chip Interconnect For Large-Scale Neural Network Accelerators
No ratings yet
NeuronLink An Efficient Chip-to-Chip Interconnect For Large-Scale Neural Network Accelerators
13 pages
Are SNNs Really More Energy Efficient Than ANNs An in Depth Hardware Aware Study Versionacceptee
No ratings yet
Are SNNs Really More Energy Efficient Than ANNs An in Depth Hardware Aware Study Versionacceptee
12 pages
Enabling BNN by Edge
No ratings yet
Enabling BNN by Edge
19 pages
10.1515 - Nanoph 2020 0297
No ratings yet
10.1515 - Nanoph 2020 0297
12 pages
00) TD-SRAM - Time-Domain-Based - In-Memory - Computing - Macro - For - Binary - Neural - Networks
No ratings yet
00) TD-SRAM - Time-Domain-Based - In-Memory - Computing - Macro - For - Binary - Neural - Networks
11 pages
Stripes Bit-Serial Deep Neural Network Computing
No ratings yet
Stripes Bit-Serial Deep Neural Network Computing
12 pages
Energy-Efficient Convolution Architecture Based On Rescheduled Dataflow
No ratings yet
Energy-Efficient Convolution Architecture Based On Rescheduled Dataflow
12 pages
Marsellus A Heterogeneous RISC-V AI-IoT End-Node SoC With 28 B DNN Acceleration
No ratings yet
Marsellus A Heterogeneous RISC-V AI-IoT End-Node SoC With 28 B DNN Acceleration
15 pages
Guía Sibelius
No ratings yet
Guía Sibelius
12 pages
06A Dual-Split 6T SRAM-Based Computing-in-Memory
No ratings yet
06A Dual-Split 6T SRAM-Based Computing-in-Memory
14 pages
S02 - S03 - 1 - The Dominant Logic Retrospective and Extension - Bettis N CKP 1995
No ratings yet
S02 - S03 - 1 - The Dominant Logic Retrospective and Extension - Bettis N CKP 1995
11 pages
An Overview of Efficient Interconnection Networks For Deep Neural Network Accelerators
No ratings yet
An Overview of Efficient Interconnection Networks For Deep Neural Network Accelerators
15 pages
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
No ratings yet
A Logic-Compatible EDRAM Compute-In-Memory With Embedded ADCs For Processing Neural Networks
13 pages
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
No ratings yet
2022-BitBlade - Energy-Efficient - Variable - Bit-Precision - Hardware - Accelerator - For - Quantized - Neural - Networks
12 pages
Weight-Oriented Approximation For Energy-Efficient Neural Network Inference Accelerators
No ratings yet
Weight-Oriented Approximation For Energy-Efficient Neural Network Inference Accelerators
14 pages
Electronics 11 00663
No ratings yet
Electronics 11 00663
14 pages
Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
No ratings yet
Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications
12 pages
Vesti Energy-Efficient In-Memory Computing Accelerator For Deep Neural Networks
No ratings yet
Vesti Energy-Efficient In-Memory Computing Accelerator For Deep Neural Networks
14 pages
Convolution Optimization For DNN
No ratings yet
Convolution Optimization For DNN
14 pages
1 s2.0 S0893608021003841 Main
No ratings yet
1 s2.0 S0893608021003841 Main
13 pages
Encodingnet: A Novel Encoding-Based Mac Design For Efficient Neural Network Acceleration
No ratings yet
Encodingnet: A Novel Encoding-Based Mac Design For Efficient Neural Network Acceleration
7 pages
Towards Green AI-Native Networks: Evaluation of Neural Circuit Policy For Estimating Energy Consumption of Base Stations
No ratings yet
Towards Green AI-Native Networks: Evaluation of Neural Circuit Policy For Estimating Energy Consumption of Base Stations
15 pages
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
No ratings yet
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
12 pages
Accelerating Binarized Convolutional 2017
No ratings yet
Accelerating Binarized Convolutional 2017
10 pages
Power Efficientreconfigurable Accelerator For Deep Convolutional Neural Networks
No ratings yet
Power Efficientreconfigurable Accelerator For Deep Convolutional Neural Networks
6 pages
Connection Pruning For Deep Spiking Neural Networks With On-Chip Learning
No ratings yet
Connection Pruning For Deep Spiking Neural Networks With On-Chip Learning
8 pages
Tender Documents For Smart LEDs
No ratings yet
Tender Documents For Smart LEDs
9 pages
A 0.61-J Frame Pipelined Wired-Logic DNN Processor in 16-nm FPGA Using Convolutional Non-Linear Neural Network
No ratings yet
A 0.61-J Frame Pipelined Wired-Logic DNN Processor in 16-nm FPGA Using Convolutional Non-Linear Neural Network
11 pages
Collaborative Optimization of Dynamic Pricing and Seat Allocation For High-Speed Railways An Empirical Study From China
No ratings yet
Collaborative Optimization of Dynamic Pricing and Seat Allocation For High-Speed Railways An Empirical Study From China
11 pages
Cap Ram
No ratings yet
Cap Ram
12 pages
Seagate 1.5tb USB2.0 S$168 GSS: Asia Pte LTD Internet TV USB $39.90
No ratings yet
Seagate 1.5tb USB2.0 S$168 GSS: Asia Pte LTD Internet TV USB $39.90
4 pages
DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator
No ratings yet
DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator
7 pages
Wang 2017
No ratings yet
Wang 2017
7 pages
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
No ratings yet
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
8 pages
Fully On-Chip MAC at 14 NM Enabled by Accurate Row-Wise Programming of PCM-Based Weights and Parallel Vector-Transport in Duration-Format
No ratings yet
Fully On-Chip MAC at 14 NM Enabled by Accurate Row-Wise Programming of PCM-Based Weights and Parallel Vector-Transport in Duration-Format
8 pages
10 1109@tcsii 2020 3013336
No ratings yet
10 1109@tcsii 2020 3013336
5 pages
An Energy-Efficient Precision-Scalable ConvNet Processor in 40-Nm CMOS-1
No ratings yet
An Energy-Efficient Precision-Scalable ConvNet Processor in 40-Nm CMOS-1
12 pages
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
No ratings yet
Area Efficient Compression For Floating-Point Feature Maps in Convolutional Neural Network Accelerators
5 pages
Low-Voltage Energy Efficient Neural Inference by
No ratings yet
Low-Voltage Energy Efficient Neural Inference by
5 pages
SC17 DNN Resilience
No ratings yet
SC17 DNN Resilience
12 pages
SNL 511-609 + 1309 - 20210127
No ratings yet
SNL 511-609 + 1309 - 20210127
5 pages
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
No ratings yet
2020 A Reconfigurable Approximate Multiplier For Quantized CNN Applications
6 pages
Oracle SPARC Servers Solution Engineer Assessment Examen
No ratings yet
Oracle SPARC Servers Solution Engineer Assessment Examen
7 pages
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
No ratings yet
Energy-Efficient FPGA Implementation of Power-Of-2 Weights-Based Convolutional Neural Networks With Low Bit-Precision Input Images
5 pages
Adv - Micro. Programming ATmega8 Using Arduino IDE
No ratings yet
Adv - Micro. Programming ATmega8 Using Arduino IDE
8 pages
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
No ratings yet
02Computing-in-Memory With SRAM and RRAM For Binary Neural Networks
4 pages
Run The System File Checker Tool
No ratings yet
Run The System File Checker Tool
5 pages
05) 14.4 All-Digital Time-Domain CNN Engine Using Bidirectional Memory Delay Lines For Energy-Efficient Edge Computing
No ratings yet
05) 14.4 All-Digital Time-Domain CNN Engine Using Bidirectional Memory Delay Lines For Energy-Efficient Edge Computing
3 pages
Use of Social Media by College Students: Relationship To Communication and Self-Concept
No ratings yet
Use of Social Media by College Students: Relationship To Communication and Self-Concept
4 pages
LJF
No ratings yet
LJF
3 pages
Agile Wireless Fire Detection DS en June20
No ratings yet
Agile Wireless Fire Detection DS en June20
3 pages
Segregation of Functions
No ratings yet
Segregation of Functions
2 pages

07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS

Uploaded by

07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS

Uploaded by

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO.

9, SEPTEMBER 2021 3177

A Time-Domain Binary CNN Engine With

Fig. 1. Software level algorithm and hardware level implementation.

II. T IME -D OMAIN B INARY N EURAL

as the value of ((N + 1)/2+bias+BN offset) when consider-

Fig. 7. Overall test platform and time-domain waveforms.

Fig. 6. Die microphotograph and chip specifications.

length of TD-MAC chain changes with different configura-

IV. M EASUREMENT R ESULTS

As shown in Fig. 10, with TD-MACs truncated to 44 instead

Fig. 9. Power savings with adaptive voltage scaling [9]. V. C ONCLUSION

You might also like