0% found this document useful (0 votes)
67 views5 pages

07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS

This document summarizes a research paper that proposes a time-domain binary convolutional neural network (TD-BNN) engine for energy efficient image recognition. The key contributions are: 1) A time-domain mixed-signal processing approach using analog delay cells chained together to address the challenge of wide vector summation in binary neural networks (BNNs). 2) A framework that enables aggressive voltage scaling of deep neural network engines through timing error detection and approximate computing, improving energy efficiency while maintaining acceptable classification accuracy. 3) A 28nm CMOS implementation of the proposed TD-BNN engine that achieves 51.5 TOPS/W energy efficiency at 0.42V and 99.6% accuracy on

Uploaded by

AMANDEEP SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views5 pages

07) A Time-Domain Binary CNN Engine With Error-Detection-Based Resilience in 28nm CMOS

This document summarizes a research paper that proposes a time-domain binary convolutional neural network (TD-BNN) engine for energy efficient image recognition. The key contributions are: 1) A time-domain mixed-signal processing approach using analog delay cells chained together to address the challenge of wide vector summation in binary neural networks (BNNs). 2) A framework that enables aggressive voltage scaling of deep neural network engines through timing error detection and approximate computing, improving energy efficiency while maintaining acceptable classification accuracy. 3) A 28nm CMOS implementation of the proposed TD-BNN engine that achieves 51.5 TOPS/W energy efficiency at 0.42V and 99.6% accuracy on

Uploaded by

AMANDEEP SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO.

9, SEPTEMBER 2021 3177

A Time-Domain Binary CNN Engine With


Error-Detection-Based Resilience
in 28nm CMOS
Zhikuang Cai , Boyang Cheng , Yuxuan Du, Xinchao Shang , and Weiwei Shan , Member, IEEE

Abstract—Due to the increasing demand of high committed to reducing the bit widths of features and
energy-efficient processor for deep neural networks, tradi- weights [2] to reduce memory access and power consumption
tional neural network engines with high-precision weights and while ensuring accuracy as much as possible. Recently
activations that usually occupies huge on/off-chip resources with
proposed binary neural networks (BNN) [3] quantized both
large power consumption are no longer suitable for Internet-
of-Things applications. Binary neural networks (BNNs) reduce weights and activations to +1 and −1, which dramatically
memory size and computation complexity, achieving drastically reduced their memory/computation. Thus, it is preferable to
increased energy efficiency. In this brief, an energy-efficient realize energy-efficient hardware with BNN engine in edge
time-domain binary neural network engine is optimized for computing applications [4].
image recognition, with time-domain accumulation (TD-MAC), The key module of the deep neural network is the MAC
timing error detection based adaptive voltage scaling design unit. While in the binarized CNN, the convolution operation of
and the related approximate computing. The proposed key
multiply-accumulate (MAC) can be realized by XNOR and bit-
features are: 1) an error-tolerant adaptive voltage scaling
system with TD-MAC chain truncation for aggressive power counting operations instead of multiply and accumulation [5].
reduction, working from near-threshold to normal voltage; A special delay cell [3] is designed to complete the multiply
2) architectural parallelism and data reuse with 100% TD-MAC and accumulate operation by calculating the delay. By this, it
utilization; 3) low power TD-MAC based on analog delay drastically reduces the power and memory footprint that allows
lines. Fabricated in a 28nm CMOS process, the whole system near-threshold voltage (NTV) design to further improve energy
achieves a maximum 51.5TOPS/W energy efficiency at 0.42V efficiency [6].
and 25MHz, with 99.6% accuracy on MNIST dataset. When
the length of the TD-MAC chain is truncated by configuration,
On the other hand, PVT variations in chip mass produc-
with a 90% accuracy on MNIST and a 150MHz, the proposed tion are severe that sufficient margin should be reserved
BNN achieves a power-saving of 13.2% and a further energy in design time, especially for NTV designs, while not
efficiency increasing of 67.6%. much work considered this effect. Since NN algorithms
Index Terms—Adaptive voltage scaling, analog delay line, are inherently noise-tolerant that a certain amount of errors
binary neural network, error resilience, time domain. would not cause much influence to its results, some
work began to use voltage scaling based on timing error
detection to improve energy-efficiency, such as [7]. These
timing error detections based resilient methods safely mini-
I. I NTRODUCTION
mize the worst-case voltage guard band down to the point
NERGY-EFFICIENT convolutional neural
E network (CNN) engines are essential for IoT and
mobile applications [1]. Traditional CNN engines use
of first failure (PoFF) without a costly error correction
mechanism.
In this brief, an energy-efficient time-domain binary neu-
8/16/32-bit fixed-point or even floating-point calculation, ral network (TD-BNN) is proposed for energy-efficiency and
which are not energy-efficient. Therefore, researchers are error resilience [9]. Our main contributions include: (1) Time-
domain mixed-signal processing that uses analog delay cells
Manuscript received May 13, 2021; accepted June 9, 2021. Date of pub- chain to address the challenge of wide vector summation in
lication June 14, 2021; date of current version August 30, 2021. This work
was supported in part by the Natural Science Foundation of Jiangsu Province BNN. (2) A new framework that enables aggressive voltage
under Grant BK20200002, and in part by the National Natural Science scaling of high-performance DNN engines with acceptable
Foundation of China under Grant 62074035 and Grant 61774038. This brief classification accuracy even in the presence of timing error
was recommended by Associate Editor F. Qiao. (Corresponding author:
Weiwei Shan.)
rates by approximate computing.
Zhikuang Cai is with the National and Local Joint Engineering Laboratory, Fabricated in 28nm CMOS, our proposed TD-BNN
RF Integration and Micro-Assembly Technology, College of Electronic and engine achieves 0.28/48.6mW power consumption and
Optical Engineering & College of Microelectronics, Nanjing University of 51.5/6.17 TOPS/W energy efficiency at 0.42/0.9V with no
Posts and Telecommunications, Nanjing 210023, China.
Boyang Cheng, Yuxuan Du, Xinchao Shang, and Weiwei Shan are with timing violations of 99.86% accuracy on MNIST. When the
the National ASIC System Engineering Research Center, School of Electronic length of the TD-MAC chain is truncated by configuration,
Science & Engineering, Southeast University, Nanjing 210096, China (e-mail: the proposed BNN achieves a power-saving of 13.2% and less
[email protected]).
Color versions of one or more figures in this article are available at
computing time at 150MHz frequency with a 90% accuracy
https://doi.org/10.1109/TCSII.2021.3088857. on MNIST, resulting in a further 67.6% improving energy
Digital Object Identifier 10.1109/TCSII.2021.3088857 efficiency.
1549-7747 
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
3178 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 9, SEPTEMBER 2021

Fig. 1. Software level algorithm and hardware level implementation.

II. T IME -D OMAIN B INARY N EURAL


N ETWORK E NGINE D ESIGN Fig. 2. Analog delay cell circuit design. (a) Definition of its positive and
negative output results. (b) Transistor-level structure and its function table.
A. Binary Neural Network
In the Binary Neural Network (BNN), the activation value
and weight are binarized to +1 and −1 so that the MAC
to know whether the intermediate y2 obtained by batch nor-
is simplified. In its hardware implementation, we use 0 to
malization is greater than or less than the median of the input
represent −1 and use 1 to represent +1, so that using 1-bit to
data range. This work takes advantage of this feature and uses
store two status is achieved, saving hardware overhead. And
delay information to replace the accumulation, biasing and
the multiplication of activation and weight is transformed to
batch normalization operations. As shown in Fig. 2(a), in the
XNOR operation.
same condition, two inverter chains are designed with the same
At the software level, the operation of the complete binary
output load. The relative delay of the rising edges of the out-
convolutional layer is shown in Fig. 1 left. The activation and
puts Y1 and Y2 of the two inverter chains is used to define
weight vectors of A = {a(i)} and W = {w(i)} (i = 1, 2, . . . , n)
the sign of result.
performs convolution and then add the bias b to obtain the
The Y2 signal is used as the reference signal. X1 and X2 are
intermediate result y1. y1 is used as the input of the batch
the same rising signal at the same time. When the rising edge
normalization operation. Then the binary activation judges the
of output signal Y1 arrives earlier than the reference standard
output y2 with sign function to obtain the final result Y.
signal Y2, the defined output value is 1. On the other hand,
For BNN hardware, the batch normalization can be com-
when the rising edge of output signal Y1 arrives later than the
bined with the biasing operation [6], so the entire convolution
signal Y2, the defined output value is 0.
operation is simplified into a single module of convolution-
The inverter chain consists of special inverter [9]. Its struc-
bias-normalization, shown by Fig. 1 right. This module con-
ture is shown in Fig. 2(b). Two signals a and w control the
sists of XOR operation, accumulation, biasing and batch
conduction of 3 parallel PMOS and 3 parallel NMOS and
normalization. The activation function at the hardware level
thus affect the charging and discharging ability of the struc-
is different from the activation function at the software level.
    ture, which makes the delay from X to Y is controlled by the
1, i wi  ai +bBN = m, (i = 1, 2, . . . , n) input signals a and w.
Y= (1)
0, i wi ai + bBN < m, (i = 1, 2, . . . , n)
At the software level, it judges the positive or negative of the C. TD-MAC Chain
input to generate the output, while at the hardware level, the
Here we describe how the computations for BNN
input and an intermediate parameter m need to be compared
are efficiently performed by the time-domain technique.
to determine the output, as shown in (1). Here m is the median
Fig. 3(a) shows the details of TD-MAC array. The fast and
of the input data range, for example, when n = 9, the input
slow speed to generate Y represent that the results of XNOR
data range is 1-9 and its median m is (n + 1)/2 = 5. The
operation are 0 or 1, respectively. Thus, the single MAC chain
activation function module consists of a comparator.
is configured as shown in Fig. 3(b). After the positive edge of
The max pooling [8] is adopted because it can be realized
the CK signal passes through the two chains, outputs Y1 and
with only OR gate, so it is more hardware-friendly than the
Y2 with different delays are obtained. Y1 and Y2 are con-
average pooling scheme.
nected to the data and clock input of the DFF, respectively.
The output ‘sign’ can be understood as the sampling result of
B. Analog Delay Cell the DFF when capturing Y1 at clock Y2.
In BNN, the activation function only needs to determine Considering the actual MAC computing in BNN, the
the positive or negative of an intermediate parameter and does proposed TD-MAC structure shows a time-domain accumu-
not care about the previous calculation process. So, in the lator for a 3×3 convolutional operation or a 9-bit summation
hardware implementation, the activation function only needs in FCLs, as shown in Fig. 3(b). The base chain is configured

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
CAI et al.: TIME-DOMAIN BINARY CNN ENGINE WITH ERROR-DETECTION-BASED RESILIENCE IN 28nm CMOS 3179

Fig. 3. Proposed TD-MAC operations with various mode configurations for CLs and FCLs. (a) Overall structure of configurable PE array. (b) Structure and
timing waveform of 3×3 TD-MAC. (c) TD-MAC chain mode with different configurations.

as the value of ((N + 1)/2+bias+BN offset) when consider-


ing the batch normalization (BN) in BNN, where N equals
to the length of XNOR-Delay line. Different from the tradi-
tional all-digital method, our proposed TD-MAC only needs to
obtain the delay difference between two chains in our structure
rather than to get a precise accumulation result. If activation
and weight are the same, the phase difference of Y and CK
is large. Otherwise, the phase difference is small. At last, the
phase difference between the base (Y1) and the MAC (Y2)
chains is reflected on 1-bit ‘sign’. Here are two situations:
1) If Y2 > Y1, sign = 1; 2) If Y2 < Y1, sign = 0. The timing Fig. 4. Overall architecture of proposed time-domain binary CNN engine [9].
diagram of a 3 × 3 TD-MAC is shown in the right part of
Fig. 3(b).
Applied in BNN, the whole TD-MAC chain consists of error detection and correction (EDAC) design is integrated into
64 TD-MACs in total. To achieve different MAC operations the engine to improve energy efficiency further through the
in CLs and FCLs, 128 2-to-1 MUXs is placed between every method of approximate computing.
two TD-MACs, deciding where to cut the MAC chain and
export an output value. For different layers in TD-BNN engine, III. TD-BNN W ITH A DAPTIVE VOLTAGE S CALING AND
4 chain modes can be configured with various TD-MAC chain C ONFIGURABLE T RUNCATION
length and MUX configurations, as shown in the Fig. 3(c). For
Razor systems [10] can detect timing information in the
example, when TD-BNN engine performs the convolutional
critical paths, which allows excessive worst-case guardbands
operation of layer CL2, the select signals of S0, S16, S32,
to be safely minimized down to the lowest supply voltage,
S48 are configured to 0 while the select signals of other MUXs
where timing violations start to occur. In conventional digital
are configured to 1. The 64 TD-MACs are divided into 4 inde-
chips, a mechanism for error correction should be applied to
pendent TD-MAC chains, each composed of 16 TD-MACs.
ensure functionality. In fact, in deep learning applications, it
Since each TD-MAC can complete a 3×3 convolutional oper-
is not always necessary to correct errors. Neural network is
ation, each segmented chain can simultaneously complete the
inherently noise-tolerant in algorithm and is a natural fit for
3×3×16 convolutional operation required by the CL2 layers.
approximate computing and error-tolerant design.
Thus, all operations in BNN are achieved by the proposed
Taking advantage of this error tolerance mechanism, a novel
TD-MAC chain with high energy efficiency.
method of TD-MAC chain truncation in the neural network
is proposed, which provides aggressive voltage scaling to
D. Overall Architecture decrease power when timing error occurs.
The overall architecture of our proposed TD-BNN is To realize error detection with quick response and low over-
described in Fig. 4. It mainly consists of controllers, 32kB head in TD-BNN, error detectors (ED) with 12 transistors are
weight/feature SRAMs, local data buffers and an 8×8 TD- inserted at the selected position. It is based on current sens-
MAC array. On-chip weight/feature SRAMs are connected ing, which flags a late-arriving transition as an error during
to weight/feature buffers respectively to provide data for the the positive clock phase. Both the schematic and the timing
TD-MAC array through dispatchers. The TD-MAC array is waveform of ED are depicted in Fig. 5(a).
composed of 64 MAC processing units, which can be con- A total of 6 EDs are inserted in the TD-MAC array for error
figured as various computing modes by kernel parallelism and detection, as shown in Fig. 5(b), where a series of TD-MACs
reuse in different convolutional layers and fully connected lay- form critical paths according to the configurations. They are
ers with 100% TD-MAC utilization. Furthermore, AVS with used to monitor whether there is timing violations when the

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
3180 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 9, SEPTEMBER 2021

Fig. 5. Error detection and TD-MAC chain truncation in TD-BNN. (a) Transistor-level structure of error detector and its waveform. (b) Configured TD-MAC
Array (6 EDs in TD-MAC Array). (c) TD-MAC chain with truncation (truncated at ED5).

Fig. 7. Overall test platform and time-domain waveforms.

Fig. 6. Die microphotograph and chip specifications.

length of TD-MAC chain changes with different configura-


tions under different supply voltages and possible process and
temperature variations. And all the timing error signals of EDs
are gathered by Dynamic-OR gates, which then propagate an
error signal to the adaptive voltage scaling module.
When scaling down the voltage for higher energy efficiency,
ED1 is firstly enabled to monitor timing violations, which
accounts for the longest critical path. Another five points
(ED2-ED6) are truncation points, which are in charge for the
critical path to be truncated to be shorter, in order to avoid
Fig. 8. Measured max frequency and power consumption across 0.42-0.9V.
timing errors. This is based on the observation in NN engine
that timing errors affect detection precision a lot, so that by
TABLE I
truncating long paths to be short to avoid timing errors while C OMPARISONS W ITH S TATE - OF -A RT W ORK
enduring approximate computing, it is beneficial for the NN
system. These truncation points are configured according to
different accuracy requirements.
The TD-MACs after ED2-ED6 will be truncated to shorten
the critical path when timing violation rate reaches a certain
value. For example, as shown in Fig. 5(c), TD-MAC chain is
truncated at ED5, resulting in only 48 processing elements in
the whole chain. This approximate computing ensures correct
timing again and provides extra margins for aggressive voltage
scaling, which provides good tradeoff between accuracy and
energy efficiency.

IV. M EASUREMENT R ESULTS


The proposed time-domain binary neural network engine
named TD-BNN is implemented in a 28nm CMOS technol- be seen that the “NN finish” signal periodically generates high-
ogy. The chip microphotograph is shown in Fig. 6, occupying level pulses, indicating that the function of TD-BNN engine
an area of 1.92mm×1.35mm with the testing pads. The imple- is correct.
mentation details are also shown in Fig. 6. Our whole chip To ensure correct operation across all chips and operating
consists of a 14kB feature SRAM, a 18kB weight SRAM, conditions, all chips are required to operate at a conservative
a PLL and a TD-BNN core for neural network calculation. The frequency for the worst case. Here a conventional worst-case
whole chip works reliably at the frequency of 25-500MHz in variation in the normal voltage. Under these margined con-
a wide voltage range across 0.42-0.9V. The overall test plat- ditions, the baseline frequency at 0.9V is measured to be
form and time-domain waveforms are shown in Fig. 7. It can 425MHz.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.
CAI et al.: TIME-DOMAIN BINARY CNN ENGINE WITH ERROR-DETECTION-BASED RESILIENCE IN 28nm CMOS 3181

As shown in Fig. 10, with TD-MACs truncated to 44 instead


of 64 when encountering timing violations, the delay of TD-
MAC chain dropped by 31.25%, and the power is reduced by
13.2%. These two facts lead to an energy efficiency improve-
ment of 67.6%, with a 90% accuracy on MNIST. So, when
the accuracy of the network is expected to be higher, a longer
TD-MAC chain should be used. On the contrary, the TD-MAC
chain can be truncated aggressively to obtain higher energy
efficiency. Considering the inherent fault tolerance of neural
network, the truncation method provides a good solution for
energy efficient NN hardware engine design.

Fig. 9. Power savings with adaptive voltage scaling [9]. V. C ONCLUSION


In summary, a time-domain binary neural network engine
named TD-BNN in 28nm CMOS is designed in this brief,
which drastically reduces computing and memory resource in
the neural network. A TD-MAC chain comprised of analog
delay lines is proposed to achieve low power. Through archi-
tectural parallelism and data reuse, 100% TD-MAC utilization
is realized for BNN calculation. Considering PVT variations in
mass production, an error-resilient AVS system with TD-MAC
chain truncation is introduced into TD-BNN engine, which
further reduces power consumption according to actual PVT
conditions. With AVS enabled, the whole chip works from
near-threshold to normal voltage, resulting in 0.28/48.6mW
and 51.5/6.17 TOPS/W at 0.42/0.9V respectively. With TD-
MAC chain truncated when encountering timing errors, our
Fig. 10. Neural network accuracy changes with voltage, under different
TD-MAC chain lengths.
TD-BNN chip further reduce power by 13.2%. In summary,
the techniques proposed in this brief effectively improve the
energy efficiency of BNN engine.
Fig. 8 shows the measurement results without activating
the error-detection based resilience for the fabricated BNN
processor. The engine can operate at 0.42-to-0.9V supply volt- R EFERENCES
age with a maximum 425MHz clock frequency at 0.9V and
[1] Y.-H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy effi-
25MHz at 0.42V. The power consumption at 0.9V and 0.42V is cient reconfigurable accelerator for deep convolutional neural networks,”
48.6mW and 0.28mW, respectively. When the voltage is scaled IEEE J. Solid State Circuits, vol. 52, no. 1, pp. 127–138, Jan. 2017
down to 0.42V, the energy efficiency reaches 51.5TOPS/W. For [2] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” in Proc. NIPS,
these operations, the detection accuracy on MNIST dataset is 2016, p. 5.
[3] K. Ando et al., “BRein memory: A 13-layer 4.2 K neuron/0.8 M synapse
up to 99.6%. Table I shows the comparisons with state-of-art binary/ternary reconfigurable in-memory deep neural network acceler-
work. Our TD-BNN engine achieves ∼4.3× energy efficiency ator in 65 nm CMOS,” in Proc. IEEE Symp. VLSI Circuits Dig. Tech.
as compared to another time-domain accelerator in [4]. Paper, Jun. 2017, pp. C24–C25.
Fig. 9 summarizes the power savings offered by adaptive [4] A. Sayal, S. Fathima, S. S. T. Nibhanupudi, and J. P. Kulkarni, “All-
digital time-domain CNN engine using bidirectional memory delay lines
voltage frequency scaling technique in a wide voltage range for energy-efficient edge computing,” in Proc. IEEE Int. Solid-State
across 0.42V∼0.9V. Three VDD operating points are given, Circuits Conf. Dig. Tech. Papers, Feb. 2019, pp. 228–230.
starting with the baseline point, then showing voltage scaling [5] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
to PoFF and the final supply voltage. With error detection “Binarized neural networks,” in Proc. Adv. Neural Inf. Process. Syst.,
vol. 29, 2016, pp. 4107–4115.
enabled voltage tuning, the design provides 15.8%, 22.1% [6] W. Shan et al., “A 510nW wake-up keyword-spotting chip using serial-
and 41.5% power savings at 0.9V/425MHz, 0.7V/150MHz, FFT-based MFCC and binarized depthwise separable CNN in 28nm
0.56V/25MHz when encountering the first timing violation, CMOS,” IEEE J. Solid-State Circuits, vol. 56, no. 1, pp. 151–164,
respectively. Power consumption is further reduced due to Jan. 2021.
[7] P. N. Whatmough, S. K. Lee, D. Brooks, and G.-Y. Wei, “DNN Engine:
error tolerance. For example, at the baseline of 0.56V and A 28-nm timing-error tolerant sparse deep neural network processor
25MHz, AVS begins to eliminate timing margins gradually by for IoT applications,” IEEE J. Solid-State Circuits, vol. 53, no. 9,
lowering power to 0.42V with 99.86% classification accuracy, pp. 2722–2731, Sep. 2018.
which provides TD-BNN with 46.2% power saving. 20mV [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
tion with deep convolutional neural networks,” in Proc. NIPS, 2012,
voltage reduction is obtained as compared to PoFF (0.44V) pp. 1097–1105.
with 4.7% more power improvement. [9] Y. Du, X. Shang, and W. Shan, “An energy-efficient time-domain binary
Furthermore, both the truncation of TD-MAC chains and neural network accelerator with error-detection in 28nm CMOS,” in
Proc. IEEE Asia Pac. Conf. Circuits Syst., 2020, pp. 70–73.
encountering timing violation cause a decrease of BNN [10] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
accuracy. Fig. 10 shows the accuracy of the neural network network training by reducing internal covariate shift,” in Proc. Int. Conf.
changes with voltage, under different TD-MAC chain lengths. Mach. Learn., 2015, pp. 448–456.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 10:21:37 UTC from IEEE Xplore. Restrictions apply.

You might also like