0% found this document useful (0 votes)
39 views10 pages

VLSI Implementation of Transcendental Function Hyperbolic Tangent For Deep NN

VLSI implementation of transcendental function hyperbolic tangent for deep NN

Uploaded by

nalevihtkas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views10 pages

VLSI Implementation of Transcendental Function Hyperbolic Tangent For Deep NN

VLSI implementation of transcendental function hyperbolic tangent for deep NN

Uploaded by

nalevihtkas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Microprocessors and Microsystems 84 (2021) 104270

Contents lists available at ScienceDirect

Microprocessors and Microsystems


journal homepage: www.elsevier.com/locate/micpro

VLSI implementation of transcendental function hyperbolic tangent for deep


neural network accelerators
Gunjan Rajput a , Gopal Raut a , Mahesh Chandra b , Santosh Kumar Vishvakarma a ,∗
a
Department of Electrical Engineering, Indian Institute of Technology Indore, India
b
NXP semiconductors, India

ARTICLE INFO ABSTRACT

Keywords: Extensive use of neural network applications prompted researchers to customize a design to speed up their
Activation function computation based on ASIC implementation. The choice of activation function (AF) in a neural network is an
Artificial neural network essential requirement. Accurate design architecture of an AF in a digital network faces various challenges as
Hyperbolic tangent (tanh)
these AF require more hardware resources because of its non-linear nature. This paper proposed an efficient
Digital implementation
approximation scheme for hyperbolic tangent (tanh) function which purely based on combinational design
Combinational logic
architecture. The approximation is based on mathematical analysis by considering maximum allowable error
in a neural network. The results prove that the proposed combinational design of an AF is efficient in terms
of area, power and delay with negligible accuracy loss on MNIST and CIFAR-10 benchmark datasets. Post
synthesis results show that the proposed design area is reduced by 66% and delay is reduced by nearly 16%
compared to state-of-the-art.

1. Introduction on the implementation of the Tanh function. In contrast, the method


can elaborate for all activation functions. The sigmoid output ranges
Artificial Neural Networks (ANN) are pertinent in different appli- from 0 to 1, and the hyperbolic tangent range from −1 to 1, but both
cations such as image processing, speech recognition, and language form s-curve shape as hyperbolic tangent and sigmoid function given
processing [1]. ANN has implemented using the software predomi- in [6]. Tanh is much better for learning than the sigmoid function [7].
nantly. The software method has an advantage, as there is no need
Tanh and sigmoid function include an exponential and division term
for designers to know the inner model of ANN elements, but can
which is very challenging to realize using digital design architecture.
easily take care of the application part. Presently the work is going
An approximation method is generally taken into consideration to
on in ANN application by using CPUs and GPUs only, which are ill-
suited for the applications where low power and optimum latency are eradicate these problems.
obligatory. To accelerate neural network applications and reduce their Various approximation techniques have been used for the imple-
power consumption with less latency is a prior requirement. mentation of the activation function based on a Lookup table (LUTs),
The main building blocks required for designing a neural network function's series expansion, Coordinate Rotation DIgital Computer
are adder, multiplier, and activation function (AF). Implementation of (CORDIC) algorithm, Stochastic computing, Piece-wise linear func-
MAC unit is easy as it requires multiplier and adder tree. Whereas tion (PWL) [8–11]. However, directly storing a functional value of a
implantation of AF complicated due to its non-linear features. More- non-linear AF in LUT’s is costly since it requires more parameters.
over, the implementation of the Tanh function needs both positive Most of the accelerators have not been implemented by Instruction
and negative exponential function [2]. An activation functions are set architecture, but they can create modules separately, preventing
non-linear such as sigmoid, Elliot, Tanh, ReLU, soft-max and many
designers from reducing hardware costs. Thus, always thinking about
more [3,4]. It consists of a division and positive/negative exponential
the multiplication and adder block, special attention should be given
calculations [5].
to other components such as the AF block. There are hidden layers of
The tanh and sigmoid activation functions are more efficient for
better training due to their non-linear behavior compared to earlier AFs neurons in a neural network, and each neuron has its AF. Therefore,
such as step, linear, etc. Moreover, various other AFs are proposed, such highly efficient AF in terms of power, area, and delay with adequate
as ReLU, ELU, SWISH, Parametric ReLU, etc. Here, we have focused accuracy are required.

∗ Corresponding author.
E-mail address: [email protected] (S.K. Vishvakarma).

https://doi.org/10.1016/j.micpro.2021.104270
Received 10 April 2020; Received in revised form 12 February 2021; Accepted 16 April 2021
Available online 11 May 2021
0141-9331/© 2021 Elsevier B.V. All rights reserved.
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Table 1 21]. The error contribution due to activation function with differ-
Computational equations for configurable AF design exploration.
ent precision is expressed in [4]. Moreover, the nonlinear activation
Preliminary Activation function implementation function like sigmoid cannot be approximated efficiently using only
Work Sigmoid Tanh combinational logic. However, using purely combinational logic has the
−𝑥
benefits of providing low latency with small area overhead compared
[15,16] 1∕(1 + 𝑒 ) 1 − 2 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(−2𝑥)
to conventional ROM-based approaches.
[17] 1∕(1 + 𝑒−𝑥 ) (𝑒2𝑥 − 1)∕(𝑒2𝑥 + 1)
This paper has explained the most commonly used activation func-
[16] [1 + 𝑡𝑎𝑛ℎ(𝑥∕2)]∕2 (𝑒𝑥 − 𝑒−𝑥 )∕(𝑒𝑥 + 𝑒−𝑥 )
tion (Tanh) concisely using a LUT-based approximation approach. Pre-
cisely, implementing an activation function is a bottle-neck between
area, power, and accuracy. A minimal approximation in design archi-
1.1. Motivation tecture requires fewer hardware resources and has less latency than
the approximation in design. In Fig. 1 we have plotted the exponential
The non-linear AF such as Sigmoid and Tanh provides a smooth curve (e𝑥 ) using various methods, which are explained in the following
transition between excitation and Inhibition that leads to better neuron subsection.
response [12]. The mathematical equation that is used for implemen-
tation in state-of-the-art is summarized in Table 1. However, they 2.1. Storing activation function value in LUT [20]
employed an approach to computing AF that lead to low throughput. It
requires both positive and negative exponential functions for the final In this method, the function value is divided into several ranges by
desired output for AF hardware implementation. All those methods and approximating that range value and store the functional value directly
design technique is hardware costly for the hardware implementation. in LUT. This method can be convenient with a highly precise approxi-
The non-linear transformation tanh function calculation requires both mate function. But by increasing precision, hardware requirement and
positive and negative exponential function, which is expensive for the complexity in design will also get increases. So, it is a barrier between
LUT-based approach. Moreover, those function requires multiplier for the high precision and area, power, delay, etc. This method is LUT
computing the 𝑒2𝑥 and divider for further extension for the equations based approach in which value is directly stored in the LUT.
that increase the area overhead. Further, those approaches are not fea-
sible for higher precision implementation [13,14]. To overcome these 2.2. Storing parameter value of activation function in LUT [22,23]
limitations, we have designed a tanh function using combinational logic
with the help of OR and AND plane. By using combinational logic, that In this method, parameters of the function are stored in the LUT.
can provide low latency with less area. For example, a and b are the intercept on the respective x and y-axis
in a straight line equation. Since these parameters are constant so by
1.2. Contribution varying coordinates (x and y) in the Eq. (1), it can easily get a line.
𝑥 𝑦
This article explores the design-space trade-offs of neural networks + =1 (1)
𝑎 𝑏
with a digital design of AF implementation. The work has focused on We can save the parameters of the function in LUT. Here, a and
the tanh AF at the 180 nm technology node. The key contributions are
b can be added in LUT to implement an AF in this method. This
method is convenient with the above method. But the drawback of
• We have implemented tanh non-linear transformation functions
with the help of truncation of Taylor’s series and combinational this method is that there is a use of adder and multiplier units to
logic circuits. calculate constant function value. If the function is complex, then more
parameters have to be added and stored in LUT, especially in higher
• Performance and inference accuracy validation are done using
precision representation.
benchmark LeNet deep network with MNIST and CIFAR-10
dataset.
2.3. Series expansion [24]
• We analyze and discuss the circuit’s physical parameters like
area, power, and throughput and evaluate it with the 180 nm
technology node. Further, it compare with the state-of-the-art Series expansion through Taylor’s series, Mclaurin series, Bernstein
designs. polynomial and many more are used to implement non-linear AFs. The
most popular Taylor series expansion representation shown in Eq. (2)
1.3. Organization for f(𝛼) is
𝑓 ′ (0) 𝑓 (0) 2 𝑓 ′ (0) 3 ∑∞
𝑓 (𝑘) (0) 𝑘
The rest of the paper is organized as follows: Background and 𝑓 (0) + 𝛼+ 𝛼 + 𝛼 +⋯= 𝛼 (2)
1! 2! 3! 𝑘=0
𝑘!
Related Work discussed in Section 2. In Section 3 explains the digital
design and mathematical analysis of a proposed method for AF. Sec- The mathematical modeling of this equation requires multipliers
tion 4 presents Results and discussion with experimental analysis of a and adders, whereas multipliers are power-hungry blocks. However, if
proposed function. Section 5 give the experimental validation of the higher precision is not the primary requirement, higher-order values
work design followed by a conclusion in Section 6. can truncate for non-linear transformation function implementation.
This method comes under the category of series expansion method.
2. Background and related work
2.4. Piece-wise linear, non-linear function transformation [25–28]
The main challenge of DNN is resource utilization and power con-
sumption for resource-limited devices. Whereas TOT-Net architecture In the piece-wise linear method, the non-linear input range is di-
has achieved a higher level of accuracy and less computational load vided into regions, and respective values are stored in the LUT. In the
[18]. In this novel, using TOT Net reduced the cost of multiply op- case of sigmoid and Tanh function, there is a linear region also. The
erators. Different types of activation function and their learning per- value of that linear region can be directly stored in LUT. The remaining
formance and optimization approach have investigated in [19]. Var- non-linear part can be approximated, and the value can be stored in
ious techniques have been used for the implementation of an acti- the LUT as shown in . This method is much efficient but has less
vation function. It mainly categorizes into two parts. First, a LUT- precision, and it also requires more area and latency. In this method,
based approach and second piece-wise approximation method [20, the pre-calculated ROM value is stored in LUT.

2
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Fig. 1. Performance and accuracy comparison of exponential function calculation using different logic design approach.

Fig. 3. Tanh and sigmoid curve with their derivatives.

Fig. 2. Hyperbolic tangent transformation curve and different regions of curve based 2.6. Approximating activation function [29,30]
on its slope.
To approximating AF, the mathematical function will be approxi-
mated such as
2.5. Coordinate rotational digital computer [10,13,16]
𝑒𝑥𝑝(𝑥) ≈ 𝐸𝑥 (𝑥) = 21.44𝑥 (4a)
1
𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) ≈ (4b)
A coordinated rotational digital computer (CORDIC) is a simple and 1 + 2−1.5𝑥
highly effective power and resource utilization method. The method Similarly, the tanh function can also be implemented in the same
is based on the elementary operation of trigonometric equations. It manner as sigmoid. This method requires fewer cycles as a comparison
uses shift, addition, subtraction operations for the computation of the to that of the CORDIC, but still, latency is high. This method requires
four cycles to implement this Sigmoid function. There are other meth-
non-linear AF. Although the CORDIC algorithm efficient for the area,
ods, such as the range addressable look-up (RALUT) method, Hybrid
it requires more clock cycles. This algorithm has efficient with high
methods consisting of LUT and series. This paper has implemented an
accuracy but low latency. The general equation used for the realization
AF based on the combinational circuit by truncated series expansion.
of the AF is shown below
2.7. Vanishing gradient problem
𝛼𝑖+1 = 𝛼𝑖 + 𝑚𝜓𝑖 𝛽𝑖 𝜒 𝑠𝑚,𝑖 (3a)
𝛽𝑖+1 = 𝛽𝑖 + 𝜓𝑖 𝛼𝑖 𝜒 𝑠𝑚,𝑖 (3b) As sigmoid and tanh functions are having non-linear behavior, the
𝛾𝑖+1 = 𝛾𝑖 + 𝜓𝑖 𝛿𝑚,𝑖 (3c) digital implementation is complicated at all exploration points. The
problem is a significant impact on weight updates. The weight updates
where 𝜓 shows the direction of a micro rotation, where direction can rule is therefore explained in the below equation.
be clockwise or anticlockwise. m represents the type of a coordinate 𝛿(𝐶)
𝑤𝑒𝑖𝑔ℎ𝑡(𝐿) = 𝑤𝑒𝑖𝑔ℎ𝑡(𝐿) − 𝑙𝑟 × (5)
system if m = 1, then the system is circular; if m = 0, then the system 𝛿𝑤𝑒𝑖𝑔ℎ𝑡(𝐿)
is linear, and for m = −1 system is hyperbolic. 𝑆𝑚,𝑖 is an integer that is Derivative term in the weight updating equation is responsible for the
non-decreasing. 𝛿𝑚,𝑖 is the rotation angle. In this method, while doing vanishing gradient problem. In Eq. (5), lr is the learning rate, C is
operations like addition, shift, etc., the output value is stored in LUT. constant. In vanishing gradient problem derivative term shows that

3
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Fig. 4. Tanh function analysis and truncated error for different order of series expansion.

the number of weights and biased values updates with a very less cosh(𝑥) = (𝑒𝑥 + 𝑒−𝑥 )∕2 (8b)
amount. In back-propagation, at every time, small weight gradients will
𝑓1 (𝑥) = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝑥) = 1∕(1 + 𝑒−𝑥 ) (8c)
be updated. The sigmoid and its derivative is shown below represented
𝑒𝑥 − 𝑒−𝑥
as: 𝑓2 (𝑥) = tanh(𝑥) = 𝑥 = 2𝑓1 (2𝑥) − 1 (8d)
𝑒 + 𝑒−𝑥
1 𝑒(−𝑥)
𝜎(𝑥) = & 𝜎 ′ (𝑥) = (6) We compared sigmoid and tanh functions performance; both the
1 + 𝑒(−𝑥) 1 + 𝑒(−𝑥2 ) functions comparatively show the same characteristics. Both functions
Here, a large x value at the input of the sigmoid function resulting in resemble the same, and both can be implemented y = x, but tanh
0, almost, i.e., when the input value w×a+ b then the output is almost converges faster than sigmoid function having the same quantized
equal to 0 further, there is no updation of the weights. In sigmoid values. The implementation of a sigmoid function in neural networks
max value reaches of derivative reaches to 0.25 as shown in Fig. 3. requires bias value, which can affect the optimization values. Tanh is
Therefore, in every layer, gradients are vanishingly small, and after used when the value of the neuron is restricted between [−1 1], then
this, there is no updation of the weights. Therefore it results in the the AF output is more likely to come between [−1 1] as shown in Fig. 3.
network being very far from the optimal value. Moreover, at another In contrast, it is different in the case of the sigmoid function. As shown
hand, the Tanh function derivative ranges up to 1 as shown in Fig. 3. in the paper by LeCun et al. clearly, a strong gradient is required, and
In tanh, while learning, w and b are larger than that of the sigmoid, one should avoid bias in the gradients [31]. Hence, tanh has benefits
where w is the weight and b is the bias values. The mathematical proof over sigmoid function.
is shown below in Eq. (7):
Hence, We explore the design technique for tanh and have imple-
𝑚𝑎𝑥 𝜎 ′ (𝑥) < 𝑚𝑎𝑥 𝑇 𝑎𝑛ℎ′ (𝑥) (7a) mented by using combinational logic. We have used series expansion

for the implementation of an AF and truncated the series up to 11th
𝜎 (𝑥) = 𝜎(𝑥)(1 − 𝜎(𝑥)) ≤ 0.25 (7b)
order to get optimum accuracy. It is observed that the accuracy is
0 < 𝜎(𝑥) < 1 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑐𝑜𝑛𝑐𝑎𝑣𝑒 𝑞𝑢𝑎𝑑𝑟𝑎𝑡𝑖𝑐 (7c) higher at 11th order than compared with 8th shown in Fig. 4(a) and
2 Fig. 4(b). Tanh function is a transformation of an exponential function.
𝑇 𝑎𝑛ℎ′ (𝑥) = 𝑠𝑒𝑐ℎ2 (𝑥) = (7d)
𝑒𝑥𝑝 (𝑥) + 𝑒𝑥𝑝 (−𝑥) Taylor’s series expansion of exp(x) = p(x) is shown in Eq. (9a). Whereas,
Above equations prove that the probability of vanishing gradient is for negative exponential expansion exp(−x) = q(x) is shown in Eq. (9b).
more in sigmoid as compared with that of the Tanh. ∑ 𝑝 (0) 𝑘 ∑ 𝑥𝑘
∞ (𝑘) ∞
𝑥2 𝑥3
𝑝(𝑥) = 𝑥 = =1+𝑥+ + +⋯ (9a)
𝑘=0
𝑘! 𝑘=0
𝑘! 2! 3!
3. Mathematical analysis and proposed digital design approach
∑ 𝑞 (0) 𝑘 ∑ 𝑥𝑘
∞ (𝑘) ∞
𝑥2
for activation function 𝑞(𝑥) = 𝑥 = =1−𝑥− −⋯ (9b)
𝑘=0
𝑘! 𝑘=0
𝑘! 2!
The design of neural network accelerators requires trigonometric
The behavior of this AF has 3 basic properties such as:
function calculations such as tanh and sigmoid. The proposed digital
design architecture enables such computation using minimum resource lim 𝑡𝑎𝑛ℎ(𝑦) = 1 (10a)
𝑦→∞
utilization with maximum accuracy. The tanh function is divided into
three regions as shown in . Region I and III value can be directly lim 𝑡𝑎𝑛ℎ(𝑦) = 𝑦 (10b)
𝑦→0
stored in the lookup table, and the remaining II region value can be
approximated according to the precision required. lim 𝑡𝑎𝑛ℎ(𝑦) = −1 (10c)
𝑦→−∞

After substituting the values and making approximations in Eq. (9)


3.1. Mathematical modeling and analysis for an activation function
up to 11th orders, we choose those points where our function values
changes. The Tanh function is an odd function that is symmetric
The trigonometric hyperbolic function is shown in Eq. (8). Whereas,
concerning 0. For implementing this non-linear AF, we will select a
the trigonometric hyperbolic function expansion in terms of exponen-
positive half of the function. Using three fundamental properties of
tial function is shown in Eq. (8c). Similarly, sigmoid function represen-
this tanh function, we can optimize our tanh equation based on these
tation shown in Eq. (8b).
properties. In this, we perform a quantization process. Tanh curve is
sinh(𝑥) = (𝑒𝑥 − 𝑒−𝑥 )∕2 (8a) divided into three segments.

4
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Table 2
Tanh implementation AND plane representation.
Input(AND plane) x4 x3 x2 x1 x𝑜

p0 x4 p23 x0 ,x1 ,x3 ,x4


p1 x0 ,x1 , 𝑥4 p24 𝑥0 ,x1 , 𝑥3 , 𝑥4
p2 x0 ,x2 ,x3 𝑥4 p25 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4
p3 x0 ,x4 p26 𝑥0 , 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4
p4 𝑥0 ,x2 ,x4 p27 x2 ,x3 , 𝑥4
p5 x0 ,x1 ,x2 , 𝑥4 p28 x1 ,x3 , 𝑥4
p6 𝑥3 ,x2 , 𝑥2 , 𝑥4 p29 𝑥0 , 𝑥1 , 𝑥3
p7 𝑥1 ,x0 , 𝑥4 , 𝑥2 p30 x1 ,x3 ,x4
p8 𝑥0 ,x1 , 𝑥2 , 𝑥4 p31 x0 ,x1 , 𝑥3 ,x4
p9 x0 ,x1 ,x4 p32 x0 , 𝑥1 , 𝑥2 ,x4
Fig. 5. k-map realization for proposed tanh function with a 5-bit input and 7-bit
output. In this figure d3 output has been shown. p10 𝑥1 ,x2 , 𝑥3 , 𝑥4 p33 x0 , 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4
p11 x1 ,x2 ,x3 , 𝑥4 p34 𝑥0 ,x1 , 𝑥2 , 𝑥3 , 𝑥4
p12 x0 ,x1 , 𝑥2 , 𝑥4 p35 x0 , 𝑥1 ,x2 , 𝑥4
p13 x0 , 𝑥2 ,x3 , 𝑥4 p36 𝑥0 , 𝑥1 ,x2 , 𝑥3 , 𝑥4
Moreover, if the segments are more, we get more precision. So, it
p14 x0 , 𝑥3 p37 x0 , 𝑥1 ,x3
is a bottleneck situation between efficiency and hardware requirement.
p15 x1 , 𝑥2 , 𝑥3 , 𝑥4 p38 𝑥0 , 𝑥2 ,x3 , 𝑥4
Suppose there are 2𝑁 quantization levels with N number of input bits,
p16 x0 , 𝑥1 , 𝑥3 , 𝑥4 p39 𝑥0 , 𝑥1 ,x2 , 𝑥4
i.e., Number of Fragmentation (NoF). N will depend upon the precision
p17 𝑥0 , 𝑥1 , 𝑥2 , 𝑥3 p40 𝑥0 , 𝑥2 , 𝑥3 , 𝑥4
required. The NoF and size of the one frame, i.e., 𝑄, is shown in below
p18 𝑥0 , 𝑥1 , 𝑥2 , 𝑥4 p41 𝑥2 , 𝑥3 , 𝑥4
Eq. (11b).
p19 𝑥0 , 𝑥1 , 𝑥3 , 𝑥4
𝑁𝑜𝐹 = 2𝑁 (11a) p20 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4
𝐹 𝑢𝑙𝑙𝑠𝑐𝑎𝑙𝑒 p21 𝑥0 , 𝑥2 , 𝑥3 , 𝑥4
𝑄= (11b)
2𝑁 p22 𝑥0 , 𝑥1 , 𝑥2 , 𝑥4

3.2. Implementation of a tanh AF

The Tanh AF implamentation techniques proposed in the state of Table 3


the art as shown in Table 4. By selecting an appropriate sampling rate, Tanh Implementation OR plane Implementation.
we find out the corresponding value of each sample using Eqs. (9a) Output(OR plane) d6 d5 d4 d3 d2 d1 d𝑜
and (9b) in tanh function up to 11th order. Then convert this decimal
d0 p0
value sample into a binary value with certain fix bit-width. In this
d1 p1 ,p2 ,p3 ,p4
implementation, we choose 7-bit for the encoding of the AF using 5-
d2 p5 ,p41 ,p7 ,p8 ,p9 ,p10
bits. Then generated the k-map of the truncated tanh equation as shown
d3 p5 ,p11 ,p12 ,p13 ,p14 ,p9 ,p15 ,p16 ,p17
in Fig. 5. By analyzing a k-map one can easily get the min-terms or max-
d4 p18 ,p19 ,p20 ,p21 ,p22 ,p23 ,p24 ,p25 ,p26
terms for the implementation of a function. For example, the output d3
d5 p27 ,p28 ,p29 ,p30 ,p31 ,p32
is shown in Eq. (12d). Similarly, we have written the remaining outputs
d6 p33 ,p34 ,p35 ,p36 ,p37 ,p38 ,p39 ,p40
from the Tables 2 and 3.
The detail Implementation of a tanh function is shown in algorithm
1. After getting all these outputs, we verify the RTL code of this ap-
proximated tanh function by not considering the pair condition. It may
Algorithm 1 : Implementation of a Tanh Activation Function
increase variables that result in sharing variables simultaneously, which
causes race conditions and is responsible for creating hazards. Imple- 1: With the help of the series expansion, expand tanh
mentation is done by using 180 nm, comparing the results with that of 2: Depend on the precision required truncate the series. In this paper
the exact tanh function, and LUT-based approximation in terms of area, we implemented up to 11th order.
power, delay, and accuracy. We have to take respective algorithms and 3: Now decide the sampling rate, by 21𝑃 each sample size and ∀ P ∋ N
implement our technology node to verify our proposed AF. 4: Required according to the accuracy P should be selected
5: Find out the corresponding function value of each sample.
𝑑0 = 𝑝0 (12a) 6: Encode the decimal value into bits. Select a appropriate bit-width
𝑑1 = 𝑝1 + 𝑝2 + 𝑝3 + 𝑝4 (12b) for this implementation.
7: Now this boolean function can be expressed into canonical form.
𝑑2 = 𝑝5 + 𝑝41 + 𝑝7 + 𝑝8 + 𝑝9 + 𝑝10 (12c) 8: Realization in SOP and POS form and design a circuit with the help
𝑑3 = 𝑝5 + 𝑝9 + 𝑝11 + 𝑝12 + 𝑝13 + 𝑝14 + 𝑝15 + 𝑝16 + 𝑝17 (12d) of OR and AND plane.

𝑑4 = 𝑝18 + 𝑝19 + 𝑝20 + 𝑝21 + 𝑝22 + 𝑝23 + 𝑝24 + 𝑝25 + 𝑝26 (12e)
𝑑5 = 𝑝27 + 𝑝28 + 𝑝29 + 𝑝30 + 𝑝31 + 𝑝32 (12f)
4. Results and discussion
𝑑6 = 𝑝33 + 𝑝34 + 𝑝35 + 𝑝36 + 𝑝37 + 𝑝38 + 𝑝39 + 𝑝40 (12g)

The combinational logic range of the above approximated simulated


function is shown in Eq. (13). The validation of the proposed method is verified by using the
𝑥 ≤ −3.5 = −1 LeNet neural architecture model using the Keras library, and perfor-
𝑓 (𝑥) = 𝑡𝑎𝑛ℎ(𝑥) = −3.5 ≤ 𝑥 ≤ 3.5 = 𝑓 (𝑥) (13) mance parameters are extracted for experimental analysis at a 180 nm
3.5 ≤ 𝑥 = 1 technology node

5
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Table 4
Reference activation functions and their design techniques and features.
AF model Technique & Features Acronyms

1 By storing activation function value directly in LUT or ROM based implementation [20] LUT
2 Storing parameters value used in the model of activation function [23] LUT-P
3 Series expansion method based on LUT approach [24] LUT-S
4 Piece-wise linear function and then its implementation [32] PWL
5 CORDIC based Configurable activation function [13,33,34] CORDIC
6 Implementation using stochastic computing [35] SC
7 Proposed method by truncated the series and then implement with the help of a combinational circuit. Proposed

Table 5
Implementation of AF using various algorithms with the help of 180 nm technology node at 0.1 GHz.

Quantization Area Leakage Delay Energy EDP ADP


level (μm2 ) power (nW) (ns) (nJ) (ns × nJ) (ns × μm2 )
LUT[20] 3014.85 30.95 2.42 74.89 181.23 7295.93
LUT-P [23] 2789.12 26.82 2.65 71.07 188.23 7391.17
LUT-S [24] 2353.21 18.92 2.92 55.24 161.30 6871.37
PWL [32] 2052.31 19.27 3.01 58.00 174.58 6177.45
CORDIC [33] 2825 119.40 8.87 1050.72 9319.88 25057.75
SC [35] 4150.81 132.42 9.82 1300.36 12769.57 40760.95
Proposed 1024.04 10.51 2.86 30.06 85.97 2928.75

Fig. 6. Performance analysis and comparison: (a) Comparison of various designs implementations in terms of energy, EDP, ADP. Ratio were computing by taking smallest value
as 1. (b) Figure of merit for different algorithms for implementation of a AF.

4.1. Resources utilization other reference methods as shown in the Table 5. The area of the
proposed design is reduced by 66.03% as compared with LUT based
The experimental evaluation is carried out using RTL of the pro- method, and 63.28%, 56.48%, and 50.01% are compared with LUT-P,
posed design, and state-of-the-art architectures are synthesized us- LUT-S, PWL methods, respectively. Moreover, among all methods, the
ing design vision-Synopsys. Results are produced by Design Compiler- SC and CORDIC have less accuracy and more baseline error areas than
Synopsys at 180 nm technology node. The LUT technique is based
other techniques.
on piece-wise linear implementation, whereas stochastic computing
and CORDIC-based methods are approximate with respect to iterative
computation [33]. The approximate technique has some degree of
4.2. Power and delay analysis
error. Moreover, with increasing the bit-precision and iterations of
computation, the degree of error can be decreased. Hence, the physical
parameters are given at 180 nm for 8-bit precision memory element In the proposed method, power is reduced approximately up to
(LUT) based architecture. In stochastic computing-based architecture
3× as compared with the conventional LUT-based approach. Power is
has a minor degree of error for 8-bit and higher precision, and in
reduced by 66.04%, but the delay is slightly increased by 18.18% as
CORDIC architecture, computational accuracy is increased with higher
compared with the LUT-based approach. If we consider the piece-wise
precision (16-bit and higher bit) representation. Hence, we selected the
8-bit and 16-bit precision for stochastic computing and CORDIC-based linear (PWL) approach, then power and delay are reduced by 45.46%
architecture. The parameters are extracted and compared as shown and 4.98% respectively. The area-delay-product (ADP) and the energy-
in Table 5. Moreover, based on those bit precision, we have calculated delay-product (EDP) is shown in Fig. 6(a). It shows that proposed
the test accuracy and baseline error for the CIFER-10 dataset as shown method has lower energy, EDP, and ADP as compared with that of the
in Table 11. The proposed method occupies less area than that of the proposed methods as shown in Table 5.

6
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Table 6
Implementation of AF using different quantization Level effects on Tanh with the help of 180 nm technology node at 0.1 Ghz.
Quantization level Area (μm2 ) Leakage power (nW) Delay (ns) Energy (nJ) EDP (ns × nJ) ADP (ns ×μm2 )
5_7 1024.04 10.51 2.86 30.06 85.97 2928.75
5_8 1289.12 16.82 2.95 49.62 141.91 3802.90
5_6 1001.21 11.92 2.42 28.85 69.82 2422.93
4_8 995.18 9.11 2.21 20.13 44.49 2109.78
4_7 982.12 8.99 2.12 19.06 40.41 2082.09

Table 7
Implementation of AF using various algorithms with the help of 180 nm technology
node at 0.1 GHz with quantization level 5_7.
AF model Area (μm2 ) Leakage power (nW) Delay (ns)
ReLU 895.21 9.12 1.82
SWISH 1031.21 11.21 3.12
ELISH 1028.12 10.98 2.99
ELU 995.29 10.55 2.88
SELU 994.21 10.41 2.78
Tanh 1024.04 10.51 2.86

Table 8
Summary of datasets (MNIST and CIFAR-10).
Datasets Training set Test image set Output Image pixel
MNIST 60K 10K 10 28 × 28
CIFAR-10 60K 10K 10 32 × 32

Fig. 7. Absolute relative error of the proposed AF with respect to the exact tanh fun-
ction.
4.4. Error analysis

This section analyzes the maximum average error and relative error
for the proposed AF algorithm. In this paper, tanh has symmetry
property; we have shown these errors for the positive half of the graph.
As shown in Fig. 4(b), the absolute error is maximum at 8th order as
compared with 11th order. The absolute error of 11thorder is calculated
using Eq. (15a), where x𝑎 is the true value and 𝑥𝑏 is the approximate
value. This is why choosing the 11thorder tanh function as an AF.

𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 = ∣ 𝑥𝑎 − 𝑥𝑏 ∣ (15a)


𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝐴𝑣𝑔 ∣ 𝑥𝑎 − 𝑥𝑏 ∣ (15b)

Average error of 11thorder is calculated by using Eq. (15b). The


average value comes out to be 0.048% which is very small compared
to that of the 8thorder having a value of 0.38. The Relative error is
calculated using an Eq. (16a). It is observed that the relative error is
more for lesser order as a comparison with that of the higher-order
is shown in Fig. 7. The average relative error is estimated by using
Eq. (16b) for 11thorder is 0.52 % and for 8thorder 57.28 %. Therefore,
seeing these errors, we have truncated the series by 11thorder.

Fig. 8. Comparison between different Quantization level. ∣ 𝑥𝑎 − 𝑥𝑏 ∣


𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 = (16a)
∣ 𝑥𝑎 ∣
∣ 𝑥𝑎 − 𝑥𝑏 ∣
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝐴𝑣𝑔 (16b)
4.3. Figure of Merit (FOM) ∣ 𝑥𝑎 ∣

Here the proposed figure of merit is expressed as Eq. (14). Where 4.5. Different quantization level effects on tanh
A𝑛𝑜𝑟𝑚 is the normalized area, P𝑛𝑜𝑟𝑚 is normalized power and D𝑛𝑜𝑟𝑚 is the
normalized delay.
The Tanh is implemented using various quantization levels with
1 the help of series expansion and then convert into the combinational
𝐹 𝑖𝑔𝑢𝑟𝑒 𝑜𝑓 𝑀𝑒𝑟𝑖𝑡 (𝐹 𝑂𝑀) = (14)
𝐴𝑛𝑜𝑟𝑚 × 𝑃𝑛𝑜𝑟𝑚 × 𝐷𝑛𝑜𝑟𝑚 design. In this paper, for the comparison, we have chosen 5-bit input
The 𝐹 𝑂𝑀 of AF for various methods including proposed technique is and 7-bit output (5_7), 5-bit input 8-bit outputs (5_8), 5-bit input 6-bit
shown in Fig. 6(b). From the results observed that the proposed design outputs (5_6), 4-bit input 8-bit outputs (4_8), 4-bit input 7-bit outputs
has lower Energy, EDP, and ADP and higher 𝐹 𝑂𝑀 compared with the (4_7). In Table 6 there is a comparison with different quantization lev-
other implementations; the proposed circuit consumes less power and els. It shows that 5_7 has more area but better performance parameters
high performance with a lower area overhead. Since. The proposed than the lower quantization level. The accuracy and other physical
circuit is well suited for deep neural network applications. performance parameters are shown in Fig. 8.

7
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Table 9
Experimental results using LeNet architecture after customizing AF for MNIST
dataset.

AF deign Baseline error (%) Test accuracy (%) Compute time (s)
LUT 1.98 98.62 348
LUT-P 2.02 98.94 352
LUT-S 3.92 98.82 421
PWL 4.25 98.71 361
CORDIC 5.15 97.98 429
SC 6.12 97.81 435
Proposed 3.48 98.92 249

Table 10 10 thousand, which has 28 × 28 grayscale representation which has


Experimental results at different quantization level using LeNet architecture after
a range between 0 to 9. This MNIST data is trained with the help of
customizing AF for MNIST dataset.
the LeNet architecture model using the Keras module. Training over
Activation function Baseline error (%) Test accuracy (%)
epochs using the proposed architecture of 𝑇 𝑎𝑛ℎ AF for MNIST dataset
5_7 3.48 98.92
on LeNet architecture is shown in Fig. 9. For 11th order the Baseline
5_8 2.01 98.99
5_6 3.52 98.81 error is 3.48%. Whereas, for the exact tanh AF, the baseline error is
4_8 5.82 98.12 1.98%, which is less than the proposed one. However, we can trade-
4_7 4.62 97.98 off by seeing the overall performance matrix in terms of hardware
implementation such as area, power, and Delay.
The experimental analysis of a customized AF in the LeNet model is
4.6. Implementations of other non-linear activation functions shown in the Table 9. We have chosen the same batch size and epochs
for all the AF designs. Although the baseline error of the proposed one
This section has implemented other famous AFs using the above- is higher in comparison to the LUT and LUT-P method and test loss is
described implementation using combinational logic and series expan- also higher. The Proposed method has an optimum accuracy with lower
sion. Table 7 shows the implemented results of various AFs such as computation time. In this paper, we have chosen inference time for the
ReLU, SWISH, ELISH, ELU, SELU. For the implementation of these AFs, computation. However, if we talk about hardware designs and their
we have taken 5_7 as a quantization level. We can implement any implementation proposed algorithm is better than other comparisons
non-linear function by the method described in the Algorithm. 1. by in terms of energy and accuracy trade-off. Experimental analysis of
selecting the appropriate quantization level. This method applies to all various quantization levels shown in Table 10 on LeNet architecture
the AFs. using the MNIST dataset.

𝑆𝑊 𝐼𝑆𝐻 = 𝑥∕(1 + 𝑒−𝑥 ) (17a)


{ 5.2. Experiment 2: CIFAR-10
0 for 𝑥 < 0
𝑅𝑒𝐿𝑈 = (17b)
𝑥 for 𝑥 ≥ 0 The CIFAR-10 dataset is used as a benchmark for image classifi-
{
𝛼(𝑒𝑥𝑝(𝑥) − −1) for 𝑥 ≤ 0 cation. It consists of a training set of 60 thousand and a test set size
𝐸𝐿𝑈 = (17c)
𝑥 for 𝑥 > 0 of 10 thousand. In this CIFAR-10, each instance has size 32 × 32
{ colored images of birds, automobiles, dogs, frogs, etc. For validation
𝛼(𝑒𝑥𝑝(𝑥) − −1) for 𝑥 ≤ 0
𝑆𝐸𝐿𝑈 = 𝜆 (17d) of our customized AF. We choose the CIFAR-10 dataset on the LeNet
𝑥 for 𝑥 > 0
model as shown in Table 11. We have fix batch size and epoch 25 and
(𝑒𝑥 − 1) 𝑥
𝐸𝐿𝐼𝑆𝐻 = 𝑜𝑟 (17e) 100, respectively, for all the methods. The Table 11 shows that the
(1 + 𝑒−𝑥 ) (1 + 𝑒−𝑥 )
baseline error of the proposed one is a little bit higher but has good
computation time and accuracy. We see these results proposed that
5. Experimental analysis and validation implementing an AF is good enough in terms of all the performance
parameters. Experimental analysis of various quantization levels as
We have performed experiments based on the MNIST and CIFAR- shown in Table 12 on LeNet architecture using CIFAR-10 dataset.
10 dataset on the LeNet architecture model for benchmark analysis.
Summary of datasets is shown in Table 8. 6. Conclusion

5.1. Experiment 1: MNIST A new approximation method for the implementation of a tanh was
proposed in this paper using purely combinational logic design. We
The MNIST dataset is the benchmark dataset for image classification showed the implementation and its comparative studies with various
[36,37]. It consists of a 60 thousand training set with a test set of other approximation techniques. Based on the quantization level, the

Table 11
Experimental results using LeNet architecture after customizing AF for CIFAR-10
dataset.

AF deign Baseline error (%) Test accuracy (%) Compute time (s)
LUT 6.02 69.02 352
LUT-P 6.03 67.99 361
LUT-S 7.42 68.72 412
PWL 6.12 69.63 392
CORDIC 7.25 66.12 418
SC 7.92 66.02 421
Proposed 6.92 68.99 273

8
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

Fig. 9. MNIST dataset on LeNet architecture using customizing AF.

Table 12 [6] Pramod Kumar Meher, An optimized lookup-table for the evaluation of sigmoid
Experimental results at different quantization level using LeNet architecture after function for artificial neural networks, in: 2010 18th IEEE/IFIP International
customizing AF for CIFAR-10 dataset. Conference on VLSI and System-on-Chip, IEEE, 2010.
Activation function Baseline error (%) Test accuracy (%) [7] Barry L. Kalman, Stan C. Kwasny, Why tanh: choosing a sigmoidal function, in:
[Proceedings 1992] IJCNN International Joint Conference on Neural Networks,
5_7 6.92 68.99
Vol. 4, IEEE, 1992.
5_8 7.12 70.85
5_6 7.59 67.92 [8] K. Basterretxea, Jose Manuel Tarela, I. Del Campo, Approximation of sigmoid
4_8 9.82 65.24 function and the derivative for hardware implementation of artificial neurons,
4_7 9.12 66.23 IEE Proc.-Circuits Dev. Syst. 151 (1) (2004) 18–24.
[9] Ashkan Hosseinzadeh Namin, et al., Efficient hardware implementation of the
hyperbolic tangent sigmoid function, in: 2009 IEEE International Symposium on
Circuits and Systems, IEEE, 2009.
proposed model has little effect on accuracy. The hardware imple- [10] Vipin Tiwari, Nilay Khare, Hardware implementation of neural network with
sigmoidal activation functions using CORDIC, Microprocess. Microsyst. 39 (6)
mentation of the proposed AF is realized using 180 nm for further (2015) 373–381.
evaluation in terms of area, power, and delay. FOM of the proposed [11] Ji Li, et al., Hardware-driven nonlinear activation for stochastic computing based
design is 3.5× as compared to the prior arts. Experimental results on deep convolutional neural networks, in: 2017 International Joint Conference on
the LeNet model for MNIST and CIFAR10 dataset also show that the Neural Networks, IJCNN, IEEE, 2017.
proposed design has also optimum accuracy. [12] J. Kadmon, H. Sompolinsky, Transition to chaos in random neuronal networks,
Phys. Rev. X 5 (4) (2015) 041030.
[13] Gopal Raut, et al., A CORDIC based configurable activation function for ANN
Declaration of competing interest applications, in: 2020 IEEE Computer Society Annual Symposium on VLSI,
ISVLSI, IEEE, 2020.
The authors declare that they have no known competing finan- [14] Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter, Fast and accurate
deep network learning by exponential linear units (elus), 2015, arXiv preprint
cial interests or personal relationships that could have appeared to
arXiv:1511.07289.
influence the work reported in this paper.
[15] Guido Baccelli, et al., NACU: a non-linear arithmetic unit for neural networks,
in: 2020 57th ACM/IEEE Design Automation Conference, DAC, IEEE, 2020.
Acknowledgments [16] Gopal Raut, et al., Efficient low-precision CORDIC algorithm for hardware
implementation of artificial neural network, in: International Symposium on VLSI
Design and Test, Springer, Singapore, 2019.
The authors would like to thank the Council of Scientific and
[17] Vipin Tiwari, Ashish. Mishra, Neural network-based hardware classifier using
Industrial Research (CSIR), India New Delhi, Government of India CORDIC algorithm, Modern Phys. Lett. B 34 (15) (2020) 2050161.
under SRF scheme for providing financial support and Special Man- [18] Najmeh Nazari, et al., TOT-net: An endeavor toward optimizingternary neural
power Development Program Chip to System Design, Department of networks, in: 2019 22nd Euromicro Conference on Digital System Design, DSD,
Electronics and Information Technology (DeitY) under the Ministry of IEEE, 2019.
Communication and Information Technology, Government of India for [19] Mohammad Loni, et al., DeepMaker: A multi-objective optimizationframework
for deep neural networks in embedded systems, Microprocess. Microsyst. 73
providing necessary Research Facilities. (2020) 102989.ii.
[20] Karl Leboeuf, et al., High speed VLSI implementation of the hyperbolic tangent
References sigmoid function, in: 2008 Third International Conference on Convergence and
Hybrid Information Technology, Vol. 1, IEEE, 2008.
[1] Kenji Suzuki (Ed.), Artificial Neural Networks: Architectures and Applications, [21] Mustafa Gl̆su, Mehmet Sezer, A taylor polynomial approach for solving
BoD–Books on Demand, 2013. differential-difference equations, J. Comput. Appl. Math. 186 (2) (2006)
[2] Z. Hajduk, Hardware implementation of hyperbolic tangent and sigmoid 349–364.
activation functions, Bull. Pol. Acad. Sci.. Tech. Sci. 66 (5) (2018). [22] Tao Yang, et al., Design space exploration of neural network activation function
[3] Naleih M. Botros, M. Abdul-Aziz, Hardware implementation of an artificial circuits, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 38 (10) (2018)
neural network using field programmable gate arrays (FPGA’s), IEEE Trans. Ind. 1974–1978.
Electron. 41 (6) (1994) 665–667. [23] Joshua Yung Lih Low, Ching Chuen Jong, A SCory-efficient tables-and-additions
[4] Chigozie Nwankpa, et al., Activation functions: Comparison of trends in practice method for accurate computation of elementary functions, IEEE Trans. Comput.
and research for deep learning, 2018, arXiv preprint arXiv:1811.03378. 62 (5) (2012) 858–872.
[5] Stamatis Vassiliadis, Ming Zhang, José G. Delgado-Frias, Elementary function [24] Barry Lee, Neil Burgess, Some results on Taylor-series function approximation
generators for neural-network emulators, IEEE Trans. Neural Netw. 11 (6) (2000) on FPGA, in: The Thrity-Seventh Asilomar Conference on Signals, Systems and
1438–1449. Computers, Vol. 2, IEEE, 2003.

9
G. Rajput et al. Microprocessors and Microsystems 84 (2021) 104270

[25] Babak Zamanlooy, Mitra Mirhassani, Efficient VLSI implementation of neural Gunjan Rajput received B.Tech. degree in Electronics &
networks with hyperbolic tangent activation function, IEEE Trans. Very Large Telecommunication Engineering and M.tech. degree in VLSI
Scale Integr. (VLSI) Syst. 22 (1) (2013) 39–48. and Embedded Engineering from Delhi Technological Uni-
[26] D.J. Myers, R.A. Hutchinson, Efficient implementation of piece-wise linear versity (DCE), Delhi, India in 2012 and 2015 respectively.
activation function for digital VLSI neural networks, Electron. Lett. 25 (1989) Currently she is with Nanoscale Devices, VLSI Circuit &
1662. System Lab and pursuing Ph.D. degree in Electrical Engi-
[27] Ehsan Rasekh, Iman Rasekh, Mohammad Eshghi, PWL Approximation of Hyper- neering Department, Indian Institute of Technology Indore,
bolic Tangent and the First Derivative for VLSI Implementation, CCECE 2010, India. Her current research interest includes in Machine
IEEE, 2010. learning, Edge computing, hardware accelerator design, and
[28] Hussein M.H. Al-Rikabi, et al., Generic model implementation of deep neural Reliability.
network activation functions using GWO-optimized SCPWL model on FPGA,
Microprocess. Microsyst. (2020) 103141.
[29] Shaghayegh Gomar, Mitra Mirhassani, Majid Ahmadi, Precise digital implementa-
tions of hyperbolic tanh and sigmoid function, in: 2016 50th Asilomar Conference
on Signals, Systems and Computers, IEEE, 2016. Gopal Raut received the B.Engg. in electronic engineering
[30] Karl Leboeuf, et al., High speed VLSI implementation of the hyperbolic tangent and M.Tech. in VLSI Design from G H Raisoni College of
sigmoid function, in: 2008 Third International Conference on Convergence and Engineering Nagpur, India, in 2015. He is currently pursuing
Hybrid Information Technology, Vol. 1, IEEE, 2008. the Ph.D. degree with the Electrical Engineering Depart-
[31] Yann A. LeCun, et al., Efficient backprop, in: Neural Networks: Tricks of the ment, Indian Institute of Technology Indore, India. His
Trade, Springer, Berlin, Heidelberg, 2012, pp. 9–48. research focus is compute-efficient and configurable VLSI
[32] Che-Wei Lin, Jeen-Shing Wang, A digital circuit design of hyperbolic tangentsig- circuit design for low power IoT and Edge AI applications.
moid function for neural networks, in: 2008 IEEE International Symposium on
Circuitsand Systems, IEEE, 2008.
[33] G. Raut, S. Rai, S.K. Vishvakarma, A. Kumar, RECON: Resource-efficient CORDIC-
based neuron architecture, IEEE Open J. Circuits Syst. 2 (2021) 170–181, http:
//dx.doi.org/10.1109/OJCAS.2020.3042743.
[34] Martine Wedlake, Harry L. Kwok, A CORDIC implementation of a digital artificial
neuron, in: 1997 IEEE Pacific Rim Conference on Communications, Computers Santosh Kumar Vishvakarma is currently an Associate
and Signal Processing, PACRIM. Professor with the Department of Electrical Engineering,
[35] Van-Tinh Nguyen, et al., An efficient hardware implementation of activa- Indian Institute of Technology Indore, India, where he is
tion functions using stochastic computing for deep neural networks, in: leading Nanoscale Devices and VLSI Circuit and System
2018 IEEE 12th International Symposium on Embedded Multicore/Many-Core Design Lab. He got his Ph.D. degree from Indian Institute of
Systems-on-Chip, MCSoC, IEEE, 2018. Technology Roorkee, India, in 2010. From 2009 to 2010, he
[36] Han Xiao, Kashif Rasul, Roland Vollgraf, Fashion-mnist: a novel image dataset was with the University Graduate Center, Kjeller, Norway,
for benchmarking machine learning algorithms, 2017, arXiv preprint arXiv: as an Post-Doctoral Fellow under European Union Project
1708.07747. ‘‘COMON.’’. His current research interests include nanoscale
[37] L. Bottou LeCun, Y. Bengio, P. Haffner, Gradient-based learning applied to devices, reliable SRAM memory designs, and configurable
document recognition, Proc. IEEE 86 (11) (1998) 2278–2324. circuits design for IoT application.

10

You might also like