0% found this document useful (0 votes)

99 views

Activation Functions and Their Characteristics in Deep Neural Networks

The document discusses activation functions and their characteristics in deep neural networks. It begins by introducing activation functions and their role in artificial neural networks as analogs to biological neurons. It then discusses several commonly used activation functions - sigmoid, hyperbolic tangent, ReLU, and their variants. It analyzes the characteristics of these functions, including their impacts on training deep neural networks and addressing issues like vanishing gradients. Experimental results on MNIST are also discussed to compare performance of different activation functions.

Uploaded by

Satyam

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views

Activation Functions and Their Characteristics in Deep Neural Networks

Uploaded by

Satyam

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Activation Functions and Their Characteristics in Deep Neural

Networks
Bin Ding, Huimin Qian, Jun Zhou
College of Energy and Electrical Engincering, Hohai University, Nanjing, 211100
E-mail: [email protected], [email protected], [email protected]

Abstract: Deep neural networks have gained remarkable achievements in many research arcas, especially in computer
vision, and natural language processing. The great successes of deep neural networks depend on several aspects in which
the development of activation function is one of the most important elements. Being aware of this, a number of researches
have concentrated on the performance improvements after the revision of a certain activation function in some specified
neural networks. We have noticed that there are few papers to review thoroughly the activation functions employed by
the neural networks. Therefore, considering the impact of improving the performance of neural networks with deep
architectures, the status and the developments of commonly used activation functions will be investigated in this paper.
More specifically, the definitions, the impacts on the neural networks, and the advantages and disadvantages of quite a
few activation functions will be discussed in this paper. Furthermore, experimental results on the dataset MNIST are
employed to compare the performance of different activation functions.
Key Words: neural network, deep architecture, activation function

1 INTRODUCTION 2 ACTIVATION FUNCTION

In the past few years, tremendous improvements have been Artificial neural network is the analog of biological neu-
witnessed in the representation and recognition perfor- ral network. Therefore, the primary task of constructing
mance of neural networks with deep architectures [1, 2], a0 artificial neural network is to design the artificial neuron
due to which, the landscapes of computer vision and natu- model since neurons are the basic units of biological neural
ral language processing have been noticeably changed [3- network. Next, briefly explain the operating principle of a
5]. The revolutionary changes are resulted from the next ~ biological neuron. A biological neuron receives the elec-
crucial elements including building more powerful model- trical signals sentby other neurons with different weights.
s[4, 19-22, 25], accumulating larger-scale dataset [19,21], When the synthetic value of all the electrical signals is big
developing higher-performance hardware, designing more enough to stimulate the neuron, it tus into the excited s-
effective regularization techniques [21, 21, 23], and so on, 1ate to output a response. Otherwise, it keeps the inactive
Among which, the developments of activation functionsal- State.
so have played a vital role of improving the performance ~ According to the principle, the artificial neuron model is
of deep neural networks, thus, more and more efforts have designed. Fig. 1 is the schematic diagram of an artifi-
concentrated on the study of activation functions [6-12]. cial neuron (AU), where {1, zy, -+, } are the inputs of
Since Nair and Hinton [7] first proposed the rectified lin- a0 AU, {wi,wz, -+ ,w, } are the weights corresponding
car units to improve the performance of Restricted Boltz- 10 the inputs, b is the bias, and the addition unit 37 get-
mann Machines, saturated activation function, such as sig- S the linear weight sum = of the inputs {z;, 23, 2}
moid and tanh, are replaced by non-saturated counterpart, ~ and the bias. Denote z = [z1, 2, ,z.] € R" and
such as ReLU, ELU, to solve the so-called vanishing gra-
dient and to accelerate the convergence speed in the neural b
networks with deep architectures, like convolutional neural o /l\
network, recurrent neural network. ‘\
In order to understand the status and performance improve- x5 W ¥ s \ = ) %
ment of activation function in deep neural networks thor- . / N
oughly, the definition, pros and cons of commonly used ac- 4 /’K
tivation functions will be discussed in this paper. And the 5
comparison of experimental results on MNIST dataset will
be illustrated as well. Figure 1: The schematic diagram of an artificial neuron

“This work is supported by the Natural Science Foundation of Jiangsu .

Province under Grant BK20140860 and the Natural Science Foundation % = [w1, +wn] € R", then the output = of the ad-
of China under Grant 61573001 dition unit can be represented as

978-1-5386-1243-9/18/$31.00
(©2018 IEEE 1836
=aw’ +b output level of a neural network owing to its value distribu-
tion. While sigmoid function is rarely adopted in the deep
The function g(-), which is named as activation function, neural networks except the output level since its saturation
is used to simulate the response state of biological neuron Exactly, it is soft saturate since it only achieves zero gradi-
and obtain the output y, that is ent in the limit [13]
y=9(2) A) g =0
Caglar et. al. gave the definition of an activation function and
in [13], which is “ An activation function is a function g :
R — R that is differentiable almost everywhere.” mo)
g =0
Nonlinear activation functions have brought neural net- as seen in Fig. 2. The soft saturation results in the difficul-
works their nonlinear capabilities, which means neural net- ties of training a deep neural network. More specific, in the
works can differentiate those data that can not be classified process of optimizing loss function, the derivatives of sig-
linearly in the current data space. moid function, which is contributed to update the weights
The definition given by Caglar [13] shows that the contin- and the bias, will reduce to zero when it comes to the satu-
uous differentiable function can be used as an activation ration area, which brings about the less contributions of the
function. Whereas, it is not always the truth to the deep first several layers in the knowledge learning from training
neural networks due to the difficulties in training. Next, samples. In fact, the situation is called vanishing gradient.
the commonly used activation functions will be analyzed Generally, the vanishing gradient will emerge in a neural
thoroughly. network with more than five layers [6].
2.1 Sigmoid and Its Improvements According to the limitation of Sigmoid function, several
211 Sigmoid improvements have been proposed. Huang et al. [14] intro-
duced double-parameters to sigmoid function. One of the
Sigmoid function is one of the most common forms of ac- parameter is used to generate an appropriate input = which
tivation functions. The definition of sigmoid function is as can not lead the input to the saturation area; the other is to
follows, control the decay of residual error. Caglar et al. [13] pro-
1 posed to inject the noise to the activation functions which
9(x)
“Tter (1)
makes the loss function is easier to optimize. Another de-
in which @ € (—00, +00), g € (0, 1), as seen in Fig. 2. It velopment is hyperbolic tangent function, which will be il-
lustrated next.

2.1.2 Hyperbolic Tangent

Hyperbolic tangent function can be casily defined as the
ratio between the sine and the cosine functions.
tanh(z) = sinh(z 2
osh(a)
It is similar to sigmoid function and can be deducted from
the sigmoid function in (1) as follows
tanh(x) = 2sigmoid(2x) — 1
Figure 2: The graphic depiction of Sigmoid function and where sigmoid(-) is g(x) in equation (1). The hyperbolic
tangent function ranges outputs between -1 and 1 as seen
its derivative in Fig. 3. It is also a continuous and monotonic func-
tion, and it is differentiable everywhere. It is symmetric
is known that the modeling and training process of a multi- about the origin (see Fig. 3), of which the outputs, namely
layer neural network can be divided into two parts: forward the inputs of next layer, are more likely on average close
propagation and back propagation. And in the back propa- to zero. Therefore, the hyperbolic tangent functions are
gation, the derivatives of activation functions in each layer more preferred than sigmoid functions. In addition, neural
should be calculated. The sigmoid function is a continuous networks with hyperbolic tangent activation functions con-
function, which means that it is differentiable everywhere. verge faster than those with sigmoid activation function-
The derivatives of sigmoid function s [15]. And the neural networks with hyperbolic tangen-
t activation functions have lower classification error than
those with sigmoid activation functions [6].
However, the calculation of the derivatives of hyperbolic
is easy to be calculated. Therefore, the sigmoid function tangent functions, listed as follows,
was commonly used in shallow neural networks. In ad-
dition, the sigmoid function is frequently employed in the tanh' () = 2sigmoid’(2x) —

The 30th Chinese Control and Decision Conference (2018 CCDC) 1837
Figure 3: The graphic depiction of hyperbolic tangent func- Figure 4: The graphic depiction of ReLU function and its
tion and its derivatives. derivatives.

is more complicated than sigmoid function. And it has the function has the following advantages [6, 16]:
same soft saturation as sigmoid function, which also has
the vanishing gradient problem. Computations of neural networks with ReLU func-
tions are cheaper than sigmoid and hyperbolic tangen-
2.2 ReLU and Its Improvements t activation functions because there are no need for
The neuroscience research found that cortical neurons are computing the exponential functions in activations.
rarely in their maximum saturation regime, and suggest-
s that their activation function can be approximated by a
The neural networks with ReLU activation functions
rectifier [29]. That is to say, the operating mode of the neu- converge much faster than those with saturating acti-
rons has the characteristic of sparsity. More specific, only vation functions in terms of training time with gradi-
one percent to four percent of neurons in the brain can be ent descent.
activated simultaneously. However, in the neural network- The ReLU function allows a network to easily obtain
s with sigmoid or hyperbolic tangent activation function- sparse representation, More specific, the output is 0
s, almost one half of the neuron units are activated at the when the input 2z < 0, which provides the sparsity in
same time, which is inconsistent with the research in neu- the activation of neuron units and improves the effi-
roscience. Furthermore, activating more neuron units will ciency of data learning. When the input = > 0, the
bring about more difficulties in the training of a deep neural features of the data can be retained largely.
network. Rectified linear units (ReLU), firstly introduced
by Nair and Hinton for Restricted Boltzmann machines [7], The derivatives of ReLU function keep as the constant
will help hidden layer of the neural network to obtain the 1, which can avoid trapping into the local optimization
sparse output matrix, which can improve the efficiency. and resolve the vanishing gradient effect occurred in
ReLU function and its improvements are almost the most sigmoid and hyperbolic tangent activation functions.
popular activation functions used in deep neural networks
Deep neural networks with ReLU activation function-
currently [6, 16, 24, 26]. It is said that although the main
difficulty of training deep networks are resolved by the idea s can reach their best performance without requiring
of initializing each layer by unsupervised learning, while any unsupervised pre-training on purely supervised
tasks with large labeled datasets.
the employment of ReLU activation functions can also be
seen as a breakthrough in directly supervised training of However, the derivatives ¢'(x) = 0 when < 0 so the
deep networks. ReLU function is left-hard-saturating. And the relative
weights might not be updated any more and that leads to
2.2.1 ReLU the death of some neuron units, which means that these
neuron units will never be activated. Another disadvantage
The definition of ReLU is as follows of ReLU is that the average of the units” outputs is identi-
cally positive, which will cause a bias shift for units in the
9(z) == maz(0,)
0,2) = {0 Gaze=
z if >0
O3 next layer. The above two attributes both have a negative
impact on convergence of the corresponding deep neural
The graph is depicted in Fig. 4. ReLU activation function networks [17].
is non-saturated, and its derivative function is
2.2.2 LReLU, PReLU, and RReL.U
1 if £>0
g'(x) = {0 if x<0 The possible death of some neuron units of neural networks
with ReLU functions are resulted from the compulsive op-
which is a constant when the inputz > 0. Thus, the vanish- eration of letting g(x) = 0 when 2 < 0. In order to alle-
ing gradient problem can be released. Specifically, ReLU viate this potential problem, Maas et al. [17] proposed the

1838 The 30th Chinese Control and Decision Conference (2018 CCDC)
leaky rectified linear units (LReLU), see in equation(4). Another improvement of ReLU, namely randomized rec-
tified linear unit (RReLU), should be discussed. As we
) = maz(0,z) = v oif 220
= 4 know, the slopes of the negative parts are set as a constant
9(z) 0.2) {0.01,« ifa<o @ in LReLU activation functions, and as a learnable param-
eter in PReLU activation functions, respectively. While in
Its derivative function is RReLU [10] , the slopes are randomized in a given range
in the training, and then fixed in the testing. The definition
B 1 if >0 of RReLU is that
9= = {0.01 otherwise

Fig. 5 gives the graph of LReLU function, which is nearly

.q(:c)z{ arvore=0
if v <0
©
identical to the ReLU function in Fig. 4. As we can see
where
a~U(A, B), A< BandA,B € [0,1)

In the training process, a is a random number sampled from

a uniform distribution U(A, B). In the test process, the
average of all the parameters a in training are taken, and
the parameter is set as %2 in [10]. The performance of
RReLU is investigated to be better than ReLU, LReLU and
PReLU in specific experiments [10].

2.3 ELU and Its Improvements

In order to push the activation means of activation functions
closer to zero to decrease the bias shift effect of ReLU, the
Figure 5: The graphic depiction of LReLU, PReLU, and exponential linear unit (ELU) with a > 0 was proposed as
RReLU function follows [11].

z if x>0
that the LReLU allows fora small, non-zero gradient when ) = 7
the unit is saturated and not active. Therefore, there are no 9(@) {a(ef 1) if 2<0 ™
zero gradients and no neuron units can be “off” always.
It should be acknowledged that the sparsity has been lost
And its derivative function is
when replacing ReLU with LReLU. Fortunately, the ex-
perimental results in [17] illustrated that the impact on the o) — 1 ifae>0
classification accuracy under the modification can not be
q(m)’{(\e‘ if 2<0
recognized while the learning capabilities of the neural net- The graphic depiction is shown in Fig. 6. The parameter o
works become more robust.
Furthermore, He et al. [12] presented the parametric recti-
fied linear unit (PReLU) by replacing the constant 0.01 of
equation (4) with a learnable parameter. The definition of
PReLU is
x if ©>0
o) = {/u‘ if £<0 2
where a is a learnable parameter. The experiments in [12]
show that LReLU can lead to better results than ReLU and
PReLU by choosing the value a very carefully, but it need-
s tedious, repeated training. On the contrary, PReLU can
learn the parameter from the data. It is verified that the e N
PReLU converges faster and has lower train error than Re-
LU. In addition, it is said that the introduce of parameter Figure 6: The graphic depiction of ELU and MPELU func-
tion
a for the activation functions will not bring about over-
fitting [12]. Wei et al. [I8] has used deep convolution-
al neural network combining L1 regularization and PRe- manages the value to which an ELU saturates for negative
LU activation function to the research on image retrieval, network inputs. The vanishing gradient problem is allevi-
in which the experiments demonstrate that the over-fitting ated because the positive part of ELU is identity. More spe-
problem has indeed resolved and the efficiency of image cific, the derivative is one when = > 0. The left-saturation
retrieval has been improved by adopting PReLU activation lets deep neural networks with ELU activation function-
functions. s become more robust to the input perturbation or noise.

The 30th Chinese Control and Decision Conference (2018 CCDC) 1839
The output average of ELU is approach to zero which con- getting deeper, the training efficiency and accuracy have
tribute to faster convergence. Experimental results in [11] received many concentrations which stimulates the devel-
have shown that ELU can enable faster learning and the opments of activation functions. The saturated activation
generalization performance of ELU are better than ReLU functions, like Sigmoid, hyperbolic tangent, are replaced
and LReLU activation functions when the layers of deep by non-saturated counterpart, such as ReLU, ELU. In this
neural networks are more than five. paper, the definition, pros and cons of several popular acti-
However, the same drawback of ELU as LReLU is that vation functions are reviewed. It should be acknowledged
searching a reasonable is important but time-consuming. that some effective activation functions have not investi-
According to this, Li et al. [8] proposed the multiple para- gated in this paper, such as maxout [27], softplus [28]. The
metric exponential linear unit (MPELU). aim of this paper is to make some contributions on the un-
derstanding of the development progress, attributions, and
Nk if x>0 appropriate choice of activation functions. And further in-
)
I = e —1) if w<o vestigation and analysis are required to improve the views
in this paper.
Among which, o, 3 are learnable parameters which control
the value to and at which MPELU saturates respectively. REFERENCES
And it has been reported that MPELU can become ELU, [1] Y. Lecun, Y. Bengio, and G. Hinton, Deep learning,
ReLU, LReLU, or PReLU by adjusting the two parameter- Nature, vol.521, No. 7553, 436-444, 2015.
s a, 8. Therefore, the advantages of MPELU include: 1)
The convergence property of ELU is also possessed by M- [2] J. Schmidhuber, Deep learning in neural networks: an
PELU, 2) The generalization capability of MPELU is better overview, Neural networks, vol. 61, 85-117, 2015.
than ReLU and PReLU based on the experiments on Ima-
geNet database. [3] Y. Guo, Y. Liu, A. Ocrlemans, S. Lao, and ct al, Deep
3 EXPERIMENTS learning for visual understanding: a review, Neuro-
computing, vol. 187, 27-48,2016.
In this paper, we have conducted experiments by deep con-
volutional neural network (DCNN), whose structure and [4 S. Christian, W. Liu, Y. Jia, and et al, Going deeper
parameters can be seen in Fig. 7. From the figure we can with convolutions, In Proceedings of the IEEE confer-
see that, the DCNN contains two 5*5 convolutional layers ence on computer vision and pattern recognition, 1-9,
and two 2*2 max-pooling layers, both the stride is fixed to 2015.
1 pixel. Then followed a Fully-Connected(FC) layer. Dur-
ing training, the input to our DCNN is a fixed-size 28*28 [5] K. He, X. Zhang, S. Ren, and J. Sun, Deep residu-
gray image. The dataset is segmented as training dataset al learning for image recognition, In Proceedings of
with 60, 000 samples and test dataset with 10, 000 samples. the IEEE conference on computer vision and pattern
The cross-entropy loss function is used, the learning rate is recognition, 770-778, 2016.
chosen as e~*, and the number of iterations is 20, 000.
The activation functions including sigmoid, hyperbolic tan- [6: X. Glorot, and Y. Bengio, Understanding the difficulty
gent, ReLU, RReLU, and ELU are considered for the DC- of training deep feedforward neural networks, Journal
NN in Fig. 7 respectively. For RReLU and ELU, we con- of Machine Learning Research, vol. 9, 249-256, 2010.
ducted different experiments by adjusting the values of pa- [7 V. Nair, and G. E. Hinton, Rectified linear units im-
rameter a for RReLU and a for ELU to choose their best
prove restricted boltzmann machines, In Proceedings
performance. The classification errors with different acti-
of International Conference on International Confer-
vation functions are listed in Tab. 1. It can be seen that
ence on Machine Learning, 807-814, 2010.
Table 1: Classification error comparisons of DCNNs with [8] Y. Li, C. Fan, Y. Li, and Q. Wu, Improving deep neural
different activation functions on MNIST network with multiple parametric exponential linear u-
‘Activation function _Parameter__Error (%) nits, arXiv: 1606.00305,2016.
Sigmoid T15
Tanh - 112 [9] A.L.Maas, A. Y. Hannun, and A. Y. Ng, Rectifier non-
ReLU - 08 linearities improve neural network acoustic models, In
RReLU a 0.99 Proceedings of International Conference on Machine
ELU o L1 Learning, Vol.30, No.1, 1-6, 2013,
the proposed network in this paper with ReLU function [10] B. Xu, N. Wang, T. Chen, and M. Li, Empirical evalu-
achieves the best classification performance than others. ation of rectified activations in convolutional network,
4 CONCLUSION arXiv:1505.00853,2015.

In the last decades, deep neural networks have acquired [11] D. Clevert, T. Unterthiner, and S. Hochreiter. Fast and
rapid improvements, especially in computer vision or natu- accurate deep network learning by exponential linear
ral language processing. Along with the layers of network units (ELUs), arXiv:1511.07289,2015.

1840 The 30th Chinese Control and Decision Conference (2018 CCDC)
p—
Kemet o3 A ePotngt [[ Comolon2
LTS
e R N epts2 2 R g
TRREL/ (ReLU)
e

—_—
— InnerProduct
MAX Pooling2
Kemel_size:2
BEeppKol stride:2.

loss functionn
aceuracy 8 ReLU
softmaxwithLoss

Figure 7: The structure of DCNN employed in this paper.

[12] K. He, X. Zhang, S. Ren, and J. Sun, Delving deep European Conference on Computer Vision, 818-833,
into rectifiers: surpassing human-level performance on 2014,
ImageNet classification, In Proceedings of the IEEE
international conference on computer vision, 1026- [21] S. Pierre, D. Eigen, X. Zhang, and et al., Overfeat:
1034,2015. integrated recognition, localization and detection using
convolutional networks, arXiv:1312.6229,2013.
[13] C. Guleehre, M. Moczulski, M. Denil, and Y. Bengio,
Noisy activation functions, In Proceedings of Interna- [22] K. He, X. Zhang, S. Ren, and J. Sun, Spatial pyra-
tional Conference on Machine Learning, 3059-3068, mid pooling in deep convolutional networks for visual
2016.
recognition, IEEE transactions on pattern analysis and
machine intelligence, vol. 37, no. 9, 1904-1916, 2015.
[14] Y. Huang, X. Duan, S. Sun, and et al., A study of
[23] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisser-
training algorithm in deep neural networks based on
man, Return of the devil in the details: Delving deep
Sigmoid activation function, Computer measurement
into convolutional nets, arXiv:1405.3531,2014.
and control, vol. 25, no. 2, 126-129,2017.
[24] M. D. Zeiler, M. Ranzato, R. Monga, and et al., On
[15] Y. Lecun, L. Bottou, G. B. Orr, and K. Miiller, Effi- rectified linear units for speech processing, In Proceed-
cient Backprop, Neural networks: Tricks of the trade, ings of IEEE International Conference on Acoustics,
Springer Berlin Heidelberg, 9-50, 1998. Speech and Signal Processing (ICASSP), 3517-3521,
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Ima- 2013.
genct classification with deep convolutional neural net- [25] M. Lin, Q. Chen, and S. Yan, Network in network,
works, In Proceedings of Advances in neural informa- arXiv:1312.4400,2013.
tion processing systems, 1097-1105,2012.
[26] R. K. Srivastava, J. Masci, S. Kazerounian, F. Gomez,
[17] A. L. Maas, A. Y. Hannun, and A. Y. Ng, Rectifier and J. Schmidhuber, Compete to compute, In Proceed-
nonlinearities improve neural network acoustic mod- ings of Advances in neural information processing sys-
els, In Proceedings of International conference on ma- tems, 2310-2318,2013.
chine learning, Computer Science Department, vol. 30,
no. 1,2013. [27] L. J. Goodfellow, D. Warde-Farley, M. Mirza, A.
Courville, and Y. Bengio, Maxout networks, arX-
[18] Q. Wei, and W. Wang, Research on image retrieval us- iv:1302.4389,2013.
ing deep convolutional neural network combining L1
regularization and PRelu activation function, In IOP [28] D. Charles, Y. Bengio, F. Blisle, C. Nadeau, and R.
Conference Series: Earth and Environmental Science Garcia, Incorporating second-order functional knowl-
, Vol. 69, No. 1, 012156, 2017. edge for better option pricing, In Proceedings of Ad-
vances in neural information processing systems, 472-
[19] K. Simonyan, and A. Zisserman, Very deep convolu- 478,2001.
tional networks for large-scale image recognition, arX-
iV:1409.1556,2014. [29] P. Lennie, The cost of cortical computation, Current
biology, vol. 13, no. 6, 493-497, 2003.
[20] M. D. Zeiler, and R. Fergus, Visualizing and un-
derstanding convolutional networks, In Proceedings of

The 30th Chinese Control and Decision Conference (2018 CCDC) 1841

Koe-064 Object Oriented Programming
0% (2)
Koe-064 Object Oriented Programming
2 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
DL Answers
No ratings yet
DL Answers
24 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Activation functions 2
No ratings yet
Activation functions 2
5 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
ML Mentorship Prahitha Movva V1
No ratings yet
ML Mentorship Prahitha Movva V1
5 pages
Activation
No ratings yet
Activation
7 pages
Activation Funtions
No ratings yet
Activation Funtions
26 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Activation Function
No ratings yet
Activation Function
43 pages
3-Activation Function, Loss Function-24-07-2024
No ratings yet
3-Activation Function, Loss Function-24-07-2024
19 pages
Activation Function
No ratings yet
Activation Function
44 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Activation Function
No ratings yet
Activation Function
31 pages
Module1
No ratings yet
Module1
124 pages
activatn fn 2
No ratings yet
activatn fn 2
10 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Unit 2_Activation Function_PR
No ratings yet
Unit 2_Activation Function_PR
22 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Activation functions
No ratings yet
Activation functions
9 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
Activation Function
No ratings yet
Activation Function
9 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
activation fn
No ratings yet
activation fn
15 pages
UNIT V NEURAL NETWORKS
No ratings yet
UNIT V NEURAL NETWORKS
35 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Activation Function
No ratings yet
Activation Function
7 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Activation Function
No ratings yet
Activation Function
4 pages
Pr1_ANN_Writeup.docx
No ratings yet
Pr1_ANN_Writeup.docx
7 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
0905 Cs 161183 Vishal
No ratings yet
0905 Cs 161183 Vishal
38 pages
Activation Function - Lect 1
No ratings yet
Activation Function - Lect 1
5 pages
Activation Function: Deep Neural Networks
No ratings yet
Activation Function: Deep Neural Networks
47 pages
CS217_2024_lec11
No ratings yet
CS217_2024_lec11
7 pages
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
From Everand
Enhancing Deep Learning Performance Using Displaced Rectifier Linear Unit
David Macêdo
No ratings yet
Deep_Learning_Notes
No ratings yet
Deep_Learning_Notes
4 pages
RNN and LSTM
No ratings yet
RNN and LSTM
32 pages
TM Samples
No ratings yet
TM Samples
17 pages
Machine Learning for Engineering and science applications - - Unit 14 - Week 11
No ratings yet
Machine Learning for Engineering and science applications - - Unit 14 - Week 11
2 pages
Brookshear - Turing Machines
No ratings yet
Brookshear - Turing Machines
64 pages
NN Lecture Notes
No ratings yet
NN Lecture Notes
45 pages
CS310: Automata Theory 2019: Lecture 4: Subset Construction
No ratings yet
CS310: Automata Theory 2019: Lecture 4: Subset Construction
23 pages
Lecture 8: Pole Placement and Observers P Outline Outline
No ratings yet
Lecture 8: Pole Placement and Observers P Outline Outline
9 pages
CM20315 03 Shallow
No ratings yet
CM20315 03 Shallow
59 pages
FSM and Efficient Synthesizable FSM Design Using Verilog
No ratings yet
FSM and Efficient Synthesizable FSM Design Using Verilog
10 pages
Turing
No ratings yet
Turing
13 pages
Introduction To Probability and Statistics: Slides 3 - Chapter 3
No ratings yet
Introduction To Probability and Statistics: Slides 3 - Chapter 3
38 pages
Lloseng CH 05 E2
No ratings yet
Lloseng CH 05 E2
66 pages
Syllabus - CS 231N PDF
No ratings yet
Syllabus - CS 231N PDF
1 page
Alignment A CSP (Profile 1)
No ratings yet
Alignment A CSP (Profile 1)
1 page
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
Probability and Statistics & MA2011 CS, It, Csce: If and Are Independent, Then ( )
No ratings yet
Probability and Statistics & MA2011 CS, It, Csce: If and Are Independent, Then ( )
10 pages
Lect 5
No ratings yet
Lect 5
17 pages
Ts
No ratings yet
Ts
726 pages
deep learning important questions as per jntuh syllabus
No ratings yet
deep learning important questions as per jntuh syllabus
4 pages
Be Computer Engineering Semester 5 2023 December Theoretical Computer Sciencerev 2019 C Scheme
No ratings yet
Be Computer Engineering Semester 5 2023 December Theoretical Computer Sciencerev 2019 C Scheme
1 page
ARIMA Model Python Example - Time Series Forecasting
No ratings yet
ARIMA Model Python Example - Time Series Forecasting
11 pages
Unit-5
No ratings yet
Unit-5
59 pages
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models - A Case Study On ChatGPT
No ratings yet
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models - A Case Study On ChatGPT
8 pages
Final Neural 2018 May
No ratings yet
Final Neural 2018 May
2 pages
Automata and Complexity Theory Group 5
No ratings yet
Automata and Complexity Theory Group 5
12 pages
TAFL Typing Notes All Units PDF
No ratings yet
TAFL Typing Notes All Units PDF
117 pages

Activation Functions and Their Characteristics in Deep Neural Networks

Uploaded by

Activation Functions and Their Characteristics in Deep Neural Networks

Uploaded by

Activation Functions and Their Characteristics in Deep Neural

1 INTRODUCTION 2 ACTIVATION FUNCTION

“This work is supported by the Natural Science Foundation of Jiangsu .

2.1.2 Hyperbolic Tangent

Fig. 5 gives the graph of LReLU function, which is nearly

In the training process, a is a random number sampled from

2.3 ELU and Its Improvements

Figure 7: The structure of DCNN employed in this paper.

You might also like