Activation Function
Activation Function
Abstract—Artificial Neural Networks are inspired from linear activation function is used where the output is similar
the human brain and the network of neurons present in as the input fed along with some error. A linear activation
the brain. The information is processed and passed on function’s boundary is linear and the if they are used, then
from one neuron to another through neuro synaptic the network can adapt to only the linear changes of the
junctions. Similarly, in artificial neural networks there input
are different layers of cells arranged and connected to but, in real world the errors possess non-linear
each other. The output/information from the inner characteristics which in turn with the neural networks
layers of the neural network are passed on to the next ability to learn about erroneous data. Hence non-linear
layers and finally to the outermost layer which gives the activation functions are preferred over linear activation
output. The input to the outer layer is provided non- functions in a Neural Network.
linearity to inner layers’ output so that it can be further The most appealing property of Artificial Neural
processed. In an Artificial Neural Network, activation Networks is the ability to adapt their behavior according to
functions are very important as they help in learning the changing characteristics of the system. In the last few
and making sense of non-linear and complicated decades many researchers and scientists have performed
mappings between the inputs and corresponding outputs. studies and investigated a number of methods to improve
the performance of Artificial Neural Networks by
I. INTRODUCTION optimizing the training methods, hyperparameter tuning,
Activation Functions are specially used in artificial neural learn parameters or network structures but not much
networks to transform an input signal into an output signal attention has been paid towards activation functions.
which in turn is fed as input to the next layer in the stack.
II. NEURAL NETWORKS
In an artificial neural network, we calculate the sum of
products of inputs and their corresponding weights and According to inventor of one of the first neurocomputer,
finally apply an activation function to it to get the output of neural network can be defined as:
that particular layer and supply it as the input to the next "...a computing system made up of a number of simple,
layer. highly interconnected processing elements, which process
A Neural Network’s prediction accuracy is dependent on information by their dynamic state response to external
the number of layers used and more importantly on the type inputs.
of activation function used. There is no manual that specify Artificial Neural Networks are based on the network of
the minimum or maximum number of layers to be used for neurons in the mammalian cortex and are modelled loosely
better results and accuracy of the neural networks but a but on a much smaller scale. Artificial Neural Networks can
thumb rule shows that a minimum 2 layers to be used. be algorithms or an actual piece of hardware. There are
Neither is there any mention in literature of the type of billions of neurons present in the mammalian brain which
activation function to be used. It is evident from studies and gives enormous magnitude of the interaction and emergent
research that using a single/multiple hidden layer in a behavior, but in an Artificial Neural Network there may
neural network reduces the error in predictions. have hundreds or thousands of processor units which is very
A neural network’s prediction accuracy is defined by the small as compared to mammalian brain structure.
type of activation function used. The most commonly used Neural Networks are organized in multiple layers and
activation functions are non-linear activation functions. A each layer is made up of a number of interconnected nodes
neural network works just like a linear regression model which have activation functions associated with them. Data
where the predicted output is same as the provided input if is fed to the network via the input layer which then
an activation function is not defined. Same is the case if a communicates with other layers and process the input data
310
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 4, Issue 12, ISSN No. 2455-2143, Pages 310-316
Published Online April 2020 in IJEAST (http://www.ijeast.com)
with the help of a system of weighted connections. This and also a complex architecture for extracting knowledge,
processed data is then obtained through the output layer. which again is our ultimate goal.
III. WHY NEURAL NETWORKS NEED ACTIVATION IV. THE NEED FOR NON-LINEARITY IN NEURAL
FUNCTIONS? NETWORKS
Neural Networks are a network of multiple layers of Those functions which have degree more than one and have
neurons consisting of nodes which are used for a curvature when plotted are known as Non-linear
classification and prediction of data provided some data as functions. It is required of a neural network to learn,
input to the network. There is an input layer, one or many represent and process any data and any arbitrary complex
hidden layers and an output layer. All the layers have nodes function which maps the inputs to the outputs. Neural
and each node has a weight which is considered while Networks are also known as Universal Function
processing information from one layer to the next layer. Approximators which, means that they can compute and
learn any function provided to them. Any imaginable
process can be represented as a functional computation in
Neural Networks.
Thus, we need to apply an activation function to make the
network dynamic and add the ability to it to extract complex
and complicated information from data and represent non-
linear convoluted random functional mappings between
input and output. Hence, by adding non linearity with the
help of non-linear activation functions to the network, we
are able to achieve non-linear mappings from inputs to
outputs. An important feature of an activation function is
that it must be differentiable so that we can implement back
propagation optimization strategy in order to compute the
errors or losses with respect to weights and eventually
optimize weights using Gradient Descend or any other
optimization technique to reduce errors.
311
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 4, Issue 12, ISSN No. 2455-2143, Pages 310-316
Published Online April 2020 in IJEAST (http://www.ijeast.com)
312
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 4, Issue 12, ISSN No. 2455-2143, Pages 310-316
Published Online April 2020 in IJEAST (http://www.ijeast.com)
313
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 4, Issue 12, ISSN No. 2455-2143, Pages 310-316
Published Online April 2020 in IJEAST (http://www.ijeast.com)
It is also a variant of Rectified Linear Unit with better which was discovered by researchers at GOOGLE. The
performance and a slight variation. It resolves the distinguishing feature of Swish function is that it is nit
problem of gradient of ReLU becoming zero for Monotonic, which means that the value of function
negative values of x by introducing a new parameter of may decrease even though the values of inputs are
the negative part of the function i.e Slope. increasing. In some cases, Swish outperforms even the
It is expressed as: ReLU function.
f(x) = x, x >= 0 It is expressed mathematically as:
f(x) = ax, x < 0 f(x) = x * sigmoid(x)
f(x) = x/(1 – e-x)
314
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 4, Issue 12, ISSN No. 2455-2143, Pages 310-316
Published Online April 2020 in IJEAST (http://www.ijeast.com)
accomplished. Different Activation Functions have both paying the much-needed attention and research in the study
advantages and dis-advantages of their own and it depends of activation functions as they affect neural network’s
on the type of system that we are designing. For example.: performance. Also, there are various activation functions
For classification problems, a combination of that are not discussed in this literature as they aren’t widely
sigmoid functions gives better results. used in deep learning but rather we have emphasized on the
Due to vanishing gradient problem i.e. gradient most commonly used activation functions In future we could
also work to compare all the activation functions and
reaching the value zero, sigmoid and tanh
analyze their performance using standard datasets and
functions are avoided.
architectures to see if their performance can be further
ReLU function is the most widely used function and improved.
performs better than other activation functions in
most of the cases. VIII. REFERENCES
If there are dead neurons in our network, then we
can use the leaky ReLU function. [1.] KARLIK, B., & OLGAC, A. V. (2011). PERFORMANCE
ReLU function has to be used only in the hidden ANALYSIS OF VARIOUS ACTIVATION FUNCTIONS IN
layers and not in the outer layer. GENERALIZED MLP ARCHITECTURES OF NEURAL
NETWORKS. INTERNATIONAL JOURNAL OF ARTIFICIAL
We can experiment with different activation functions
INTELLIGENCE AND EXPERT SYSTEMS, 1(4), 111-122.
while developing a model if time constraints are not there.
We start with ReLU function and then move on to other
functions if it does not give satisfactory results. [2.] AGOSTINELLI, F., HOFFMAN, M., SADOWSKI, P., &
As studies have shown that both sigmoid and tanh BALDI, P. (2014). LEARNING ACTIVATION
FUNCTIONS TO IMPROVE DEEP NEURAL
functions are not suitable for hidden layers because the
NETWORKS. ARXIV PREPRINT ARXIV:1412.6830.
slope of function becomes very small as the input becomes
very large or very small which in turn slows down gradient
descent. ReLu is the most preferred choice for apllying with [3.] CHEN, T., & CHEN, H. (1995). UNIVERSAL
hidden layers as the derivative of ReLU is 1. Also, leaky APPROXIMATION TO NONLINEAR OPERATORS BY
NEURAL NETWORKS WITH ARBITRARY ACTIVATION
ReLU can be used in case of zero derivatives. An activation
FUNCTIONS AND ITS APPLICATION TO DYNAMICAL
function which is going to approximate the function faster
SYSTEMS.IEEE TRANSACTIONS ON NEURAL
and can be trained at a faster rate can also be chosen.
NETWORKS, 6(4), 911-917.
VII. CONCLUSION
[4.] STINCHCOMBE, M., & WHITE, H. (1989,
This paper provides a brief description of various activation
DECEMBER). UNIVERSAL APPROXIMATION USING
functions that are used in the field of deep learning and also
FEEDFORWARD NETWORKS WITH NON-SIGMOID
about the importance of activation functions in developing
HIDDEN LAYER ACTIVATION FUNCTIONS. IN IJCNN
an effective and efficient deep learning model and
INTERNATIONAL JOINT CONFERENCE ON NEURAL
improving the performance of artificial neural networks.
NETWORKS.
This paper highlights the need of activation function and
the need for non-linearity in neural networks. Firstly, we
have given a description of activation functions, then we [5.] CAO, J., & WANG, J. (2004). ABSOLUTE
have given a brief description of the need of activation EXPONENTIAL STABILITY OF RECURRENT NEURAL
functions and the need of non-linearity in neural networks. NETWORKS WITH LIPSCHITZ-CONTINUOUS
We then describe various types of activation functions that ACTIVATION FUNCTIONS AND TIME DELAYS. NEURAL
are commonly used in neural networks. Activation NETWORKS, 17(3), 379-390.
Functions have the ability to improve the learning rate and
learning of patterns present in the dataset which, in turn, [6.] HUANG, G. B., & BABRI, H. A. (1998). UPPER
helps in automation of process of feature detection, BOUNDS ON THE NUMBER OF HIDDEN NEURONS IN
extraction and predictions. This paper justifies the use of FEEDFORWARD NETWORKS WITH ARBITRARY
activation functions in the hidden layers of neural networks BOUNDED NONLINEAR ACTIVATION
and their usefulness in classification across various FUNCTIONS. IEEE TRANSACTIONS ON NEURAL
domains. These activation functions were developed after NETWORKS, 9(1), 224-229.
extensive research and experiments over the years.
There has been little emphasis on activation functions in
the past but now, at present scientists and developers are
315
International Journal of Engineering Applied Sciences and Technology, 2020
Vol. 4, Issue 12, ISSN No. 2455-2143, Pages 310-316
Published Online April 2020 in IJEAST (http://www.ijeast.com)
316