0% found this document useful (0 votes)
8 views9 pages

Activation Functions

The document outlines an experiment on the study of activation functions in neural networks, aiming to understand their characteristics and applications. It details various linear and non-linear activation functions, their mathematical expressions, and their suitability for different tasks in machine learning. The conclusion emphasizes the importance of activation functions in enabling neural networks to learn complex patterns and relationships in data.

Uploaded by

stutiii24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

Activation Functions

The document outlines an experiment on the study of activation functions in neural networks, aiming to understand their characteristics and applications. It details various linear and non-linear activation functions, their mathematical expressions, and their suitability for different tasks in machine learning. The conclusion emphasizes the importance of activation functions in enabling neural networks to learn complex patterns and relationships in data.

Uploaded by

stutiii24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

K. J.

Somaiya College of Engineering, Mumbai-77


(Autonomous College Affiliated to University of Mumbai)

Batch: B1 Roll No.: 1713082

Experiment / assignment / tutorial No. 1

Grade: AA / AB / BB / BC / CC / CD /DD

Signature of the Staff In-charge with date

Title: Study of various activation functions used in neural networks.


AIM: To study the characteristics of linear and non-linear activation functions commonly used in
neural networks.
OUTCOME: After conduction of the experiment, student will be able to:
1) Learn the characteristics of various activation functions used in practical neural network applications.
2) Make effective decisions in the choice of suitable and appropriate activation function for a given
application.

Theoretical Background:
The Neural Networks have been inspired by the goal of modeling biological neural systems and
are increasingly achieving good results in Machine Learning tasks. The basic computational unit
of the brain is a neuron. Approximately 86 billion neurons can be found in the human nervous
system and are connected with approximately 1014 – 1015 synapses. The diagram below shows a
simplified drawing of a biological neuron (left) and a common mathematical model (right). Each
neuron receives input signals from its dendrites and produces output signal along its axon. The
axon eventually branches out and connects via synapses to dendrites of other neurons. In the
computational model of a neuron, the signals that travel along the axons (e.g. x0) interact
multiplicatively (e.g. w0x0) with the dendrites of the other neuron based on the synaptic strength
at that synapse (e.g. w0). The idea is that the synaptic strengths (the weights w) are learnable and
control the strength of influence (and its direction: excitatory –(positive weight) or inhibitory
(negative weight)) of one neuron on another. In the basic model, the dendrites carry the signal to
the cell body where they all get summed. If the final sum is above a certain threshold, the neuron
can fire, sending a spike along its axon. In the computational model, we model the firing of the
neuron with an activation function, f.

Figure: Simplified drawing of a biological neuron and its mathematical model

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
Activation function can be either linear or non-linear depending on the function it represents, and
are used to control the outputs of neural networks, across different domains including object
recognition and classification, speech recognition, scene understanding and description, cancer
detection systems, finger print detection, weather forecast and self-driving cars. Training a
neural network involves manipulation of the training data with some optimization algorithm that
involves gradient processing, usually gradient descent, and produces an output for the neural
network, that contains the patterns in the data. The on-going research has proved that a proper
choice of an activation function improves results in neural network based applications. The
expected output determines the type of activation functions to be deployed in hidden and output
layer of a network.
Commonly used activation functions and their characteristics are as follows.
Sr. Activation Activation function expression and its Characteristics
No. function derivative
1 Identity f x   x ;  x  Linear

f ' x   1 ;  x  Cannot detect complex


patterns in data
 Generally used in input layer
 Used for linear regression task
2 Binary Step  0 for x  0  Non-linear
f x   
 1 for x  0  Used in the output layer
 Was used earlier in
 ? for x  0
f ' x    Perceptron
 0 for x  0
 Used for binary classification
3 Bipolar Step   1 for x  0  Non-linear
f x   
 1 for x  0  Used in the output layer

 ? for x  0  Was used earlier in


f ' x    Perceptron
 0 for x  0
 Used for binary classification
4 Sigmoid 1  Non-linear
f x  
1 e  x  Non-zero centered output
 Low convergence rate
f '  x   f  x 1  f  x 

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
 Suffers from vanishing
gradient problem
5 Bipolar 1 e  x  Non-linear
Sigmoid f x  
1 e  x  Zero centered output

2e x  Suffers from vanishing


f ' x   gradient problem
e x  12
6 Hyperbolic 2  Non-linear
Tangent f x   1
1 e  2 x  Zero centered output

f ' x   1  f x 2  Steeper than Sigmoid


 Suffers from vanishing
gradient problem
7 Rectified f  x   max 0, x   Non-linear
Linear
Unit(ReLU)  Does not suffer from
vanishing gradient problem
 0 for x  0
f ' x   
 1 for x  0  Allows faster and effective
training of network
 Tend to die out due to being
non-differentiable at origin.
8 Leaky ReLU f  x   max x, x   Non-linear
 Does not suffer from
 for x  0
f ' x    vanishing gradient problem
1 for x  0
 Allows faster and effective
training of network
 Overcomes dying ReLU
problem
9 SoftMax e xi  Non-linear
f xi   x
e j  Used for multi-class
j classification problem

f  xi   f  xi  1  f  x x  for i  j

x j   f  xi   f x j   for i  j

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
10 Maxout 
f  x   max w1T x , w2T x   Non-linear

f ' x   w  Generalizes ReLU and Leaky


ReLU
 Does not saturate and does not
die.
 Computationally complex

Observations: 1) Execute python code to observe the characteristic plot of the above
activation functions and plot them below.

Activatio Characteristic plot Activation Characteristic plot


n function function
Identity Hyperbolic
Tangent

Binary Rectified
Step Linear Unit
(ReLU)

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
Bipolar Leaky
Step ReLU

Sigmoid Bipolar
Sigmoid

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
2) Search with the help of references given and find out the type of activation functions
used in the hidden and outer layer of the following well known Deep neural networks.

Network Activation function Activation function Reference


used in hidden layers used in outer layer
Description
AlexNet Rectified Linear Unit Softmax Krizhevsky, I. Sutskever, and G. E.
(ReLU) Hinton, “Imagenet classification with
deep convolutional neural networks,”
2012. [Online].Available:
http://citeseerx.ist.psu.edu/viewdoc/su
mmary
VGGNet Rectified Linear Unit Softmax K. Simonyan and A. Zisserman,
(ReLU) “Very Deep Convolutional Networks
for Large-Scale Image Recognition,”
arXiv, 2015. [Online]. Available:
https://arxiv.org/pdf/1409.1556.pdf

GoogleNet Rectified Linear Unit Softmax C. Szegedy, W. Liu, Y. Jia, P.


(ReLU) Sermanet, S. Reed, D. Anguelov, D.
Erhan, V. Vanhoucke, and A.
Rabinovich, “Going deeper with
convolutions,,” in Conference on
Computer Vision and Pattern
Recognition (CVPR), vol. 2015.
IEEE, 2015, pp. 1–9.
ResNet-56 Rectified Linear Unit Rectified Linear Unit K. He, X. Zhang, S. Ren, and J. Sun,
(ReLU) (ReLU) “Deep Residual Learning for Image
Recognition,” in IEEE Conf. Comput.
Vis. Pattern Recognit. IEEE, 2016, pp.
770–778. [Online]. Available:
https://doi.org/10.1109/CVPR.2016.9

MobileNet Rectified Linear Unit Softmax A. G. Howard, M. Zhu, B. Chen, D.


(ReLU) Kalenichenko, W. Wang, T. Weyand,
M. Andreetto, and H. Adam,
“MobileNets: Efficient Convolutional
Neural Networks for Mobile Vision
Applications,,” arXiv, 2017. [Online].
Available:
http://arxiv.org/abs/1704.04861
LeNet-5 Tanh Softmax LeCun, Yann, et al. "Handwritten digit
recognition with a back-propagation
network." Advances in neural information
processing systems. 1990.

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
K. J. Somaiya College of Engineering, Mumbai-77
(Autonomous College Affiliated to University of Mumbai)

3) Following figure shows a two layer neural network consisting of one hidden layer and
one output layer. Two activation functions, S= Bipolar Sigmoid, and L= Identity, can be
implemented for the
three neurons\nodes in the network.

Network Activation functions in Activation functions in hidden


hidden layer and output and output layer, if the
layer, if the network network simulates binary
simulates linear regression, classifier
y =β1x1+β2x2
L = Identity because it takes S = Bipolar Sigmoid. The
W1 W5 the inputs, multiplied by the sigmoid or logistic activation
x1 weights for each neuron, and function maps the input values
W2 creates an output signal in the range (0,1), which is
y proportional to the input. In essentially their probability of
W3 one sense, a linear function is belonging to a class. So, it is
better than a step function mostly used for multi-
x2 W6 because it allows multiple class classification. The sigmoid
W4 outputs, not just yes and no. function can be scaled to have
The value it returns is a any range of output values,
continuous numerical value. depending upon the problem.
When the range is from −1 to
−1, it is called a bipolar sigmoid.

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
K. J. Somaiya College of Engineering, Mumbai-77
(Autonomous College Affiliated to University of Mumbai)

4) State the desirable properties of activation functions used in neural networks. (you mad add more
rows if required)
Sr. Desirable property
No.
1. Non -linearity
1.1. Complex mapping of input - output
2. Differentiable (continuously)
2.1. Requirement for backpropagation algorithm
2.1.1. It requires computation of the gradient of the activation function at each iteration step
2.2. We should be able to find slope of the curve at any two points
3. Monotonic and Continuous
3.1. Should be a function which is either entirely non-increasing or non-decreasing
3.1.1. Ensures that gradients/derivatives do not vanish(zero)

5) For the following network consisting of a single neuron, two inputs and one output, write a
Python code to find output considering the various activation functions you have studied above.
Assume some relevant values for x1, x2, w1 and w2.

w1 yin
x1
yout
x2
w2

x=[x1,x2] w=[w1,w2] Net input to Activation Output of neuron,


neuron, function used at yout
Yin=x1w1 +x2w2 neuron output
[6,3] [2,4] 24 Identity
24
[6,-3] [-2,4] -24 Binary Step
0
[7,-5] [1,2] -3 Bipolar Step
-1
[2,4] [5,1] 14 Sigmoid 0.9999991684719722

[2,4] [-5,1] -6 Bipolar Sigmoid -0.9950547536867305

[1,2] [7,4] 15 Tanh 0.999999999999813

[7,8] [6,2] 58 ReLU 58

[7,8] [-6,2] -26 Leaky ReLU -2.6

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No
K. J. Somaiya College of Engineering, Mumbai-77
(Autonomous College Affiliated to University of Mumbai)

Conclusion:

In artificial neurons, inputs and weights are given from which the weighted sum of input is
calculated, and then it is given to an activation function that converts it into the output. So basically
an activation function is used to map the input to the output. This activation function helps a neural
network to learn complex relationships and patterns in data. One important use of the activation
function is to keep output restricted to a particular range. Another use of activation function is to add
non-linearity in data. Most of the time for hidden layers ReLU and its variants are used and for the
final layer, softmax or linear function is used depending upon the type of problem.

Signature of faculty in-charge

Department of Electronics and Telecommunication Engineering


NNFL/SEM VII/August-Nov 20 Page No

You might also like