0% found this document useful (0 votes)

11 views4 pages

Activation Functions

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

Activation Functions

Uploaded by

aimad baigouar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Activation Functions

Linear 1
ELU 1
ReLU 2
LeakyReLU 3
Sigmoid 3
Tanh 4
Softmax 4

Linear
A straight line function where activation is proportional to input ( which is the weighted sum from neuron ).

Function Derivative
R(z,m) = \begin{Bmatrix} z*m \\■ \end{Bmatrix}
R'(z,m) = \begin{Bmatrix} m \\■ \end{Bmatrix}

Pros

• It gives a range of activations, so it is not binary activation.

• We can definitely connect a few neurons together and if more than 1 fires, we could take the max ( or
softmax) and decide based on that.
Cons

• For this function, derivative is a constant. That means, the gradient has no relationship with X.
• It is a constant gradient and the descent is going to be on constant gradient.
• If there is an error in prediction, the changes made by back propagation is constant and not
depending on the change in input delta(x) !

ELU
Exponential Linear Unit or its widely known name ELU is a function that tend to converge cost to zero
faster and produce more accurate results. Different to other activation functions, ELU has a extra alpha
constant which should be positive number.
ELU is very similiar to RELU except negative inputs. They are both in identity function form for
non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas
RELU sharply smoothes.

Function Derivative
R(z) = \begin{Bmatrix} z & z > 0 \\■ α.( e^z – 1) & z <=R'(z)
0 \end{Bmatrix}
= \begin{Bmatrix} 1 & z>0 \\■α.e^z & z<0 \end{Bmatrix}
Pros

• ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
• ELU is a strong alternative to ReLU.
• Unlike to ReLU, ELU can produce negative outputs.
Cons
• For x > 0, it can blow up the activation with the output range of [0, inf].

ReLU
A recent invention which stands for Rectified Linear Units. The formula is deceptively simple: .
Despite its name and appearance, it’s not linear and provides the same benefits as Sigmoid but with better
performance.

Function Derivative
R(z) = \begin{Bmatrix} z & z > 0 \\■ 0 & z <= 0 \end{Bmatrix}
R'(z) = \begin{Bmatrix} 1 & z>0 \\■0 & z<0 \end{Bmatrix}

Pros

• It avoids and rectifies vanishing gradient problem.

• ReLu is less computationally expensive than tanh and sigmoid because it involves simpler
mathematical operations.
Cons

• One of its limitation is that it should only be used within Hidden layers of a Neural Network Model.
• Some gradients can be fragile during training and can die. It can cause a weight update which will
makes it never activate on any data point again. Simply saying that ReLu could result in Dead
Neurons.
• In another words, For activations in the region (x<0) of ReLu, gradient will be 0 because of which the
weights will not get adjusted during descent. That means, those neurons which go into that state will
stop responding to variations in error/ input ( simply because gradient is 0, nothing changes ). This is
called dying ReLu problem.
• The range of ReLu is [0, inf). This means it can blow up the activation.
Further reading

• Deep Sparse Rectifier Neural Networks Glorot et al., (2011)

• Yes You Should Understand Backprop, Karpathy (2016)
LeakyReLU
LeakyRelu is a variant of ReLU. Instead of being 0 when , a leaky ReLU allows a small, non-zero,
constant gradient (Normally, ). However, the consistency of the benefit across tasks is
presently unclear. 1

Function Derivative
R(z) = \begin{Bmatrix} z & z > 0 \\■ \alpha z & z <= 0 \end{Bmatrix}
R'(z) = \begin{Bmatrix} 1 & z>0 \\■\alpha & z<0 \end{Bmatrix}

Pros

• Leaky ReLUs are one attempt to fix the "dying ReLU" problem by having a small negative slope (of
0.01, or so).
Cons

• As it possess linearity, it can't be used for the complex Classification. It lags behind the Sigmoid and
Tanh for some of the use cases.
Further reading

• Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,

Kaiming He et al. (2015)

Sigmoid
Sigmoid takes a real value as input and outputs another value between 0 and 1. It’s easy to work with and
has all the nice properties of activation functions: it’s non-linear, continuously differentiable, monotonic,
and has a fixed output range.

Function Derivative

S(z) = 1 +1e −z S 0(z) = S(z) ⋅ (1 − S(z))

Pros

• It is nonlinear in nature. Combinations of this function are also nonlinear!

• It will give an analog activation unlike step function.
• It has a smooth gradient too.
• It’s good for a classifier.
• The output of the activation function is always going to be in range (0,1) compared to (-inf, inf) of
linear function. So we have our activations bound in a range. Nice, it won’t blow up the activations
then.
Cons

• Towards either end of the sigmoid function, the Y values tend to respond very less to changes in X.
• It gives rise to a problem of “vanishing gradients”.
• Its output isn’t zero centered. It makes the gradient updates go too far in different directions. 0 <
output < 1, and it makes optimization harder.
• Sigmoids saturate and kill gradients.
• The network refuses to learn further or is drastically slow ( depending on use case and until gradient
/computation gets hit by floating point value limits ).
Further reading

• Yes You Should Understand Backprop, Karpathy (2016)

Tanh
Tanh squashes a real-valued number to the range [-1, 1]. It's non-linear. But unlike Sigmoid, its output is
zero-centered. Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity.
1

Function Derivative
z
tanh(z) = ee z − e −z
tanh 0(z) = 1 − tanh(z)2
+ e −z

Pros

• The gradient is stronger for tanh than sigmoid ( derivatives are steeper).
Cons

• Tanh also has the vanishing gradient problem.

Softmax
Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general
way of saying, this function will calculate the probabilities of each target class over all possible target
classes. Later the calculated probabilities will be helpful for determining the target class for the given
inputs.
References

1(1, 2) http://cs231n.github.io/neural-networks-1/

Aesthetic and Regenative Gynecology
No ratings yet
Aesthetic and Regenative Gynecology
12 pages
Navara Engine Maintenance (Yd25ddti)
100% (6)
Navara Engine Maintenance (Yd25ddti)
50 pages
Quarter 4 - Module 5:: Storing Desserts
100% (2)
Quarter 4 - Module 5:: Storing Desserts
9 pages
CMA Examination Sample Questions
No ratings yet
CMA Examination Sample Questions
9 pages
Activation Function
No ratings yet
Activation Function
36 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Activation
No ratings yet
Activation
7 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Types of Neural Network Activation Functions_ How to Choose_ (1)
No ratings yet
Types of Neural Network Activation Functions_ How to Choose_ (1)
36 pages
Activation Function
No ratings yet
Activation Function
43 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Activation F
No ratings yet
Activation F
4 pages
Activation Function
No ratings yet
Activation Function
18 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Ml Ppt Activation Functions
No ratings yet
Ml Ppt Activation Functions
12 pages
Pr1_ANN_Writeup.docx
No ratings yet
Pr1_ANN_Writeup.docx
7 pages
Artificial Neural Networks(ANN)
No ratings yet
Artificial Neural Networks(ANN)
67 pages
3. Activation Function
No ratings yet
3. Activation Function
14 pages
5 TH
No ratings yet
5 TH
22 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Ijisae 4865
No ratings yet
Ijisae 4865
8 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Activation Functions
No ratings yet
Activation Functions
34 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
Unit 2b
No ratings yet
Unit 2b
11 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
Activation Functions
No ratings yet
Activation Functions
23 pages
Act_Fun
No ratings yet
Act_Fun
7 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Activation Function: Deep Neural Networks
No ratings yet
Activation Function: Deep Neural Networks
47 pages
06 AIS302 ANN backpropagation
No ratings yet
06 AIS302 ANN backpropagation
83 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Activations
No ratings yet
Activations
8 pages
what are the activation functions, how do i deter...
No ratings yet
what are the activation functions, how do i deter...
3 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Neural_Networks_Activation_Functions__1694135997
No ratings yet
Neural_Networks_Activation_Functions__1694135997
7 pages
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
No ratings yet
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
15 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
Activation Function
No ratings yet
Activation Function
13 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Activation function
No ratings yet
Activation function
20 pages
DL_Assi02
No ratings yet
DL_Assi02
9 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Lesson 13
No ratings yet
Lesson 13
17 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
ReLu Heuristics For Avoiding Local Bad Minima
100% (2)
ReLu Heuristics For Avoiding Local Bad Minima
10 pages
Rectified Linear Units (ReLU) in Deep Learning - Kaggle
No ratings yet
Rectified Linear Units (ReLU) in Deep Learning - Kaggle
3 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Sequence Mode
No ratings yet
Sequence Mode
15 pages
Module 2
No ratings yet
Module 2
13 pages
Activation Functions
No ratings yet
Activation Functions
10 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
Activation Function
No ratings yet
Activation Function
4 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Introduction to Logarithms and Exponentials
From Everand
Introduction to Logarithms and Exponentials
Simone Malacrida
No ratings yet
Aud Theo 96 102
No ratings yet
Aud Theo 96 102
7 pages
1888 Victorian
No ratings yet
1888 Victorian
18 pages
A Thousand Splendid Suns 24052021
No ratings yet
A Thousand Splendid Suns 24052021
5 pages
Shipping Law Dissertation
100% (2)
Shipping Law Dissertation
4 pages
Anthony BusinessPlan
No ratings yet
Anthony BusinessPlan
5 pages
0478 m17 Ms 22
No ratings yet
0478 m17 Ms 22
5 pages
Bandaging
No ratings yet
Bandaging
19 pages
8P
No ratings yet
8P
4 pages
NIJ Standard of Armour Materials
No ratings yet
NIJ Standard of Armour Materials
27 pages
Module Two - Docx - 1607439360085 PDF
No ratings yet
Module Two - Docx - 1607439360085 PDF
11 pages
Russians in Alaska 1732 1867 1st Edition Lydia Black - Download the ebook today to explore every detail
100% (2)
Russians in Alaska 1732 1867 1st Edition Lydia Black - Download the ebook today to explore every detail
46 pages
5th Term - Operations Management
No ratings yet
5th Term - Operations Management
58 pages
1.-Germans at Meat 1910
No ratings yet
1.-Germans at Meat 1910
4 pages
Leaflet on the Fees Evolution 2025 001
No ratings yet
Leaflet on the Fees Evolution 2025 001
47 pages
Untitled
No ratings yet
Untitled
875 pages
United States Patent: (10) Patent No.: US 6,660,751 B1
No ratings yet
United States Patent: (10) Patent No.: US 6,660,751 B1
9 pages
SERVPERF and Servperf
No ratings yet
SERVPERF and Servperf
19 pages
Bona Pre
No ratings yet
Bona Pre
11 pages
Research Report: Bvlgari
No ratings yet
Research Report: Bvlgari
49 pages
Coffee Seedlings Distribution Guidelines
No ratings yet
Coffee Seedlings Distribution Guidelines
2 pages
Answer of The Case Questions: 1. Answer: Summary of The Case
No ratings yet
Answer of The Case Questions: 1. Answer: Summary of The Case
10 pages
AMOM Lecture1 - Fundamentals
No ratings yet
AMOM Lecture1 - Fundamentals
50 pages
Iso 20766 2 2018
No ratings yet
Iso 20766 2 2018
9 pages
Effectiveness and Efficiency Analysis of Parallel Flow and Counter Flow Heat Exchangers
No ratings yet
Effectiveness and Efficiency Analysis of Parallel Flow and Counter Flow Heat Exchangers
6 pages
Elreph 3000 Family Operator Reference PDF
No ratings yet
Elreph 3000 Family Operator Reference PDF
79 pages
Inventions and Discoveries: Invention - Inventor
No ratings yet
Inventions and Discoveries: Invention - Inventor
13 pages

Activation Functions

Uploaded by

Activation Functions

Uploaded by

Activation Functions

• It gives a range of activations, so it is not binary activation.

• It avoids and rectifies vanishing gradient problem.

• Deep Sparse Rectifier Neural Networks Glorot et al., (2011)

• Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,

S(z) = 1 +1e −z S 0(z) = S(z) ⋅ (1 − S(z))

• It is nonlinear in nature. Combinations of this function are also nonlinear!

• Yes You Should Understand Backprop, Karpathy (2016)

• Tanh also has the vanishing gradient problem.

You might also like