9.b Handout-4-Activation Functions

Uploaded by

calabi mozart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views4 pages

9.b Handout-4-Activation Functions

Uploaded by

calabi mozart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

10/24/2018 CS231n Convolutional Neural Networks for Visual Recognition

“dislike” (activation near zero) certain linear regions of its input space. Hence, with an appropriate
loss function on the neuron’s output, we can turn a single neuron into a linear classi er:

Binary Softmax classi er. For example, we can interpret σ(∑i wi xi + b) to be the probability of
one of the classes P (y i = 1 ∣ xi ; w) . The probability of the other class would be
P (y i = 0 ∣ xi ; w) = 1 − P (y i = 1 ∣ xi ; w) , since they must sum to one. With this

interpretation, we can formulate the cross-entropy loss as we have seen in the Linear
Classi cation section, and optimizing it would lead to a binary Softmax classi er (also known as
logistic regression). Since the sigmoid function is restricted to be between 0-1, the predictions of
this classi er are based on whether the output of the neuron is greater than 0.5.

Binary SVM classi er. Alternatively, we could attach a max-margin hinge loss to the output of the
neuron and train it to become a binary Support Vector Machine.

Regularization interpretation. The regularization loss in both SVM/Softmax cases could in this
biological view be interpreted as gradual forgetting, since it would have the effect of driving all
synaptic weights w towards zero after every parameter update.

A single neuron can be used to implement a binary classi er (e.g. binary Softmax or binary SVM
classi ers)

Commonly used activation functions

Every activation function (or non-linearity) takes a single number and performs a certain xed
mathematical operation on it. There are several activation functions you may encounter in
practice:

Left: Sigmoid non-linearity squashes real numbers to range between [0,1] Right: The tanh non-linearity
squashes real numbers to range between [-1,1].

http://cs231n.github.io/neural-networks-1/ 4/13
10/24/2018 CS231n Convolutional Neural Networks for Visual Recognition

Sigmoid. The sigmoid non-linearity has the mathematical form σ(x) = 1/(1 + e−x ) and is
shown in the image above on the left. As alluded to in the previous section, it takes a real-valued
number and “squashes” it into range between 0 and 1. In particular, large negative numbers
become 0 and large positive numbers become 1. The sigmoid function has seen frequent use
historically since it has a nice interpretation as the ring rate of a neuron: from not ring at all (0)
to fully-saturated ring at an assumed maximum frequency (1). In practice, the sigmoid non-
linearity has recently fallen out of favor and it is rarely ever used. It has two major drawbacks:

Sigmoids saturate and kill gradients. A very undesirable property of the sigmoid neuron is
that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these
regions is almost zero. Recall that during backpropagation, this (local) gradient will be
multiplied to the gradient of this gate’s output for the whole objective. Therefore, if the local
gradient is very small, it will effectively “kill” the gradient and almost no signal will ow
through the neuron to its weights and recursively to its data. Additionally, one must pay
extra caution when initializing the weights of sigmoid neurons to prevent saturation. For
example, if the initial weights are too large then most neurons would become saturated and
the network will barely learn.
Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of
processing in a Neural Network (more on this soon) would be receiving data that is not
zero-centered. This has implications on the dynamics during gradient descent, because if
the data coming into a neuron is always positive (e.g. x > 0 elementwise in
f = w x + b )), then the gradient on the weights w will during backpropagation become
T

either all be positive, or all negative (depending on the gradient of the whole expression f ).
This could introduce undesirable zig-zagging dynamics in the gradient updates for the
weights. However, notice that once these gradients are added up across a batch of data the
nal update for the weights can have variable signs, somewhat mitigating this issue.
Therefore, this is an inconvenience but it has less severe consequences compared to the
saturated activation problem above.

Tanh. The tanh non-linearity is shown on the image above on the right. It squashes a real-valued
number to the range [-1, 1]. Like the sigmoid neuron, its activations saturate, but unlike the
sigmoid neuron its output is zero-centered. Therefore, in practice the tanh non-linearity is always
preferred to the sigmoid nonlinearity. Also note that the tanh neuron is simply a scaled sigmoid
neuron, in particular the following holds: tanh(x) = 2σ(2x) − 1.

http://cs231n.github.io/neural-networks-1/ 5/13
10/24/2018 CS231n Convolutional Neural Networks for Visual Recognition

Left: Recti ed Linear Unit (ReLU) activation function, which is zero when x < 0 and then linear with slope 1
when x > 0. Right: A plot from Krizhevsky et al. (pdf) paper indicating the 6x improvement in convergence
with the ReLU unit compared to the tanh unit.

ReLU. The Recti ed Linear Unit has become very popular in the last few years. It computes the
function f (x) = max(0, x) . In other words, the activation is simply thresholded at zero (see
image above on the left). There are several pros and cons to using the ReLUs:

(+) It was found to greatly accelerate (e.g. a factor of 6 in Krizhevsky et al.) the convergence
of stochastic gradient descent compared to the sigmoid/tanh functions. It is argued that
this is due to its linear, non-saturating form.
(+) Compared to tanh/sigmoid neurons that involve expensive operations (exponentials,
etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero.
(-) Unfortunately, ReLU units can be fragile during training and can “die”. For example, a
large gradient owing through a ReLU neuron could cause the weights to update in such a
way that the neuron will never activate on any datapoint again. If this happens, then the
gradient owing through the unit will forever be zero from that point on. That is, the ReLU
units can irreversibly die during training since they can get knocked off the data manifold.
For example, you may nd that as much as 40% of your network can be “dead” (i.e. neurons
that never activate across the entire training dataset) if the learning rate is set too high. With
a proper setting of the learning rate this is less frequently an issue.

Leaky ReLU. Leaky ReLUs are one attempt to x the “dying ReLU” problem. Instead of the function
being zero when x < 0, a leaky ReLU will instead have a small negative slope (of 0.01, or so). That
is, the function computes f (x) = 1(x < 0)(αx) + 1(x >= 0)(x) where α is a small
constant. Some people report success with this form of activation function, but the results are
not always consistent. The slope in the negative region can also be made into a parameter of
each neuron, as seen in PReLU neurons, introduced in Delving Deep into Recti ers, by Kaiming He
et al., 2015. However, the consistency of the bene t across tasks is presently unclear.

http://cs231n.github.io/neural-networks-1/ 6/13
10/24/2018 CS231n Convolutional Neural Networks for Visual Recognition

Maxout. Other types of units have been proposed that do not have the functional form
f (w x + b) where a non-linearity is applied on the dot product between the weights and the
T

data. One relatively popular choice is the Maxout neuron (introduced recently by Goodfellow et al.)
that generalizes the ReLU and its leaky version. The Maxout neuron computes the function
max(w x + b1 , w x + b2 ) . Notice that both ReLU and Leaky ReLU are a special case of this
T T
1 2

form (for example, for ReLU we have w1 , b1 = 0). The Maxout neuron therefore enjoys all the
bene ts of a ReLU unit (linear regime of operation, no saturation) and does not have its
drawbacks (dying ReLU). However, unlike the ReLU neurons it doubles the number of parameters
for every single neuron, leading to a high total number of parameters.

This concludes our discussion of the most common types of neurons and their activation
functions. As a last comment, it is very rare to mix and match different types of neurons in the
same network, even though there is no fundamental problem with doing so.

TLDR: “What neuron type should I use?” Use the ReLU non-linearity, be careful with your learning
rates and possibly monitor the fraction of “dead” units in a network. If this concerns you, give
Leaky ReLU or Maxout a try. Never use sigmoid. Try tanh, but expect it to work worse than
ReLU/Maxout.

Neural Network architectures

Layer-wise organization
Neural Networks as neurons in graphs. Neural Networks are modeled as collections of neurons
that are connected in an acyclic graph. In other words, the outputs of some neurons can become
inputs to other neurons. Cycles are not allowed since that would imply an in nite loop in the
forward pass of a network. Instead of an amorphous blobs of connected neurons, Neural
Network models are often organized into distinct layers of neurons. For regular neural networks,
the most common layer type is the fully-connected layer in which neurons between two adjacent
layers are fully pairwise connected, but neurons within a single layer share no connections. Below
are two example Neural Network topologies that use a stack of fully-connected layers:

http://cs231n.github.io/neural-networks-1/ 7/13

AD-0376-C C20 - Alarm and Montoring System - Operation
100% (2)
AD-0376-C C20 - Alarm and Montoring System - Operation
243 pages
G23 Parts Manual
No ratings yet
G23 Parts Manual
43 pages
Advance Auditing and Assurance
100% (2)
Advance Auditing and Assurance
182 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Activation Function
No ratings yet
Activation Function
13 pages
Activation Function
No ratings yet
Activation Function
43 pages
Automatically Build ML Models On Amazon SageMaker Autopilot - Tapan Hoskeri
No ratings yet
Automatically Build ML Models On Amazon SageMaker Autopilot - Tapan Hoskeri
26 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Activation Function
No ratings yet
Activation Function
36 pages
Artificial Neural Networks (ANN)
No ratings yet
Artificial Neural Networks (ANN)
67 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
E50417-H8940-C592-A1 en Manual SICAM FCM Configurator
No ratings yet
E50417-H8940-C592-A1 en Manual SICAM FCM Configurator
48 pages
CCS345 Ethics and AI Lecture Notes 1
No ratings yet
CCS345 Ethics and AI Lecture Notes 1
3 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Cyber Security UNIT-2
No ratings yet
Cyber Security UNIT-2
40 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Jeffrey Roy Auth. From Machinery To Mobility Government and Democracy in A Participative Age
No ratings yet
Jeffrey Roy Auth. From Machinery To Mobility Government and Democracy in A Participative Age
137 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Architectures Discription
No ratings yet
Architectures Discription
75 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Clean Code March 2017
No ratings yet
Clean Code March 2017
82 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Week 14 (NN)
No ratings yet
Week 14 (NN)
49 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Design and Construction of A Battery Level Indicator
No ratings yet
Design and Construction of A Battery Level Indicator
10 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
ANNs
No ratings yet
ANNs
57 pages
Types of Neural Network Activation Functions - How To Choose
No ratings yet
Types of Neural Network Activation Functions - How To Choose
36 pages
Paper 4
No ratings yet
Paper 4
33 pages
Introduccion A Spark
No ratings yet
Introduccion A Spark
22 pages
1.a-CMPS460-S22-Welcome To ML
No ratings yet
1.a-CMPS460-S22-Welcome To ML
57 pages
Neural Networks and Neural Language Models
No ratings yet
Neural Networks and Neural Language Models
27 pages
Lec6. Operator Overload
No ratings yet
Lec6. Operator Overload
28 pages
Clean Code v2017 en
No ratings yet
Clean Code v2017 en
55 pages
Ffu 0001114 01
No ratings yet
Ffu 0001114 01
27 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
TK Series Magnet GPS Tracker USER MANUAL
No ratings yet
TK Series Magnet GPS Tracker USER MANUAL
26 pages
Simulation of Five-Level Five-Phase SVPWM Voltage Source Inverter PDF
No ratings yet
Simulation of Five-Level Five-Phase SVPWM Voltage Source Inverter PDF
5 pages
Research Proposal Presentation
No ratings yet
Research Proposal Presentation
20 pages
CS231n Convolutional Neural Networks For Visual Recognition 2
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 2
12 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
Neural Networks From Scratch: 3.1 Formal Neuron
No ratings yet
Neural Networks From Scratch: 3.1 Formal Neuron
8 pages
Wcms 2nd Unit Notes
No ratings yet
Wcms 2nd Unit Notes
31 pages
Practical Daa Soham
No ratings yet
Practical Daa Soham
33 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
No ratings yet
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
15 pages
Activation Function
No ratings yet
Activation Function
34 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Assignment 1 Front Sheet: Qualification BTEC Level 5 HND Diploma in Computing
No ratings yet
Assignment 1 Front Sheet: Qualification BTEC Level 5 HND Diploma in Computing
23 pages
8.c-CMPS460-S22-Probabilitic Modeling - Logistic Regression
No ratings yet
8.c-CMPS460-S22-Probabilitic Modeling - Logistic Regression
16 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
Neural Network Notes
No ratings yet
Neural Network Notes
8 pages
Module 2
No ratings yet
Module 2
13 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Getting Into Neural Networks
No ratings yet
Getting Into Neural Networks
15 pages
Pr1 ANN Writeup
No ratings yet
Pr1 ANN Writeup
7 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
18 pages
1.d-CMPS460-S22-Formalizing The Learning Problem
No ratings yet
1.d-CMPS460-S22-Formalizing The Learning Problem
9 pages
Activation
No ratings yet
Activation
7 pages
Neural Networks
No ratings yet
Neural Networks
11 pages
Rattrapage
No ratings yet
Rattrapage
6 pages
Exp 23 - (21203A0048 - Anvita Keer)
No ratings yet
Exp 23 - (21203A0048 - Anvita Keer)
7 pages
ANN Notes
No ratings yet
ANN Notes
7 pages
Module v13 050 Advanced Programming en
No ratings yet
Module v13 050 Advanced Programming en
3 pages
RRL - Revision
No ratings yet
RRL - Revision
4 pages
What Are The Activation Functions, How Do I Deter...
No ratings yet
What Are The Activation Functions, How Do I Deter...
3 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
3 pages
Test TP Programmation Sujet 5 + Solution
No ratings yet
Test TP Programmation Sujet 5 + Solution
4 pages
Activation F
No ratings yet
Activation F
4 pages
Activation Functions
No ratings yet
Activation Functions
4 pages
Document 8 Study
No ratings yet
Document 8 Study
8 pages
Manual - IP - Firewall - L7 - MikroTik Wiki
No ratings yet
Manual - IP - Firewall - L7 - MikroTik Wiki
3 pages
AP-14 Ver 1.0 EN
No ratings yet
AP-14 Ver 1.0 EN
3 pages
Basic Computer Terminologies
No ratings yet
Basic Computer Terminologies
2 pages
Class8-IIT Screening Test QP Sample Paper
No ratings yet
Class8-IIT Screening Test QP Sample Paper
2 pages
7
No ratings yet
7
2 pages
JS 1 Maths
No ratings yet
JS 1 Maths
2 pages
Exemple de Contrôle Continu
No ratings yet
Exemple de Contrôle Continu
1 page
Edge CB JS1B Unit 8 Overview
No ratings yet
Edge CB JS1B Unit 8 Overview
1 page
Milagrow IMap10.0 User Manual (Page 24 of 29) M
No ratings yet
Milagrow IMap10.0 User Manual (Page 24 of 29) M
1 page
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet

9.b Handout-4-Activation Functions

Uploaded by

9.b Handout-4-Activation Functions

Uploaded by

10/24/2018 CS231n Convolutional Neural Networks for Visual Recognition

Commonly used activation functions

Neural Network architectures

You might also like