0% found this document useful (0 votes)

26 views

Mod 2.3 - Activation Function, Loss Functions

Uploaded by

Christeena Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Mod 2.3 - Activation Function, Loss Functions

Uploaded by

Christeena Antony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

• Why Do We Need Activation Functions?

• An activation function Φ(v) in the output layer can

control the nature of the output (e.g., probability value
in [0, 1])

• In multilayer neural networks, activation functions

bring non-linearity into hidden layers, which increases
the complexity of the model.

A neural network with any number of layers but only linear

activations can be shown to be equivalent to a single-layer
network.

Binary Step Function

Binary step function depends on a threshold value that decides whether a
neuron should be activated or not.

The input fed to the activation function is compared to a certain threshold; if

the input is greater than it, then the neuron is activated, else it is deactivated,
meaning that its output is not passed on to the next hidden layer.

Mathematically it can be represented as:

Binary Step Function

Here are some of the limitations of binary step function:
 It cannot provide multi-value outputs—for example, it cannot be used
for multi-class classification problems.
 The gradient of the step function is zero, which causes a hindrance in
the backpropagation process.

Linear Activation Function

The linear activation function, also known as "no activation," or "identity
function” is where the activation is proportional to the input.

The function doesn't do anything to the weighted sum of the input, it simply
spits out the value it was given.

Linear Activation Function

Mathematically it can be represented as: f(x)=x

However, a linear activation function has two major problems :

 It’s not possible to use backpropagation as the derivative of the
function is a constant and has no relation to the input x.
 All layers of the neural network will collapse into one if a linear
activation function is used. No matter the number of layers in the
neural network, the last layer will still be a linear function of the first
layer. So, essentially, a linear activation function turns the neural
network into just one layer.

Non-Linear Activation Functions

The linear activation function is simply a linear regression model.

Because of its limited power, this does not allow the model to create
complex mappings between the network’s inputs and outputs.

Non-linear activation functions solve the following limitations of linear

activation functions:
 They allow backpropagation because now the derivative function
would be related to the input, and it’s possible to go back and
understand which weights in the input neurons can provide a better
prediction.
 They allow the stacking of multiple layers of neurons as the output
would now be a non-linear combination of input passed through
multiple layers. Any output can be represented as a functional
computation in a neural network.

Non-Linear Neural Networks Activation Functions

Sigmoid / Logistic Activation Function

This function takes any real value as input and outputs values in the range
of 0 to 1.

The larger the input (more positive), the closer the output value will be to
1.0, whereas the smaller the input (more negative), the closer the output will
be to 0.0, as shown below.
Sigmoid/Logistic Activation Function

Mathematically it can be represented as:

Sigmoid/logistic activation function is one of the most widely used functions:

Reasons
 It is commonly used for models where we have to predict the
probability as an output. Since probability of anything exists only
between the range of 0 and 1, sigmoid is the right choice because of
its range.
 The function is differentiable and provides a smooth gradient, i.e.,
preventing jumps in output values. This is represented by an S-shape
of the sigmoid activation function.

Tanh Function (Hyperbolic Tangent)

Tanh function is very similar to the sigmoid/logistic activation function, and even
has the same S-shape with the difference in output range of -1 to 1. In Tanh,
the larger the input (more positive), the closer the output value will be to 1.0,
whereas the smaller the input (more negative), the closer the output will be to -
1.0.

Tanh Function (Hyperbolic Tangent)

Mathematically it can be represented as:

Advantages of using this activation function are:

 The output of the tanh activation function is Zero centered; hence we can
easily map the output values as strongly negative, neutral, or strongly
positive.
 Usually used in hidden layers of a neural network as its values lie
between -1 to 1; therefore, the mean for the hidden layer comes out to
be 0 or very close to it. It helps in centering the data and makes learning
for the next layer much easier.

ReLU Function
ReLU stands for Rectified Linear Unit.

Although it gives an impression of a linear function, ReLU has a derivative

function and allows for backpropagation while simultaneously making it
computationally efficient.

The main catch here is that the ReLU function does not activate all the neurons
at the same time.

The neurons will only be deactivated if the output of the linear transformation is
less than 0.
ReLU Activation Function

Mathematically it can be represented as : f(x)=max(0,x)

The advantages of using ReLU as an activation function are as follows:
 Since only a certain number of neurons are activated, the ReLU function
is far more computationally efficient when compared to the sigmoid and
tanh functions.
 ReLU accelerates the convergence of gradient descent towards the
global minimum of the loss function due to its linear, non-saturating
property.

The limitations faced by this function are:

 All the negative input values become zero immediately, which decreases
the model’s ability to fit or train from the data properly.
 The Dying ReLU problem.(Solved by an improved version named Leaky
ReLU)
Neural networks are a set of algorithms that are designed to
recognize trends/relationships in a given set of training data. These
algorithms are based on the way human neurons process
information.

This equation represents how a neural network processes the input

data at each layer and eventually produces a predicted output value.

To train — the process by which the model maps the relationship

between the training data and the outputs — the neural network
updates its hyperparameters, the weights, wT, and biases, b, to
satisfy the equation above.

Each training input is loaded into the neural network in a process

called forward propagation. Once the model has produced an
output, this predicted output is compared against the given target
output in a process called backpropagation — the
hyperparameters of the model are then adjusted so that it now
outputs a result closer to the target output.

This is where loss functions come in. Loss functions are one of the
most important aspects of neural networks, as they (along with the
optimization functions) are directly responsible for fitting the model
to the given training data
Loss Functions Overview

A loss function is a function that compares the target and

predicted output values; measures how well the neural network
models the training data. When training, we aim to minimize this
loss between the predicted and target outputs.

The hyperparameters are adjusted to minimize the average loss

— we find the weights, wT, and biases, b, that minimize the value
of J (average loss).

Types of Loss Functions

In supervised learning, there are two main types of loss functions —

these correlate to the 2 major types of neural networks: regression
and classification loss functions

1. Regression Loss Functions — used in regression neural

networks; given an input value, the model predicts a
corresponding output value (rather than pre-selected labels);

Ex. Mean Squared Error, Mean Absolute Error

2. Classification Loss Functions — used in classification neural

networks; given an input, the neural network produces a vector
of probabilities of the input belonging to various pre-set
categories — can then select the category with the highest
probability of belonging;

Ex. Binary Cross-Entropy, Categorical Cross-Entropy

Mean Squared Error (MSE)

One of the most popular loss functions, MSE finds the average of the
squared differences between the target and the predicted outputs

This function has numerous properties that make it especially suited

for calculating loss. The difference is squared, which means it does
not matter whether the predicted value is above or below the target
value; however, values with a large error are penalized. MSE is also a
convex function (as shown in the diagram above) with a clearly
defined global minimum — this allows us to more easily
utilize gradient descent optimization to set the weight values.

Mean Absolute Error (MAE)

MAE finds the average of the absolute differences between the target
and the predicted outputs.

This loss function is used as an alternative to MSE in some cases. As

mentioned previously, MSE is highly sensitive to outliers, which can
dramatically affect the loss because the distance is squared. MAE is
used in cases when the training data has a large number of outliers
to mitigate this.

It also has some disadvantages; as the average distance approaches

0, gradient descent optimization will not work, as the function's
derivative at 0 is undefined (which will result in an error, as it is
impossible to divide by 0).

Because of this, a loss function called a Huber Loss was developed,

which has the advantages of both MSE and MAE.

If the absolute difference between the actual and predicted value is

less than or equal to a threshold value, 𝛿, then MSE is applied.
Otherwise — if the error is sufficiently large — MAE is applied.

Binary Cross-Entropy/Log Loss

This is the loss function used in binary classification models —

where the model takes in an input and has to classify it into one of
two pre-set categories.
Classification neural networks work by outputting a vector of
probabilities — the probability that the given input fits into each of
the pre-set categories; then selecting the category with the highest
probability as the final output.

In binary classification, there are only two possible actual values of y

— 0 or 1. Thus, to accurately determine loss between the actual and
predicted values, it needs to compare the actual value (0 or 1) with
the probability that the input aligns with that category (p(i) =
probability that the category is 1; 1 — p(i) = probability that the
category is 0)

Categorical Cross-Entropy Loss

In cases where the number of classes is greater than two, we utilize

categorical cross-entropy — this follows a very similar process to
binary cross-entropy.

Binary cross-entropy is a special case of categorical cross-entropy,

where M = 2 — the number of categories is 2.

Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Activation Function
No ratings yet
Activation Function
9 pages
Activation Function
No ratings yet
Activation Function
43 pages
Activation Function
No ratings yet
Activation Function
36 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Activation Function
No ratings yet
Activation Function
4 pages
Module1
No ratings yet
Module1
124 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Activation
No ratings yet
Activation
7 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Types of Neural Network Activation Functions_ How to Choose_ (1)
No ratings yet
Types of Neural Network Activation Functions_ How to Choose_ (1)
36 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
Activation Function
No ratings yet
Activation Function
31 pages
activatn fn 2
No ratings yet
activatn fn 2
10 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Unit 2b
No ratings yet
Unit 2b
11 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Unit 2_Activation Function_PR
No ratings yet
Unit 2_Activation Function_PR
22 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
activation fn
No ratings yet
activation fn
15 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
5 TH
No ratings yet
5 TH
22 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Act_Fun
No ratings yet
Act_Fun
7 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Activation Function
No ratings yet
Activation Function
44 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
UNIT V NEURAL NETWORKS
No ratings yet
UNIT V NEURAL NETWORKS
35 pages
Activation Function
No ratings yet
Activation Function
18 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Forward_and_Backward_Propagation_Deep_Learning_1703697260
No ratings yet
Forward_and_Backward_Propagation_Deep_Learning_1703697260
9 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Activation Functions and Loss
No ratings yet
Activation Functions and Loss
17 pages
3-Activation Function, Loss Function-24-07-2024
No ratings yet
3-Activation Function, Loss Function-24-07-2024
19 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
Presentation for deep learning
No ratings yet
Presentation for deep learning
15 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Communication Lab Record
No ratings yet
Communication Lab Record
32 pages
CNN Notes
No ratings yet
CNN Notes
10 pages
Module 3
No ratings yet
Module 3
46 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages