Activation Function

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Activation Function.

Activation Function.
Introduction.

An activation function in a neural network defines how the weighted


sum of the input is transformed into an output from a node or nodes in a
layer of the network. Sometimes the activation function is called a
“transfer function.”
inputs are multiplied by the weights in a node and summed together.
This value is referred to as the summed activation of the node. The
summed activation is then transformed via an activation function and
defines the specific output or “activation” of the node.
Introduction.

• The choice of activation function has a large impact on the capability


and performance of the neural network, and different activation
functions may be used in different parts of the model.

• A network may have three types of layers: input layers that take raw
input from the domain, hidden layers that take input from another
layer and pass output to another layer, and output layers that make a
prediction.
Introduction.
The choice of activation function in the hidden layer will control how well the
network model learns the training dataset. The choice of activation function in the
output layer will define the type of predictions the model can make.

As such, a careful choice of activation function must be made for each deep learning
neural network project.

• Activation functions are a key part of neural network design.


• The modern default activation function for hidden layers is the ReLU function.
• The activation function for output layers depends on the type of prediction
problem.
Activation functions

Activation functions are also typically differentiable, meaning the first-


order derivative can be calculated for a given input value. This is
required given that neural networks are typically trained using the
backpropagation of error algorithm that requires the derivative of
prediction error in order to update the weights of the model.
Activation functions

It is used to determine the output of neural network like yes or no. It


maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending
upon the function).

• The Activation Functions can be basically divided into 2 types-


1. Linear Activation Function
2. Non-linear Activation Functions
Activation functions

• Activation functions are mathematical equations that determine the


output of a neural network. The function is attached to each neuron in
the network, and determines whether it should be activated (“fired”) or
not, based on whether each neuron’s input is relevant for the model’s
prediction. Activation functions also help normalize the output of each
neuron to a range between 1 and 0 or between -1 and 1.
Activation for Hidden Layers

• A hidden layer in a neural network is a layer that receives input from another layer (such
as another hidden layer or an input layer) and provides output to another layer (such as
another hidden layer or an output layer).

• A hidden layer does not directly contact input data or produce outputs for a model, at
least in general.

• A neural network may have zero or more hidden layers.

• Typically, a differentiable nonlinear activation function is used in the hidden layers of a


neural network. This allows the model to learn more complex functions than a network
trained using a linear activation function.
Activation for Hidden Layers

• Rectified Linear Activation (ReLU)


• Logistic (Sigmoid)
• Hyperbolic Tangent (Tanh)
Activation for Hidden Layers
• A neural network will almost always have the same activation function
in all hidden layers.

• It is most unusual to vary the activation function through a network


model.

• Traditionally, the sigmoid activation function was the default activation


function in the 1990s. Perhaps through the mid to late 1990s to 2010s,
the Tanh function was the default activation function for hidden layers.
Activation for Hidden Layers
• Both the sigmoid and Tanh functions can make the model more
susceptible to problems during training, via the so-called vanishing
gradients problem.

• The activation function used in hidden layers is typically chosen based


on the type of neural network architecture.

• Modern neural network models with common architectures, such as


MLP and CNN, will make use of the ReLU activation function, or
extensions.
Activation for Hidden Layers
• Recurrent networks still commonly use Tanh or sigmoid activation functions,
or even both. For example, the LSTM commonly uses the Sigmoid activation
for recurrent connections and the Tanh activation for output.

• Multilayer Perceptron (MLP): ReLU activation function.


• Convolutional Neural Network (CNN): ReLU activation function.
• Recurrent Neural Network: Tanh and/or Sigmoid activation function.

• If you’re unsure which activation function to use for your network, try a few
and compare the results.
Activation for Output Layers

• The output layer is the layer in a neural network model that directly outputs a
prediction.
• All feed-forward neural network models have an output layer.
• There are perhaps three activation functions you may want to consider for use
in the output layer; they are:

Linear
Logistic (Sigmoid)
SoftMax
the output of the functions will not be confined
between any range.

Equation : f(x) = x

Range : (-infinity to infinity)

It doesn’t help with the complexity or various


parameters of usual data that is fed to the
neural networks.

It takes the inputs, multiplied by the weights


for each neuron, and creates an output
signal proportional to the input. In one
sense, a linear function is better than a step
function because it allows multiple outputs,
not just yes and no.
Linear activation functions
• With linear activation functions, no matter how many layers in the
neural network, the last layer will be a linear function of the first layer
(because a linear combination of linear functions is still a linear
function). So a linear activation function turns the neural network into
just one layer.

• A neural network with a linear activation function is simply a linear


regression model. It has limited power and ability to handle
complexity varying parameters of input data.
The Nonlinear Activation Functions are the
most used activation functions.
It makes it easy for the model to
generalize or adapt with variety of
data and to differentiate between
the output.

The main terminologies needed to understand


for nonlinear functions are:
Derivative or Differential: Change in y-axis
w.r.t. change in x-axis.It is also known as slope.

The Nonlinear Activation Functions are


mainly divided on the basis of
their range or curves-
Nonlinear Activation Functions
• Modern neural network models use non-linear activation functions. They allow the model to
create complex mappings between the network’s inputs and outputs, which are essential for
learning and modeling complex data, such as images, video, audio, and data sets which are
non-linear or have high dimensionality.

• Almost any process imaginable can be represented as a functional computation in a neural


network, provided that the activation function is non-linear.

1.They allow backpropagation because they have a derivative function which is related to the
inputs.
2.They allow “stacking” of multiple layers of neurons to create a deep neural network. Multiple
hidden layers of neurons are needed to learn complex data sets with high levels of accuracy.
1. Sigmoid or Logistic Activation Function

The main reason why we use sigmoid


function is because it exists between (0
to 1). Therefore, it is especially used for
models where we have to predict the
probability as an output.

Since probability of anything exists only


between the range of 0 and 1, sigmoid
is the right choice.
Sigmoid or Logistic Activation Function

• It is the same function used in the logistic regression classification


algorithm.

• The function takes any real value as input and outputs values in the
range 0 to 1. The larger the input (more positive), the closer the output
value will be to 1.0, whereas the smaller the input (more negative), the
closer the output will be to 0.0.
Key Points:-

• The function is differentiable.That means, we can find the slope of


the sigmoid curve at any two points.

• The logistic sigmoid function can cause a neural network to get stuck
at the training time.
• The softmax function is a more generalized logistic activation
function which is used for multiclass classification.
2. Tanh or hyperbolic tangent Activation Function
Tanh or hyperbolic tangent Activation Function

• The hyperbolic tangent activation function is also referred to simply as the Tanh (also
“tanh” and “TanH“) function.

• It is very similar to the sigmoid activation function and even has the same S-shape.

• The function takes any real value as input and outputs values in the range -1 to 1. The
larger the input (more positive), the closer the output value will be to 1.0, whereas
the smaller the input (more negative), the closer the output will be to -1.0.

• The Tanh activation function is calculated as follows:


• (e^x – e^-x) / (e^x + e^-x)
ReLU (Rectified Linear Unit)

The ReLU function is calculated as follows:


max(0.0, x)
This means that if the input value (x) is negative, then a value 0.0 is
returned, otherwise, the value is returned.
Softmax

• Advantages

• Able to handle multiple classes only one class in other activation functions—
normalizes the outputs for each class between 0 and 1, and divides by their
sum, giving the probability of the input value being in a specific class.

• Useful for output neurons—typically Softmax is used only for the output
layer, for neural networks that need to classify inputs into multiple categories.
Softmax
• The Softmax outputs a vector of values that sum to 1.0 that can be interpreted as
probabilities of class membership.

• It is related to the argmax function that outputs a 0 for all options and 1 for the chosen
option. Softmax is a “softer” version of argmax that allows a probability-like output of a
winner-take-all function.

• As such, the input to the function is a vector of real values and the output is a vector of the
same length with values that sum to 1.0 like probabilities.

• The softmax function is calculated as follows:


• e^x / sum(e^x)
How to Choose an Output Activation Function

• You must choose the activation function for your output layer based on the type
of prediction problem that you are solving.
• Specifically, the type of variable that is being predicted.
• For example, you may divide prediction problems into two main groups,
predicting a categorical variable (classification) and predicting a numerical
variable (regression).
• If your problem is a regression problem, you should use a linear activation
function.
• Regression: One node, linear activation.
• If your problem is a classification problem, then there are three main types of
classification problems and each may use a different activation function.
• Binary Classification: One node, sigmoid activation.
• Multiclass Classification: One node per class, softmax activation.
• Multilabel Classification: One node per class, sigmoid activation.

You might also like