Activation FN
Activation FN
Linearity refers to the property of a system or model where the output is directly
proportional to the input
while nonlinearity implies that the relationship between input and output is more
complex and cannot be expressed as a simple linear function
Non-Linear Classification refers to categorizing those instances that are not linearly
separable. It is possible to classify data with a straight line.
The linear transfer function calculates the neuron's output by simply returning the value
passed to it.
This neuron can be trained to learn an affine function of its inputs, or to find a linear
approximation to a nonlinear function.
Input Layer: This layer accepts input features. It provides information from the outside
world to the network, no computation is performed at this layer, nodes here just pass
on the information(features) to the hidden layer.
Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of
the abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer and transfers the result to
the output layer.
Output Layer: This layer bring up the information learned by the network to the outer
world.
What is an activation function and why use them?
We know, the neural network has neurons that work in correspondence with weight,
bias, and their respective activation function.
In a neural network, we would update the weights and biases of the neurons on the
basis of the error at the output. This process is known as back propagation.
Activation functions make the back-propagation possible since the gradients are
supplied along with the error to update the weights and biases
Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a linear regression
model.
The activation function does the non-linear transformation to the input making it
capable to learn and perform more complex tasks.
Mathematical proof
Neuron can not learn with just a linear function attached to it. A non-linear activation
function will let it learn as per the difference w.r.t error. Hence we need an activation
function.
Variants of Activation Function
Linear Function
Equation : Linear function has the equation similar to as of a straight line i.e. y = x
No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf
Uses : Linear activation function is used at just one place i.e. output layer.
For example : Calculation of price of a house is a regression problem. House price may
have any big/small value, so we can apply linear activation at output layer. Even in this
case neural net must have any non-linear function at hidden layers.
Sigmoid Function
It is a function which is plotted as ‘S’ shaped graph.
Equation : A = 1/(1 + e-x)
Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep.
This means, small changes in x would also bring about large changes in the value of Y.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or
1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted
easily to be 1 if value is greater than 0.5 and 0 otherwise.
Tanh Function
The activation that works almost always better than sigmoid function is Tanh function
also known as Tangent Hyperbolic function. It’s actually mathematically shifted
version of the sigmoid function. Both are similar and can be derived from each other.
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies between
-1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it,
hence helps in centering the data by bringing mean close to 0. This makes learning
for the next layer much easier.
RELU Function
It Stands for Rectified linear unit. It is the most widely used activation function. Mainly
implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and have
multiple layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are activated
making the network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
A rectified linear unit (ReLU) is an activation function that introduces the property
of non-linearity to a deep learning model and solves the vanishing gradients issue. "It
interprets the positive part of its argument. It is one of the most popular activation
functions in deep learning.
Stochastic (random) function
A stochastic (random) function X(t) is a many-valued numerical function of an
independent argument t, whose value for any fixed value t ∈ T (where T is the domain of
the argument) is a random variable, called a cut set .
If each neuron passes the threshold values then only the neurons will fire.
It is built by introducing random variation into the network and by giving stochastic
weights.
Stochastic modeling forecasts the probability of various outcomes under different
conditions, using random variables.
Stochastic modeling presents data and predicts outcomes that account for certain levels
of unpredictability or randomness.
With fixed input the output of stochastic neural net is likely to be different
(stochastic, or random to certain extent) for multiple evaluations
In contrast to deterministic neural networks, where for fixed input the output is also
unique (deterministic).
What is the capacity of a perceptron?
From an information theory point of view, a single perceptron with K inputs has a
capacity of 2K bits of information.