0% found this document useful (0 votes)
10 views15 pages

Activation FN

The document discusses the concepts of linearity and non-linearity in neural networks, explaining the roles of different activation functions such as ReLU, sigmoid, and tanh. It highlights the importance of non-linear activation functions for enabling complex learning tasks and outlines the structure of a neural network, including input, hidden, and output layers. Additionally, it covers stochastic functions and the capacity of perceptrons and neural networks, emphasizing the significance of activation functions in the learning process.

Uploaded by

trainy045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

Activation FN

The document discusses the concepts of linearity and non-linearity in neural networks, explaining the roles of different activation functions such as ReLU, sigmoid, and tanh. It highlights the importance of non-linear activation functions for enabling complex learning tasks and outlines the structure of a neural network, including input, hidden, and output layers. Additionally, it covers stochastic functions and the capacity of perceptrons and neural networks, emphasizing the significance of activation functions in the learning process.

Uploaded by

trainy045
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Linear & Non-Linear Units

Linearity refers to the property of a system or model where the output is directly
proportional to the input

while nonlinearity implies that the relationship between input and output is more
complex and cannot be expressed as a simple linear function

A Rectified Linear Unit (ReLU) is a form of activation function used commonly in


deep learning models.

The function returns 0 if it receives a negative input, and if it receives a positive


value, the function will return back the same positive value.
Linear Classification refers to categorizing a set of data points into a discrete class based
on a linear combination of its explanatory variables.

Non-Linear Classification refers to categorizing those instances that are not linearly
separable. It is possible to classify data with a straight line.

The linear transfer function calculates the neuron's output by simply returning the value
passed to it.

This neuron can be trained to learn an affine function of its inputs, or to find a linear
approximation to a nonlinear function.

A linear network cannot, be made to perform a nonlinear computation.


Elements of a Neural Network

Input Layer: This layer accepts input features. It provides information from the outside
world to the network, no computation is performed at this layer, nodes here just pass
on the information(features) to the hidden layer.

Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of
the abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer and transfers the result to
the output layer.

Output Layer: This layer bring up the information learned by the network to the outer
world.
What is an activation function and why use them?

The activation function decides whether a neuron should be activated or not by


calculating the weighted sum and further adding bias to it. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.

We know, the neural network has neurons that work in correspondence with weight,
bias, and their respective activation function.

In a neural network, we would update the weights and biases of the neurons on the
basis of the error at the output. This process is known as back propagation.

Activation functions make the back-propagation possible since the gradients are
supplied along with the error to update the weights and biases
Why do we need Non-linear activation function?
A neural network without an activation function is essentially just a linear regression
model.

The activation function does the non-linear transformation to the input making it
capable to learn and perform more complex tasks.
Mathematical proof

Elements of the diagram are as follows:


Hidden layer i.e. layer 1:
z(1) = W(1)X + b(1) a(1)
Here,
z(1) is the vectorized output of layer 1
W(1) be the vectorized weights assigned to
neurons of hidden layer i.e. w1, w2, w3 and
w4
X be the vectorized input features i.e. i1 and
i2
b is the vectorized bias assigned to neurons Layer 2 i.e. output layer :-
in hidden layer i.e. b1 and b2 Note : Input for layer 2 is
a(1) is the vectorized form of any linear output from layer 1
function. z(2) = W(2)a(1) + b(2)
(Note: We are not considering activation a(2) = z(2)
function here)
Calculation at Output layer
z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
Let,
[W(2) * W(1)] = W
[W(2)*b(1) + b(2)] = b
Final output : z(2) = W*X + b
which is again a linear function
This observation results again in a linear function even after applying a hidden layer,
hence we can conclude that, doesn’t matter how many hidden layer we attach in
neural net, all layers will behave same way because the composition of two linear
functions is a linear function itself.

Neuron can not learn with just a linear function attached to it. A non-linear activation
function will let it learn as per the difference w.r.t error. Hence we need an activation
function.
Variants of Activation Function

Linear Function
Equation : Linear function has the equation similar to as of a straight line i.e. y = x
No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
Range : -inf to +inf

Uses : Linear activation function is used at just one place i.e. output layer.

For example : Calculation of price of a house is a regression problem. House price may
have any big/small value, so we can apply linear activation at output layer. Even in this
case neural net must have any non-linear function at hidden layers.
Sigmoid Function
It is a function which is plotted as ‘S’ shaped graph.
Equation : A = 1/(1 + e-x)
Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very steep.
This means, small changes in x would also bring about large changes in the value of Y.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or
1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted
easily to be 1 if value is greater than 0.5 and 0 otherwise.
Tanh Function

The activation that works almost always better than sigmoid function is Tanh function
also known as Tangent Hyperbolic function. It’s actually mathematically shifted
version of the sigmoid function. Both are similar and can be derived from each other.

Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies between
-1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it,
hence helps in centering the data by bringing mean close to 0. This makes learning
for the next layer much easier.
RELU Function

It Stands for Rectified linear unit. It is the most widely used activation function. Mainly
implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
Value Range :- [0, inf)

Nature :- non-linear, which means we can easily backpropagate the errors and have
multiple layers of neurons being activated by the ReLU function.

Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are activated
making the network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.

A rectified linear unit (ReLU) is an activation function that introduces the property
of non-linearity to a deep learning model and solves the vanishing gradients issue. "It
interprets the positive part of its argument. It is one of the most popular activation
functions in deep learning.
Stochastic (random) function
A stochastic (random) function X(t) is a many-valued numerical function of an
independent argument t, whose value for any fixed value t ∈ T (where T is the domain of
the argument) is a random variable, called a cut set .

In stochastic neural networks, the algorithm instead of providing deterministic values


to each neurons it assigns probabilities to each neuron.

If each neuron passes the threshold values then only the neurons will fire.

It is built by introducing random variation into the network and by giving stochastic
weights.
Stochastic modeling forecasts the probability of various outcomes under different
conditions, using random variables.

Stochastic modeling presents data and predicts outcomes that account for certain levels
of unpredictability or randomness.

With fixed input the output of stochastic neural net is likely to be different
(stochastic, or random to certain extent) for multiple evaluations

In contrast to deterministic neural networks, where for fixed input the output is also
unique (deterministic).
What is the capacity of a perceptron?
From an information theory point of view, a single perceptron with K inputs has a
capacity of 2K bits of information.

What is the capacity of a neural network?


Neural networks are defined at various levels of abstraction, and thus it models different
aspects of neural networks. Therefore the network capacity is nothing but the levels of
abstraction or the number of fundamental memories or the number of patterns that can
be stored and recalled in a network.

What is the perceptron convergence procedure?


Perceptron Convergence Theorem: For any finite set of linearly separable labeled
examples, the Perceptron Learning Algorithm will halt after a finite number of
iterations. In other words, after a finite number of iterations, the algorithm yields a
vector w that classifies perfectly all the examples

You might also like