UNIT1 Perceptron MLP
UNIT1 Perceptron MLP
Proposed by Minsky-Papert
More general computational model than
McCulloch-Pitts neuron.
It overcomes some of the limitations of the M-
P neuron by introducing the concept of
numerical weights (a measure of importance)
for inputs, and a mechanism for learning
those weights.
Inputs are no longer limited to boolean values
like in the case of an M-P neuron, it supports
real inputs as well which makes it more useful
and generalized.
Now, this is very similar to an M-P neuron but we take a weighted sum of the inputs and set the output as
one only when the sum is more than an arbitrary threshold (theta).
EXAMPLE: Consider the task of predicting whether you would watch a random game of football on TV or not
using the behavioral data available. And let's assume your decision is solely dependent on 3 binary inputs
(binary for simplicity).
Here,
w_0 is called the bias because it represents the prior
(prejudice).
A football freak may have a very low threshold and may watch any
football game irrespective of the league, club or importance of the
game [theta = 0]. On the other hand, a selective viewer may only
watch a football game that is a premier league game, featuring
Man United game and is not friendly [theta = 2].
So, weights and the bias will depend on the data.
Based on the data, if needed the model may have to give a lot of
importance (high weight) to the isManUnitedPlaying input and
penalize the weights of other inputs.
Perceptron vs McCulloch-Pitts Neuron
What kind of functions can be implemented using a perceptron? How different is it from McCulloch-Pitts
neurons?
Fig. Illustrating the need of Activation Function for a complex problem. Activation function must be
efficient and it should reduce the computation time because the neural network sometimes trained on
millions of data points.
Types of AF:
The Activation Functions can be basically divided into 3 types-
1. Binary step Activation Function
2. Linear Activation Function
3. Non-linear Activation Functions
1. Binary Step Function
A binary step function is a threshold-based activation
function.
If the input value is above or below a certain threshold, the
neuron is activated and sends exactly the same signal to the
next layer.
We decide some threshold value to decide output that
neuron should be activated or deactivated.
It is very simple and useful to classify binary problems or
classifier.
Eg. f(x) = 1 if x > 0 else 0 if x <= 0
Equation: f(x) = x
Range : (-infinity to infinity)
It doesn’t help with the complexity or various parameters of
usual data that is fed to the neural networks
3. Non-linear Activation Function
Most used activation functions.
Nonlinearity helps to makes the graph look something like
this.
tanh(x) = 2 * sigmoid(2x) - 1
Advantages
Avoids vanishing gradient problem.
Computationally efficient—allows the network to converge
very quickly
Non-linear—although it looks like a linear function, ReLU
has a derivative function and allows for backpropagation
Disadvantages
Can only be used with a hidden layer
hard to train on small datasets and need much data for
learning non- linear behavior.
The Dying ReLU problem—when inputs approach zero, or
are negative, the gradient of the function becomes zero,
the network cannot perform backpropagation and cannot
learn.
The function and its derivative both are monotonic.
All the negative values are converted into zero, and this
conversion rate is so fast that neither it can map nor fit into
data properly which creates a problem.
Leaky ReLU Activation Function
We needed the Leaky ReLU activation function to solve the
Dying ReLU‟ problem.
Leaky ReLU we do not make all negative inputs to zero but
to a value near to zero which solves the major issue of ReLU
activation function.
R(z) = max (0.1*z, z)
Advantages
Prevents dying ReLU problem—this variation of ReLU has a
small positive slope in the negative area, so it does enable
backpropagation, even for negative input values
Otherwise like ReLU
Disadvantages
Results not consistent—leaky ReLU does not provide
consistent predictions for negative input values.
Softmax:
Sigma = softmax
zi = input vector
e^{zi}} = standard exponential function for input vector
K = number of classes in the multi-class classifier
e^{zj} = standard exponential function for output vector
Advantages
Able to handle multiple classes only one class in other
activation functions—normalizes the outputs for each
class between 0 and 1with the sum of the probabilities
been equal to 1, and divides by their sum, giving the
probability of the input value being in a specific class.
Useful for output neurons—typically Softmax is used
only for the output layer, for neural networks that need
to classify inputs into multiple categories.