dp learn
dp learn
Dendrites Inputs
Synapse Weights
Axon Output
Biological Neural Networks
● The biological neuron is a nerve cell that provides the fundamental functional unit for
the nervous systems of all animals.
● Neurons exist to communicate with one another, and pass electro-chemical impulses
across synapses, from one cell to the next, as long as the impulse is strong enough to
activate the release of chemicals across a synaptic cleft. The strength of the impulse must
surpass a minimum threshold or chemicals will not be released.
● The neuron is made up of a nerve cell consisting of a soma (cell body) that has many
dendrites but only one axon. The single axon can branch hundreds of times.
Biological Neural Networks
● Dendrites are thin structures that arise from the main cell body. Dendrites allow the cell to receive
signals from connected neighboring neurons and each dendrite is able to perform multiplication by
that dendrite’s weight value. Here multiplication means an increase or decrease in the ratio of
synaptic neurotransmitters to signal chemicals introduced into the dendrite.
● Axons are nerve fibers with a special cellular extension that comes from the cell body.Axons are the
single, long fibers extending from the main soma. They stretch out longer distances than dendrites
and measure generally 1 centimeter in length
● Synapses are the connecting junction between axon and dendrites. The majority of synapses send
signals from the axon of a neuron to the dendrite of another neuron.
Artificial Neural Networks
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks
also have neurons that are linked to each other in various layers of the networks. These neurons
are known as nodes.
The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons interconnected to
one another, artificial neural networks also have neurons that are interconnected to one another
in various layers of the networks. These neurons are known as nodes
Perceptron
● Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a Perceptron learning rule based on the original MCP
neuron. A Perceptron is an algorithm for supervised learning of binary classifiers. This algorithm enables neurons to learn and
processes elements in the training set one at a time.
● The perceptron is a linear-model binary classifier with a simple input–output relationship which shows we’re summing n number
of inputs times their associated weights and then sending this “net input” to a step function with a defined threshold.
●
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Step 2: An activation function is applied with the above-mentioned weighted sum giving us an output
either in binary form or a continuous value as follows: Y=f(∑wi*xi + b)
Types of Perceptron
Types of Perceptron models
● Single Layer Perceptron model:
■ One of the easiest ANN(Artificial Neural Networks) types consists of a feed-forward network
and includes a threshold transfer inside the model.
■ The main objective of the single-layer perceptron model is to analyze the linearly separable
objects with binary outcomes. A Single-layer perceptron can learn only linearly separable
patterns.
● Multi-Layered Perceptron model:
■ It is mainly similar to a single-layer perceptron model but has more hidden layers.
Forward Stage: From the input layer in the on stage, activation functions begin and terminate on the output
layer.
Backward Stage: In the backward stage, weight and bias values are modified per the model’s requirement.
The backstage removed the error between the actual output and demands originating backward on the
output layer. A multilayer perceptron model has a greater processing power and can process linear and
non-linear patterns. Further, it also implements logic gates such as AND, OR, XOR, XNOR, and NOR.
Advantage and Disadvantage of Perceptron
Advantages:
Disadvantages:
2. It can only be used to classify the linearly separable sets of input vectors. If the input
vectors are non-linear, it is not easy to classify them correctly
Perceptron Learning Rule
Perceptron Learning Rule states that the algorithm would automatically learn the optimal weight coefficients. The input
features are then multiplied with these weights to determine if a neuron fires or not.
The Perceptron receives multiple input signals, and if the sum of the input signals exceeds a certain threshold, it either
outputs a signal or does not return an output. In the context of supervised learning and classification, this can then be used
to predict the class of a sample.
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned weight coefficient; an output value
”f(x)”is generated
The process of moving from the right to left i.e backward from the Output to the Input layer is called the
Backward Propagation.
Activation Function
● We use activation functions to propagate the output of one layer’s nodes forward to the
next layer (up to and including the output layer). Activation functions are a
scalar-to-scalar function, yielding the neuron’s activation. We use activation functions for
hidden neurons in a neural network to introduce nonlinearity into the network’s modeling
capabilities.
● Many activation functions belong to a logistic class of transforms that (when graphed)
resemble an S.
Activation Function
● We use activation functions for hidden neurons in a neural network to introduce
nonlinearity into the network’s modeling capabilities.
● Linear
● Step
● Sigmoid
● Hyperbolic
● Rectified Linear
● Softmax
a) Linear Activation Function
● A linear transform is basically the identity function, and
f(x) = x, where the dependent variable has a direct,
proportional relationship with the independent variable.
In practical terms, it means the function passes the signal
through unchanged.
● Range : -inf to +inf
● For example : Calculation of price of a house is a regression
problem. House price may have any big/small value, so we
can apply linear activation at output layer. Even in this case
neural net must have any non-linear function at hidden
layers.
b) Non Linear Activation Function
● The Nonlinear Activation Functions are the most used
activation functions. Nonlinearity helps to makes the
graph look something like this
● It makes it easy for the model to generalize or adapt with
variety of data and to differentiate between the output.
● . The Nonlinear Activation Functions are mainly divided
on the basis of their range or curves-
1. Sigmoid or Logistic Activation Function
2. Tanh or hyperbolic tangent Activation Function
3. ReLU (Rectified Linear Unit) Activation Function
4. Leaky ReLU
1.Sigmoid Activation Function
● It is a function which is plotted as ‘S’ shaped graph.
● Equation : A = 1/(1 + e-x)
● Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are very
steep. This means, small changes in x would also bring about large changes in
the value of Y.
● Value Range : 0 to 1
● Uses : Usually used in output layer of a binary classification, where result is
either 0 or 1, as value for sigmoid function lies between 0 and 1 only so, result
can be predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.
2. Tanh Activation Function
The activation that works almost always better than
sigmoid function is Tanh function also known as Tangent
Hyperbolic function. It’s actually mathematically shifted
version of the sigmoid function. Both are similar and can
be derived from each other
● Value Range :- -1 to +1 Nature :- non-linear
● Uses :- Usually used in hidden layers of a neural
network as it’s values lies between -1 to 1 hence the
mean for the hidden layer comes out be 0 or very close
to it, hence helps in centering the data by bringing
mean close to 0. This makes learning for the next layer
much easier.
3. ReLU(Rectified linear unit) Activation Function
● It is the most widely used activation function. Implemented in
hidden layers of Neural network.
● Equation :- A(x) = max(0,x).
● Value Range :- [0, inf)
● Nature :- non-linear, which means we can easily backpropagate
the errors and have multiple layers of neurons being activated by
the ReLU function.
● In simple words, RELU learns
● Uses :- ReLu is less computationally expensive than tanh and
sigmoid because it involves simpler mathematical operations. At
much faster than sigmoid and
a time only a few neurons are activated making the network Tanh function.
sparse making it efficient and easy for computation.
3. ReLU Activation Function
c) ReLU Activation Functions
The formula is deceptively simple: max(0,z). Rectified Linear Units, it’s not linear and
provides the same benefits as Sigmoid but with better performance.
(i) Leaky Relu
Leaky Relu is a variant of ReLU. Instead of being 0 when z<0, a leaky ReLU allows a small,
non-zero, constant gradient α (normally, α=0.01). However, the consistency of the
benefit across tasks is presently unclear. Leaky ReLUs attempt to fix the “dying ReLU”
problem.
(ii) Parametric Relu
PReLU gives the neurons the ability to choose what slope is best in the negative region.
They can become ReLU or leaky ReLU with certain values of α.
4) Softmax Activation Function
The softmax function is also a type of sigmoid function but is handy when we are
trying to handle multi- class classification problems. Nature :- non-linear
● Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
● Output:- The softmax function is ideally used in the output layer of the
classifier where we are actually trying to attain the probabilities to define the
class of each input.
Note: Activation Function
● The basic rule of thumb is if you really don’t know what activation function to use, then simply use RELU
as it is a general activation function in hidden layers and is used in most cases these days.
● If your output is for binary classification then, sigmoid function is very natural choice for output layer.
● If your output is for multi-class classification then, Softmax is very useful to predict the probabilities of
each classes.
Loss Function
Loss Functions :
Loss Function Notation ,
Loss Functions for Regression ,
Loss Functions for Classification,
Loss Functions for Reconstruction,
Loss Function
● Loss functions quantify how close a given neural network is to the ideal toward which it is
training. The idea is simple.
● We calculate a metric based on the error we observe in the network’s predictions.
● We then aggregate these errors over the entire dataset and average them
● Now we have a single number representative of how close the neural network is to its ideal.
● Looking for this ideal state is equivalent to finding the parameters (weights and biases) that
will minimize the “loss” incurred from the errors.
● In this way, loss functions help reframe training neural networks as an optimization problem.
In most cases, these parameters cannot be solved for analytically, but, more often than not,
they can be approximated well with iterative optimization algorithms like gradient descent
Loss Function
A loss function is a function that compares the target and predicted output values; measures
how well the neural network models the training data. When training, we aim to minimize this loss
The hyperparameters are adjusted to minimize the average loss — we find the weights, wT, and
● used in regression neural networks; given an input value, the model predicts a corresponding output value
● Estimating the price of a house or predicting stock prices are examples of regression because one works
outputs.
● This loss function is used as an alternative to MSE in some cases. As mentioned previously,
MSE is highly sensitive to outliers, which can dramatically affect the loss because the distance
is squared. MAE is used in cases when the training data has a large number of outliers to
mitigate this.
c)Mean Bias Error
Mean Bias Error is used to calculate the average bias in the model. Bias, in a nutshell, is
overestimating or underestimating a parameter. Corrective measures can be taken to reduce the
bias post-evaluating the model using MBE.
Mean Bias Error takes the actual difference between the target and the predicted value, and not
the absolute difference. One has to be cautious as the positive and the negative errors could
cancel each other out, which is why it is one of the lesser-used loss functions.
Loss Function for Classification
● used in classification problem;
● given an input, the neural network produces a vector of probabilities of the input belonging to various
pre-set categories — can then select the category with the highest probability of belonging;
● A mail can be classified as a spam or not a spam and a person’s dietary preferences can be put in one of
It is intended for use with binary classification where the target values are in the set {0, 1}.
Mathematically, it is the preferred loss function under the inference framework of maximum likelihood. It is the loss
function to be evaluated first and only changed if you have a good reason.
Cross-entropy will calculate a score that summarizes the average difference between the actual and predicted
probability distributions for predicting class 1. The score is minimized and a perfect cross-entropy value is 0
b)Categorical Cross Entropy Loss
Categorical Cross Entropy loss is essentially Binary Cross Entropy Loss expanded to multiple
classes. One requirement when categorical cross entropy loss function is used is that the labels
should be one-hot encoded.
This way, only one element will be non-zero as other elements in the vector would be multiplied by
It is intended for use with binary classification where the target values are in the set {-1, 1}.
The hinge loss function encourages examples to have the correct sign, assigning more error when
there is a difference in the sign between the actual and predicted class values.
Reports of performance with the hinge loss are mixed, sometimes resulting in better performance
than cross-entropy on binary classification problems.
Loss Functions for Reconstruction
The loss function is used to optimize model performance. It tells us the prediction
performance of the model. In the present context, reconstruction loss is a measure to
check how good reconstructed images are produced by the proposed autoencoder model
1 Loss Function for regression Mean Square error,
Mean Absolute error
Huber Loss
Model parameters are the features of Model hyperparameters are the parameters
training data that will learn on its own that determine the entire training process
during training
They are internal to the model and their They are external to the model and their
values can be estimated from the data. values can not be estimated from the data.
https://monkeylearn.com/blog/sentiment-analysis-deep-learning/