Fundamentals of Neural Network
Fundamentals of Neural Network
02
Forward propagation
• Forward propagation in deep learning refers to the process of passing input
data through the neural network to get the output or prediction.
• It involves a series of computations, where the input data is transformed as
it passes through the layers of the network.
• Process of passing the input forward through the network, involving
weighted sums, biases, and activation functions, is forward propagation.
• The network learns the optimal weights and biases during the training phase
to make accurate predictions.
Forward propagation
Simple Analogy
• Preparing a recipe (making predictions)
• Ingredients (Input)
• Ingredients importance or preference (Weights and Biases)
• Mixing Ingredients (Weighted Sums)
• Taste recipe (Activation Function)
• Final Dish as per desire (Output ; 0/1)
Loss Function
Loss function
• In deep learning, a loss function is a measure of how well a model's
predictions match the actual target values.
• The goal during the training of a model is to minimize this loss function.
• It quantifies the difference between predicted values and true values,
providing a way to assess how well the model is performing.
• Different types of problems (classification, regression, etc.) and algorithms
use different loss functions.
Loss Function
Mean squared
Absolute
Loss Function
Squared Error Loss (Mean Squared Error - MSE):
• Use Case: Typically used for regression problems, where the goal is to
predict a continuous variable.
• Calculation: It calculates the average of the squared differences between
the predicted and actual values.
• Formula:
Squared Error Loss
(Mean Squared Error - MSE)
Squared Error Loss
(Mean Squared Error - MSE)
Squared Error Loss
(Mean Squared Error - MSE)
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Binary Cross Entropy Loss:
• Use Case: Commonly used for classification problems.
• Calculation: For binary classification problems (two classes).
• Formula:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Binary Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Binary Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Exercise:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Categorical Cross Entropy Loss:
• Use Case: Commonly used for classification problems.
• Calculation: For multi-class classification problems (more than two
classes).
• Formula
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Categorical Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Categorical Cross Entropy Loss:
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
Cross Entropy Loss
(Binary Cross Entropy and Categorical Cross Entropy)
What is activation function?
• The activation function decides whether a neuron should be activated or not
by calculating the weighted sum and further adding bias to it. The purpose
of the activation function is to introduce non-linearity into the output of a
neuron.
• In artificial neural networks, an activation function is one that outputs a
smaller value for tiny inputs and a higher value if its inputs are greater than
a threshold. An activation function "fires" if the inputs are big enough;
otherwise, nothing happens.
• An activation function, then, is a gate that verifies how an incoming value is
higher than a threshold value.
• The activation function is a fundamental component of neural networks that
introduces non-linearity, enabling them to learn complex relationships,
adapt to various data patterns, and make sophisticated decisions.
Why there is a need of activation function?
Introducing Non-linearity:
• Without activation functions, the entire neural network would behave like a linear
model.
• The stacking of multiple linear operations would result in a linear combination,
limiting the network's ability to learn and represent complex, non-linear patterns in
the data.
Facilitating Backpropagation:
• Activation functions provide derivatives or gradients that are essential for the
backpropagation algorithm, which is used to update the weights of the network
during training.
• This enables the network to learn and improve its performance over time.
Types of activation function
In a perceptron or a neural network, activation functions play a crucial role by
introducing non-linearity to the model.
Here are some common types of activation functions used in perceptrons:
Advantages:
Simplicity
Ease of Interpretation:
• Direct proportionality between input and output.
• Straightforward interpretation.
Compatibility with Linear Models (Well-suited for tasks with linear
relationships)
Types of activation function
Disadvantages:
Limited Expressiveness:
• Inability to model complex, non-linear relationships.
• Stacking linear layers results in a linear model.
Vanishing Gradient Problem:
• Prone to vanishing gradients, especially in deep networks.
• May lead to slow learning.
Not Suitable for Classification Problems:
• Challenging for binary classification tasks.
• Output not squashed into a specific range.
Not Used in Hidden Layers of Deep Networks:
• Rarely used in deep networks' hidden layers.
• Non-linear activations preferred.
Types of activation function
2. Logistic Activation Function :
is also commonly referred to as the Sigmoid Activation Function.
Description: The sigmoid (logistic) function squashes input values to the range
(0, 1). It is commonly used in the output layer of binary classification models.
Mathematical Form:
Types of activation function
The activation that works almost always better than sigmoid function is Tanh
function also known as Tangent Hyperbolic function. It is actually
mathematically shifted version of the sigmoid function. Both are similar and
can be derived from each other.
Value Range : -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it is values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or very
close to it, hence helps in centering the data by bringing mean close to 0. This
makes learning for the next layer much easier.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
Types of activation function
4. Softmax Function:
Description: Often used in the output layer of a neural network for multi-class
classification problems. It transforms the raw output scores (logits) into a
probability distribution over multiple classes. The Softmax function is
particularly useful when dealing with problems where an input can belong to
one of several exclusive classes.
Mathematical Form:
Types of activation function
The softmax function is also a type of sigmoid function but is handy when we
are trying to handle multi- class classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
If your output is for binary classification then, sigmoid function is very natural
choice for output layer.
If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
5. Rectified Linear Unit (ReLU):
Description: ReLU is a popular activation function that outputs the input for
positive values and zero for negative values. It introduces non-linearity and is
computationally efficient.
Mathematical Form:
Types of activation function
It Stands for Rectified linear unit. It is the most widely used activation
function. Chiefly implemented in hidden layers of Neural network.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and
have multiple layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because
it involves simpler mathematical operations. At a time only a few neurons are
activated making the network sparse making it efficient and easy for
computation.
Types of activation function
Types of activation function
Types of activation function
Types of activation function
Types of activation function
5. Leaky ReLU (Rectified Linear Unit) Function:
Description: Leaky ReLU is an activation
function used in artificial neural networks to
introduce nonlinearity among the outputs
between layers of a neural network. This
activation function was created to solve the
dying ReLU problem using the standard ReLU
function that makes the neural network die
during training.
Mathematical Form:
Types of activation function
Using this function, we can convert negative values to make them close to
0 but not actually 0, solving the dying ReLU issue that arises from using
the standard ReLU function during neural network training.
The Leaky ReLU is a popular activation function that is used to address
the limitations of the standard ReLU function in deep neural networks by
introducing a small negative slope for negative function inputs, which
helps neural networks to maintain better information flow both during its
training and after.
Types of activation function
In a perceptron or a neural network, activation functions play a crucial role by
introducing non-linearity to the model.
Here are some common types of activation functions used in perceptrons:
Imagine you are blindfolded and placed somewhere in a valley. Your goal is
to find the lowest point in the valley without being able to see the
terrain(area of land). Here's how you might proceed:
Initial Position: You start at a random location in the valley.
Objective: Your objective is to descend to the lowest point in the valley.
Sense of Touch: You can feel the slope of the ground beneath your feet,
giving you an indication of the direction of descent.
Movement: You take a step in the direction of the steepest slope, relying
on your sense of touch to guide you downhill.
Repetition: You repeat this process, continuously adjusting your direction
based on the slope of the terrain.
Convergence: Eventually, you reach the lowest point in the valley,
indicating convergence to the optimal solution.
Gradient Descent
• In this analogy:
• Key Aspects
Uses entire dataset in each iteration so it is computationally expensive for
large datasets
Accurate estimate of gradient
Often converges to global minimum in most of the problems
Stochastic Gradient Descent