FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
BCSE-332L
Module 1:
Introduction to Neural and Deep Neural Networks
Dr . Saurabh Agrawal
Faculty Id: 20165
School of Computer Science and Engineering
VIT, Vellore-632014
Tamil Nadu, India
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 1
Outline
Neural Networks Basics Gradient Descent
Functions in Neural Networks: Back Propagation
Activation Functions Deep Neural Networks
Loss Functions Forward and Back Propagation
Function Approximation Parameters
Classification and Clustering Problems Hyperparameters
Deep Networks Basics
Shallow Neural Networks
Where:
n is the total number of data points in the dataset.
y_i represents the actual target value for the ith data point.
ŷ_i represents the predicted value for the ith data point generated by the regression model.
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 36
Functions in Neural Networks: Loss Functions
Regression Loss Function:
Mean Absolute Error (MAE) is a frequently used loss function in regression analysis and machine
learning.
It is used to measure the average absolute difference between the predicted values generated by a
regression model and the actual observed values (target values) in a dataset.
The formula for Mean Absolute Error (MAE) is as follows:
Where:
n is the total number of data points in the dataset.
y_i represents the actual target value for the ith data point.
ŷ_i represents the predicted value for the ith data point generated by the regression model.
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 37
Functions in Neural Networks: Loss Functions
Regression Loss Function:
Huber loss combines MSE for small errors and MAE for large errors.
It introduces a hyperparameter δ that defines the point at which the loss function transitions between being
quadratic and linear, making it more robust to outliers.
The formula for Huber Loss is as follows:
Where:
n represents the quantity of data points within the dataset.
y signifies the factual or true value associated with each data point.
ŷ denotes the anticipated or model-generated value for each data point.
δ determines the threshold at which the Huber loss function shifts from being quadratic to linear.
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 38
Functions in Neural Networks: Loss Functions
Regression Loss Function:
Quantile loss finds its application in quantile regression, a technique employed when the goal is to
estimate a particular quantile, such as the median, from the distribution of the target variable.
This method enables the modeling of diverse quantiles within the dataset.
The formula for Quantile Loss is as follows:
Where:
α is the quantile level of interest. It’s a value between 0 and 1, where α = 0.5 corresponds to estimating the median, α < 0.5
corresponds to estimating lower quantiles, and α > 0.5 corresponds to estimating upper quantiles.
y_i represents the actual target value (observed value) for the ith data point.
ŷ_i represents the predicted value generated by the quantile regression model for the ith data point.
|y_i – ŷ_i| represents the absolute difference between the actual and predicted values.
I(y_i – ŷ_i < 0) is an indicator function that equals 1 if y_i – ŷ_i is less than 0 (indicating an underestimation) and 0
otherwise.
Where:
n is the total number of data points in the dataset.
y_i represents the actual target value for the ith data point.
ŷ_i represents the predicted value for the ith data point generated by the regression model.
cosh(x) is the hyperbolic cosine function, defined as (e^x + e^(-x))/2.
log(x) is the natural logarithm function.
Where:
y is the actual binary label, which can be either 0 or 1.
ŷ is the predicted probability that the given data point belongs to the positive class (class 1).
Where:
Hinge(y, ŷ ) represents the Hinge Loss.
y is the actual binary label, which can be either -1 (negative class) or 1 (positive class).
ŷ is the raw decision value or score assigned by the classifier for a data point. It is not a probability.
The formula calculates the loss for each data point and returns the maximum of 0 and 1 – y . ŷ
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 43
Functions in Neural Networks: Loss Functions
Multi-class Classification Loss Functions
Multi-class classification involves classifying data into one of several distinct classes or categories.
Unlike binary classification, where there are only two classes, multi-class classification has more than
two possible outcomes.
Various loss functions are used in multi-class classification to measure the difference between predicted
class probabilities and the actual class labels.
Where:
CCE(y, ŷ ) represents the Categorical Cross-Entropy Loss.
y is a one-hot encoded vector representing the true class labels. Each element of y is 0 except for the element
corresponding to the true class, which is 1.
ŷ is a vector representing the predicted class probabilities for a data point.
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 45
Functions in Neural Networks: Loss Functions
Multi-class Classification Loss Functions
Kullback-Leibler Divergence (KL Divergence) Loss, also known as KL Loss, is a mathematical measure used
in machine learning and statistics to quantify the difference between two probability distributions.
In the context of loss functions, KL divergence loss is often utilized in tasks where you want to compare or
match two probability distributions, such as generative modeling or variational autoencoders.
The formula for Kullback-Leibler Divergence (KL Divergence) Loss is as follows:
Where:
KL(P || Q): This represents the KL divergence from distribution P to distribution Q. It measures how distribution Q differs from the
reference distribution P.
Σ: This symbol denotes a summation, typically taken over all possible events or outcomes.
P_i and Q_i: These are the probabilities of event i occurring in the distributions P and Q, respectively.
log(P_i / Q_i): This term computes the logarithm of the ratio of the probabilities of event i occurring in the two distributions. It
quantifies how much more or less likely event i is in distribution P compared to distribution Q.
31-Jul-24 Dr. Saurabh Agrawal, SCOPE, DATABASE SYSTEMS, VIT, VELLORE 46
Function Approximation
Function approximation using neural networks involves using a neural network model to learn and
approximate a target function from input-output pairs of data.
Here’s a structured approach to how this is typically done:
Steps for Function Approximation with Neural Networks:
1. Data Collection and Preparation: Gather a dataset that represents input-output pairs of the function
you want to approximate. Ensure the dataset covers a representative range of inputs and includes
corresponding outputs.
2. Model Selection: Choose a suitable neural network architecture. For basic function approximation
tasks, a feedforward neural network (Multi-layer Perceptron, MLP) is commonly used. The number of
input nodes should match the dimensionality of your input data, and the output layer should have one
node for scalar output or multiple nodes for multi-dimensional output.
i. The first part computes the output Z, using the inputs and the weights.
ii. The second part performs the activation on Z to give out the final output A of the neuron.
i. The first equation computes all the intermediate outputs Z in single matrix multiplication.
ii. The second equation computes all the activations A in single matrix multiplication.
So this formula basically tells us the next position we need to go, which is the direction of the steepest
descent. Let’s look at another example to really drive the concept home.