Unit 2 Introduction to Deep Learning
Unit 2 Introduction to Deep Learning
Unit 2 Introduction to Deep Learning
Dr. Arthi. A
Professor
Department of Artificial Intelligence and
Data Science
Source:
1.Richard Szeliski, Computer Vision: Algorithms and Applications, 2010.
OBJECTIVES
2.Loss Function:
• A loss function (also known as a cost function or
error function) is used to quantify the error between
the predicted outputs and the actual target values.
• Common loss functions include Mean Squared Error
(MSE) for regression tasks and cross-entropy for
classification tasks.
Back-Propagation
3.Backpropagation:
• The core of the backpropagation algorithm involves
calculating the gradients of the loss function with
respect to the network's parameters, primarily the
weights and biases.
• The gradients represent the sensitivity of the loss to
changes in the parameters. They indicate how much
the loss would change if the parameters were
adjusted.
Back-Propagation
4.Gradient Descent:
• The computed gradients are used to update the
network's weights and biases.
• A common optimization algorithm used with
backpropagation is gradient descent.
• Gradient descent adjusts the weights and biases in
the direction that reduces the loss, allowing the
network to learn from its mistakes.
Back-Propagation
5.Iterative Process:
• The forward propagation, loss calculation, gradient
computation, and weight updates are performed
iteratively for a specified number of epochs or until
convergence.
• During training, the network gradually improves its
ability to make accurate predictions and minimize
the loss.
6.Mini-Batches:
• To improve efficiency, training is often performed
using mini-batches of data rather than the entire
dataset. This approach reduces the computational
load and can lead to faster convergence.
Back-Propagation
7.Activation Functions:
• In deep learning, various activation functions are
used within neural network layers, such as ReLU
(Rectified Linear Unit), sigmoid, and tanh. These
functions introduce non-linearity, which is essential
for the network's ability to learn complex patterns.
8.Backpropagation Through Layers:
• Backpropagation works by computing gradients
layer by layer, starting from the output layer and
moving backward through the hidden layers.
• The chain rule from calculus is used to efficiently
calculate the gradients for each layer.
Back-Propagation
9.Regularization Techniques:
• To prevent overfitting, regularization techniques like
dropout and weight decay are often employed during
training.
Activation: ReLU
Introduction to Neural Networks
67
A Heuristics for Avoiding Bad Local Minima
10.Noise Injection
11.Escape with Perturbation Methods
12.Using Over-Parameterized Networks