0% found this document useful (0 votes)

276 views79 pages

Unit 2 Introduction to Deep Learning

Uploaded by

rrjothi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

276 views79 pages

Unit 2 Introduction to Deep Learning

Uploaded by

rrjothi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 79

Department of

Artificial Intelligence and Data Science

AL3502 - DEEP LEARNING FOR
VISION

Dr. Arthi. A
Professor
Department of Artificial Intelligence and
Data Science
Source:
1.Richard Szeliski, Computer Vision: Algorithms and Applications, 2010.
OBJECTIVES

2. To understand the methods and terminologies

involved in deep neural network
SYLLABUS
UNIT - II
INTRODUCTION TO DEEP LEARNING

Deep Feed-Forward Neural Networks – Gradient

Descent – Back-Propagation and Other Differentiation
Algorithms – Vanishing Gradient Problem – Mitigation
– Rectified Linear Unit (ReLU) – Heuristics for
Avoiding Bad Local Minima – Heuristics for Faster
Training – Nestors Accelerated Gradient Descent –
Regularization for Deep Learning – Dropout –
Adversarial Training – Optimization for Training Deep
Models.
Deep Feed Forword Networks
Deep Feed Forword Networks
Deep Feed Forword Networks
Activation Function
Deep Feed Forword Networks
1.Input Layer
• Image as input (grayscale or RGB)
• Pixels flattened into a 1D array (e.g., 28x28 = 784
pixels)
• Neurons in input layer correspond to each pixel
2.Hidden Layers
• Multiple hidden layers for feature extraction
• First layers detect basic features (e.g., edges)
• Deeper layers extract complex patterns (e.g.,
shapes, textures)
• Activation functions (ReLU, sigmoid) introduce non-
linearity
Deep Feed Forword Networks
3. Output Layer
• Neurons represent possible classes (e.g., "cat",
"dog")
• Softmax activation for classification
• Produces probabilities for each class
4. Training Process
• Forward pass: data flows through the network
• Loss calculation: error between predicted and true
labels
• Backpropagation: weights updated to minimize loss
• Optimization: uses gradient descent or Adam
Deep Feed Forword Networks
Training Phase
1. Initialize Weights and Biases
• Randomly initialize weights and biases
• Each neuron has associated weights and biases
2. Forward Pass
• Input image is flattened into a vector
Pass through each hidden layer (linear
transformation + activation)
• Hidden layers extract features (e.g., edges, shapes)
• Output layer produces probabilities for each class
(e.g., cat or dog)
Deep Feed Forword Networks
3. Loss Calculation
• Compare predicted output with true label
• Use cross-entropy loss for classification tasks
• Higher loss indicates a larger error
4. Backpropagation
• Compute the gradients of the loss w.r.t. weights
• Gradients propagate from the output layer back to
the input layer
• Determine how to adjust weights to reduce the loss
Deep Feed Forword Networks
5. New Weight Update
• Use optimization algorithms like Stochastic
Gradient Descent (SGD) or Adam
• Update weights based on gradients
• Learning rate controls the size of weight updates
6. Repeat for Multiple Epochs
• Process entire dataset through multiple epochs
• Use mini-batch gradient descent for faster updates
• Iterate until the model converges
Deep Feed Forword Networks
7. Monitoring Performance
• Track training and validation loss
• Use validation data to check generalization
performance
• Early stopping to prevent overfitting
8. Regularization
• Dropout: Randomly drop neurons to prevent
overfitting
• L2 Regularization: Add penalty for large weights to
simplify the model
Deep Feed Forword Networks
5. Inference (Prediction)
• New image input passed through the network
• Features extracted in hidden layers
• Output layer produces predicted label (highest
probability)
6. Challenges
• Overfitting: risk of fitting noise in data
• Regularization: techniques like dropout, L2
regularization
• Data Augmentation: enhances model robustness
with varied inputs
Deep Feed Forword Networks
7. Real-World Applications
• Image classification (e.g., cat vs dog, handwritten
digit recognition)
• Object detection, facial recognition, and medical
image analysis
Gradient based Optimization
• Most deep learning algorithms involve optimization
of some sort.
• Optimization refers to the task of either
minimizing or maximizing some function f (x) by
altering x.
Objective Function
• The function we want to minimize or maximize is
called the objective function or criterion.
• It quantifies how well the model's predictions
match the actual outcomes.
Gradient based Optimization
• We often denote the value that minimizes or
maximizes a function with a superscript ∗. For
example, we might say x∗ = arg min f(x).
• Most optimization problems are framed as
minimization problems.
• If a problem is about maximization, we can
convert it to a minimization problem by
minimizing the negative of the objective
function.
• When we are minimizing it, we may also call it the
cost function, loss function, or error function.
Gradient based Optimization
• Suppose we have a function y = f (x), where both x
and y are real numbers.
• The derivative of this function is denoted as f’(x)
or as dy/dx.
• The derivative f’(x) gives the slope of f (x) at the
point x.
• It shows the rate of change of the function's value
with respect to changes in 𝑥

• In other words, it specifies how to scale a small

change in the input in order to obtain the
corresponding change in the output.
Gradient based Optimization
• This is an iterative optimization technique where
we update the variable x in the direction opposite
to the gradient of the objective function.
• This helps in reducing the value of the function. The
update rule is
x x - α.f’(x)
where
α is a small step size or learning rate.
Gradient based Optimization
Figure
describes
an
illustration
of how the
derivatives
of a
function
can be
used to
follow the
function
downhill to
a
minimum.
Figure Uphill and the Groundhill of the gradient problem
Gradient based Optimization
• The derivative is therefore useful for minimizing a
function because it tells us how to change x in order
to make a small improvement in y.
• For example, we know that f(x-ϵsign(f’(x))) is less
than f (x) for small enough ϵ.
• We can thus reduce f (x) by moving x in small steps
with opposite sign of the derivative. (x) = 0, the
derivative provides no information about which
direction.
Gradient based Optimization
• When f’(x) the derivative provides no information
about which direction to move.
• Points where f’(x)=0 known as critical points or
stationary points.
• A local minimum is a point where f (x) is lower than
at all neighboring points, so it is no longer possible
to decrease f(x) by making infinitesimal steps.
• A local maximum is a point where f (x) is higher
than at all neighboring points,
Gradient based Optimization
Local Minimum:
• A point where the function value is lower than at all
neighboring points.
• It's a point where we can't decrease the function value
by making infinitesimal changes.
Local Maximum:
• A point where the function value is higher than at all
neighboring points.
• It's a point where we can't increase the function value
by making infinitesimal changes.
Saddle Point:
• A critical point that is neither a local minimum nor a
local maximum.
• The function might have a higher value in one
direction and a lower value in another direction,
resembling a saddle.
Gradient based Optimization
• A point that obtains the absolute lowest value of f (x)
is a global minimum.It is possible for there to be only
one global minimum or multiple global minima of the
function.
• It is also possible for there to be local minima that are
not globally optimal.
• In the context of deep learning, we optimize functions
that may have many local minima that are not optimal,
and many saddle points surrounded by very flat
regions.
• All of this makes optimization very difficult, especially
when the input to the function is multidimensional.
We therefore usually settle for finding a value of f that
is very low, but not necessarily minimal in any formal
sense.
Gradient based Optimization

Figure representing Minimum ,maximum saddle Point

Gradient based Optimization
• A point that obtains the absolute lowest value of f (x)
is a global minimum.It is possible for there to be only
one global minimum or multiple global minima of the
function.
• It is also possible for there to be local minima that are
not globally optimal.
• In the context of deep learning, we optimize functions
that may have many local minima that are not optimal,
and many saddle points surrounded by very flat
regions.
• All of this makes optimization very difficult, especially
when the input to the function is multidimensional.
We therefore usually settle for finding a value of f that
is very low, but not necessarily minimal in any formal
sense.
Back-Propagation
• After a neural network is defined with initial weights,
and a forward pass is performed to generate the
initial prediction,
• there is an error function which defines how far
away the model is from the true prediction.
• There are many possible algorithms that can
minimize the error function—for example, one could
do a brute force search to find the weights that
generate the smallest error.
• However, for large neural networks, a training
algorithm is needed that is very computationally
efficient.
• Backpropagation is that algorithm—it can discover
the optimal weights relatively quickly, even for a
network with millions of weights.
Back-Propagation
Training algorithm of BPNN:
1. Inputs X, arrive through the pre connected path
2. Input is modeled using real weights W. The weights
are usually randomly selected.
3. Calculate the output for every neuron from the
input layer, to the hidden layers, to the output layer.
4. Calculate the error in the outputs
Error B= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer
to adjust the weights such that the error is decreased.
Keep repeating the process until the desired output is
achieved
Back-Propagation

Architecture of back propagation network:

As shown in the diagram, the architecture of BPN has
three interconnected layers having weights on them.
The hidden layer as well as the output layer also has
bias, whose weight is always 1, on them. As is clear from
the diagram, the working of BPN is in two phases. One
phase sends the signal from the input layer to the output
layer, and the other phase back propagates the error
from the output layer to the input layer.
Back-Propagation
1. Forward pass — weights are initialized and inputs
from the training set are fed intothe network. The
forward pass is carried out and the model generates
its initial prediction.
2. Error function — the error function is computed by
checking how far away the prediction is from the
known true value.
3. Backpropagation with gradient descent — the
backpropagation algorithm calculates
• how much the output values are affected by each of
the weights in the model. To do this,it calculates
partial derivatives, going back from the error
function to a specific neuron and its weight.
Back-Propagation
• This provides complete traceability from total errors,
back to a specific weight which contributed to that
error. The result of backpropagation is a set of
weights that minimize the error function.
4. Weight update — weights can be updated after every
sample in the training set, but this is
• usually not practical. Typically, a batch of samples is
run in one big forward pass, and then
backpropagation performed on the aggregate result.
• The batch size and number of batches used in
training, called iterations, are important
hyperparameters that are tuned to get the best
results. Running the entire training set through the
backpropagation process is called an epoch.
Vanishing Gradient
•  The Neural Networks are trained using back
propagation and gradient based learning methods.
•  During training, we want to reach the most
optimum value of weights resulting in minimum loss.
•  Each weight is constantly gets updated during the
training of the algorithm.
•  The update is proportional to the partial
derivative of the error function with respect to
the current weight in each training iteration.
•  However, sometimes this update becomes too small,
and hence the weight does not get updated.
• It results in very less or practically no training of the
network. This is referred to as the vanishing
gradient problem.
Vanishing Gradient
•  In Figure, we Shown that in the sigmoid function, we
can face the problem of vanishing gradient, while in
the case of a ReLU or Leaky ReLU, we will not have
vanishing gradient as an issue.
Back-Propagation
• The backpropagation algorithm is a fundamental
concept in training artificial neural networks,
including deep learning models.
• It is used to adjust the network's weights and
biases during the training process to minimize the
error between the predicted and actual outputs.
1.Forward Propagation:
• The process begins with forward propagation.
• Input data is passed through the neural network to
compute the predicted outputs.
• Each neuron in the network calculates a weighted
sum of its inputs and applies an activation function to
produce an output.
Back-Propagation

2.Loss Function:
• A loss function (also known as a cost function or
error function) is used to quantify the error between
the predicted outputs and the actual target values.
• Common loss functions include Mean Squared Error
(MSE) for regression tasks and cross-entropy for
classification tasks.
Back-Propagation

3.Backpropagation:
• The core of the backpropagation algorithm involves
calculating the gradients of the loss function with
respect to the network's parameters, primarily the
weights and biases.
• The gradients represent the sensitivity of the loss to
changes in the parameters. They indicate how much
the loss would change if the parameters were
adjusted.
Back-Propagation

4.Gradient Descent:
• The computed gradients are used to update the
network's weights and biases.
• A common optimization algorithm used with
backpropagation is gradient descent.
• Gradient descent adjusts the weights and biases in
the direction that reduces the loss, allowing the
network to learn from its mistakes.
Back-Propagation
5.Iterative Process:
• The forward propagation, loss calculation, gradient
computation, and weight updates are performed
iteratively for a specified number of epochs or until
convergence.
• During training, the network gradually improves its
ability to make accurate predictions and minimize
the loss.
6.Mini-Batches:
• To improve efficiency, training is often performed
using mini-batches of data rather than the entire
dataset. This approach reduces the computational
load and can lead to faster convergence.
Back-Propagation
7.Activation Functions:
• In deep learning, various activation functions are
used within neural network layers, such as ReLU
(Rectified Linear Unit), sigmoid, and tanh. These
functions introduce non-linearity, which is essential
for the network's ability to learn complex patterns.
8.Backpropagation Through Layers:
• Backpropagation works by computing gradients
layer by layer, starting from the output layer and
moving backward through the hidden layers.
• The chain rule from calculus is used to efficiently
calculate the gradients for each layer.
Back-Propagation
9.Regularization Techniques:
• To prevent overfitting, regularization techniques like
dropout and weight decay are often employed during
training.

• The backpropagation algorithm is a key component

of deep learning, enabling neural networks to learn
from data, make predictions, and adapt their
parameters to minimize errors. It has been
instrumental in the success of various deep learning
architectures, such as convolutional neural networks
(CNNs) for image processing and recurrent neural
networks (RNNs) for sequential data.
Back-Propagation
Back-Propagation
• Using the Back propagation network, find the new
weights for the net shown below. It is presented with
the input pattern [0,1] and the target output 1. Use a
learning rate of 0.25 and binary sigmoidal activation
function
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
Back-Propagation
CS 404/504, Fall 2021

Activation: ReLU
Introduction to Neural Networks

• ReLU (Rectified Linear Unit): takes a real-valued number and

thresholds it at zero
ℝ𝑛 → ℝ+ ¿ ¿
𝑛

 Most modern deep NNs use ReLU

activations 𝑓 (𝑥 )

 ReLU is fast to compute

o Compared to sigmoid, tanh
o Simply threshold a matrix at zero
 Accelerates the convergence of
gradient descent
o Due to linear, non-saturating form
𝑥
 Prevents the gradient vanishing
problem

67
A Heuristics for Avoiding Bad Local Minima

• Local minima in optimization are points where a

function's value is lower than neighboring points
but not necessarily the lowest across the entire
domain.
• In machine learning, especially when training deep
neural networks, models can sometimes get stuck in
these local minima, leading to suboptimal
performance.
A Heuristics for Avoiding Bad Local Minima
• Local To avoid bad local minima, various heuristics
and techniques are employed:
1. Random Initialization of Weights
2. Momentum-Based Gradient Descent
3. Stochastic Gradient Descent (SGD)
4. Regularization Techniques (L2, Dropout)
5. Ensemble Methods
6. Batch Normalization
7. Adaptive Optimization Algorithms (Adam, RMSprop)
8. Simulated Annealing
9. Learning Rate Annealing and Schedulers
A Heuristics for Avoiding Bad Local Minima

10.Noise Injection
11.Escape with Perturbation Methods
12.Using Over-Parameterized Networks

Avoiding bad local minima in deep learning is crucial to

achieving good performance. Techniques such as
random initialization, momentum, adaptive optimizers,
batch normalization, and noise injection provide
powerful tools to escape or mitigate the effects of local
minima during training. Each of these heuristics
addresses different aspects of the optimization process,
helping the model reach a better solution.
Heuristics for Faster Training
Use Pretrained Models
• Fine-tuning a model that’s already trained on a
similar task saves time instead of starting from
scratch.
Early Stopping
• Stop training when the performance stops improving
on the validation set to save time.
Weight Initialization
• Properly initialize weights (like He or Xavier
initialization) to avoid slow convergence from poor
starting points.
Heuristics for Faster Training
Batch Normalization
• Normalize inputs of each layer to stabilize and speed
up training by allowing higher learning rates.
Adaptive Learning Rate Optimizers (Adam,
RMSprop)
• Automatically adjust the learning rate during training
for faster convergence without needing to fine-tune
manually.
Learning Rate Scheduling
• Start with a high learning rate and reduce it as
training progresses to speed up the early stages and
fine-tune later.
Heuristics for Faster Training
Data Augmentation
• Generate more data by transforming existing
samples, improving model generalization and
reducing training time.
Parallel Computing (GPUs/TPUs)
• Use GPUs or TPUs to handle computations faster,
especially for large models and datasets.
Use Mini-Batch Gradient Descent
• Instead of using the entire dataset, update the model
with small batches of data to speed up training.
Heuristics for Faster Training
Generate Reduce Precision (Mixed Precision Training)
• Use lower-precision arithmetic (like 16-bit floating-
point) to speed up computation while still
maintaining accuracy.

• These heuristics help optimize training time while

maintaining or even improving model performance.
Regularization
• A central problem in machine learning is how to
make an algorithm that will perform well not just on
the training data, but also on new inputs. Many
strategies used in machine learning are explicitly
designed to reduce the test error, possibly at the
expense of increased training error. These strategies
are known collectively as regularization.
Regularization
Generalization Error
• Generalization error refers to the difference between
a machine learning model's performance on the
training data and its performance on new, unseen
data. It measures how well the model generalizes to
data it has not been trained on.
• A low generalization error indicates that the model is
not overfitting and performs well on both training
and test data.
• Regularization techniques are often used to reduce
generalization error by preventing overfitting and
improving the model’s ability to handle unseen data.
Regularization

• A Regularization in machine learning refers to

techniques that reduce a model's generalization
error without reducing its training error, often by
introducing constraints or penalties on the model's
parameters.
• These methods aim to prevent overfitting, especially
in complex models like deep learning networks,
where the model might otherwise learn irrelevant
details in the training data.
Regularization

• Regularization strategies can involve adding terms to

the objective function (like L1 or L2 penalties),
imposing soft constraints, or using ensemble
methods that combine multiple hypotheses.
• The goal is to strike a balance between bias and
variance, reducing the model's sensitivity to
fluctuations in the training data (variance) without
overly simplifying the model (increasing bias).
Regularization

• Regularization strategies can involve adding terms to

AL3502DEEP LEARNING FOR VISIONL T P C
No ratings yet
AL3502DEEP LEARNING FOR VISIONL T P C
3 pages
EN 6101-2016
No ratings yet
EN 6101-2016
13 pages
Academic Performance in Math and Science-Related Subjects of Grade 12 Students in Relation To Class Schedules
No ratings yet
Academic Performance in Math and Science-Related Subjects of Grade 12 Students in Relation To Class Schedules
26 pages
Foreground Extraction
No ratings yet
Foreground Extraction
29 pages
Branches of Biology Worksheet
100% (1)
Branches of Biology Worksheet
2 pages
UNIT_4_DL
No ratings yet
UNIT_4_DL
31 pages
Chap 1 3
0% (1)
Chap 1 3
19 pages
Sow English Year 4 2024-2025
No ratings yet
Sow English Year 4 2024-2025
23 pages
Top 10 Deep Learning Algorithms You Should Know in 2023
No ratings yet
Top 10 Deep Learning Algorithms You Should Know in 2023
14 pages
2024-25 NLP Question Bank
No ratings yet
2024-25 NLP Question Bank
4 pages
unit-1-fundamentals-of-healthcare-analyticsregulation-2021
No ratings yet
unit-1-fundamentals-of-healthcare-analyticsregulation-2021
30 pages
Assa North
100% (4)
Assa North
676 pages
Deep Learning Step by Step
No ratings yet
Deep Learning Step by Step
171 pages
UNIT 3 DV (1)
No ratings yet
UNIT 3 DV (1)
44 pages
UNIT_2_DL
No ratings yet
UNIT_2_DL
44 pages
IT3401-WE Unit 1
No ratings yet
IT3401-WE Unit 1
38 pages
UNIT_3 _DL
No ratings yet
UNIT_3 _DL
15 pages
U-Tube Theory Example 1
No ratings yet
U-Tube Theory Example 1
7 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
Analisis Pengaruh Kualitas Produk, Kualitas Layanan Dan Persepsi Harga Terhadap Kepuasan Pelanggan Air Minum Dalam Kemasan
No ratings yet
Analisis Pengaruh Kualitas Produk, Kualitas Layanan Dan Persepsi Harga Terhadap Kepuasan Pelanggan Air Minum Dalam Kemasan
141 pages
Strategi Komunikasi Pemasaran Terpadu Mangsi Coffee Dalam Memperkenalkan Kopi
100% (1)
Strategi Komunikasi Pemasaran Terpadu Mangsi Coffee Dalam Memperkenalkan Kopi
6 pages
Geiger - Geiger - Broschure - Englisch
No ratings yet
Geiger - Geiger - Broschure - Englisch
8 pages
Projectstatistics Section2 Group4
No ratings yet
Projectstatistics Section2 Group4
40 pages
Rooter
No ratings yet
Rooter
4 pages
Unit 4
No ratings yet
Unit 4
10 pages
Vandana Resume 2003 Marketing
No ratings yet
Vandana Resume 2003 Marketing
3 pages
Childhood Development Theory
No ratings yet
Childhood Development Theory
2 pages
Unit III
No ratings yet
Unit III
58 pages
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
No ratings yet
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
60 pages
UNIT_5_DL
No ratings yet
UNIT_5_DL
11 pages
An Error Signal Is The Difference Between The Set Point and The Amplitude of
No ratings yet
An Error Signal Is The Difference Between The Set Point and The Amplitude of
3 pages
Groups and Individuals
No ratings yet
Groups and Individuals
3 pages
Reduced Incidence Matrix: Let With "N" Nodes and "B" Branches. Let A Be The of
No ratings yet
Reduced Incidence Matrix: Let With "N" Nodes and "B" Branches. Let A Be The of
12 pages
BOILEAU 1959 A Critical Study of The Linear Suburb of Madrid PDF
No ratings yet
BOILEAU 1959 A Critical Study of The Linear Suburb of Madrid PDF
14 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
402B Deep Learning
No ratings yet
402B Deep Learning
82 pages
week 06 - Deep Feedforward Networks - Optimization
No ratings yet
week 06 - Deep Feedforward Networks - Optimization
83 pages
CS8691 Unit1 ARTIFICIAL INTELLIGENCE Regulation 2017
No ratings yet
CS8691 Unit1 ARTIFICIAL INTELLIGENCE Regulation 2017
88 pages
Sru7060 10 Dfu Esp
No ratings yet
Sru7060 10 Dfu Esp
152 pages
Dos and Bios Interrupts
No ratings yet
Dos and Bios Interrupts
5 pages
Deep Learning Question Bank(2024-25)
No ratings yet
Deep Learning Question Bank(2024-25)
2 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
18 pages
CNH-1015 MaintProducts Case r18 LR
100% (1)
CNH-1015 MaintProducts Case r18 LR
28 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
DNN M3 Optimization
No ratings yet
DNN M3 Optimization
81 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
10 pages
ML Notes
No ratings yet
ML Notes
14 pages
Practice Problems: Concurrency: Lectures On Operating Systems (Mythili Vutukuru, IIT Bombay)
No ratings yet
Practice Problems: Concurrency: Lectures On Operating Systems (Mythili Vutukuru, IIT Bombay)
38 pages
Unit 2 (Second Order Methods)
No ratings yet
Unit 2 (Second Order Methods)
9 pages
Chapter 11: Representation & Description Digital Image Processing
No ratings yet
Chapter 11: Representation & Description Digital Image Processing
29 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
OUAT Post Graduate
No ratings yet
OUAT Post Graduate
44 pages
RMK Group 21cs905 CV Unit 5
No ratings yet
RMK Group 21cs905 CV Unit 5
101 pages
Envmath 4 12 TA P
100% (1)
Envmath 4 12 TA P
2 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
RMK Group 21cs905 CV Unit 2
No ratings yet
RMK Group 21cs905 CV Unit 2
76 pages
Digital Visualization
No ratings yet
Digital Visualization
65 pages
CS3491 - Notes - Unit 4 - Ensemble Techniques and Unsupervised Learning
No ratings yet
CS3491 - Notes - Unit 4 - Ensemble Techniques and Unsupervised Learning
35 pages
DL Unit-2
No ratings yet
DL Unit-2
31 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
PPT_Btech CSE
No ratings yet
PPT_Btech CSE
17 pages
Title Proposal Format
No ratings yet
Title Proposal Format
2 pages
Unit 3
No ratings yet
Unit 3
99 pages
Unit 5
No ratings yet
Unit 5
23 pages
Face Recognition Using Neural Network: Seminar Report
100% (3)
Face Recognition Using Neural Network: Seminar Report
33 pages
Scheme - I Sample Question Paper
50% (2)
Scheme - I Sample Question Paper
4 pages
Business Meeting Protocol and Etiquette in Russia
No ratings yet
Business Meeting Protocol and Etiquette in Russia
5 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Lecture 26-30 Unit 2
No ratings yet
Lecture 26-30 Unit 2
20 pages
Information: Reading Images - The Grammar of Visual Design
No ratings yet
Information: Reading Images - The Grammar of Visual Design
5 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
Mil PRF 19500K
No ratings yet
Mil PRF 19500K
125 pages
Cs3027 Deep Learning Syllabus
No ratings yet
Cs3027 Deep Learning Syllabus
2 pages
DLunit 2
No ratings yet
DLunit 2
8 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
ML Lab
No ratings yet
ML Lab
21 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Basic Relationship Between Pixels
No ratings yet
Basic Relationship Between Pixels
22 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Cp5191 MLT Unit II
No ratings yet
Cp5191 MLT Unit II
27 pages
BM304 Biomedical Signal Processing PDF
No ratings yet
BM304 Biomedical Signal Processing PDF
2 pages
Robotics and Machine Vision Internal 3 Important Questions
No ratings yet
Robotics and Machine Vision Internal 3 Important Questions
1 page
Lab Manual 15
No ratings yet
Lab Manual 15
9 pages
PBG BCI Robotics
No ratings yet
PBG BCI Robotics
5 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
Detailed Lesson Plan-Dsp
No ratings yet
Detailed Lesson Plan-Dsp
6 pages

Unit 2 Introduction to Deep Learning

Uploaded by

Unit 2 Introduction to Deep Learning

Uploaded by

Department of

Artificial Intelligence and Data Science

2. To understand the methods and terminologies

Deep Feed-Forward Neural Networks – Gradient

• In other words, it specifies how to scale a small

Figure representing Minimum ,maximum saddle Point

Architecture of back propagation network:

• The backpropagation algorithm is a key component

• ReLU (Rectified Linear Unit): takes a real-valued number and

 Most modern deep NNs use ReLU

 ReLU is fast to compute

• Local minima in optimization are points where a

Avoiding bad local minima in deep learning is crucial to

• These heuristics help optimize training time while

• A Regularization in machine learning refers to

• Regularization strategies can involve adding terms to

• Regularization strategies can involve adding terms to

You might also like