0% found this document useful (0 votes)

26 views61 pages

Week 4

The document covers various optimization techniques in deep learning, including Stochastic Gradient Descent, Batch Optimization, and Mini-Batch Optimization, along with their advantages and disadvantages. It discusses the importance of minimizing loss functions and the role of optimization in machine learning, particularly in relation to linear and logistic regression, and the softmax classifier. Additionally, it introduces concepts of nonlinearity and neural networks, including the implementation of logical functions such as AND, OR, and XOR.

Uploaded by

adithiyaaaiml2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views61 pages

Week 4

Uploaded by

adithiyaaaiml2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Course Name: Deep Learning

Faculty Name: Prof. P. K. Biswas

Department : E & ECE, IIT Kharagpur

Topic
Lecture 16: Optimization
Concepts Covered:

 Multiclass SVM Loss Function

 Optimization

 Stochastic Gradient descent

 Batch Optimization

 Mini-Batch Optimization
Optimizing Loss Function
L
1  max(0,W X t

j i
W Xt

yi
   
i
W 2

kl
N i j  yi k l

W yi
1
N   i | (Wj X i W yi X i   0)] W yi
[ X t t

i j  yi

Wj 
1   | (W X W X   0)]  W
[ X t t

i j i yi i j
N i j  yi

Source - http://cs231n.github.io
Optimizing Loss Function
 
1
 [ X | (W X W X    0)]   W
t t  
1
[ X | (W t X W t X   0)]  W
Wyi
N i j  yi
i j i yi i yi Wj
N 
i j  yi
i j i yi i j

Gradient descent
1
W y (k 1)  (1)W yi (k )  N   [ X i | (Wjt X i Wyti X i   0)]
i
i j  yi
1
N   i | (Wj Xi
Wj (k 1)  (1  )Wj (k )  [ X t
W t
X i   0)]
y i
i j  yi

Source - http://cs231n.github.io
Local and Global Minima
Stochastic/ Batch/ Mini batch
Optimization
Stochastic Gradient Descent
Upsides
 The frequent updates immediately give an insight into the
performance of the model and the rate of improvement.

 This variant of gradient descent may be the simplest to understand

and implement.

 The increased model update frequency can result in faster learning

on some problems.

 The noisy update process can allow the model to avoid local
minima (e.g. premature convergence).
Stochastic Gradient Descent
Downsides
 Updating the model so frequently is more computationally
expensive than other configurations of gradient descent, taking
significantly longer to train models on large datasets.

 The frequent updates can result in a noisy gradient signal, which

may cause the model parameters and in turn the model error to
jump around (have a higher variance over training epochs).

 The noisy learning process down the error gradient can also make
it hard for the algorithm to settle on an error minimum for the
model.
Batch Gradient Descent
Upsides
 Fewer updates to the model means this variant of gradient
descent is more computationally efficient than stochastic
gradient descent.

 The decreased update frequency results in a more stable

error gradient and may result in a more stable convergence
on some problems.

 The separation of the calculation of prediction errors and the

model update lends the algorithm to parallel processing
based implementations.
Batch Gradient Descent
Downsides
 The more stable error gradient may result in premature convergence of
the model to a less optimal set of parameters.
 The updates at the end of the training epoch require the additional
complexity of accumulating prediction errors across all training examples.
 It requires the entire training dataset in memory and available to the
algorithm.
 Model updates, and in turn training speed, may become very slow for
large datasets.
Mini-Batch Gradient Descent
Upsides
 The model update frequency is higher than batch gradient
descent which allows for a more robust convergence, avoiding
local minima.

 The batched updates provide a computationally more efficient

process than stochastic gradient descent.

 The batching allows both the efficiency of not having all training
data in memory and algorithm implementations.
Mini-Batch Gradient Descent
Downsides
 Mini-batch requires the configuration of an additional “mini-
batch size” hyper parameter for the learning algorithm.

 Error information must be accumulated across mini-batches

of training examples like batch gradient descent.
Error minimization with iterations
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 17: Optimization in ML
Concepts Covered:
 Optimization
 Stochastic Gradient Descent
 Batch Optimization
 Mini-batch optimization

 Optimization in ML
 Linear and Logistic Regression
 Softmax classifier
 Nonlinearity
Optimization in Machine
Learning
Optimization in Machine
Learning
 Goal of optimization is to reduce a cost function J(W) to optimize
some performance measure P.
 In pure optimization minimizing J is the goal in and of itself.
 In Machine Learning J(W) is minimized w.r.t parameter W on
training data (training error), and we the error to be low on
unforeseen (test) data.
 Test error (generalization error) should be low.
Optimization in Machine
Learning
Assumptions
 Test and Training data are generated by a probability
distribution: Data generating process.
 Data samples in each data set are independent.
 Training set and Test set are identically distributed.

Performance of ML is its ability to

 Make the training error small.
 Reduce the gap between training and test error.
Underfitting and Overfitting
 Underfitting: Model is not able to obtain sufficiently low training error.
 Overfitting: The gap between training and test error is too large.

We can control Overfitting/ Underfitting by altering its Capacity

Set of functions the learning algorithm can select as being the solution
Linear and Logistic
Regression
Linear & Logistic Regression- Binary
Classification
Linear Regression
f : X R  y R
d
yˆ W Xt

Logistic Regression

p( y | X ;W )   (W X )
t
Linear Regression
X2

X1
Logistic Regression

1 
 (W t X )  W X
t

 (W t X )
1 e

W t X 
Softmax Classifier
 Generalization of Binary Logistic Classifier to
Multiple Classes
s yi  f ( X i ,W ) yi  (WX i ) yi  W X i
t
yi

 Softmax Classifier
s yi
e
p( yi | X i ;W ) 
e
sj
j
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 18: Nonlinearity
Concepts Covered:
 Optimization in ML
 Linear and Logistic Regression
 Softmax classifier
 Nonlinearity
 Neural Network
Nonlinearity
Linear
Seperability
X2

X1
Nonlinearity
X2

-- -
-- - - -
- -+ + ++ + - -
- + + +++ -
- ++ - - X1
- - - -
-
Nonlinearity
Threshold

1 x0 
y   y
0 x0

x 
Logistic Regression

1 
 (W t X )  W X
t

 (W t X )
1 e

W t X 
Nonlinearity
ReLU : Rectified Linear Unit


y  max(0, x)  y

x 
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 19: Neural Network
Concepts Covered:
 Nonlinearity
 Neural Network
 AND Logic
 OR Logic
 XOR Logic

 Feed Forward NN

 Back Propagation Learning
Threshold

1 x0 
y   y
0 x0

x 
Logistic Regression


1
 (W X ) 
t
  (W t X )
1 eW
t
X

W t X 
Nonlinearity
ReLU : Rectified Linear Unit


y  max(0, x)  y

x 
Neuron
• Dendrite: receives signals from
other neurons
• Synapse : point of connection
to other neurons
• Soma : processes the
information
• Axon : transmits the output of
this neuron.
Neuron

X y  f (W t X )


W
Neural Network

y  f (W t X )

  
W W W
AND Function

X1 X2 y X2

0 0 0
0 1

0 1 0 X1  X 2 1.5  0
1 0 0
0 0
1 1 1
X1
AND Function
0
1.5 1 0 
1 

W  1  X  
0 1

 1 1 0
 1   

1 1 1 
AND Function
0
1 0  1.5 1.5  0
1 0 1    0.5 0
X W  
t
0 
1       
1 1  1   0.5 
 0
     

1 1 1   0.5  1
AND
Function
OR Function

X1 X2 y X2
0 0 0
1 1

0 1 1
X1  X 2  0.5  0
1 0 1
0 1
1 1 1
X1
OR Function
0
1 0   0.5  0.5 0
1 0 1    0.5  1
X W  
t
0 
1       
1 1  1   0.5  
 1
     

1 1 1   1.5  1
OR Function
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 20: Neural Network - II
Concepts Covered:
 Neural Network
 AND Logic
 OR Logic
 XOR Logic

 Feed Forward NN
 Back Propagation Learning
AND/ OR
Function
XOR Function
X1 X2 y X2
0 0 0
1 0
0 1 1

1 0 1
0 1
1 1 0
X1
XOR Function
X1  X 2  ( X1 X 2 ).( X1 X 2 )

X1 X2 h1  X1  X 2 h2  X 1  X 2 h1.h2  X1  X 2
0 0 0 1 0

0 1 1 1 1

1 0 1 1 1

1 1 1 0 0
XOR Function
1 1 1 1 1 
 0.5 1  0 0 1 1   0.5 0.5 0.5 1.5 
 0 1 1 1
1 1   
 0.5
  1 1 1 0
 1.5 0  1.5 0.5 0.5  
 1 0 1
W1t X h

1
1 0 1.5  0.5 0
1 1 1   0.5  
h tW    1     1
1

2 1 1 1 
  1  
 0.5 



  
1 1 0   0.5 0
X1  X 2
XOR Function
Neural Network
Function
f (1) f (2) f (i1) f (i ) f ( K 1) f (K )

(K ) ( K 1) ............ (i ) (2) (1)

f (f ( f ....( f (f ( X )))))

DEEP LEARNING IIT Kharagpur Assignment - 4 - 2024
100% (2)
DEEP LEARNING IIT Kharagpur Assignment - 4 - 2024
7 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
DL UNIT II PART II (IMP) Optimization For Training Deep Model
No ratings yet
DL UNIT II PART II (IMP) Optimization For Training Deep Model
81 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Neural Network Module 2 Notes
100% (1)
Neural Network Module 2 Notes
72 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
AI Lec24-25
No ratings yet
AI Lec24-25
63 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Optimization
No ratings yet
Optimization
51 pages
Deep Neural Networks - 2
No ratings yet
Deep Neural Networks - 2
55 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Day 2 - Loss & Activation Functions
No ratings yet
Day 2 - Loss & Activation Functions
8 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Linearity: Skip To Content
No ratings yet
Linearity: Skip To Content
10 pages
Super VIP Cheat Sheet: Arti Cial Intelligence
No ratings yet
Super VIP Cheat Sheet: Arti Cial Intelligence
18 pages
4 Optimization
No ratings yet
4 Optimization
48 pages
Super Cheatsheet Artificial Intelligence
No ratings yet
Super Cheatsheet Artificial Intelligence
18 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
MV cs4243 2024 Amir 6 p1
No ratings yet
MV cs4243 2024 Amir 6 p1
97 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Deep Learning Module 3
No ratings yet
Deep Learning Module 3
15 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Unit V NNHDL
No ratings yet
Unit V NNHDL
33 pages
DNN Cluster S2 22 MidSem Regular
No ratings yet
DNN Cluster S2 22 MidSem Regular
6 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Kannan M5L3 Notes
No ratings yet
Kannan M5L3 Notes
98 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
AngularJS Tutorial W3Schools PDF
No ratings yet
AngularJS Tutorial W3Schools PDF
43 pages
05 Acknowledgement
No ratings yet
05 Acknowledgement
2 pages
HSLU International IT Management
No ratings yet
HSLU International IT Management
17 pages
Exploring The Use of Self-Directed Listening Skills and Strategies by Engineering Students
No ratings yet
Exploring The Use of Self-Directed Listening Skills and Strategies by Engineering Students
15 pages
Handouts - Sociolinguistics
No ratings yet
Handouts - Sociolinguistics
30 pages
Course Outline (EEE315Lab)
No ratings yet
Course Outline (EEE315Lab)
4 pages
Mathematics: Quarter 1 - Module 5: Geometric Sequence vs. Arithmetic Sequence
No ratings yet
Mathematics: Quarter 1 - Module 5: Geometric Sequence vs. Arithmetic Sequence
16 pages
List of Eligible Marshals For 2017 Promotion Exercise
No ratings yet
List of Eligible Marshals For 2017 Promotion Exercise
186 pages
Head Master
100% (1)
Head Master
1 page
RDG 323 - Creating Motivating and Engaging Learning Environments
No ratings yet
RDG 323 - Creating Motivating and Engaging Learning Environments
3 pages
Stereoisomerism Pyqs Nsec
100% (1)
Stereoisomerism Pyqs Nsec
8 pages
The Weekly Schedule Sunday Monday Tuesday Wednesday Thursday Friday Saturday
No ratings yet
The Weekly Schedule Sunday Monday Tuesday Wednesday Thursday Friday Saturday
2 pages
Chuckra Numerical and Nonverbal Worksheets Pack 3 v07 2020 Answers
No ratings yet
Chuckra Numerical and Nonverbal Worksheets Pack 3 v07 2020 Answers
4 pages
Simultaneous Equations Quadratic
No ratings yet
Simultaneous Equations Quadratic
7 pages
Santosh Kumar Kayala: Certifications
No ratings yet
Santosh Kumar Kayala: Certifications
2 pages
Rina Tannenbaum Resume October 2012
No ratings yet
Rina Tannenbaum Resume October 2012
33 pages
M Quiz
No ratings yet
M Quiz
6 pages
Annual Summary Report 2015 - Minhaj Welfare Foundation
No ratings yet
Annual Summary Report 2015 - Minhaj Welfare Foundation
44 pages
Yellow September: You Are Not Alone!
No ratings yet
Yellow September: You Are Not Alone!
9 pages
Practice Quiz M1 (Ungraded) 03
No ratings yet
Practice Quiz M1 (Ungraded) 03
5 pages
Storytelling PDF
No ratings yet
Storytelling PDF
38 pages
Sharmila Rege CV
No ratings yet
Sharmila Rege CV
6 pages
Fractions Multiplying Pictures
0% (1)
Fractions Multiplying Pictures
2 pages
The TESDA Housekeeping NC II
No ratings yet
The TESDA Housekeeping NC II
3 pages
Aldhelm: ("There Is No Real Evidence That Any Such Version Ever Existed... ")
No ratings yet
Aldhelm: ("There Is No Real Evidence That Any Such Version Ever Existed... ")
29 pages
A. Kornberg - Never A Dull Enzyme
No ratings yet
A. Kornberg - Never A Dull Enzyme
33 pages
Analyzing Literature As A Means of Understanding Values in The VUCA (Volatile, Uncertain, Complex, Ambiguous) World
No ratings yet
Analyzing Literature As A Means of Understanding Values in The VUCA (Volatile, Uncertain, Complex, Ambiguous) World
7 pages
Action Plan in He 2019 2020
No ratings yet
Action Plan in He 2019 2020
2 pages
Evaluating The Effectiveness of Modern Forecasting Models in Predicting Commodity Futures Prices in Volatile Economic
No ratings yet
Evaluating The Effectiveness of Modern Forecasting Models in Predicting Commodity Futures Prices in Volatile Economic
16 pages
Child and Adolescent and Learning Principles Finals
No ratings yet
Child and Adolescent and Learning Principles Finals
2 pages

Week 4

Uploaded by

Week 4

Uploaded by

Course Name: Deep Learning

Faculty Name: Prof. P. K. Biswas

 Multiclass SVM Loss Function

 Stochastic Gradient descent

 This variant of gradient descent may be the simplest to understand

 The increased model update frequency can result in faster learning

 The frequent updates can result in a noisy gradient signal, which

 The decreased update frequency results in a more stable

 The separation of the calculation of prediction errors and the

 The batched updates provide a computationally more efficient

 Error information must be accumulated across mini-batches

Performance of ML is its ability to

We can control Overfitting/ Underfitting by altering its Capacity

 Feed Forward NN

(K ) ( K 1) ............ (i ) (2) (1)

You might also like