Open navigation menu

Scribd

0% found this document useful (0 votes)

27 views25 pages

Lecture 4 - SGD, Back Propagation

The document discusses stochastic gradient descent and backpropagation algorithms. Stochastic gradient descent is an optimization method for minimizing loss functions. It iterates over examples individually to update parameters. Backpropagation is a method for calculating gradients in neural networks using chain rule to efficiently propagate gradients backward through the network.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views25 pages

Lecture 4 - SGD, Back Propagation

The document discusses stochastic gradient descent and backpropagation algorithms. Stochastic gradient descent is an optimization method for minimizing loss functions. It iterates over examples individually to update parameters. Backpropagation is a method for calculating gradients in neural networks using chain rule to efficiently propagate gradients backward through the network.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Deep Learning

Vazgen Mikayelyan

October 27, 2020

V. Mikayelyan Deep Learning October 27, 2020 1/9

Outline

1 Stochastic Gradient Descent

2 Back-Propagation

V. Mikayelyan Deep Learning October 27, 2020 2/9

Stochastic Gradient Descent

Let L be a loss function that we know:

V. Mikayelyan Deep Learning October 27, 2020 3/9

Stochastic Gradient Descent

Let L be a loss function that we know:

n
1X
L (w ) = (fw (xi ) − yi )2 ,
n
i=1

n
1X
L (w ) = (−yl log fw (xi ) − (1 − yi ) log (1 − fw (xi ))) ,
n
i=1
n
1 X T
L (w ) = −yi log fw (xi ) .
n
i=1

V. Mikayelyan Deep Learning October 27, 2020 3/9

Stochastic Gradient Descent

Let L be a loss function that we know:

n
1X
L (w ) = (fw (xi ) − yi )2 ,
n
i=1

n
1X
L (w ) = (−yl log fw (xi ) − (1 − yi ) log (1 − fw (xi ))) ,
n
i=1
n
1 X T
L (w ) = −yi log fw (xi ) .
n
i=1

Do you see problems in finding minimum of these functions using GD?

V. Mikayelyan Deep Learning October 27, 2020 3/9

Stochastic Gradient Descent

Note that in each case we can represent the loss function by the following
form:
n
1X
L (w ) = Li (w ) .
n
i=1

V. Mikayelyan Deep Learning October 27, 2020 4/9

Stochastic Gradient Descent

Note that in each case we can represent the loss function by the following
form:
n
1X
L (w ) = Li (w ) .
n
i=1

SGD algorithm is the following:

Choose an initial vector of parameters w and learning rate α.

V. Mikayelyan Deep Learning October 27, 2020 4/9

Stochastic Gradient Descent

Note that in each case we can represent the loss function by the following
form:
n
1X
L (w ) = Li (w ) .
n
i=1

SGD algorithm is the following:

Choose an initial vector of parameters w and learning rate α.
Repeat until an approximate minimum is obtained.

V. Mikayelyan Deep Learning October 27, 2020 4/9

Stochastic Gradient Descent

Note that in each case we can represent the loss function by the following
form:
n
1X
L (w ) = Li (w ) .
n
i=1

SGD algorithm is the following:

Choose an initial vector of parameters w and learning rate α.
Repeat until an approximate minimum is obtained.
Randomly shuffle examples in the training set.

V. Mikayelyan Deep Learning October 27, 2020 4/9

Stochastic Gradient Descent

Note that in each case we can represent the loss function by the following
form:
n
1X
L (w ) = Li (w ) .
n
i=1

SGD algorithm is the following:

Choose an initial vector of parameters w and learning rate α.
Repeat until an approximate minimum is obtained.
Randomly shuffle examples in the training set.
For i = 1, 2, ..., n, do w ← w − α∇Li (w ).

V. Mikayelyan Deep Learning October 27, 2020 4/9

Stochastic Gradient Descent

Note that in each case we can represent the loss function by the following
form:
n
1X
L (w ) = Li (w ) .
n
i=1

SGD algorithm is the following:

Choose an initial vector of parameters w and learning rate α.
Repeat until an approximate minimum is obtained.
Randomly shuffle examples in the training set.
For i = 1, 2, ..., n, do w ← w − α∇Li (w ).
Do you see problems in this optimization method?

V. Mikayelyan Deep Learning October 27, 2020 4/9

Mini-Batch Gradient Descent

MBGD algorithm is the following:

Choose an initial vector of parameters w , learning rate α and batch
size B.

V. Mikayelyan Deep Learning October 27, 2020 5/9

Mini-Batch Gradient Descent

MBGD algorithm is the following:

Choose an initial vector of parameters w , learning rate α and batch
size B.
Repeat until an approximate minimum is obtained.

V. Mikayelyan Deep Learning October 27, 2020 5/9

Mini-Batch Gradient Descent

MBGD algorithm is the following:

Choose an initial vector of parameters w , learning rate α and batch
size B.
Repeat until an approximate minimum is obtained.
Randomly shuffle examples in the training set.

V. Mikayelyan Deep Learning October 27, 2020 5/9

Mini-Batch Gradient Descent

MBGD algorithm is the following:

Choose an initial vector of parameters w , learning rate α and batch
size B.
Repeat until an approximate minimum is obtained.
Randomly shuffle examples in the training set.
For i = 1, 2, ..., d Bn e, do
i·B
1 X
w ← w − α∇ Lk (w ) .
B
k=(i−1)·B+1

V. Mikayelyan Deep Learning October 27, 2020 5/9

Outline

1 Stochastic Gradient Descent

2 Back-Propagation

V. Mikayelyan Deep Learning October 27, 2020 6/9

Back-Propagation

Question: How to calculate the derivative of the function sin x 2 ?

V. Mikayelyan Deep Learning October 27, 2020 7/9

Back-Propagation

Question: How to calculate the derivative of the function sin x 2 ?

Theorem 1
Given n functions f1 , . . . , fn with the composite function

f = f1 ◦ (f2 ◦ · · · (fn−1 ◦ fn )) ,

if each function fi is differentiable at its immediate input, then the

composite function is also differentiable by the repeated application of
Chain Rule, where the derivative is
df df1 df2 dfn
= ··· .
dx df2 df3 dx

V. Mikayelyan Deep Learning October 27, 2020 7/9

Back-Propagation

V. Mikayelyan Deep Learning October 27, 2020 8/9

Back-Propagation

V. Mikayelyan Deep Learning October 27, 2020 9/9

Back-Propagation

In this case we have the following function

L (w1 , w2 ) = (f2 (w2 f1 (w1 x)) − y )2

V. Mikayelyan Deep Learning October 27, 2020 9/9

Back-Propagation

In this case we have the following function

L (w1 , w2 ) = (f2 (w2 f1 (w1 x)) − y )2

∂L ∂L
We have to calculate the derivatives and :
∂w1 ∂w2

V. Mikayelyan Deep Learning October 27, 2020 9/9

Back-Propagation

In this case we have the following function

L (w1 , w2 ) = (f2 (w2 f1 (w1 x)) − y )2

∂L ∂L
We have to calculate the derivatives and :
∂w1 ∂w2
∂L ∂L ∂f2 ∂ (w2 f1 )
= ,
∂w2 ∂f2 ∂ (w2 f1 ) ∂w2

V. Mikayelyan Deep Learning October 27, 2020 9/9

Back-Propagation

In this case we have the following function

L (w1 , w2 ) = (f2 (w2 f1 (w1 x)) − y )2

∂L ∂L
We have to calculate the derivatives and :
∂w1 ∂w2
∂L ∂L ∂f2 ∂ (w2 f1 )
= ,
∂w2 ∂f2 ∂ (w2 f1 ) ∂w2

∂L
=
∂w1
V. Mikayelyan Deep Learning October 27, 2020 9/9
Back-Propagation

In this case we have the following function

L (w1 , w2 ) = (f2 (w2 f1 (w1 x)) − y )2

∂L ∂L
We have to calculate the derivatives and :
∂w1 ∂w2
∂L ∂L ∂f2 ∂ (w2 f1 )
= ,
∂w2 ∂f2 ∂ (w2 f1 ) ∂w2

∂L ∂L ∂f2 ∂ (w2 f1 ) ∂ (f1 ) ∂ (w1 x)

= .
∂w1 ∂f2 ∂ (w2 f1 ) ∂f1 ∂ (w1 x) ∂w1
V. Mikayelyan Deep Learning October 27, 2020 9/9

You might also like

AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
32 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
UNIT2
No ratings yet
UNIT2
25 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Learning 3
No ratings yet
Learning 3
98 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Neural Network Training
No ratings yet
Neural Network Training
73 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Week 4
No ratings yet
Week 4
61 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Back Propagation
No ratings yet
Back Propagation
71 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Back Propogation
No ratings yet
Back Propogation
43 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
61 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
Lecture 18. Backpropagation
No ratings yet
Lecture 18. Backpropagation
55 pages
HMD-Deep Learning-Lecture 2-2024
No ratings yet
HMD-Deep Learning-Lecture 2-2024
47 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Tut 01
No ratings yet
Tut 01
39 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Lecture 2 - GD Linear Regression
No ratings yet
Lecture 2 - GD Linear Regression
28 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
35 pages
Lecture 3 - Logistic Regression, Regularization, Softmax Classifier
No ratings yet
Lecture 3 - Logistic Regression, Regularization, Softmax Classifier
26 pages
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
16 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
30 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
Part of DL
No ratings yet
Part of DL
24 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
Montanari
No ratings yet
Montanari
10 pages
Autodiff
No ratings yet
Autodiff
12 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Lecture 6
No ratings yet
Lecture 6
25 pages
Aie231 NN Lab5
No ratings yet
Aie231 NN Lab5
7 pages
Unit 3
No ratings yet
Unit 3
6 pages
Week 1 Solutions
No ratings yet
Week 1 Solutions
8 pages
Backprop Unit 2
No ratings yet
Backprop Unit 2
5 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
No ratings yet
ML - Stochastic Gradient Descent (SGD) - GeeksforGeeks
9 pages
04 Batch SGD Mini Batch Gradient Descent Algorithms
No ratings yet
04 Batch SGD Mini Batch Gradient Descent Algorithms
3 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages