0% found this document useful (0 votes)
235 views9 pages

DSE 2020-21 2nd Sem DL Problem Solving 2.0

This document discusses different optimization techniques for machine learning models, including calculating optimal learning rates for gradient descent on quadratic error functions, weight updates using ordinary gradient descent, momentum method, and RProp (resilient backpropagation). It provides examples of calculating weight updates and convergence for each method on sample error functions. The optimal learning rate that leads to fastest convergence on the given multivariate quadratic error function is 0.125.

Uploaded by

srirams007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
235 views9 pages

DSE 2020-21 2nd Sem DL Problem Solving 2.0

This document discusses different optimization techniques for machine learning models, including calculating optimal learning rates for gradient descent on quadratic error functions, weight updates using ordinary gradient descent, momentum method, and RProp (resilient backpropagation). It provides examples of calculating weight updates and convergence for each method on sample error functions. The optimal learning rate that leads to fastest convergence on the given multivariate quadratic error function is 0.125.

Uploaded by

srirams007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Deep Learning

Dr. Sugata Ghosal


BITS [email protected]
Pilani
Pilani Campus
Pilani Campus
BITS
Pilani
Pilani Campus

Worked Out Problems


Optimization
These slides are assembled by the instructor with grateful acknowledgement of the many
others who made their course materials freely available online .
Optimal Learning Rate: Multivariate Diagonal Quadratic
Error Function
Error surface is given by E(x,y,z) = 3x2 +2y2 + 4z2 +6. What is the optimal
learning rate that leads to fastest convergence to the global minimum? 

ɳx,opt = 1/6
ɳy,opt = 1/4
ɳz,opt = 1/8

Optimal learning rate =


min(ɳx,opt , ɳy,opt , ɳy,opt ) = 0.125

Largest learning rate for convergence =


min (2ɳx,opt , 2ɳy,opt , 2ɳy,opt) = 0.333
Dependence on learning rate –
Error Minimization

• 1,opt 2,opt

• 2,opt

• 2,opt

• 2,opt

• 2,opt

• 2,opt
Minimization of Quadratic Error Function
Weight Updates – Ordinary Gradient Descent
Weight Updates – Ordinary Gradient Descent

Nestorov
w1_int=1.5+0.9x.5=1.95 w2_inter=2.0+0.9*1=2.9
dE/dw1(w1_int,w2_int)=0.5x(1.95-3)-(2.9-4)/6=-0.342
dE/dw2(w1_int,w2_int)=2/9x(2.9-4)-(1.95-3)/6=-0.0694
W1(t+1)=1.95+0.3*0.342=2.0526 w2(t+1)=2.9+0.3x0.0694=2.92
Weight Updates – Momentum Method

w2(t+1)= 2.058+0.9*(2.0-1.0)=2.958

Nestorov
w1_int=1.5+0.9x.5=1.95 w2_inter=2.0+0.9*1=2.9
dE/dw1(w1_int,w2_int)=0.5x(1.95-3)-(2.9-4)/6=-0.342
dE/dw2(w1_int,w2_int)=2/9x(2.9-4)-(1.95-3)/6=-0.0694
W1(t+1)=1.95+0.3*0.342=2.0526 w2(t+1)=2.9+0.3x0.0694=2.92
Weight Updates – RProp

Assume, α = 1.5, β = 0.6


What will be (w1,w2) at (t+1)?

At time t-1,
dE/dw1 =0.5*(1-3)-(1-4)/6=-0.5
dE/w2=2/9*(1-4)-(1- 3)/6=-0.333

At time t,
dE/dw1 =0.5*(1.5-3)-(2.0-4)/6=0.4167
dE/w2=2/9*(2-4)-(1.5- 3)/6=-0.194

Delta w1 = 1.5-1 =0.5 Delta w2 = 2-1=1

w1(t+1)= 1+0.5*0.6 = 1.3, sign of derivation became different


w2(t+1)= 2.0+1.5*1=3.5, sign of derivative remained same

You might also like