Lecture 4 1
Lecture 4 1
Ali Harakeh
University of Waterloo
WAVE Lab
[email protected]
1/60
ME 780
Overview
1 Optimization: Introduction
6 Conclusion
2/60
ME 780
Optimization: Introduction
Section 1
Optimization: Introduction
3/60
ME 780
Optimization: Introduction
Introduction
4/60
ME 780
Review Of First and Second Order Methods
Section 2
5/60
ME 780
Review Of First and Second Order Methods
x ← x − ∇x f (x)
6/60
ME 780
Review Of First and Second Order Methods
7/60
ME 780
Review Of First and Second Order Methods
x∗ = x(0) − H −1 ∇x f (x(0) )
8/60
ME 780
Review Of First and Second Order Methods
9/60
ME 780
Review Of First and Second Order Methods
Lipschitz Continuity
10/60
ME 780
Review Of First and Second Order Methods
Lipschitz Continuity
11/60
ME 780
The Difference Between Learning and Pure Optimization
Section 3
12/60
ME 780
The Difference Between Learning and Pure Optimization
13/60
ME 780
The Difference Between Learning and Pure Optimization
14/60
ME 780
The Difference Between Learning and Pure Optimization
15/60
ME 780
The Difference Between Learning and Pure Optimization
16/60
ME 780
The Difference Between Learning and Pure Optimization
17/60
ME 780
The Difference Between Learning and Pure Optimization
18/60
ME 780
The Difference Between Learning and Pure Optimization
Early Stopping
19/60
ME 780
Batch and Minibatch Algorithms
Section 4
20/60
ME 780
Batch and Minibatch Algorithms
21/60
ME 780
Batch and Minibatch Algorithms
22/60
ME 780
Batch and Minibatch Algorithms
23/60
ME 780
Batch and Minibatch Algorithms
24/60
ME 780
Batch and Minibatch Algorithms
25/60
ME 780
Batch and Minibatch Algorithms
26/60
ME 780
Batch and Minibatch Algorithms
27/60
ME 780
Batch and Minibatch Algorithms
Sampling Minibatches
28/60
ME 780
Batch and Minibatch Algorithms
Sampling Minibatches
29/60
ME 780
Batch and Minibatch Algorithms
30/60
ME 780
Batch and Minibatch Algorithms
31/60
ME 780
Challenges In Deep Model Training
Section 5
32/60
ME 780
Challenges In Deep Model Training
33/60
ME 780
Challenges In Deep Model Training
Ill-Conditioning
34/60
ME 780
Challenges In Deep Model Training
Ill-Conditioning
35/60
ME 780
Challenges In Deep Model Training
Ill-Conditioning
36/60
ME 780
Challenges In Deep Model Training
Local Minima
37/60
ME 780
Challenges In Deep Model Training
38/60
ME 780
Challenges In Deep Model Training
39/60
ME 780
Challenges In Deep Model Training
40/60
ME 780
Challenges In Deep Model Training
41/60
ME 780
Challenges In Deep Model Training
42/60
ME 780
Challenges In Deep Model Training
43/60
ME 780
Challenges In Deep Model Training
44/60
ME 780
Challenges In Deep Model Training
Saddle Points
45/60
ME 780
Challenges In Deep Model Training
Saddle Points
46/60
ME 780
Challenges In Deep Model Training
Saddle Points
47/60
ME 780
Challenges In Deep Model Training
48/60
ME 780
Challenges In Deep Model Training
Saddle Points
49/60
ME 780
Challenges In Deep Model Training
50/60
ME 780
Challenges In Deep Model Training
51/60
ME 780
Challenges In Deep Model Training
52/60
ME 780
Challenges In Deep Model Training
53/60
ME 780
Challenges In Deep Model Training
54/60
ME 780
Challenges In Deep Model Training
Gradients do not specify the optimal step size, but only the
optimal direction within and infinitesimal region.
When the traditional gradient descent algorithm proposes to
make a very large step, the gradient clipping heuristic
intervenes to reduce the step size to be small enough that it is
less likely to go outside the region where the gradient
indicates the direction of approximately steepest descent.
55/60
ME 780
Challenges In Deep Model Training
56/60
ME 780
Challenges In Deep Model Training
57/60
ME 780
Conclusion
Section 6
Conclusion
58/60
ME 780
Conclusion
Conclusion
59/60
ME 780
Conclusion
Next Lecture
60/60