0% found this document useful (0 votes)

40 views24 pages

Tut02 - Calculus Crash Course

The document discusses finding good parameter values for a model that minimize loss. It introduces the concept of calculus and derivatives for finding the local minimum of a loss function. The gradient of a multivariate function provides the rate of change in all directions and can be used with gradient descent to minimize loss by moving in the opposite direction of the gradient at each step.

Uploaded by

Jglewd 2641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views24 pages

Tut02 - Calculus Crash Course

Uploaded by

Jglewd 2641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Finding good parameters

Is there a simple way to

find parameter values
that give me least loss?
Highest education Gender True
Age
(HS, BT, MT, PhD) (M, F, O) Salary We suspect that the following law holds
Oh, of course! You are
45 0 0 1 0 1 0 0 64475 already familiar with
it. It is called calculus.
22 0 1 0 0 0 1 0 34179
Actually, just a linear model
Clearly, we don’t agree
28 1 0 0 0 1 0 0 34573 on the definition of
the word “simple”
34 0 0 1 0 0 1 0 50882

47 1 0 0 0 0 1 0 38660 Use average loss to find good params

55 0 0 1 0 1 0 0 71487

49 0 0 0 1 0 1 0 79430 is the feature vector for the i-th person in our training set

27 0 1 0 0 0 0 1 34355

25 0 1 0 0 1 0 0 43837
calculus crash course
it is said that both Gottfried Leibniz and Isaac Newton
felt that the other person’s work was a bit … derivative
Derivatives
For this simple case, we can exactly calculate the discrepancy.

i.e., smaller the movement, smaller the discrepancy 

2.5
Case 1: ¿1
2
2.25
1.5
1.25
1 Case 2:
0.5

Case 3:
-2 -1.5 -1 -0.5 0 0.5 1 1.5
2
Derivatives – Behind the Scenes

Can beRearranging
obtained as a corollary
this equation gives of Taylor’s theorem
me
Holds only if is “small” otherwise this may not hold
If has the same sign as then
If has opposite sign as then
The derivative tells us two things
Its sign tells us in which direction will function value increase
Interpret ve as “right” and ve as “left”
E.g., tells me that will increase if I decrease from a bit
Its magnitude gives an idea how much function value will change
Stationary points
These aredoesplaces where
not look flat to the derivative vanishes
Yup! In fact, it looks like the i.e. is 0
function value will increase
me at
These could be a local minimum, both to a
theglobal
left and theminimum
right

a local maximum, global maximum, or a saddle point

Derivative being zero at a point is itsalmost waylikeofa straight line! I bet the slope of
Look! At this small scale, the function looks

telling us that the function looks flat this line is . This is why at small scales, we
around that point have

Tangent
The function really does look flat at
line/plane
. What about some other value of
where ?

Remember kids, such effects are visible

only when we are look at very small
scales. Let me zoom in to show you
Stationary points If and , then derivative moves
from +ve to -ve around this point
Can find out if a stationary i.e., maximum

point is a max. or min. using

2nd derivative If and , then this may be min or max
where we define or saddle – higher derivatives e.g.,
needed
Sign of tells us in which
direction will increase
Its magnitude gives us an idea
how much will change
Also, no general way of If and , then derivative moves
telling if a max/min is local or from -ve to +ve around this point
i.e., minimum
global
Rules of derivatives
In the following we have
Sum Rule:
Scaling Rule: if is a constant
Product Rule:
Quotient Rule:
Chain Rule:
Chain rule commonly used when is a function of and is dependent on
Exercise
 Melbo claims that the following function is continuous and
differentiable at for a secret value of and . Find them!

 Use the following identity and derivative rules to show .

Multivariate Functions

Saddle

Minimum

Maximum
Multivariate derivatives aka Gradients
Aha! So, a gradient is like a bunch
of coordinate-wise derivatives
arranged in the form of a vector!
For a function where the input has coordinates, we simply
repeat the process for each coordinates to define the
gradient as
Consider a function from
Trick: convert the problem into analyzing functions as we know them
at a point is the derivative of w.r.t treating as const
The sign of tells us if will or if we increase slightly (keeping const)
The magnitude of tells us how sharply changes upon changing
at a point is the derivative of w.r.t treating as const
The sign of tells us if will or if we increase slightly (keeping const)
The magnitude of tells us how sharply changes upon changing
The vector is called the gradient of
The gradient vector only tells me how the function

Gradients for value changes if I change one coordinate keeping

all other coordinates fixed. What if I want to
change multiple coordinates at the same time?

𝑓 ( 𝑥 , 𝑦0)

( 𝑥0 , 𝑦 0 )
𝑓 ( 𝑥0 , 𝑦 )

Stationary points are those points where the gradient

vector is all zeros i.e., the function looks flat in all directions.
As in 1D case, maxima, minima, saddle points all stationary.
Steepest ascent

where and , if is “small”

A fancy way of saying
Claim: The direction of the gradient vector offers the biggest
increase in function value out of all directions
Proof: Suppose we are only allowed to take a step of length i.e., .
Recall that where is the angle between the vectors and . To get the
max increase in value, we must increase as much as possible which
happens when i.e., and are in the same direction.
Steepest descent

where Ibe
have a feeling this result will
and , if is “small”
very useful when we wish
Indeed! This simple-looking 2-line result
is the key to powerful ML algorithms such
A fancyto minimize
way of saying
loss functions as gradient descent and backpropagation

Claim: The direction opposite to the gradient vector offers the

biggest decrease in function value out of all directions
Proof: Suppose we are only allowed to take a step of length i.e., .
Recall that where is the angle between the vectors and . To get the
max decrease in value, we must decrease as much as possible
which happens when i.e., and are in opposite directions.
A Toy Example – Function output values
In this discrete toy example, we
3 3 3 3 3 3 3 3 can
3 calculate gradient at a point
6

as
2 2 2 3 4 3 3 2 where
1
5
4

1 1 1 3 3 3 1 1 1
3

1 0 1 1 2 1 1 0 1
2

1 1 1 3 3 3 1 1 1
1

1 2 3 3 4 3 2 2 2
0

03 13 2 3 3 3 4 3 5 36 37 83 3
A Toy Example – Gradients
In this discrete toy example, we
can calculate gradient at a point
6

as
where
5
4

Saddle Minimum

We can visualize these gradients

using simple arrows as well

Maximum
1
0

0 1 2 3 4 5 6 7 8
A Toy Example – Gradients
In thisa discrete toy example, we
Gradients converge toward
can calculate gradient at a point
maxima from all directions
6

as
where
5

Gradients diverge away

from a minima in all
directions
4

We can visualize these gradients

using simple arrows as well

At saddle points, different things

happen along different directions
1
0

0 1 2 3 4 5 6 7 8
Rules of gradients
In the following we have and
Sum Rule:
Scaling Rule: if is a constant that does not vary with
Product Rule:
Quotient Rule:
Chain Rule:
Rules of derivatives
In the following we have
Sum Rule:
Scaling Rule: if is a constant that does not vary with
Product Rule:
Quotient Rule:
Chain Rule:
A few useful identities
If is a constant that does not vary with

If is a constant vector that does not vary with

if is
If is a constant square matrix that does not vary with
symmetric

If is a constant vector that does not vary with

Exercise
 Find the gradient of the multivariate function where and
 Hint: try to derive this from first principles (easy if using one of the identities)
 Find the gradient of the affine multivariate function defined as
where and .
 Find the gradient where is a constant vector
 Hint: use the fact that for any vector , we always have and the fac that dot
product distributes over addition i.e.,
 Alternatively, use chain rule with where
Applying Calculus to Find My Salary
Lets simplify
Be careful by hiding the bias by taking
though – the similar
looking expression is equal to If is a vector then is actually a
the squared Euclidean norm and square symmetric matrix
whose -th entry is
Let hence, just a scalar.

since are the “ground-truth” labels and don’t depend on

using the scaling rule and an identity
using an identity and the fact that dot products are symmetric i.e.,
Applying Calculus to Find My Salary
By sum rule

A is differentiable, the minima must be one of the stationary pts

If is invertible, then the solution must be

But what if is not

invertible?

We arrange the feature vectors in a matrix and the

true outputs/ground truth labels in a vector
Then we need to apply other learning
techniques such as regularization
which we will study later.
Summary
Calculus techniques play a major role in designing ML algorithms
For , its derivative tells us how the function output value will vary if
we make a tiny change to its input value
The sign tells if the output will go up or down if we increase the input a little
bit
The magnitude tells us how sensitive is the output to changes in the input
Stationary points (max, min, saddle) are where the derivative is zero
For , its gradient plays the same role
and its -th coordinate tells us how the output value will change if the -th
coordinate of input is changed keeping all other coordinates fixed
The gradient gives us the direction of steepest ascent for the function output
value and the direction opposite to it is the direction of steepest ascent
Stay Amazing!
Hang-out with you in the next one

Printers Presentation
100% (1)
Printers Presentation
17 pages
Mathematics For Economic Analysis - Sydsaeter, Knut Hammond, Peter J., 1945 - 2006 - Delhi - Dorling Kindersley, Pearson Education
No ratings yet
Mathematics For Economic Analysis - Sydsaeter, Knut Hammond, Peter J., 1945 - 2006 - Delhi - Dorling Kindersley, Pearson Education
1,004 pages
HBC2111 Management Maths Ii
No ratings yet
HBC2111 Management Maths Ii
74 pages
Maths For ML Revision
No ratings yet
Maths For ML Revision
25 pages
Calculus Strauss PDF
78% (40)
Calculus Strauss PDF
1,114 pages
Sixteen Saviours or One?, John Perry. 1879
100% (3)
Sixteen Saviours or One?, John Perry. 1879
160 pages
Woodpecker Lx16
No ratings yet
Woodpecker Lx16
46 pages
2 Gradient Vector and Directional Derivative Portal File
No ratings yet
2 Gradient Vector and Directional Derivative Portal File
29 pages
Math Camp Calculus
No ratings yet
Math Camp Calculus
51 pages
15hp & 30hp Motors
No ratings yet
15hp & 30hp Motors
2 pages
Baofeng Cheat Sheet - W7APK
No ratings yet
Baofeng Cheat Sheet - W7APK
5 pages
Practical Work 3 Zener Diode
No ratings yet
Practical Work 3 Zener Diode
9 pages
Mathematics II Calculus - Differentiation
No ratings yet
Mathematics II Calculus - Differentiation
53 pages
Optimization and Gradient Descent Algorithm
No ratings yet
Optimization and Gradient Descent Algorithm
37 pages
Unit II. 21mab101t Calculus and Linear Algebra 2023 2024pptx
No ratings yet
Unit II. 21mab101t Calculus and Linear Algebra 2023 2024pptx
84 pages
Application of Derivative
No ratings yet
Application of Derivative
8 pages
11.-I Love The Earth - Password - Removed
No ratings yet
11.-I Love The Earth - Password - Removed
16 pages
Unit2 Multivar Calculus With Correction
No ratings yet
Unit2 Multivar Calculus With Correction
64 pages
Differential Calculus
0% (1)
Differential Calculus
217 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
01 04 Circles 4 PDF
0% (1)
01 04 Circles 4 PDF
9 pages
Distance and Open Learning
No ratings yet
Distance and Open Learning
64 pages
Calculus 1
100% (1)
Calculus 1
230 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Key Concepts of Vectors
No ratings yet
Key Concepts of Vectors
26 pages
Math2510Section14 4
No ratings yet
Math2510Section14 4
37 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
1012 - Mathematics - Unit11 F22
No ratings yet
1012 - Mathematics - Unit11 F22
33 pages
Differentiation
No ratings yet
Differentiation
59 pages
Vector Calculus
No ratings yet
Vector Calculus
16 pages
MC Ty Maxmin 2009 1 PDF
No ratings yet
MC Ty Maxmin 2009 1 PDF
10 pages
SchneiderPL PDF
No ratings yet
SchneiderPL PDF
281 pages
Variational Principles Cropped K2opt
No ratings yet
Variational Principles Cropped K2opt
63 pages
Baby Care Catlogue 1-10-23 Compress 1
No ratings yet
Baby Care Catlogue 1-10-23 Compress 1
52 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Final Exam Ch4 3
No ratings yet
Final Exam Ch4 3
11 pages
Module 3 - Statistics
No ratings yet
Module 3 - Statistics
24 pages
Gradientt
No ratings yet
Gradientt
6 pages
3.4 Differentiation
No ratings yet
3.4 Differentiation
15 pages
Identify Your Helpers of Destiny
90% (10)
Identify Your Helpers of Destiny
6 pages
Lesson 8a DIRECTIONAL DERIVATIVES & GRADIENTS
No ratings yet
Lesson 8a DIRECTIONAL DERIVATIVES & GRADIENTS
33 pages
CSEC Math 2018 Paper 032
No ratings yet
CSEC Math 2018 Paper 032
16 pages
Ceramic Data Sheet
No ratings yet
Ceramic Data Sheet
2 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Gradients in Breif
No ratings yet
Gradients in Breif
8 pages
BM Chapter 4
No ratings yet
BM Chapter 4
32 pages
Maths For ML
No ratings yet
Maths For ML
1 page
Mathematical Review
No ratings yet
Mathematical Review
12 pages
SS1 Jis
No ratings yet
SS1 Jis
5 pages
John Dewey - Towards A Flexible Curriculum
No ratings yet
John Dewey - Towards A Flexible Curriculum
8 pages
Pembangunan Ekonomi Pertanian Digital Dalam Menduk
No ratings yet
Pembangunan Ekonomi Pertanian Digital Dalam Menduk
25 pages
Gradient
No ratings yet
Gradient
4 pages
Analisis Factorial 2 2 y 2 3
No ratings yet
Analisis Factorial 2 2 y 2 3
8 pages
FF Calculus 2
No ratings yet
FF Calculus 2
12 pages
Calculus Review
No ratings yet
Calculus Review
5 pages
ALiCC PWRI 2006
No ratings yet
ALiCC PWRI 2006
47 pages
Differentiation
No ratings yet
Differentiation
8 pages
Calc
No ratings yet
Calc
6 pages
Aztecs Primary Homework Help
100% (1)
Aztecs Primary Homework Help
4 pages
VLSI System Design
No ratings yet
VLSI System Design
91 pages
Aravind Rangamreddy 500195259 cs3
No ratings yet
Aravind Rangamreddy 500195259 cs3
8 pages
Technical White Paper For VPLS: Huawei Technologies Co., LTD
No ratings yet
Technical White Paper For VPLS: Huawei Technologies Co., LTD
19 pages
Untitled
No ratings yet
Untitled
4 pages
Gradient - Wikipedia PDF
No ratings yet
Gradient - Wikipedia PDF
10 pages
Vector Calculus - Understanding The Gradient - BetterExplained
No ratings yet
Vector Calculus - Understanding The Gradient - BetterExplained
30 pages
Increasing and Decreasing Functions
100% (1)
Increasing and Decreasing Functions
51 pages
32
No ratings yet
32
11 pages
Steering Damper For Yamaha R1M: Otee! Otee!
No ratings yet
Steering Damper For Yamaha R1M: Otee! Otee!
4 pages
Application of Differentiation
0% (1)
Application of Differentiation
57 pages
Functions Several Variables: CH 7-Of
No ratings yet
Functions Several Variables: CH 7-Of
56 pages
4.4 Directional
No ratings yet
4.4 Directional
29 pages
Dimension Drawing BD1100+ Purge
No ratings yet
Dimension Drawing BD1100+ Purge
1 page
Platoon #1
No ratings yet
Platoon #1
3 pages
Chia Verini 2002
No ratings yet
Chia Verini 2002
2 pages
EC270 CH2 Math Lecture Slides WEB
No ratings yet
EC270 CH2 Math Lecture Slides WEB
37 pages
Gradient
No ratings yet
Gradient
6 pages
Understanding Gradient and Divergence
No ratings yet
Understanding Gradient and Divergence
24 pages
The Differential Calculus
No ratings yet
The Differential Calculus
42 pages
Vector Calculus - Understanding The Gradient
No ratings yet
Vector Calculus - Understanding The Gradient
6 pages
Lecture 00 Math Review
No ratings yet
Lecture 00 Math Review
23 pages
QT Calculus
No ratings yet
QT Calculus
10 pages
Calculus of Variation and Image Processing: Scalar Product
No ratings yet
Calculus of Variation and Image Processing: Scalar Product
9 pages
Thoughts On Educational Topics and Institutions by Boutwell, George S., 1818-1905
No ratings yet
Thoughts On Educational Topics and Institutions by Boutwell, George S., 1818-1905
118 pages
Social Studies Unit Plan Organizer Teacher Candidate: Andrea Murree Grade: 6 Social Studies
No ratings yet
Social Studies Unit Plan Organizer Teacher Candidate: Andrea Murree Grade: 6 Social Studies
23 pages
Ma1505 Cheat
No ratings yet
Ma1505 Cheat
4 pages
Emergency Light
No ratings yet
Emergency Light
2 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
The Little Book of Javascript
From Everand
The Little Book of Javascript
Karl Agius
No ratings yet
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Tut02 - Calculus Crash Course

Uploaded by

Tut02 - Calculus Crash Course

Uploaded by

Finding good parameters

Is there a simple way to

47 1 0 0 0 0 1 0 38660 Use average loss to find good params

i.e., smaller the movement, smaller the discrepancy 

a local maximum, global maximum, or a saddle point

Remember kids, such effects are visible

point is a max. or min. using

 Use the following identity and derivative rules to show .

Gradients for value changes if I change one coordinate keeping

Stationary points are those points where the gradient

where and , if is “small”

Claim: The direction opposite to the gradient vector offers the

We can visualize these gradients

using simple arrows as well

Gradients diverge away

We can visualize these gradients

using simple arrows as well

At saddle points, different things

If is a constant vector that does not vary with

If is a constant vector that does not vary with

since are the “ground-truth” labels and don’t depend on

A is differentiable, the minima must be one of the stationary pts

If is invertible, then the solution must be

But what if is not

We arrange the feature vectors in a matrix and the

You might also like