Tut02 - Calculus Crash Course
Tut02 - Calculus Crash Course
49 0 0 0 1 0 1 0 79430 is the feature vector for the i-th person in our training set
27 0 1 0 0 0 0 1 34355
25 0 1 0 0 1 0 0 43837
calculus crash course
it is said that both Gottfried Leibniz and Isaac Newton
felt that the other person’s work was a bit … derivative
Derivatives
For this simple case, we can exactly calculate the discrepancy.
Case 3:
-2 -1.5 -1 -0.5 0 0.5 1 1.5
2
Derivatives – Behind the Scenes
Can beRearranging
obtained as a corollary
this equation gives of Taylor’s theorem
me
Holds only if is “small” otherwise this may not hold
If has the same sign as then
If has opposite sign as then
The derivative tells us two things
Its sign tells us in which direction will function value increase
Interpret ve as “right” and ve as “left”
E.g., tells me that will increase if I decrease from a bit
Its magnitude gives an idea how much function value will change
Stationary points
These aredoesplaces where
not look flat to the derivative vanishes
Yup! In fact, it looks like the i.e. is 0
function value will increase
me at
These could be a local minimum, both to a
theglobal
left and theminimum
right
telling us that the function looks flat this line is . This is why at small scales, we
around that point have
Tangent
The function really does look flat at
line/plane
. What about some other value of
where ?
Saddle
Minimum
Maximum
Multivariate derivatives aka Gradients
Aha! So, a gradient is like a bunch
of coordinate-wise derivatives
arranged in the form of a vector!
For a function where the input has coordinates, we simply
repeat the process for each coordinates to define the
gradient as
Consider a function from
Trick: convert the problem into analyzing functions as we know them
at a point is the derivative of w.r.t treating as const
The sign of tells us if will or if we increase slightly (keeping const)
The magnitude of tells us how sharply changes upon changing
at a point is the derivative of w.r.t treating as const
The sign of tells us if will or if we increase slightly (keeping const)
The magnitude of tells us how sharply changes upon changing
The vector is called the gradient of
The gradient vector only tells me how the function
𝑓 ( 𝑥 , 𝑦0)
( 𝑥0 , 𝑦 0 )
𝑓 ( 𝑥0 , 𝑦 )
where Ibe
have a feeling this result will
and , if is “small”
very useful when we wish
Indeed! This simple-looking 2-line result
is the key to powerful ML algorithms such
A fancyto minimize
way of saying
loss functions as gradient descent and backpropagation
as
2 2 2 3 4 3 3 2 where
1
5
4
1 1 1 3 3 3 1 1 1
3
1 0 1 1 2 1 1 0 1
2
1 1 1 3 3 3 1 1 1
1
1 2 3 3 4 3 2 2 2
0
03 13 2 3 3 3 4 3 5 36 37 83 3
A Toy Example – Gradients
In this discrete toy example, we
can calculate gradient at a point
6
as
where
5
4
Saddle Minimum
Maximum
1
0
0 1 2 3 4 5 6 7 8
A Toy Example – Gradients
In thisa discrete toy example, we
Gradients converge toward
can calculate gradient at a point
maxima from all directions
6
as
where
5
0 1 2 3 4 5 6 7 8
Rules of gradients
In the following we have and
Sum Rule:
Scaling Rule: if is a constant that does not vary with
Product Rule:
Quotient Rule:
Chain Rule:
Rules of derivatives
In the following we have
Sum Rule:
Scaling Rule: if is a constant that does not vary with
Product Rule:
Quotient Rule:
Chain Rule:
A few useful identities
If is a constant that does not vary with