Differentiation, Partial Differentiation & Gradients
Differentiation, Partial Differentiation & Gradients
Mathematics for
Machine Learning
José David Vega Sánchez
[email protected]
2025
2
Cálculo Vectorial
Outline
1250
Regression Problem Size (feet2)
Predict real-valued output
1. Introduction to Gradient Descent
Price ($) in 1000's
Training set of Size in feet2 (x)
(y)
housing prices
2104 460
(Portland, OR)
1416 232 m
Notation: 1534 315
852 178
m = Number of training examples
x’s = “input” variable / features … …
y’s = “output” variable / “target” variable
Shorthand: h(x)
Learning Algorithm
y
Size of h Estimated x
house price
x hypothesis estimated Linear regression with one variable.
value of y Univariate linear regression.
1. Introduction to Gradient Descent
Price ($) in 1000's
Training Set Size in feet2 (x)
(y)
2104 460
1416 232 m
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
1. Introduction to Gradient Descent
# training examples
Let’s define
y
Cost Function
Mean Squared Error Cost Function
x
1. Introduction to Gradient Descent
Simplified
Hypothesis:
Parameters:
y y
Cost Function: x x
Goal:
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameter )
x
1. Introduction to Gradient Descent
y erro
r
x
m = 3 examples in the training set
1. Introduction to Gradient Descent
x
1. Introduction to Gradient Descent
(for fixed , this is a function of x)
(function of the parameters )
Price ($)
in 1000’s
https://www.kaggle.com/code/trolukovich/animating-gradien-descent/input
1. Introduction to Gradient Descent
𝜃𝜃𝑗𝑗 = 𝜃𝜃𝑗𝑗 - ∝ ∇𝑗𝑗
𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
Derivadas parciales Learning Rate
𝜕𝜕𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
𝜕𝜕𝜃𝜃0
=∇
𝜕𝜕𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
Gradiente
𝜕𝜕𝜃𝜃1
𝜃𝜃1
𝜕𝜕𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
∇𝑗𝑗 = , 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑗𝑗 = 0,1
𝜕𝜕𝜃𝜃𝑗𝑗
𝜃𝜃0
1. Introduction to Gradient Descent
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )
Same
bad minimu
fit! m
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )
bad
minimu
fit!
m
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )
minimu
bad m
fit!
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )
good
fit! minimu
m
close to the
minimum
J(θ0,θ1)
θ1
θ0
1. Introduction to Gradient Descent
J(θ0,θ1)
θ1
θ0
1. Introduction to Gradient Descent
learning
rate
1. Introduction to Gradient Descent
Gradient descent algorithm
update
and
simultaneously
1. Introduction to Gradient Descent
J(θ0,θ1)
θ1
θ0
Substitute Factor
Expand
Simplify
2. Differentation of Univariate Function
Derivative Rules
Partial Differentation
& Gradients
3. Partial Differentation & Gradients
Definition of Partial Differentation
In a function of multiple variables, a partial derivative represents the rate of change of
the function with respect to one variable while keeping all other variables constant.
Notation
If f(x,y) is a function of x and y, the partial derivative with respect to x is denoted as:
Notation
For f(x,y) the gradient is denoted as:
Example
Gradient:
3. Partial Differentation & Gradients
Chain Rule of partial differentation
This play a pivotal role in backpropagation in neural networks
3. Partial Differentation & Gradients
Chain Rule of partial differentation
Now let's use the chain rule to understand how changes in the speed of one
Objetive
factor (like P, B, or C) propagate through the system.
Mathematical Representation
3. Partial Differentation & Gradients
Application of Chain Rule of partial differentation
Using the chain rule, we calculate the rate of change of the car's speed (C) with respect to the
person's speed (P).
3. Partial Differentation & Gradients
Chain Rule of partial differentation: Generalization
𝝏𝝏𝒘𝒘
= ?
𝝏𝝏𝒕𝒕
3. Partial Differentation & Gradients
Chain Rule of partial differentation: Generalization
𝝏𝝏𝒘𝒘
= ?
𝝏𝝏𝒕𝒕
3. Partial Differentation & Gradients
Chain Rule of partial differentation: Generalization
𝝏𝝏𝒘𝒘
= ?
𝝏𝝏𝒕𝒕
Derivadas parciales
Finalmente
Thanks