0% found this document useful (0 votes)
6 views51 pages

Differentiation, Partial Differentiation & Gradients

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views51 pages

Differentiation, Partial Differentiation & Gradients

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Applied

Mathematics for
Machine Learning
José David Vega Sánchez
[email protected]
2025
2
Cálculo Vectorial
Outline

1. Introduction to Gradient Descent

2. Differentation of Univariate Functions

3. Partial Differentation & Gradients


Introduction to
Gradient
Descent
1. Introduction to Gradient Descent
Housing Prices
(Portland, OR)
Price
220k
(in 1000s
of dollars)

1250
Regression Problem Size (feet2)
Predict real-valued output
1. Introduction to Gradient Descent
Price ($) in 1000's
Training set of Size in feet2 (x)
(y)
housing prices
2104 460
(Portland, OR)
1416 232 m
Notation: 1534 315
852 178
m = Number of training examples
x’s = “input” variable / features … …
y’s = “output” variable / “target” variable

(x, y) → one training example


(x(i), y(i)) → ith training example
1. Introduction to Gradient Descent
How do we represent h ?
Training Set

Shorthand: h(x)
Learning Algorithm
y

Size of h Estimated x
house price
x hypothesis estimated Linear regression with one variable.
value of y Univariate linear regression.
1. Introduction to Gradient Descent
Price ($) in 1000's
Training Set Size in feet2 (x)
(y)
2104 460
1416 232 m
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
1. Introduction to Gradient Descent
# training examples

Let’s define

y
Cost Function
Mean Squared Error Cost Function
x
1. Introduction to Gradient Descent
Simplified
Hypothesis:

Parameters:
y y

Cost Function: x x

Goal:
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameter )

x
1. Introduction to Gradient Descent

(for fixed , this is a function of x)


(function of the parameter )

y erro
r

x
m = 3 examples in the training set
1. Introduction to Gradient Descent

(for fixed , this is a function of x) (function of the parameter )

x
1. Introduction to Gradient Descent
(for fixed , this is a function of x)
(function of the parameters )

Price ($)
in 1000’s

Size in feet2 (x)


How to plot two parameter ?
1. Introduction to Gradient Descent
Slope (pendiente)

https://www.kaggle.com/code/trolukovich/animating-gradien-descent/input
1. Introduction to Gradient Descent
𝜃𝜃𝑗𝑗 = 𝜃𝜃𝑗𝑗 - ∝ ∇𝑗𝑗
𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
Derivadas parciales Learning Rate

𝜕𝜕𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
𝜕𝜕𝜃𝜃0
=∇
𝜕𝜕𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
Gradiente
𝜕𝜕𝜃𝜃1
𝜃𝜃1

𝜕𝜕𝐽𝐽(𝜃𝜃0 , 𝜃𝜃1 )
∇𝑗𝑗 = , 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑗𝑗 = 0,1
𝜕𝜕𝜃𝜃𝑗𝑗
𝜃𝜃0
1. Introduction to Gradient Descent
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )

Same

bad minimu
fit! m
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )

bad
minimu
fit!
m
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )

minimu
bad m
fit!
1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )

good
fit! minimu
m
close to the
minimum

We need an algorithm to automatically choose


the values of θ0 and θ1 that minimizes the cost function J
1. Introduction to Gradient Descent

J(θ0,θ1)

θ1
θ0
1. Introduction to Gradient Descent

J(θ0,θ1)

θ1
θ0
1. Introduction to Gradient Descent

Gradient descent algorithm

learning
rate
1. Introduction to Gradient Descent
Gradient descent algorithm

update
and
simultaneously
1. Introduction to Gradient Descent

J(θ0,θ1)

θ1
θ0

Bad news: Gradient Descent is susceptible to local optima


1. Introduction to Gradient Descent
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Differentation of
Univariate Function
2. Differentation of Univariate Function
Derivative
The derivative is a fundamental concept in calculus that measures the rate at which a
function changes. It gives us the slope of a function's graph at any given point.
2. Differentation of Univariate Function
Example: Derivative

Substitute Factor

Expand

Take the limit

Simplify
2. Differentation of Univariate Function
Derivative Rules
Partial Differentation
& Gradients
3. Partial Differentation & Gradients
Definition of Partial Differentation
In a function of multiple variables, a partial derivative represents the rate of change of
the function with respect to one variable while keeping all other variables constant.

Notation
If f(x,y) is a function of x and y, the partial derivative with respect to x is denoted as:

Similarly, the partial derivative with respect to y is:


3. Partial Differentation & Gradients
Example

The partial derivative with respect to x is

The partial derivative with respect to y is


3. Partial Differentation & Gradients
Definition of Gradient
The gradient of a function is a vector that contains all the partial derivatives of the function. It
points in the direction of the steepest increase of the function.

Notation
For f(x,y) the gradient is denoted as:

Example

Gradient:
3. Partial Differentation & Gradients
Chain Rule of partial differentation
This play a pivotal role in backpropagation in neural networks
3. Partial Differentation & Gradients
Chain Rule of partial differentation
Now let's use the chain rule to understand how changes in the speed of one
Objetive
factor (like P, B, or C) propagate through the system.

Mathematical Representation
3. Partial Differentation & Gradients
Application of Chain Rule of partial differentation
Using the chain rule, we calculate the rate of change of the car's speed (C) with respect to the
person's speed (P).
3. Partial Differentation & Gradients
Chain Rule of partial differentation: Generalization
𝝏𝝏𝒘𝒘
= ?
𝝏𝝏𝒕𝒕
3. Partial Differentation & Gradients
Chain Rule of partial differentation: Generalization
𝝏𝝏𝒘𝒘
= ?
𝝏𝝏𝒕𝒕
3. Partial Differentation & Gradients
Chain Rule of partial differentation: Generalization
𝝏𝝏𝒘𝒘
= ?
𝝏𝝏𝒕𝒕

Uso de la regla de la cadena

Derivadas parciales

Finalmente
Thanks

You might also like