Math Lecture 4
Math Lecture 4
Lecture 4
Dr Salaheddin Alakkari
Recap
Geometric property of Derivatives
• Since the derivative of a function at a given 𝑥 value corresponds to
the slope of the tangent line at this point, one can identify if the
function is increasing or decreasing at this point.
• If the derivative is positive, then the function is increasing at that
point.
• If the derivative is negative, then the function is decreasing at that
point.
Geometric property of Derivatives: Identifying
Maxima and Minima Points
• A very important property in differentiable functions is that we can
identify local maxima and local minima using differentiation.
• This can be done by finding the values of 𝑥 where the derivative
equals zero.
• We can know whether the point is maxima or minima by studying
the behaviour of the function around this point.
Geometric Property of Derivatives: Convexity
and Inflection Points
• For twice differentiable functions, the second derivative at a given
value 𝑥0 will tell us if the function is concave or convex at the given
point.
• The 𝑥 values where the second derivative vanishes (becomes
equal to zero) are known as inflection points.
• Inflection point is a point where the function changes its
behaviour from being convex to concave or vice versa.
Activity: Sketching a Function Using
Differentiation Properties
• Sketch the function 𝑓 𝑥 = 2𝑥 3 − 6𝑥
A very brief introduction to integration
• Integration is the opposite to derivative.
• One can simply define integration using the following rule
න 𝑓′(𝑥) 𝑑𝑥 = 𝑓(𝑥) + 𝐶
𝑥 𝑛+1
න 𝑥 𝑛 𝑑𝑥 = +𝐶
𝑛+1
𝑥 𝑛+1
න 𝑎𝑥 𝑛 𝑑𝑥 = 𝑎 +𝐶
𝑛+1
Multi-variate Real Function
Multi-variate Functions
• In this section we will discuss functions of the form
𝑓 𝑥1 , ⋯ , 𝑥𝑛 : ℝ𝑛 ⟶ ℝ.
• These functions are very common in many applications.
• Example of such functions 𝑓 𝑥, 𝑦 = 𝑥 2 + 2𝑥𝑦.
Partial Derivatives of a Multi-variate Function
• For multi-variate functions, we can find partial derivative that is
related to a particular variable:
𝜕𝑓
• Example 𝑓 𝑥, 𝑦 = 𝑥2 + 2𝑥𝑦 find
𝜕𝑥
Gradient of a Mult-variate Functions
• Given a function𝑓 𝑥1 , ⋯ , 𝑥𝑛 : ℝ𝑛 ⟶ ℝ, the gradient of this
functions is fined as follows
𝜕𝑓
𝜕𝑥1
∇𝑓 = ⋮ ⊆ ℝ𝑛
𝜕𝑓
𝜕𝑥𝑛
• This vector is known as the gradient vector or simply the gradient.
• Example find the gradient of the function 𝑓 𝑥, 𝑦 = 𝑥 2 + 2𝑥𝑦
Example
• 𝑓 𝑥1 , 𝑥2 , 𝑥3 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑏, where 𝑏 ∈ ℝ.
• The gradient is
𝜕𝑓
𝜕𝑥1 𝑎1
𝜕𝑓
∇𝑓 = 𝜕𝑥2
= 𝑎2
𝜕𝑓
𝑎3
𝜕𝑥3
Examples
Unconstrained Convex Optimization Problem:
Definition
• The problem
minimize 𝑓(𝑥)
where 𝑓 is convex and the domain of 𝑓 𝐝𝐨𝐦𝑓 ⊆ ℝ𝑛 .
• We refer to this type of problem as unconstrained optimization
problem.
• Note: The domain of a convex functions is a convex set.
• The set ℝ𝑛 is always a convex set.
• This is the standard form for unconstrained convex optimization.
• We refer to the function 𝑓 as the objective or loss function
• The optimality condition for such type of problems is when the
gradient vector vanishes 𝜵𝒇 𝒙 = 𝟎.
Solve the Following Optimization Problem
minimize 𝑓 𝑥 = 𝑥 2 + 6𝑥 + 3
Another example
Minimize 𝑓 𝑥 = 4𝑥 2 − 6𝑥 − 4
Unconstrained Convex Optimization Problem
• More straightforward to solve.
• In many cases, the problem may be solved analytically.
• But in practice we deal with high dimensional problem that are not
possible to solve analytically.
• Hence, iterative gradient descent methods offer a very effective
treatment for such type of problems.
Gradient Descent:
Gradient Descent:
History
• The gradient descent is
attributed to the French
mathematician Cauchy in his
1847 work “Méthode générale
pour la résolution des
systèmes d'équations
simultanées”.
• It is one of the most practical
methods to date for solving
unconstrained optimization
problems.
Gradient Descent: The Idea
• Consider a differentiable single-variable function 𝑓: ℝ ⟶ ℝ.
• The first derivative 𝑓′ 𝑥𝑘 at point 𝑥𝑘 gives the slope at this point.
• We want always to move towards the local minimal point.
𝑓(𝑥)
𝑥𝑘 𝑓′ 𝑥𝑘 > 0
𝑓′ 𝑥𝑘 < 0 𝑥
𝑘 Positive slope, move backward
Negative slope, move forward 𝑥𝑘+1 < 𝑥𝑘
𝑥𝑘+1 > 𝑥𝑘
𝑥
Gradient Descent: The Equation
• For single value functions 𝑓: ℝ ⟶ ℝ, 𝑥𝑘+1 = 𝑥𝑘 − 𝜂𝑓′ 𝑥𝑘 .
• For multi-variate functions 𝑓: ℝ𝑛 ⟶ ℝ, 𝑥𝑘+1 = 𝑥𝑘 − 𝜂𝛻𝑓 𝑥𝑘
• 𝜂 is referred
𝑓(𝑥)
to as the learning rate or step size.
𝑥𝑘 𝑓′ 𝑥𝑘 > 0
𝑓′ 𝑥𝑘 < 0 𝑥
𝑘 Positive slope, move backward
Negative slope, move forward 𝑥𝑘+1 < 𝑥𝑘
𝑥𝑘+1 > 𝑥𝑘
𝑥
Gradient Descent: The Algortihm
• Choose an initial starting point 𝑥 0 in the domain
• Repeat
• Update 𝑥𝑘+1 = 𝑥𝑘 − 𝜂𝑘 𝛻𝑓 𝑥𝑘
• 𝑘 =𝑘+1
• Until convergence 𝛻𝑓 𝑥𝑘 ≈ 0
Example
• minimize 𝑓 𝑥 = 5𝑥 2 + 3𝑥 + 2
• 𝑥0 = 4 and 𝜂 = 0.1
• Find 𝑥1
Example in ℝ 2