0% found this document useful (0 votes)
13 views27 pages

Math Lecture 4

Math Notes 4

Uploaded by

akp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Math Lecture 4

Math Notes 4

Uploaded by

akp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Math for Computer Science

Lecture 4
Dr Salaheddin Alakkari
Recap
Geometric property of Derivatives
• Since the derivative of a function at a given 𝑥 value corresponds to
the slope of the tangent line at this point, one can identify if the
function is increasing or decreasing at this point.
• If the derivative is positive, then the function is increasing at that
point.
• If the derivative is negative, then the function is decreasing at that
point.
Geometric property of Derivatives: Identifying
Maxima and Minima Points
• A very important property in differentiable functions is that we can
identify local maxima and local minima using differentiation.
• This can be done by finding the values of 𝑥 where the derivative
equals zero.
• We can know whether the point is maxima or minima by studying
the behaviour of the function around this point.
Geometric Property of Derivatives: Convexity
and Inflection Points
• For twice differentiable functions, the second derivative at a given
value 𝑥0 will tell us if the function is concave or convex at the given
point.
• The 𝑥 values where the second derivative vanishes (becomes
equal to zero) are known as inflection points.
• Inflection point is a point where the function changes its
behaviour from being convex to concave or vice versa.
Activity: Sketching a Function Using
Differentiation Properties
• Sketch the function 𝑓 𝑥 = 2𝑥 3 − 6𝑥
A very brief introduction to integration
• Integration is the opposite to derivative.
• One can simply define integration using the following rule
න 𝑓′(𝑥) 𝑑𝑥 = 𝑓(𝑥) + 𝐶

Where 𝐶 is any constant.


• We can set upper and lower limits to integration as follows
𝑏
න 𝑓 ′ 𝑥 𝑑𝑥 = 𝑓 𝑥 𝑏
𝑎 = 𝑓 𝑏 − 𝑓(𝑎)
𝑎
Where the interval 𝑎, 𝑏 is in the domain of 𝑓 ′ 𝑥 .
A very brief introduction to integration
• Integration of common function.
න 𝑎 𝑑𝑥 = 𝑎𝑥 + 𝐶

𝑥 𝑛+1
න 𝑥 𝑛 𝑑𝑥 = +𝐶
𝑛+1
𝑥 𝑛+1
න 𝑎𝑥 𝑛 𝑑𝑥 = 𝑎 +𝐶
𝑛+1
Multi-variate Real Function
Multi-variate Functions
• In this section we will discuss functions of the form
𝑓 𝑥1 , ⋯ , 𝑥𝑛 : ℝ𝑛 ⟶ ℝ.
• These functions are very common in many applications.
• Example of such functions 𝑓 𝑥, 𝑦 = 𝑥 2 + 2𝑥𝑦.
Partial Derivatives of a Multi-variate Function
• For multi-variate functions, we can find partial derivative that is
related to a particular variable:
𝜕𝑓
• Example 𝑓 𝑥, 𝑦 = 𝑥2 + 2𝑥𝑦 find
𝜕𝑥
Gradient of a Mult-variate Functions
• Given a function𝑓 𝑥1 , ⋯ , 𝑥𝑛 : ℝ𝑛 ⟶ ℝ, the gradient of this
functions is fined as follows
𝜕𝑓
𝜕𝑥1
∇𝑓 = ⋮ ⊆ ℝ𝑛
𝜕𝑓
𝜕𝑥𝑛
• This vector is known as the gradient vector or simply the gradient.
• Example find the gradient of the function 𝑓 𝑥, 𝑦 = 𝑥 2 + 2𝑥𝑦
Example
• 𝑓 𝑥1 , 𝑥2 , 𝑥3 = 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 + 𝑏, where 𝑏 ∈ ℝ.
• The gradient is
𝜕𝑓
𝜕𝑥1 𝑎1
𝜕𝑓
∇𝑓 = 𝜕𝑥2
= 𝑎2
𝜕𝑓
𝑎3
𝜕𝑥3
Examples
Unconstrained Convex Optimization Problem:
Definition
• The problem
minimize 𝑓(𝑥)
where 𝑓 is convex and the domain of 𝑓 𝐝𝐨𝐦𝑓 ⊆ ℝ𝑛 .
• We refer to this type of problem as unconstrained optimization
problem.
• Note: The domain of a convex functions is a convex set.
• The set ℝ𝑛 is always a convex set.
• This is the standard form for unconstrained convex optimization.
• We refer to the function 𝑓 as the objective or loss function
• The optimality condition for such type of problems is when the
gradient vector vanishes 𝜵𝒇 𝒙 = 𝟎.
Solve the Following Optimization Problem
minimize 𝑓 𝑥 = 𝑥 2 + 6𝑥 + 3
Another example
Minimize 𝑓 𝑥 = 4𝑥 2 − 6𝑥 − 4
Unconstrained Convex Optimization Problem
• More straightforward to solve.
• In many cases, the problem may be solved analytically.
• But in practice we deal with high dimensional problem that are not
possible to solve analytically.
• Hence, iterative gradient descent methods offer a very effective
treatment for such type of problems.
Gradient Descent:
Gradient Descent:
History
• The gradient descent is
attributed to the French
mathematician Cauchy in his
1847 work “Méthode générale
pour la résolution des
systèmes d'équations
simultanées”.
• It is one of the most practical
methods to date for solving
unconstrained optimization
problems.
Gradient Descent: The Idea
• Consider a differentiable single-variable function 𝑓: ℝ ⟶ ℝ.
• The first derivative 𝑓′ 𝑥𝑘 at point 𝑥𝑘 gives the slope at this point.
• We want always to move towards the local minimal point.
𝑓(𝑥)

𝑥𝑘 𝑓′ 𝑥𝑘 > 0
𝑓′ 𝑥𝑘 < 0 𝑥
𝑘 Positive slope, move backward
Negative slope, move forward 𝑥𝑘+1 < 𝑥𝑘
𝑥𝑘+1 > 𝑥𝑘

𝑥
Gradient Descent: The Equation
• For single value functions 𝑓: ℝ ⟶ ℝ, 𝑥𝑘+1 = 𝑥𝑘 − 𝜂𝑓′ 𝑥𝑘 .
• For multi-variate functions 𝑓: ℝ𝑛 ⟶ ℝ, 𝑥𝑘+1 = 𝑥𝑘 − 𝜂𝛻𝑓 𝑥𝑘
• 𝜂 is referred
𝑓(𝑥)
to as the learning rate or step size.

𝑥𝑘 𝑓′ 𝑥𝑘 > 0
𝑓′ 𝑥𝑘 < 0 𝑥
𝑘 Positive slope, move backward
Negative slope, move forward 𝑥𝑘+1 < 𝑥𝑘
𝑥𝑘+1 > 𝑥𝑘

𝑥
Gradient Descent: The Algortihm
• Choose an initial starting point 𝑥 0 in the domain
• Repeat
• Update 𝑥𝑘+1 = 𝑥𝑘 − 𝜂𝑘 𝛻𝑓 𝑥𝑘
• 𝑘 =𝑘+1
• Until convergence 𝛻𝑓 𝑥𝑘 ≈ 0
Example
• minimize 𝑓 𝑥 = 5𝑥 2 + 3𝑥 + 2
• 𝑥0 = 4 and 𝜂 = 0.1
• Find 𝑥1
Example in ℝ 2

• minimize 𝑓 𝑥1 , 𝑥2 = 𝑥12 + 𝑥22 + 4𝑥1 + 3


3
• 𝑥0 = and 𝜂 = 0.1
5
• Find 𝑥1 and 𝑥2
Choice for Step Size of Gradient Descent 𝜂𝑘
• Fixed step size.
• High step size will result in a phenomena known as a zigzagging behavior
and may prevent the algorithm to converge. Small step size will
significantly slow down convergence.
• Using scheduling strategies.
• Other techniques include using the line search algorithm.
Advantages of Gradient Descent
• Simple and very practical.
• Requires minimal computation per update.
• Widely used for optimizing AI and deep learning models.
Limitations of Gradient Descent
• Convergence is significantly affected by the choice of the learning
rate.
• May converge very slowly for some types of functions that are not
strongly convex.

You might also like