Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Basic Solution Methods For Unconstrained Problems
Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Basic Solution Methods For Unconstrained Problems
Unconstrained
optimization methods
𝜕𝑓
ቤ = 0 = 𝑔1 (𝒙) nonlinear system of
𝜕𝑥1 𝒙
equations
𝜕𝑓
ቤ = 0 = 𝑔2 (𝒙) 𝒈 𝒙 =𝟎
𝜵𝑓 𝒙 = 𝟎 ⇔ 𝜕𝑥2 𝒙 ⇔
⋮
𝜕𝑓
ቤ = 0 = 𝑔𝑛 (𝒙)
𝜕𝑥1 𝒙
• The optimal solution is found by solving the system of equations analytically or numerically
(e.g., by Newton’s method).
• Differentiation and solution of the system of equations is challenging for complex problems!
Unconstrained
optimization methods
∃𝑘ത ≥ 0: 𝑓 𝒙 𝑘+1
<𝑓 𝒙 𝑘
∀𝑘 > 𝑘ത and lim 𝒙(𝑘) = 𝒙∗ ∈ 𝑅𝑛
𝑘→∞
𝒙(2)
𝒙(1) 𝒙(3)
𝒙(4)
𝒙(0) 𝒙(5)
𝒙∗
= 𝑅𝑛
∃𝑘ത ≥ 0: 𝑓 𝒙 𝑘+1
<𝑓 𝒙 𝑘
∀𝑘 > 𝑘ത and lim 𝒙(𝑘) = 𝒙∗ ∈ 𝑅𝑛
𝑘→∞
Rate of convergence:
• Linear: if there exists a constant 𝐶 ∈ (0,1), such that for sufficiently large 𝑘 :
𝑘+1
𝒙 − 𝒙∗ ≤ 𝐶 𝒙 𝑘
− 𝒙∗
𝑘+1 𝑝
𝒙 − 𝒙∗ ≤ 𝑀 𝒙 𝑘
− 𝒙∗
Unconstrained
optimization methods
Unconstrained
optimization methods
Unconstrained
optimization methods
Open issues:
𝑘
• Determination of the descent direction 𝒑 ?
• Calculation of the step length 𝛼𝑘 ?
𝜙 𝛼 = 𝑓(𝒙 𝑘 𝑘
+ 𝛼𝒑 )
Remarks
1. Naively speaking it would be ideal to globally minimize 𝜙(𝛼). Generally, it is very expensive to find this
solution. It is not necessarily a good idea since the search is one-dimensional
2. One could also search for some local solution. But this is often also too expensive (need function and/or
gradient evaluations at a number of points).
𝑘+1
3. Practical strategies (so-called non-exact LS): find 𝛼 such that 𝑓(𝒙 ) becomes as small as possible
with minimal effort.
4 of 10 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Practical Line-Search Strategies
6
20
4
4 7 1.51
2.81
10
x2
2 1 1.5 2.81
0.1
0 20
-2
-3 -2 -1 0 x1 1 2 3 4
40 1
𝑘 𝑘
30
𝜙 𝛼 = 𝑓(𝒙 + 𝛼𝒑 )
20
10
-10
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Geometrical interpretation:
𝜙 𝛼
𝜙 0 𝑐1 =0
𝑐1 = 0.1
α
𝑐1 = 0.25
𝑇 𝑐1 = 1 𝑐1 = 0.5
𝜙 ′ 0 = 𝜵𝑓 𝒙 𝑘
𝒑 𝑘
<0
𝜙𝑙 𝛼 = 𝜙 0 + 𝛼𝑐1 𝜙′(0)
feasible domain for 𝑐1 = 0.25
6 of 10 Applied Numerical Optimization [1] Nocedal J., Wright S. J., Numerical Optimization, 2nd Edition, Springer, 2006
Prof. Alexander Mitsos, Ph.D.
Simple Line-Search Algorithm
Remarks:
1. The choice of a step length, which fulfills the Armijo condition guarantees the descent of 𝑓:
′ 𝑘 𝑇 𝑘 𝑘
𝜙 0 = 𝜵𝑓 𝒙 𝒑 < 0 (𝒑 is a descent direction)
𝜙′(0)𝛼02
𝛼1 = −
2[𝜙 𝛼0 − 𝜙 0 − 𝜙 ′ 0 𝛼0 ]
𝜙 𝛼
𝜙 0 Relevance:
𝜙′(0) α
feasible values
9 of 10 Applied Numerical Optimization [1] Nocedal J., Wright S. J., Numerical Optimization, 2nd Edition, Springer, 2006
Prof. Alexander Mitsos, Ph.D.
Check Yourself
Unconstrained
optimization methods
Line-search
approach
Steepest-descent …
direction special cases
Conjugate Newton’s-step
directions direction Extra work:
𝑘 prove that it
Many gradient methods use a symmetric positive definite matrix 𝑫 and calculate
guarantees
𝑘+1 𝑘 𝑘 𝑘
𝒙 =𝒙 − 𝛼𝑘 𝑫 𝜵𝑓(𝒙 ) descent!
3 of 11 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Determination of a Descent Direction: A Toolbox
Line-search approaches differ from each other with respect to the determination of descent direction and step
length.
Line-search
approach
Steepest-descent …
direction special cases
Conjugate Newton’s-step
directions direction
𝑘
Many gradient methods use a symmetric positive definite matrix 𝑫 and calculate
𝑘+1 𝑘 𝑘 𝑘
𝒙 =𝒙 − 𝛼𝑘 𝑫 𝜵𝑓(𝒙 )
𝑘 𝑇 𝑘 𝑘
min 𝜵𝑓 𝒙 𝒑 s. t. 𝒑 =1
𝒑 𝑘 ∈𝑅𝑛
𝑘 )𝑇 𝒑 𝑘 𝑘 𝑘
Note that 𝜵𝑓(𝒙 = 𝜵𝑓(𝒙 ) 𝒑 cos(𝜃)
⇒𝒑 𝑘 𝑘 𝑘
= −𝜵𝑓(𝒙 )/ 𝜵𝑓(𝒙 )
𝑘
The choice of 𝑫 is the identity matrix 𝑰.
𝑘 𝑇 𝑘
descent direction: 𝜵𝑓 𝒙 𝒑 <0 𝑘 𝑘
𝑓 𝒙 >𝐶 𝑓 𝒙 =𝐶
𝑓 𝒙 𝑘
<𝐶
𝑘 𝑇 1
𝜵𝑓 𝒙 𝒑 >0
𝑘
𝜵𝑓 𝒙 𝒑 1
𝑘
𝜵𝑓(𝒙 )𝑇 𝒑 𝑘
= 𝜵𝑓(𝒙 𝑘
) 𝒑 𝑘
cos(𝜃)
𝑘 𝑘 𝑇 2
𝒙 𝜵𝑓 𝒙 𝒑 <0
2
𝒑
−𝜵𝑓 𝒙 𝑘
Algorithm:
choose 𝒙 0
for k=0,1,…
𝜵𝑓(𝒙 𝑘
if ) ≤ 𝜀 stop, else
𝒙∗
𝑘 𝑘
set 𝒑 = −𝜵𝑓(𝒙 )
determine the step length 𝛼𝑘 (e.g. 𝒙 𝑘+1
using the Armijo rule)
set 𝒙 𝑘+1 =𝒙 𝑘 + 𝛼𝑘 𝒑 𝑘
𝑘
𝒙
end for “well scaled” “poorly scaled”
Directions become perpendicular
Line-search
approach
Steepest-descent …
direction special cases
Conjugate Newton’s-step
directions direction
𝑘
Many gradient methods use a symmetric positive definite matrix 𝑫 and calculate
𝑘+1 𝑘 𝑘 𝑘
𝒙 =𝒙 − 𝛼𝑘 𝑫 𝜵𝑓(𝒙 )
𝑇 1 𝑇
𝑘+1 𝑘 𝑘 𝑘+1 𝑘 𝑘+1 𝑘
𝑚 𝒙 =𝑓 𝒙 + 𝛁𝑓 𝒙 𝒙 −𝒙 + 𝒙 −𝒙 𝜵2 𝑓(𝒙 𝑘
)(𝒙 𝑘+1
−𝒙 𝑘
)
2
linear approximation of
𝑓 at 𝑥𝑘
(1st nec. opt. cond. for 𝑚) 𝑓 = const
𝑘+1 𝑘
0 = 𝜵𝑚 𝒙 = 𝜵𝑓 𝒙 + 𝜵2 𝑓 𝒙 𝑘 𝒙 𝑘+1
−𝒙 𝑘
𝑥* 𝒙 𝑘
𝑝𝑘𝑁𝑒𝑤𝑡𝑜𝑛
𝑘+1 𝑘 −1
⇒𝒙 =𝒙 − 𝜵2 𝑓 𝒙 𝑘
𝜵𝑓(𝒙 𝑘
) 𝒙 𝑘+1
𝑆𝑡𝑒𝑒𝑝 𝐷𝑒𝑐.
𝑝𝑘
Algorithm: Remarks:
choose 𝒙 0 1. line-search?
𝑘+1 𝑘
for k=0,1,… 𝒙 =𝒙 + 𝛼𝑘 𝒑 𝑘
𝑘 −1
if 𝜵𝑓(𝒙 𝑘
) ≤ 𝜀 stop, else 𝒑 = − 𝜵2 𝑓 𝒙 𝑘
𝜵𝑓 𝒙 𝑘
𝛼𝑘 = 1
𝑘 −1
set 𝒑 = − 𝜵2 𝑓 𝒙 𝑘
𝜵𝑓 𝒙 𝑘
Unconstrained
optimization methods
Let 𝐹 denote a class of problems, e.g., Lipschitz-continuous functions with Lipschitz-constant 𝐿, i.e.,
|𝑓 𝒙 − 𝑓 𝒚 | < 𝐿 𝒙 − 𝒚 , 𝐿 is assumed to be fixed for all 𝑃 ∈ 𝐹.
“Performance of a method 𝑀 on a problem 𝑃 ∈ 𝐹 is the total amount of computational effort that is required by 𝑀
to solve 𝑃.” *
“To solve the problem means to find an approximate solution to 𝑃 with an accuracy ε > 0.” *
For unconstrained problems, the accuracy ε > 0 can be defined as the norm of the objective’s gradient.
* Yurii Nesterov, Introductory Lectures on Convex Optimization – A Basic Course, Kluwer Academic Publishers, (2004)
Analytical Complexity: The smallest number of queries to an oracle to solve Problem 𝑃 to accuracy ε. [1]
Arithmetical Complexity: The smallest number of arithmetic operations (including work of the oracle and
work of method), required to solve problem 𝑃 up to accuracy ε. [1]
4 of 11 Applied Numerical Optimization [1] Yurii Nesterov, Introductory Lectures on Convex Optimization – A Basic Course,
Prof. Alexander Mitsos, Ph.D. Kluwer Academic Publishers, 2004.
Analytical Complexity of Steepest Descent Method
Algorithm:
0
choose 𝒙
for k=0,1,…
𝑘
if 𝜵𝑓(𝒙 ) ≤ 𝜀 stop, else
set 𝒑 𝑘 = −𝜵𝑓 𝒙 𝑘
determine the step length 𝛼𝑘 (e.g. using the Armijo rule)
𝑘+1 𝑘 𝑘
set 𝒙 =𝒙 + 𝛼𝑘 𝒑
end for
1
• Worst-case analytical complexity (queries to oracle): 𝑂
𝜀2
𝑥2
𝑓 𝒙
[1]
𝑥2
𝑥1
𝑥1
[1] https://commons.wikimedia.org/w/index.php?curid=9941741
6 of 11 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Illustration of Convergence (1)
𝑓 𝒙
[1]
𝑥2
𝑥1
[1] https://commons.wikimedia.org/w/index.php?curid=9941741
7 of 11 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Illustration of Convergence (2)
𝑓 𝒙
[1]
𝑥2
𝑥1
[1] https://commons.wikimedia.org/w/index.php?curid=9941741
8 of 11 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Analytical Complexity of Newton’s Method
• Problem class: 𝑓 is twice continuously differentiable and 𝜵2 𝑓 𝒙 is Lipschitz-continuous with fixed Lipschitz
constant 𝐿, i.e., 𝜵2 𝑓 𝒙 − 𝜵2 𝑓 𝒚 < 𝐿 𝒙 − 𝒚
𝑘+1 𝑘+1
1 𝑘+1 𝑘 3
𝑚𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑑 𝒙 =𝑚 𝒙 + 𝜎𝑘 𝒙 −𝒙
3
1
• Worst-case analytical complexity: 𝑂
𝜀3/2
𝑓 𝒙
[1]
𝑥2
𝑥1
[1] https://commons.wikimedia.org/w/index.php?curid=9941741
10 of 11 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Check Yourself
Unconstrained
optimization methods
Line-search
approach
Steepest-descent …
direction special cases
Conjugate Newton’s-step
directions direction
variants
Idea:
𝑘
• The linear equation system, 𝑯 𝒑 𝑘 = −𝒈 𝑘
, is solved approximately by an iterative method, e.g., by CG
𝑘
(conjugate gradients) if 𝑯 is positive definite.
Comments:
• LU- or Cholesky-decomposition – very high computational effort!
• Large errors occur for ill-conditioned problems.
• The exact solution is not needed.
for k=0,1,…
Newton’s method if 𝜵𝑓(𝒙 𝑘
) ≤ 𝜀 stop, else
CG method calculate 𝒈 𝑘
≔ 𝜵𝑓(𝒙 𝑘
) and 𝑯 𝑘
≔ 𝜵2 𝑓 𝒙 𝑘
to determine 𝒑 𝑘
approximately solve 𝑯 𝑘
𝒑 𝑘
= −𝒈 𝑘
for 𝒑 𝑘
with CG method
𝑘+1 𝑘 𝑘
set 𝒙 =𝒙 + 𝛼𝑘 𝒑
end for
Motivation: What if
𝑘
•𝑯 𝑘
is singular or almost singular (poorly conditioned)? 𝑯 ≔ 𝜵2 𝑓 𝒙 𝑘
𝑘
•𝑯 is not positive definite?
Idea Approximations
𝑩 𝑘 =𝑯 𝑘 +𝑬 𝑘
𝑘 𝑘 𝑘
replace 𝑯 by the approximation 𝑩 ≈𝑯 with 𝑬 𝑘 = 𝜏𝑘 𝑰, 𝜏𝑘 ≥ 0 smartly chosen
converges to steepest descent for 𝜏𝑘 → ∞
𝑩 𝑘 𝒑 𝑘 = −𝒈 𝑘
𝑘+1 𝑘
𝒙 =𝒙 + 𝛼𝑘 𝒑 𝑘 , (k from the line-search)
nd
6 of 10 Applied Numerical Optimization [1] Nocedal J., Wright S. J., Numerical Optimization, 2 Edition, Springer, 2006, Chapter 3
Prof. Alexander Mitsos, Ph.D.
Quasi-Newton Methods (1)
𝑘
Idea: Reduce complexity by simplified calculation of 𝑯 (Davidon):
𝑘
𝑘 𝑘 𝑯 ≔ 𝜵2 𝑓 𝒙 𝑘
• replace 𝑯 by an approximation 𝑩 .
𝑘
• instead of calculating 𝑩 , we look for a simple update using information 𝑘 𝑘
𝒈 ≔ 𝜵𝑓(𝒙 )
from the last iterations.
𝑓 (𝑘) ≔ 𝑓(𝒙 𝑘
)
Approach:
𝑘 𝑘 𝑇 1
• Consider quadratic approximation of f at 𝒙 , 𝑚(𝑘) 𝒑 = 𝑓 (𝑘) + 𝒈 𝒑 + 2 𝒑𝑇 𝑩 𝑘
𝒑.
−1
• First order optimality condition: 𝒑 𝑘 = −𝑩 𝑘 𝒈 𝑘
symmetric positive definite
• By convexity necessary and sufficient for minimization of 𝑚(𝑘) 𝒑 .
𝑘+1 𝑘
• Construct the quadratic approximation at 𝒙 =𝒙 + 𝛼𝑘 𝒑 𝑘 ,
𝑇 1 𝑇 𝑘+1
𝑚(𝑘+1) 𝒑 = 𝑓 (𝑘+1) + 𝒈 𝑘+1
𝒑+ 𝒑 𝑩 𝒑
2
• What conditions must 𝑩 𝑘+1 satisfy?
7 of 10 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Quasi-Newton Methods (2)
Conditions on 𝑩 𝑘+1
:
1. Gradient of 𝑚(𝑘+1) at 𝒙 𝑘
and 𝒙 𝑘+1
must be equal to gradient of 𝑓.
𝜵𝑚(𝑘+1) 𝒑 = 𝒈 𝑘+1
+𝑩 𝑘+1
𝒑
𝑘+1 At 𝒙 = 𝒙 𝑘 𝑘
At 𝒙 = 𝒙 , 𝒑=0 , 𝒑 = −𝛼𝑘 𝒑
We want 𝜵𝑚(𝑘+1) 0 = 𝒈 𝑘+1
We want 𝜵𝑚(𝑘+1) −𝛼𝑘 𝒑 𝑘
=𝒈 𝑘
𝑘+1 𝑘+1
⇒𝒈 − 𝛼𝑘 𝑩 𝒑𝑘 =𝒈 𝑘
𝑘+1 𝑘 𝑘+1 𝑘
Automatically satisfied ⇒𝑩 𝛼𝑘 𝒑 =𝒈 −𝒈
=𝒙 𝑘+1 −𝒙 𝑘
𝑘+1 𝑘 𝑇 𝑘+1 𝑘 𝑘 𝑘 𝑇 𝑘
2. Since, 𝑩 is symmetric positive definite: 𝒔 𝑩 𝒔 > 0, ∀𝒔 ≠0⇒𝒔 𝒚 >0
Wolfe conditions (line-search) guarantee these constraints for all 𝑓, even when 𝑓 is non-convex.
8 of 10 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Quasi-Newton Methods (3)
𝑘+1
Conditions on 𝑩 :
𝑘+1
𝑩 𝒔𝑘 =𝒚 𝑘
gives many solutions for 𝑩 𝑘+1
𝑘+1
1 𝑘 𝑘 𝑇 𝑘
1 𝑘 𝑘 𝑇
1 𝑘 𝑘 𝑇
⇒𝑩 = 𝑰− 𝒚 𝒔 𝑩 𝑰− 𝒔 𝒚 + 𝒚 𝒚 → DFP formula
𝒚 𝑘 𝑇𝒔 𝑘 𝒚 𝑘 𝑇𝒔 𝑘 𝒚 𝑘 𝑇𝒔 𝑘
𝑘+1 −1
1 𝑘 𝑘 𝑇 𝑘 −1
1 𝑘 𝑘 𝑇
1 𝑘 𝑇
⇒𝑩 = 𝑰− 𝒔 𝒚 𝑩 𝑰− 𝒚 𝒔 + 𝒔 𝒔𝑘 → BFGS formula
𝒚 𝑘 𝑇𝒔 𝑘 𝒚 𝑘 𝑇𝒔 𝑘 𝒚 𝑘 𝑇𝒔 𝑘
Parameter estimation
Determination of a Descent Direction: A Toolbox
Line-search approaches differ from each other with respect to the determination of descent direction and step
length.
Line-search
approach
Steepest-descent …
direction special cases
Conjugate Newton’s-step
directions direction
variants
⇒ 𝜵𝑓 𝒙 = 𝑱(𝒙)𝑇 𝜺(𝒙)
𝑚
• The Hessian can be approximated by the first term in case of almost linear problems (i.e., 𝜵2 𝜀𝑗 𝒙 = 0) or
good starting values (i.e., small 𝜀𝑗 (𝒙))
𝑘 𝑇 𝑘 𝑘 𝑇
• With Hessian approximation: 𝑱 𝑱 𝒑 𝑘 = −𝑱 𝜺 𝑘
𝑘 𝑘
• If 𝑱 has full-rank, 𝒑 is always a descent direction
2
𝒑 𝑘 𝑇 𝑘 𝑘 𝑇 𝑘 𝑇 𝑘 𝑘 𝑇 𝑘 𝑇 𝑘 𝑘 𝑘 𝑘
∙ 𝜵𝑓 𝒙 =𝒑 ∙𝑱 𝜺 = −𝒑 ∙𝑱 ∙𝑱 ∙𝒑 =− 𝑱 ∙𝒑 2
< 0,
𝑘 𝑘 𝑘 𝑇 𝑘
The inequality is strict unless 𝑱 ∙𝒑 =0⇔𝑱 𝜺 = 𝜵𝑓𝑘 = 0. Optimum
• In descent-direction 𝒑 𝑘
, the step-length is determined as per the Wolfe-conditions
• It is a local method
Unconstrained
optimization methods
Idea:
𝑘
• Approximate 𝑓 at 𝒙 by the quadratic model function 𝑚(𝑘) :
𝑘 𝑇 1
𝑚(𝑘) 𝒑 = 𝑓 (𝑘) + 𝒈 𝒑 + 2 𝒑𝑇 𝑩 𝑘
𝒑
• Solve the minimization problem: min 𝑚(𝑘) (𝒑) s. t. 𝒑 ≤ ∆(𝑘) and set 𝒑 𝑘
to the solution found
𝒑
• Set 𝒙 𝑘+1 =𝒙 𝑘 +𝒑 𝑘
• Compare the agreement between the model function 𝑚𝑘 and the objective function 𝑓 at the previous
iterations. Define contraction rate 𝜌𝑘 as:
Basic Algorithm:
1
choose ∆(𝑚𝑎𝑥) > 0,∆(0) (0,∆(𝑚𝑎𝑥) ) and [0, 4 )
for 𝑘 = 0,1, …
calculate direction 𝒑 𝑘
, contraction rate 𝜌𝑘
1 ||𝒑 𝑘 ||
if 𝜌𝑘 < , ∆(𝑘+1) =
4 4
3 𝑘
else if 𝜌𝑘 > and ||𝒑 || = ∆(𝑘) , ∆(𝑘+1) =min(2∆(𝑘) , ∆(𝑚𝑎𝑥) )
4
else 𝒙 𝑘+1 =𝒙 𝑘
Remarks:
• Strategies for the efficient solution of the minimization problem for 𝒑(𝑘) :
𝑘
- The Cauchy point: minimum along the steepest descent direction (−𝒈 ), slow
𝑘
- The Dogleg method: applicable when 𝑩 is positive definite, fast (superlinear)
𝑘
unconstrained minimum of 𝑚𝑘 , 𝒑 𝐵
≤∆ (𝑘)
. 𝒙
𝒑𝐵
(𝑘) 𝑘 𝒑𝑈 𝒑 𝑘
(𝜏 ∗ )
• For a small ∆ : search the solution along the direction −𝒈
𝒈 𝑘 𝑇𝒈 𝑘
• For an intermediate ∆(𝑘) : additionally calculate 𝒑𝑈 = − 𝑘 𝑇 𝑘 𝑘
𝒈 𝑘
𝑘
𝒈 𝑩 𝒈 −𝒈
Where 𝒑𝑈 is the unconstrained minimum of 𝑚(𝑘) in the steepest descent direction.
𝜏𝒑𝑈 0≤𝜏≤1
𝒑𝑈 and 𝒑𝐵 : 𝒑 𝑘 𝜏 =൝ 𝑈
𝒑 + (𝜏 − 1)(𝒑𝐵 − 𝒑𝑈 ) 1 ≤ 𝜏 ≤ 2
𝑘
with 𝒑 (𝜏 ∗ ) = ∆(𝑘) .
7 of 10 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Illustration of Convergence – Rosenbrock Function
𝑓 𝒙
x2
[1]
𝑥2
𝑥1
x1
[1] https://commons.wikimedia.org/w/index.php?curid=9941741
8 of 10 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Illustration of Convergence – BFGS (Quasi-Newton Method)
𝑓 𝒙
[1]
𝑥2
𝑥1
[1] https://commons.wikimedia.org/w/index.php?curid=9941741
9 of 10 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Check Yourself