01 Nonlinear Optimization
01 Nonlinear Optimization
1 Basic definitions
3 Optimality conditions
Minimize (or maximize) an objective function F (w) depending on decision variables w subject
to equality and/or inequality constrains
Minimize (or maximize) an objective function F (w) depending on decision variables w subject
to equality and/or inequality constrains
An optimization problem
Terminology
▶ w ∈ Rn - decision variable
minn F (w) (1a)
w∈R ▶ F : Rn → R - objective
s.t. G(w) = 0 (1b) ▶ G : Rn → RnG - equality constraints
H(w) ≥ 0 (1c) ▶ H : Rn → RnH - inequality constraints
Definition
The feasible set of the optimization problem (1) is defined as
Ω = {w ∈ Rn | G(w) = 0, H(w) ≥ 0}. A point w ∈ Ω is is called a feasible point.
In the example, the feasible set is the intersection of the two grey areas (halfspace and circle).
1. Theory and algorithms for nonlinear programming A. Nurkanović 3/43
Basic definitions: local and global minimizer
70
F (w)
Local minimum
60
Definition (Local minimizer) Global minimum
Neighborhood of w$
50
A point w∗ ∈ Ω is called a local minimizer of the
optimization problem (1) if there exists an open ball 40
F (w)
30
it holds that F (w) ≥ F (w∗ ).
20
A set Ω is said to be convex if for any w1 , w2 and any θ ∈ [0, 1] it holds θw1 + (1 − θ)w2 ∈ Ω
F (w)
▶ F is concave if and only if −F is convex 3F (w1 ) + (1 ! 3)F (w2 )
F (3w1 + (1 ! 3)w2 )
is a convex set
w
”...in fact, the great watershed in optimization isn’t between linearity and nonlinearity, but
convexity and nonconvexity.” R. T. Rockafellar, SIAM Review, 1993
1 Basic definitions
3 Optimality conditions
”... the main fact, which should be known to any person dealing with optimization models, is
that in general, optimization problems are unsolvable.”
Yurii Nesterov, Lectures on Convex Optimization, 2018.
(“solvable” refers to finding a global minimizer)
Linear program
min g ⊤ w
w∈Rn
s.t. Aw − b = 0
Cw − d ≥ 0
Linear program
min g ⊤ w
w∈Rn
s.t. Aw − b = 0
Cw − d ≥ 0
Quadratic program
1 ⊤
min w Qw + g ⊤ w
w∈Rn 2
s.t. Aw − b = 0
Cw − d ≥ 0
min F (w)
w∈Rn
s.t. G(w) = 0
H(w) ≥ 0
MPCC
min F (w)
w∈Rn
s.t. G(w) = 0
H(w) ≥ 0
0 ≤ w1 ⊥ w2 ≥ 0
min F (w)
w0 ∈Rp ,w1 ∈Zq
s.t. G(w) = 0
H(w) ≥ 0
w = [w0⊤ , w1⊤ ]⊤ , n = p + q
s.t. x0 = x̄0
xk+1 = f (xk , uk )
0 ≥ h(xk , uk ), k = 0, . . . , N −1
0 ≥ r(xN )
s.t. x0 = x̄0
xk+1 = f (xk , uk )
0 ≥ h(xk , uk ), k = 0, . . . , N −1
0 ≥ r(xN )
1 Basic definitions
3 Optimality conditions
0
30
50
-10 20
40
-20 10
F (w)
30
F (w)
0
F (w)
-30
20 -10
-40
-20
10
-50 -30
0 5 5
5
5 5
0 5 0
0 0
0
0
w2 -5 -5 w1 w2 -5 -5 w2 -5 -5 w1
w1
40
F (w)
20
0
-2 0 2 4
rF (w)
point (or to exclude points) -50
▶ Sufficient conditions: verify optimality
-100
of a candidate point -2 0 2 4
100
r2 F (w)
50
0
-2 0 2 4
w
100
F (w)
50
0
-2 0 2 4
▶ Necessary conditions: find a candidate
100
rF (w)
point (or to exclude points) 50
▶ Sufficient conditions: verify optimality 0
of a candidate point -50
-2 0 2 4
▶ A minimizer must satisfy SONC, but
100
r2 F (w)
does not have to satisfy SOSC
50
0
-2 0 2 4
w
min F (w)
w∈Rn
s.t. G(w) = 0
min F (w)
w∈Rn
s.t. G(w) = 0
H(w) ≥ 0
3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5
7i
1
0.5
-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)
3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5
7i
Cases: 1
▶ Hi (w∗ ) > 0 then µ∗i = 0, and Hi (w) is
0.5
inactive
0
-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)
3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5
7i
Cases: 1
▶ Hi (w∗ ) > 0 then µ∗i = 0, and Hi (w) is
0.5
inactive
▶ µ∗i > 0 and Hi (w) = 0 then Hi (w) is 0
strictly active
-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)
3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5
7i
Cases: 1
▶ Hi (w∗ ) > 0 then µ∗i = 0, and Hi (w) is
0.5
inactive
▶ µ∗i > 0 and Hi (w) = 0 then Hi (w) is 0
strictly active
-0.5
▶ µ∗i = 0 and Hi (w) = 0 then then Hi (w) is 0 0.5 1 1.5 2 2.5 3
Hi (w)
weakly active
minn F (w)
w∈R 2
7rH(w)
s.t. H(w) ≥ 0 1
!rF (w)
w2
0
-1
-2
-3
7 =0.857
-4
-4 -2 0 2 4
w1
minn F (w)
w∈R 2
7rH(w)
s.t. H(w) ≥ 0 1
▶ −∇F is the gravity !rF (w)
w2
0
-1
-2
-3
7 =0.704
-4
-4 -2 0 2 4
w1
minn F (w)
w∈R 2
w2
0
-1
-2
-3
7 =0.552
-4
-4 -2 0 2 4
w1
minn F (w)
w∈R 2
s.t. H(w) ≥ 0 1
7rH(w)
▶ −∇F is the gravity
!rF (w)
w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0
means the fence can only ”push” the ball -1
-2
-3
7 =0.404
-4
-4 -2 0 2 4
w1
minn F (w)
w∈R 2
w2
0 !rF (w)
▶ µ∇H is the force of the fence. Sign µ ≥ 0
means the fence can only ”push” the ball -1
minn F (w)
w∈R 2
s.t. H(w) ≥ 0 1
7rH(w)
▶ −∇F is the gravity
w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0 !rF (w)
means the fence can only ”push” the ball -1
minn F (w)
w∈R 2
s.t. H(w) ≥ 0 1
▶ −∇F is the gravity
w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0
!rF (w) = 0
means the fence can only ”push” the ball -1
minn F (w)
w∈R 2
s.t. H(w) ≥ 0 1
▶ −∇F is the gravity
w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0
!rF (w) = 0
means the fence can only ”push” the ball -1
1 Basic definitions
3 Optimality conditions
Iteration 0
6
y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
Linearization of F at linearization point w̄
4
equals
3
First-order Taylor series at w̄
F (w)
2
equals
1
∂F
FL (w; w̄) := F (w̄) + (w̄) (w − w̄) 0
∂w
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
w
Iteration 0
6
y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
Linearization of F at linearization point w̄
4
equals
3
First-order Taylor series at w̄
F (w)
2
equals
1
⊤
FL (w; w̄) := F (w̄) + ∇w F (w̄) (w − w̄) 0
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
w
F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2
0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )
F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2
0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )
F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2
0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )
F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2
0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )
In direct methods, we have to solve the discretized optimal control problem, which is a
Nonlinear Program (NLP)
Lagrange function
∇w L(w∗ , λ∗ ) = 0
G(w∗ ) = 0
Conditions
∇w F (wk ) +∇2w L(wk , λk )∆w −∇w G(wk )λ+ = 0
G(wk ) +∇w G(wk )⊤ ∆w = 0
are the KKT optimality conditions of a quadratic program (QP), namely:
Quadratic program
1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0,
The full step Newton’s Method iterates by solving in each iteration the QP
Quadratic program in Sequential Quadratic Programming (SQP)
1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0,
This obtains as solution the step ∆wk and the new multiplier λ+ k
QP = λ + ∆λ
k
New iterate
wk+1 = wk + ∆wk
λk+1 = λk + ∆λk = λ+
QP
This is the “full step, exact Hessian SQP method for equality constrained optimization”.
1. Theory and algorithms for nonlinear programming A. Nurkanović 29/43
NLP with inequality constraints
∇w L (w∗ , µ∗ , λ∗ ) = 0
G (w∗ ) = 0
H(w∗ ) ≥ 0
µ∗ ≥ 0
H(w∗ )⊤ µ∗ = 0
By Linearizing all functions within the KKT Conditions, and setting λ+ = λk + ∆λ and
µ+ = µk + ∆µ, we obtain the KKT conditions of a Quadratic Program (QP)
QP with inequality constraints
1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0
H(wk ) + ∇H(wk )⊤ ∆w ≥ 0
By Linearizing all functions within the KKT Conditions, and setting λ+ = λk + ∆λ and
µ+ = µk + ∆µ, we obtain the KKT conditions of a Quadratic Program (QP)
QP with inequality constraints
1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0
H(wk ) + ∇H(wk )⊤ ∆w ≥ 0
▶ QP solution: ∆wk , λ+
QP , µ+
QP
▶ full step: wk+1 = wk + ∆wk , λk+1 = λ+
QP , µ
k+1
= µ+QP
▶ nonsmooth complementarity conditions resolved at QP level
3
minn F (w) 0 5 7i ? Hi (w) 6 0
w∈R
2.5
s.t. H(w) ≥ 0
2
1.5
KKT conditions
7i
1
0.5
∇F (w) − ∇H(w)µ = 0
0 ≤ µ ⊥ H(w) ≥ 0 0
-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)
▶ Main difficulty: nonsmoothness of
complementarity conditions
▶ 4th lecture (Tuesday) will show why Newton’s
method does not work for nonsmooth problems
1. Theory and algorithms for nonlinear programming A. Nurkanović 33/43
Barrier problem in interior-point method
3
@(Hi (w))
NLP with inequalites 2.5 = =5.000 != log(Hi (w))
minn F (w)
@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1
0.5
3
@(Hi (w))
NLP with inequalites 2.5 = =1.000 != log(Hi (w))
minn F (w)
@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1
0.5
3
@(Hi (w))
NLP with inequalites 2.5 = =0.200 != log(Hi (w))
minn F (w)
@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1
0.5
3
@(Hi (w))
NLP with inequalites 2.5 = =0.040 != log(Hi (w))
minn F (w)
@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1
0.5
3
@(Hi (w))
NLP with inequalites 2.5 = =0.008 != log(Hi (w))
minn F (w)
@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1
0.5
3
@(Hi (w))
NLP with inequalites 2.5 = =0.002 != log(Hi (w))
minn F (w)
@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1
0.5
3
Example NLP F (w)
2.5 = =5.000 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
3
Example NLP F (w)
2.5 = =1.500 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
3
Example NLP F (w)
2.5 = =0.450 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
3
Example NLP F (w)
2.5 = =0.135 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
3
Example NLP F (w)
2.5 = =0.040 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
3
Example NLP F (w)
2.5 = =0.012 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
3
Example NLP F (w)
2.5 = =0.004 F= (w)
2
min 0.5w2 − 2w
w∈R2 1.5
Objective
s.t. −1≤w ≤1 1
0.5
0
Barrier problem
-0.5
-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w
Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =1.000
Hi (w)
7i
i−1
1
τ
Introduce variable µi = Hi (w) 0.5
Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =0.100
Hi (w)
7i
i−1
1
τ
Introduce variable µi = Hi (w) 0.5
Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =0.010
Hi (w)
7i
i−1
1
τ
Introduce variable µi = Hi (w) 0.5
Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =0.001
Hi (w)
7i
i−1
1
τ
Introduce variable µi = Hi (w) 0.5
wk+1 = wk + α∆w
Smoothed KKT conditions
sk+1 = sk + α∆s
∇w L(w, λ, µ)
G(w) λk+1 = λk + α∆λ
Rτ (w, s, λ, µ) =
H(w) − s = 0
µk+1 = µk + α∆µ
diag(s)µ − τ e
(s, µ > 0) such that sk+1 > 0, µk+1 > 0
▶ Optimization problem come in many variants (LP, QP, NLP, MPCC, MINLP, OCP, ....)
▶ Each problem class be addressed with suitable software.
▶ Lagrangian function, duality, and KKT conditions are important concepts
▶ For convex problems KKT conditions sufficient for global optimality.
▶ Newton-type optimization for NLP solves the nonsmooth KKT conditions via Sequential
Quadratic Programming (SQP) or via the Interior-Point Method.
▶ NLP solvers need to evaluate
first and second order derivatives (e.g. via CasADi).
Nonlinear optimization:
▶ Nocedal, Jorge, and Stephen J. Wright, eds. Numerical optimization. New York, NY:
Springer New York, 2006.
▶ Biegler, Lorenz T. Nonlinear programming: concepts, algorithms, and applications to
chemical processes. Society for Industrial and Applied Mathematics, 2010.
Convex optimization:
▶ Boyd, Stephen, and Lieven Vandenberghe. Convex Optimization. Cambridge University
Press, 2004. online: https://web.stanford.edu/~boyd/cvxbook/
▶ Rockafellar, R. T., Fundamentals of optimization. Lecture Notes 2007. online:
https://sites.math.washington.edu/~rtr/fundamentals.pdf
Optimization software:
▶ https://plato.asu.edu/guide.html
▶ https://www.syscop.de/research/software
▶ Moritz Diehl, Sébastien Gros. ”Numerical optimal control (Draft),” Lecture notes, 2024.
online: https://www.syscop.de/files/2024ws/NOC/book-NOCSE.pdf
▶ Karmarkar, Narendra. ”A new polynomial-time algorithm for linear programming.” In
Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.
302-311. 1984.
▶ Dantzig, George B. ”Origins of the simplex method.” In A history of scientific computing,
pp. 141-151. 1990.