0% found this document useful (0 votes)
15 views90 pages

01 Nonlinear Optimization

The document outlines a lecture on nonlinear programming, covering basic definitions, classifications of optimization problems, optimality conditions, and algorithms. It emphasizes the importance of optimization in quantitative sciences and discusses various types of optimization problems, including linear programming, quadratic programming, and nonlinear programming. The lecture is part of a Winter School on Numerical Methods for Optimal Control of Nonsmooth Systems, held in February 2025 in Paris.

Uploaded by

gminhhoang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views90 pages

01 Nonlinear Optimization

The document outlines a lecture on nonlinear programming, covering basic definitions, classifications of optimization problems, optimality conditions, and algorithms. It emphasizes the importance of optimization in quantitative sciences and discusses various types of optimization problems, including linear programming, quadratic programming, and nonlinear programming. The lecture is part of a Winter School on Numerical Methods for Optimal Control of Nonsmooth Systems, held in February 2025 in Paris.

Uploaded by

gminhhoang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

1.

Theory and algorithms for nonlinear


programming
Armin Nurkanović

Systems Control and Optimization Laboratory, University of Freiburg, Germany


(slides create jointly with Moritz Diehl)

Winter School on Numerical Methods for Optimal Control of Nonsmooth Systems


École des Mines de Paris
February 3-5, 2025, Paris, France
Outline of the lecture

1 Basic definitions

2 Some classifications of optimization problems

3 Optimality conditions

4 Nonlinear programming algorithms

1. Theory and algorithms for nonlinear programming A. Nurkanović 1/43


What is an optimization problem?
Optimization is a powerful tool used in all quantitative sciences.

Minimize (or maximize) an objective function F (w) depending on decision variables w subject
to equality and/or inequality constrains

1. Theory and algorithms for nonlinear programming A. Nurkanović 2/43


What is an optimization problem?
Optimization is a powerful tool used in all quantitative sciences.

Minimize (or maximize) an objective function F (w) depending on decision variables w subject
to equality and/or inequality constrains

An optimization problem
Terminology
▶ w ∈ Rn - decision variable
minn F (w) (1a)
w∈R ▶ F : Rn → R - objective
s.t. G(w) = 0 (1b) ▶ G : Rn → RnG - equality constraints
H(w) ≥ 0 (1c) ▶ H : Rn → RnH - inequality constraints

▶ If F, G, H are nonlinear and smooth, we speak of a nonlinear programming problem (NLP).


▶ Only in few special cases a closed form solution exists.
▶ Use an iterative algorithm to find an approximate solution.
▶ Problem may be parametric, and some (or all) functions depend on a fixed parameter
p ∈ Rp , e.g. model predictive control.

1. Theory and algorithms for nonlinear programming A. Nurkanović 2/43


Basic definitions: the feasible set

Definition
The feasible set of the optimization problem (1) is defined as
Ω = {w ∈ Rn | G(w) = 0, H(w) ≥ 0}. A point w ∈ Ω is is called a feasible point.

In the example, the feasible set is the intersection of the two grey areas (halfspace and circle).
1. Theory and algorithms for nonlinear programming A. Nurkanović 3/43
Basic definitions: local and global minimizer

70
F (w)
Local minimum
60
Definition (Local minimizer) Global minimum
Neighborhood of w$
50
A point w∗ ∈ Ω is called a local minimizer of the
optimization problem (1) if there exists an open ball 40

Bϵ (w∗ ) with ϵ > 0, such that for all w ∈ Bϵ (w∗ ) ∩ Ω

F (w)
30
it holds that F (w) ≥ F (w∗ ).
20

Definition (Global minimizer) 10

A point w∗ ∈ Ω is called a global minimizer of (1) 0

if for all w ∈ Ω it holds that F (w) ≥ F (w∗ ). -10


-3 -2 -1 0 1 2 3 4 5
w
▶ The value F (w∗ ) at a local/global minimizer
w∗ is called local/global minimum. 1 4
F (w) = w − 2w3 − 3w2 + 12w + 10
2

1. Theory and algorithms for nonlinear programming A. Nurkanović 4/43


Convex sets
A key concept in optimization is convexity.

A set Ω is said to be convex if for any w1 , w2 and any θ ∈ [0, 1] it holds θw1 + (1 − θ)w2 ∈ Ω

Figure inspired by Figure 2.2 in S. Boyd and L. Vandenberghe. Convex optimization.


Cambridge university press, 2004.

1. Theory and algorithms for nonlinear programming A. Nurkanović 5/43


Convex functions

▶ A function F is convex if for every


w1 , w2 ∈ Rn and θ ∈ [0, 1] it holds that
(w2 ; F (w2 ))

F (θw1 +(1−θ)w2 ) ≤ θF (w1 )+(1−θ)F (w2 )

F (w)
▶ F is concave if and only if −F is convex 3F (w1 ) + (1 ! 3)F (w2 )

▶ F is convex if and only if the epigraph

epiF = {(w, t) ∈ Rnw +1 | F (w) ≤ t} (w1 ; F (w1 ))

F (3w1 + (1 ! 3)w2 )
is a convex set
w

1. Theory and algorithms for nonlinear programming A. Nurkanović 6/43


Convex optimization problems

A convex optimization problem

An optimization problem is convex if the


minn F (w) objective function F is convex and the
w∈R
s.t. G(w) = 0 feasible set Ω is convex.
H(w) ≥ 0

▶ For convex problems, every locally optimal solution is globally optimal.


▶ First-order optimality conditions are necessary and sufficient.
▶ Many iterative algorithms for nonconvex optimization solve a sequence of convex
optimization problems.

1. Theory and algorithms for nonlinear programming A. Nurkanović 7/43


Convex optimization problems

A convex optimization problem

An optimization problem is convex if the


minn F (w) objective function F is convex and the
w∈R
s.t. G(w) = 0 feasible set Ω is convex.
H(w) ≥ 0

▶ For convex problems, every locally optimal solution is globally optimal.


▶ First-order optimality conditions are necessary and sufficient.
▶ Many iterative algorithms for nonconvex optimization solve a sequence of convex
optimization problems.

”...in fact, the great watershed in optimization isn’t between linearity and nonlinearity, but
convexity and nonconvexity.” R. T. Rockafellar, SIAM Review, 1993

1. Theory and algorithms for nonlinear programming A. Nurkanović 7/43


Outline of the lecture

1 Basic definitions

2 Some classifications of optimization problems

3 Optimality conditions

4 Nonlinear programming algorithms

1. Theory and algorithms for nonlinear programming A. Nurkanović 7/43


Some classifications of optimization problems

Optimization problems can be:


▶ unconstrained (Ω = Rn ) or constrained (Ω ⊂ Rn )
▶ convex or nonconvex
▶ linear or nonlinear
▶ differentiable or nonsmooth
▶ continuous or (mixed-)integer
▶ finite or infinite dimensional

1. Theory and algorithms for nonlinear programming A. Nurkanović 8/43


Some classifications of optimization problems

Optimization problems can be:


▶ unconstrained (Ω = Rn ) or constrained (Ω ⊂ Rn )
▶ convex or nonconvex
▶ linear or nonlinear
▶ differentiable or nonsmooth
▶ continuous or (mixed-)integer
▶ finite or infinite dimensional

”... the main fact, which should be known to any person dealing with optimization models, is
that in general, optimization problems are unsolvable.”
Yurii Nesterov, Lectures on Convex Optimization, 2018.
(“solvable” refers to finding a global minimizer)

1. Theory and algorithms for nonlinear programming A. Nurkanović 8/43


Class 1: Linear Programming (LP)

Linear program

min g ⊤ w
w∈Rn
s.t. Aw − b = 0
Cw − d ≥ 0

▶ convex optimization problem


▶ 1947: simplex method by Dantzig, 1984: polynomial time interior-point method by
Karmarkar
▶ a solution is always at a vertex of the feasible set (possibly a whole facet if nonunique)
▶ very mature and reliable

1. Theory and algorithms for nonlinear programming A. Nurkanović 9/43


Class 1: Linear Programming (LP)

Linear program

min g ⊤ w
w∈Rn
s.t. Aw − b = 0
Cw − d ≥ 0

▶ convex optimization problem


▶ 1947: simplex method by Dantzig, 1984: polynomial time interior-point method by
Karmarkar
▶ a solution is always at a vertex of the feasible set (possibly a whole facet if nonunique)
▶ very mature and reliable

1. Theory and algorithms for nonlinear programming A. Nurkanović 9/43


Class 2: Quadratic Programming (QP)

Quadratic program

1 ⊤
min w Qw + g ⊤ w
w∈Rn 2
s.t. Aw − b = 0
Cw − d ≥ 0

▶ depending on Q, can be convex and nonconvex


▶ many good solvers: OSQP, HPIPM, qpOASES, Gurobi, Clarabel, DAQP, OOQP, MOSEK, ...
▶ solved online in linear model predictive control
▶ subsproblems in nonlinear optimization

1. Theory and algorithms for nonlinear programming A. Nurkanović 10/43


Class 3: Nonlinear Program (NLP)

Nonlinear programming problem

min F (w)
w∈Rn
s.t. G(w) = 0
H(w) ≥ 0

▶ F, G, H smooth functions, can be convex and nonconvex


▶ solved with iterative Newton-type algorithms
▶ solved in nonlinear model predictive control

1. Theory and algorithms for nonlinear programming A. Nurkanović 11/43


Class 4: Mathematical programs with Complementarity Constraints
(MPCC)

MPCC

min F (w)
w∈Rn
s.t. G(w) = 0
H(w) ≥ 0
0 ≤ w1 ⊥ w2 ≥ 0

w = [w0⊤ , w1⊤ , w2⊤ ]⊤ ∈ Rn


▶ more difficult than standard nonlinear programming
▶ feasible set is inherently nonsmooth and nonconvex
▶ powerful modeling concept
▶ requires specialized theory and algorithms (Lectures 5 and 6 on Wednesday)

1. Theory and algorithms for nonlinear programming A. Nurkanović 12/43


Class 5: Mixed-integer programming

Mixed-integer nonlinear program (MINLP)

min F (w)
w0 ∈Rp ,w1 ∈Zq

s.t. G(w) = 0
H(w) ≥ 0

w = [w0⊤ , w1⊤ ]⊤ , n = p + q

▶ inherently nonconvex feasible set


▶ due to combinatorial nature, NP-hard even for linear F, G, H
▶ branch and bound, branch and cut algorithms based on iterative solution of relaxed
continuous problems

1. Theory and algorithms for nonlinear programming A. Nurkanović 13/43


Class 6: Continuous time optimal control problems (OCP)

▶ decision variables x(·), u(·) in infinite


Continuous-time Optimal Control Problem
dimensional function space
RT ▶ infinitely many constraints for t ∈ [0, T ]
min 0
Lc (x(t), u(t)) dt + E(x(T ))
x(·),u(·) ▶ smooth ordinary differential equations
s.t. x(0) = x̄0 (ODE)
ẋ(t) = fc (x(t), u(t))
ẋ(t) = fc (x(t), u(t))
0 ≥ h(x(t), u(t)), t ∈ [0, T ]
0 ≥ r(x(T )) ▶ more generally, dynamic model can be
based on
▶ differential algebraic equations (DAE)
▶ partial differential equations (PDE)
▶ stochastic ODE
▶ nonsmooth ODE - (treated on
Tuesday)
▶ OCP can be convex or nonconvex
▶ all or some components of u(t) may take
integer values (mixed-integer OCP)
1. Theory and algorithms for nonlinear programming A. Nurkanović 14/43
Direct optimal control methods solve Nonlinear Programs (NLP)
Treated in detail in the 2nd lecture.

Continuous time OCP


RT
min 0
Lc (x(t), u(t)) dt + E(x(T ))
x(·),u(·)

s.t. x(0) = x̄0


ẋ(t) = fc (x(t), u(t))
0 ≥ h(x(t), u(t)), t ∈ [0, T ]
0 ≥ r(x(T ))

Direct methods like direct collocation,


multiple shooting first discretize, then
optimize.

1. Theory and algorithms for nonlinear programming A. Nurkanović 15/43


Direct optimal control methods solve Nonlinear Programs (NLP)
Treated in detail in the 2nd lecture.

Continuous time OCP Discrete time OCP (an NLP)


RT PN −1
min 0
Lc (x(t), u(t)) dt + E(x(T )) min ℓ(xk , uk ) + E(xN )
x(·),u(·) x,u k=0

s.t. x(0) = x̄0 s.t. x0 = x̄0


ẋ(t) = fc (x(t), u(t)) xk+1 = f (xk , uk )
0 ≥ h(x(t), u(t)), t ∈ [0, T ] 0 ≥ h(xk , uk ), k = 0, . . . , N −1
0 ≥ r(x(T )) 0 ≥ r(xN )

Direct methods like direct collocation, Variables x = (x0 , . . . , xN ) and


multiple shooting first discretize, then u = (u0 , . . . , uN −1 ) can be summarized in
optimize. vector w = (x, u) ∈ Rn .

1. Theory and algorithms for nonlinear programming A. Nurkanović 15/43


Direct optimal control methods solve Nonlinear Programs (NLP)
Treated in detail in the 2nd lecture.

Discrete time OCP (an NLP)


PN −1
min k=0 ℓ(xk , uk ) + E(xN )
x,u

s.t. x0 = x̄0
xk+1 = f (xk , uk )
0 ≥ h(xk , uk ), k = 0, . . . , N −1
0 ≥ r(xN )

Variables x = (x0 , . . . , xN ) and


u = (u0 , . . . , uN −1 ) can be summarized in
vector w = (x, u) ∈ Rn .

1. Theory and algorithms for nonlinear programming A. Nurkanović 15/43


Nonlinear MPC solves Nonlinear Programs (NLP)

Discrete time NMPC Problem (an NLP)


PN −1
min k=0 ℓ(xk , uk ) + E(xN )
x,u

s.t. x0 = x̄0
xk+1 = f (xk , uk )
0 ≥ h(xk , uk ), k = 0, . . . , N −1
0 ≥ r(xN )

Variables x = (x0 , . . . , xN ) and


u = (u0 , . . . , uN −1 ) can be summarized in
vector w = (x, u) ∈ Rn .

1. Theory and algorithms for nonlinear programming A. Nurkanović 16/43


Nonlinear MPC solves Nonlinear Programs (NLP)

Discrete time NMPC Problem (an NLP) Nonlinear Program (NLP)


PN −1 min F (w)
min k=0 ℓ(xk , uk ) + E(xN ) w∈Rn
x,u

s.t. x0 = x̄0 s.t. G(w) = 0


xk+1 = f (xk , uk ) H(w) ≥ 0
0 ≥ h(xk , uk ), k = 0, . . . , N −1
0 ≥ r(xN )

Variables x = (x0 , . . . , xN ) and


u = (u0 , . . . , uN −1 ) can be summarized in
vector w = (x, u) ∈ Rn .

1. Theory and algorithms for nonlinear programming A. Nurkanović 16/43


Outline of the lecture

1 Basic definitions

2 Some classifications of optimization problems

3 Optimality conditions

4 Nonlinear programming algorithms

1. Theory and algorithms for nonlinear programming A. Nurkanović 16/43


Algebraic characterization of unconstrained local optima

Consider the unconstrained problem: minw∈Rn F (w)


First-Order Necessary Condition of Optimality (FONC) (in convex case also sufficient)

w∗ local optimizer ⇒ ∇F (w∗ ) = 0, w∗ stationary point

Second-Order Necessary Condition of Optimality (SONC)

w∗ local minimizer ⇒ ∇2 F (w∗ ) ⪰ 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 17/43


Algebraic characterization of unconstrained local optima

Consider the unconstrained problem: minw∈Rn F (w)


First-Order Necessary Condition of Optimality (FONC) (in convex case also sufficient)

w∗ local optimizer ⇒ ∇F (w∗ ) = 0, w∗ stationary point

Second-Order Necessary Condition of Optimality (SONC)

w∗ local minimizer ⇒ ∇2 F (w∗ ) ⪰ 0

Second-Order Sufficient Conditions of Optimality (SOSC)

∇F (w∗ ) = 0 and ∇2 F (w∗ ) ≻ 0 ⇒ x∗ strict local minimizer

∇F (w∗ ) = 0 and ∇2 F (w∗ ) ≺ 0 ⇒ x∗ strict local maximizer

no conclusion can be drawn in the case ∇2 F (w∗ ) is indefinite.


1. Theory and algorithms for nonlinear programming A. Nurkanović 17/43
Type of stationary points

Minimum Maximum Saddle point

0
30
50
-10 20
40
-20 10

F (w)
30

F (w)
0
F (w)

-30
20 -10
-40
-20
10
-50 -30
0 5 5
5
5 5
0 5 0
0 0
0
0
w2 -5 -5 w1 w2 -5 -5 w2 -5 -5 w1
w1

a stationary point w∗ with ∇F (w∗ ) = 0 can be a minimizer, a maximizer, or a saddle point

1. Theory and algorithms for nonlinear programming A. Nurkanović 18/43


Optimality conditions - unconstrained

40

F (w)
20
0
-2 0 2 4

▶ Necessary conditions: find a candidate 0

rF (w)
point (or to exclude points) -50
▶ Sufficient conditions: verify optimality
-100
of a candidate point -2 0 2 4
100

r2 F (w)
50
0
-2 0 2 4
w

1. Theory and algorithms for nonlinear programming A. Nurkanović 19/43


Optimality conditions - unconstrained

100

F (w)
50
0
-2 0 2 4
▶ Necessary conditions: find a candidate
100

rF (w)
point (or to exclude points) 50
▶ Sufficient conditions: verify optimality 0
of a candidate point -50
-2 0 2 4
▶ A minimizer must satisfy SONC, but
100

r2 F (w)
does not have to satisfy SOSC
50

0
-2 0 2 4
w

1. Theory and algorithms for nonlinear programming A. Nurkanović 19/43


First-order necessary conditions for equality constrained optimization

Nonlinear Program (NLP)

min F (w)
w∈Rn
s.t. G(w) = 0

Lagrangian function: L(w, λ) = F (w) − λ⊤ G(w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 20/43


First-order necessary conditions for equality constrained optimization

Nonlinear Program (NLP)


Definition (LICQ)
minn F (w) A point w satisfies Linear Independence
w∈R
s.t. G(w) = 0 Constraint Qualification (LICQ) if
∂G(w) ⊤
∇G (w) := ∂w is full column rank.
Lagrangian function: L(w, λ) = F (w) − λ⊤ G(w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 20/43


First-order necessary conditions for equality constrained optimization

Nonlinear Program (NLP)


Definition (LICQ)
minn F (w) A point w satisfies Linear Independence
w∈R
s.t. G(w) = 0 Constraint Qualification (LICQ) if
∂G(w) ⊤
∇G (w) := ∂w is full column rank.
Lagrangian function: L(w, λ) = F (w) − λ⊤ G(w)

First-order necessary conditions (in convex case also sufficient)


Let F, G in C 1 . If w∗ is a (local) minimizer, and w∗ satisfies LICQ, then there is a unique
vector λ such that:

∇w L(w∗ , λ∗ ) = ∇F (w∗ ) − ∇G(w∗ )λ = 0 dual feasibility


∗ ∗ ∗
∇λ L(w , λ ) = G(w ) = 0 primal feasibility

1. Theory and algorithms for nonlinear programming A. Nurkanović 20/43


The Karush-Kuhn-Tucker (KKT) conditions

Nonlinear Program (NLP)

min F (w)
w∈Rn
s.t. G(w) = 0
H(w) ≥ 0

L(w, λ, µ) = F (w) − λ⊤ G(w) − µ⊤ H(w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 21/43


The Karush-Kuhn-Tucker (KKT) conditions

Nonlinear Program (NLP) Definition (LICQ)

min F (w) A point w satisfies LICQ if


w∈Rn
s.t. G(w) = 0 [∇G (w) , ∇HA (w)]
H(w) ≥ 0 is full column rank.

L(w, λ, µ) = F (w) − λ⊤ G(w) − µ⊤ H(w) Active set A = {i | Hi (w) = 0}

1. Theory and algorithms for nonlinear programming A. Nurkanović 21/43


The Karush-Kuhn-Tucker (KKT) conditions

Nonlinear Program (NLP) Definition (LICQ)

min F (w) A point w satisfies LICQ if


w∈Rn
s.t. G(w) = 0 [∇G (w) , ∇HA (w)]
H(w) ≥ 0 is full column rank.

L(w, λ, µ) = F (w) − λ⊤ G(w) − µ⊤ H(w) Active set A = {i | Hi (w) = 0}

Theorem (KKT conditions - FONC for constrained optimization)


Let F, G, H be C 1 . If w∗ is a (local) minimizer and satisfies LICQ, then there are unique
vectors λ∗ and µ∗ such that (w∗ , λ∗ , µ∗ ) satisfies:

∇w L (w∗ , µ∗ , λ∗ ) = 0, µ∗ ≥ 0, dual feasibility


∗ ∗
G (w ) = 0, H (w ) ≥ 0 primal feasibility
µ∗i Hi (w∗ ) = 0, ∀i complementary slackness

1. Theory and algorithms for nonlinear programming A. Nurkanović 21/43


The complementary slackness condition

3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5

▶ same as min(µ∗ , H(w∗ )) = 0 2

▶ zero level set of min is an L-shaped set


1.5

7i
1

0.5

-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 22/43


The complementary slackness condition

3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5

▶ same as min(µ∗ , H(w∗ )) = 0 2

▶ zero level set of min is an L-shaped set


1.5

7i
Cases: 1
▶ Hi (w∗ ) > 0 then µ∗i = 0, and Hi (w) is
0.5
inactive
0

-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 22/43


The complementary slackness condition

3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5

▶ same as min(µ∗ , H(w∗ )) = 0 2

▶ zero level set of min is an L-shaped set


1.5

7i
Cases: 1
▶ Hi (w∗ ) > 0 then µ∗i = 0, and Hi (w) is
0.5
inactive
▶ µ∗i > 0 and Hi (w) = 0 then Hi (w) is 0

strictly active
-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 22/43


The complementary slackness condition

3
▶ complementarity conditions
0 ≤ µ∗ ⊥ H(w∗ ) ≥ 0 2.5

▶ same as min(µ∗ , H(w∗ )) = 0 2

▶ zero level set of min is an L-shaped set


1.5

7i
Cases: 1
▶ Hi (w∗ ) > 0 then µ∗i = 0, and Hi (w) is
0.5
inactive
▶ µ∗i > 0 and Hi (w) = 0 then Hi (w) is 0

strictly active
-0.5
▶ µ∗i = 0 and Hi (w) = 0 then then Hi (w) is 0 0.5 1 1.5 2 2.5 3
Hi (w)
weakly active

1. Theory and algorithms for nonlinear programming A. Nurkanović 22/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2
7rH(w)
s.t. H(w) ≥ 0 1
!rF (w)

w2
0

-1

-2

-3
7 =0.857
-4
-4 -2 0 2 4
w1

Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control


with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2
7rH(w)
s.t. H(w) ≥ 0 1
▶ −∇F is the gravity !rF (w)

w2
0

-1

-2

-3
7 =0.704
-4
-4 -2 0 2 4
w1

Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control


with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2

s.t. H(w) ≥ 0 7rH(w)


1
▶ −∇F is the gravity !rF (w)

w2
0

-1

-2

-3
7 =0.552
-4
-4 -2 0 2 4
w1

Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control


with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2

s.t. H(w) ≥ 0 1
7rH(w)
▶ −∇F is the gravity
!rF (w)

w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0
means the fence can only ”push” the ball -1

-2

-3
7 =0.404
-4
-4 -2 0 2 4
w1

Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control


with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2

s.t. H(w) ≥ 0 1 7rH(w)


▶ −∇F is the gravity

w2
0 !rF (w)
▶ µ∇H is the force of the fence. Sign µ ≥ 0
means the fence can only ”push” the ball -1

▶ ∇H gives the direction of the force and µ -2

adjusts the magnitude.


-3
7 =0.258
-4
-4 -2 0 2 4
w1

Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control


with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2

s.t. H(w) ≥ 0 1
7rH(w)
▶ −∇F is the gravity

w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0 !rF (w)
means the fence can only ”push” the ball -1

▶ ∇H gives the direction of the force and µ -2

adjusts the magnitude.


-3
▶ active constraint: H (w) = 0, µ > 0 7 =0.118
-4
-4 -2 0 2 4
w1

Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control


with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2

s.t. H(w) ≥ 0 1
▶ −∇F is the gravity

w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0
!rF (w) = 0
means the fence can only ”push” the ball -1

▶ ∇H gives the direction of the force and µ -2

adjusts the magnitude.


-3
▶ active constraint: H (w) = 0, µ > 0 7 =0.000
-4
▶ weakly active constraint: -4 -2 0 2 4

H (w) = 0, µ = 0 the ball touches the w1


Balance of the forces:
fence but no force is needed
∇L(w, µ) = ∇F (w) − ∇H(w)µ = 0
Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control
with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Some intuitions on the KKT conditions
Ball rolling down a valley blocked by a fence - test problem with two variables and one inequality constraint

minn F (w)
w∈R 2

s.t. H(w) ≥ 0 1
▶ −∇F is the gravity

w2
0
▶ µ∇H is the force of the fence. Sign µ ≥ 0
!rF (w) = 0
means the fence can only ”push” the ball -1

▶ ∇H gives the direction of the force and µ -2

adjusts the magnitude.


-3
▶ active constraint: H (w) = 0, µ > 0 7 =0.000
-4
▶ weakly active constraint: -4 -2 0 2 4

H (w) = 0, µ = 0 the ball touches the w1


Balance of the forces:
fence but no force is needed
▶ inactive constraint: H (w) > 0, µ = 0 ∇L(w, µ) = ∇F (w) − ∇H(w)µ = 0
Animation inspired by Lecture 2 of the Winter School on Numerical Optimal Control
with Differential Algebraic Equations by S. Gros and M. Diehl, Freiburg, 2016

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Outline of the lecture

1 Basic definitions

2 Some classifications of optimization problems

3 Optimality conditions

4 Nonlinear programming algorithms

1. Theory and algorithms for nonlinear programming A. Nurkanović 23/43


Newton’s method
To solve a nonlinear system, solve a sequence of linear systems

Iteration 0
6
y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
Linearization of F at linearization point w̄
4
equals
3
First-order Taylor series at w̄

F (w)
2
equals
1

∂F
FL (w; w̄) := F (w̄) + (w̄) (w − w̄) 0
∂w
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
w

1. Theory and algorithms for nonlinear programming A. Nurkanović 24/43


Newton’s method
To solve a nonlinear system, solve a sequence of linear systems

Iteration 0
6
y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
Linearization of F at linearization point w̄
4
equals
3
First-order Taylor series at w̄

F (w)
2
equals
1


FL (w; w̄) := F (w̄) + ∇w F (w̄) (w − w̄) 0

-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
w

1. Theory and algorithms for nonlinear programming A. Nurkanović 24/43


Newton’s method
To solve a nonlinear system, solve a sequence of linear systems

Linearization of F at linearization point w̄


Iteration 0
6
equals y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
First-order Taylor series at w̄
4
equals
3

F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2

Newton’s methods, solve sequence of: 1

0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )

1. Theory and algorithms for nonlinear programming A. Nurkanović 24/43


Newton’s method
To solve a nonlinear system, solve a sequence of linear systems

Linearization of F at linearization point w̄


Iteration 1
6
equals y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
First-order Taylor series at w̄
4
equals
3

F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2

Newton’s methods, solve sequence of: 1

0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )

1. Theory and algorithms for nonlinear programming A. Nurkanović 24/43


Newton’s method
To solve a nonlinear system, solve a sequence of linear systems

Linearization of F at linearization point w̄


Iteration 2
6
equals y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
First-order Taylor series at w̄
4
equals
3

F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2

Newton’s methods, solve sequence of: 1

0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )

1. Theory and algorithms for nonlinear programming A. Nurkanović 24/43


Newton’s method
To solve a nonlinear system, solve a sequence of linear systems

Linearization of F at linearization point w̄


Iteration 3
6
equals y = F (w)
5 y = F (wk ) + rF (wk )(w ! wk )
First-order Taylor series at w̄
4
equals
3

F (w)
FL (w; w̄) := F (w̄) + ∇w F (w̄)⊤ (w − w̄) 2

Newton’s methods, solve sequence of: 1

0
F (wk ) + ∇F (wk )⊤ ∆w = 0,
-1
-1 -0.5 0 0.5 1 1.5 2 2.5 3
update wk+1 = wk + ∆w. w
(for continuously differentiable F : Rn → Rn )

1. Theory and algorithms for nonlinear programming A. Nurkanović 24/43


General Nonlinear Program (NLP)

In direct methods, we have to solve the discretized optimal control problem, which is a
Nonlinear Program (NLP)

General Nonlinear Program (NLP)



G(w) = 0
minn F (w) s.t.
w∈R H(w) ≥ 0

We first treat the case without inequality constraints


NLP only with equality constraints

min F (w) s.t. G(w) = 0


w∈Rn

1. Theory and algorithms for nonlinear programming A. Nurkanović 25/43


Lagrange function and optimality conditions

Lagrange function

L(w, λ) = F (w) − λ⊤ G(w)

For an optimal solution w∗ there exist multipliers λ∗ such that


Nonlinear root-finding problem

∇w L(w∗ , λ∗ ) = 0
G(w∗ ) = 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 26/43


Newton’s method on optimality conditions

Use Newton’s method to solve:


∇w L(w∗ , λ∗ ) = 0
G(w∗ ) = 0 ?

Given an iterate (wk , λk ), the linearization reads as:

∇w L(wk , λk ) +∇2w L(wk , λk )∆w −∇w G(wk )∆λ = 0


G(wk ) +∇w G(wk )⊤ ∆w = 0

Due to ∇L(wk , λk ) = ∇F (wk ) − ∇G(wk )λk , this is equivalent to:

∇w F (wk ) +∇2w L(wk , λk )∆w −∇w G(wk )λ+ = 0


G(wk ) +∇w G(wk )⊤ ∆w = 0

with the shorthand λ+ = λk + ∆λ

1. Theory and algorithms for nonlinear programming A. Nurkanović 27/43


Newton Step = Quadratic Program

Conditions
∇w F (wk ) +∇2w L(wk , λk )∆w −∇w G(wk )λ+ = 0
G(wk ) +∇w G(wk )⊤ ∆w = 0
are the KKT optimality conditions of a quadratic program (QP), namely:
Quadratic program

1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0,

with Ak = ∇2w L(wk , λk )

1. Theory and algorithms for nonlinear programming A. Nurkanović 28/43


Newton’s method

The full step Newton’s Method iterates by solving in each iteration the QP
Quadratic program in Sequential Quadratic Programming (SQP)

1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0,

with Ak = ∇2w L(wk , λk )

This obtains as solution the step ∆wk and the new multiplier λ+ k
QP = λ + ∆λ
k

New iterate

wk+1 = wk + ∆wk
λk+1 = λk + ∆λk = λ+
QP

This is the “full step, exact Hessian SQP method for equality constrained optimization”.
1. Theory and algorithms for nonlinear programming A. Nurkanović 29/43
NLP with inequality constraints

Regard again NLP with both, equality and inequality constraints:


NLP with equality and inequality constraints

G(w) = 0
minn F (w) s.t.
w∈R H(w) ≥ 0

Lagrangian function for NLP with equality and inequality constraints

L(w, λ, µ) = F (w) − λ⊤ G(w) − µ⊤ H(w)

1. Theory and algorithms for nonlinear programming A. Nurkanović 30/43


Optimality conditions with inequalities

Theorem (Karush-Kuhn-Tucker (KKT) conditions)


Let F, G, H be C 2 . If w∗ is a (local) minimizer and satisfies LICQ, then there are unique
vectors λ∗ and µ∗ such that (w∗ , λ∗ , µ∗ ) satisfies:

∇w L (w∗ , µ∗ , λ∗ ) = 0
G (w∗ ) = 0
H(w∗ ) ≥ 0
µ∗ ≥ 0
H(w∗ )⊤ µ∗ = 0

▶ Last tree complementarity conditions make the KKT conditions nonsmooth


▶ This system cannot be solved by plain Newton’s method. But we can use SQP...

1. Theory and algorithms for nonlinear programming A. Nurkanović 31/43


Sequential Quadratic Programming (SQP)

By Linearizing all functions within the KKT Conditions, and setting λ+ = λk + ∆λ and
µ+ = µk + ∆µ, we obtain the KKT conditions of a Quadratic Program (QP)
QP with inequality constraints

1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0
H(wk ) + ∇H(wk )⊤ ∆w ≥ 0

with Ak = ∇2w L(wk , λk , µk )

1. Theory and algorithms for nonlinear programming A. Nurkanović 32/43


Sequential Quadratic Programming (SQP)

By Linearizing all functions within the KKT Conditions, and setting λ+ = λk + ∆λ and
µ+ = µk + ∆µ, we obtain the KKT conditions of a Quadratic Program (QP)
QP with inequality constraints

1
min ∇F (wk )⊤ ∆w + ∆w⊤ Ak ∆w
∆w∈Rn 2
k k ⊤
s.t. G(w ) + ∇G(w ) ∆w = 0
H(wk ) + ∇H(wk )⊤ ∆w ≥ 0

with Ak = ∇2w L(wk , λk , µk )

▶ QP solution: ∆wk , λ+
QP , µ+
QP
▶ full step: wk+1 = wk + ∆wk , λk+1 = λ+
QP , µ
k+1
= µ+QP
▶ nonsmooth complementarity conditions resolved at QP level

1. Theory and algorithms for nonlinear programming A. Nurkanović 32/43


Interior-point methods
(without equality constraint for lighter notation)

NLP with inequalites

3
minn F (w) 0 5 7i ? Hi (w) 6 0
w∈R
2.5
s.t. H(w) ≥ 0
2

1.5

KKT conditions

7i
1

0.5
∇F (w) − ∇H(w)µ = 0
0 ≤ µ ⊥ H(w) ≥ 0 0

-0.5
0 0.5 1 1.5 2 2.5 3
Hi (w)
▶ Main difficulty: nonsmoothness of
complementarity conditions
▶ 4th lecture (Tuesday) will show why Newton’s
method does not work for nonsmooth problems
1. Theory and algorithms for nonlinear programming A. Nurkanović 33/43
Barrier problem in interior-point method

3
@(Hi (w))
NLP with inequalites 2.5 = =5.000 != log(Hi (w))

minn F (w)

@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1

0.5

Idea: put inequality constraint into objective 0

Barrier problem -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
m
X τ log(Hi (w)) approximates:
min F (w) − τ log(Hi (w)) =: Fτ (w) (
w∈Rn
i=1 0 if Hi (w) ≥ 0
χ(Hi (w)) =
∞ if Hi (w) < 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 34/43


Barrier problem in interior-point method

3
@(Hi (w))
NLP with inequalites 2.5 = =1.000 != log(Hi (w))

minn F (w)

@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1

0.5

Idea: put inequality constraint into objective 0

Barrier problem -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
m
X τ log(Hi (w)) approximates:
min F (w) − τ log(Hi (w)) =: Fτ (w) (
w∈Rn
i=1 0 if Hi (w) ≥ 0
χ(Hi (w)) =
∞ if Hi (w) < 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 34/43


Barrier problem in interior-point method

3
@(Hi (w))
NLP with inequalites 2.5 = =0.200 != log(Hi (w))

minn F (w)

@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1

0.5

Idea: put inequality constraint into objective 0

Barrier problem -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
m
X τ log(Hi (w)) approximates:
min F (w) − τ log(Hi (w)) =: Fτ (w) (
w∈Rn
i=1 0 if Hi (w) ≥ 0
χ(Hi (w)) =
∞ if Hi (w) < 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 34/43


Barrier problem in interior-point method

3
@(Hi (w))
NLP with inequalites 2.5 = =0.040 != log(Hi (w))

minn F (w)

@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1

0.5

Idea: put inequality constraint into objective 0

Barrier problem -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
m
X τ log(Hi (w)) approximates:
min F (w) − τ log(Hi (w)) =: Fτ (w) (
w∈Rn
i=1 0 if Hi (w) ≥ 0
χ(Hi (w)) =
∞ if Hi (w) < 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 34/43


Barrier problem in interior-point method

3
@(Hi (w))
NLP with inequalites 2.5 = =0.008 != log(Hi (w))

minn F (w)

@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1

0.5

Idea: put inequality constraint into objective 0

Barrier problem -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
m
X τ log(Hi (w)) approximates:
min F (w) − τ log(Hi (w)) =: Fτ (w) (
w∈Rn
i=1 0 if Hi (w) ≥ 0
χ(Hi (w)) =
∞ if Hi (w) < 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 34/43


Barrier problem in interior-point method

3
@(Hi (w))
NLP with inequalites 2.5 = =0.002 != log(Hi (w))

minn F (w)

@(Hi (w))
1.5
w∈R
s.t. H(w) ≥ 0 1

0.5

Idea: put inequality constraint into objective 0

Barrier problem -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
m
X τ log(Hi (w)) approximates:
min F (w) − τ log(Hi (w)) =: Fτ (w) (
w∈Rn
i=1 0 if Hi (w) ≥ 0
χ(Hi (w)) =
∞ if Hi (w) < 0

1. Theory and algorithms for nonlinear programming A. Nurkanović 34/43


Example barrier problem

3
Example NLP F (w)
2.5 = =5.000 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Example barrier problem

3
Example NLP F (w)
2.5 = =1.500 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Example barrier problem

3
Example NLP F (w)
2.5 = =0.450 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Example barrier problem

3
Example NLP F (w)
2.5 = =0.135 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Example barrier problem

3
Example NLP F (w)
2.5 = =0.040 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Example barrier problem

3
Example NLP F (w)
2.5 = =0.012 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Example barrier problem

3
Example NLP F (w)
2.5 = =0.004 F= (w)

2
min 0.5w2 − 2w
w∈R2 1.5

Objective
s.t. −1≤w ≤1 1

0.5

0
Barrier problem
-0.5

-1
min 0.5w2 − 2 − τ log(w + 1) − τ log(1 − w) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
w∈R2 w

1. Theory and algorithms for nonlinear programming A. Nurkanović 35/43


Primal-dual interior-point method
Alternative interpretation

Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =1.000
Hi (w)

7i
i−1
1

τ
Introduce variable µi = Hi (w) 0.5

Smoothed KKT conditions 0

∇F (w) − ∇H(w)⊤ µ = 0 -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
Hi (w)µi = τ
(Hi (w) > 0, µi > 0)

1. Theory and algorithms for nonlinear programming A. Nurkanović 36/43


Primal-dual interior-point method
Alternative interpretation

Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =0.100
Hi (w)

7i
i−1
1

τ
Introduce variable µi = Hi (w) 0.5

Smoothed KKT conditions 0

∇F (w) − ∇H(w)⊤ µ = 0 -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
Hi (w)µi = τ
(Hi (w) > 0, µi > 0)

1. Theory and algorithms for nonlinear programming A. Nurkanović 36/43


Primal-dual interior-point method
Alternative interpretation

Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =0.010
Hi (w)

7i
i−1
1

τ
Introduce variable µi = Hi (w) 0.5

Smoothed KKT conditions 0

∇F (w) − ∇H(w)⊤ µ = 0 -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
Hi (w)µi = τ
(Hi (w) > 0, µi > 0)

1. Theory and algorithms for nonlinear programming A. Nurkanović 36/43


Primal-dual interior-point method
Alternative interpretation

Barrier problem
m
X
minn F (w) − τ log(Hi (w)) =: Fτ (w)
w∈R 3
i=1
0 5 7i ? Hi (w) 6 0
2.5 Hi (w)7i = =
KKT conditions (∇Fτ (w) = 0)
2
m
X 1
∇F (w) − τ ∇Hi (w) = 0 1.5 = =0.001
Hi (w)

7i
i−1
1

τ
Introduce variable µi = Hi (w) 0.5

Smoothed KKT conditions 0

∇F (w) − ∇H(w)⊤ µ = 0 -0.5


0 0.5 1 1.5 2 2.5 3
Hi (w)
Hi (w)µi = τ
(Hi (w) > 0, µi > 0)

1. Theory and algorithms for nonlinear programming A. Nurkanović 36/43


Primal-dual interior-point method

Nonlinear programming problem Solve approximately with Newton’s method


for fixed τ
min F (w)
w∈Rn Rτ (w, s, λ, µ) + ∇Rτ (w, s, λ, µ)∆z = 0
s∈RnH
s.t. G(w) = 0 with z = (w, s, λ, µ)
H(w) − s = 0
Line-serach
s≥0
Find α ∈ (0, 1)

wk+1 = wk + α∆w
Smoothed KKT conditions
  sk+1 = sk + α∆s
∇w L(w, λ, µ)
 G(w)  λk+1 = λk + α∆λ
Rτ (w, s, λ, µ) = 
 H(w) − s  = 0

µk+1 = µk + α∆µ
diag(s)µ − τ e
(s, µ > 0) such that sk+1 > 0, µk+1 > 0

e = (1, . . . , 1) and reduce τ ...


1. Theory and algorithms for nonlinear programming A. Nurkanović 37/43
Summary

▶ Optimization problem come in many variants (LP, QP, NLP, MPCC, MINLP, OCP, ....)
▶ Each problem class be addressed with suitable software.
▶ Lagrangian function, duality, and KKT conditions are important concepts
▶ For convex problems KKT conditions sufficient for global optimality.
▶ Newton-type optimization for NLP solves the nonsmooth KKT conditions via Sequential
Quadratic Programming (SQP) or via the Interior-Point Method.
▶ NLP solvers need to evaluate
first and second order derivatives (e.g. via CasADi).

1. Theory and algorithms for nonlinear programming A. Nurkanović 38/43


Some interesting and important topics not covered today

▶ Duality for convex optimization problems.


▶ First-order methods (gradient descent, stochastic gradient descent, ...).
▶ Solution methods for linear and quadratic programs (active set, interior-point, simplex).
▶ Augmented Lagrangian methods for constrained optimization.
▶ Solution methods for mixed-integer problems (branch and bound, ...)
▶ Computing derivatives via automatic differentiation.
▶ Globalization strategies (linear search vs trust region, merit functions vs filter).
▶ Regularization (convexification of the Hessian, LICQ violation).

1. Theory and algorithms for nonlinear programming A. Nurkanović 39/43


Optimization textbooks

Nonlinear optimization:
▶ Nocedal, Jorge, and Stephen J. Wright, eds. Numerical optimization. New York, NY:
Springer New York, 2006.
▶ Biegler, Lorenz T. Nonlinear programming: concepts, algorithms, and applications to
chemical processes. Society for Industrial and Applied Mathematics, 2010.

Convex optimization:
▶ Boyd, Stephen, and Lieven Vandenberghe. Convex Optimization. Cambridge University
Press, 2004. online: https://web.stanford.edu/~boyd/cvxbook/
▶ Rockafellar, R. T., Fundamentals of optimization. Lecture Notes 2007. online:
https://sites.math.washington.edu/~rtr/fundamentals.pdf

1. Theory and algorithms for nonlinear programming A. Nurkanović 40/43


Free online video lectures

Numerical optimization video lectures by Moritz Diehl (highly recommended!):


▶ Videos: https://www.syscop.de/teaching/ws2020/numerical-optimization
▶ Lecture notes: https://publications.syscop.de/Diehl2016.pdf

Lecture notes/slides by Mario Zanon Sébastien Gros


▶ https://mariozanon.wordpress.com/teaching/
numerical-methods-for-optimal-control/

Optimization software:
▶ https://plato.asu.edu/guide.html
▶ https://www.syscop.de/research/software

1. Theory and algorithms for nonlinear programming A. Nurkanović 41/43


References for this lecture

▶ Moritz Diehl, Sébastien Gros. ”Numerical optimal control (Draft),” Lecture notes, 2024.
online: https://www.syscop.de/files/2024ws/NOC/book-NOCSE.pdf
▶ Karmarkar, Narendra. ”A new polynomial-time algorithm for linear programming.” In
Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.
302-311. 1984.
▶ Dantzig, George B. ”Origins of the simplex method.” In A history of scientific computing,
pp. 141-151. 1990.

1. Theory and algorithms for nonlinear programming A. Nurkanović 42/43


Summary of optimality conditions

Optimality conditions for NLP with equality and/or inequality constraints:


▶ First-Order Necessary Conditions: A local optimizer of a (differentiable) NLP is a KKT
point
▶ Second-Order Sufficient Conditions require positivity of the Hessian in all critical
feasible directions
Nonconvex problem ⇒ a minimizer is not necessarily a global minimizer.
Note: some nonconvex problems may have a unique minimum

Some important practical consequences...


▶ A KKT point may not be a local (global) optimizer
... the lack of equivalence results from a lack of regularity and/or SOSC
▶ A local (global) optimizer may not be a KKT point
... due to violation of constraint qualifications, e.g. LICQ violated

1. Theory and algorithms for nonlinear programming A. Nurkanović 43/43

You might also like