0% found this document useful (0 votes)

9 views

Optimization Slides

RATES

Uploaded by

nkele098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Optimization Slides

RATES

Uploaded by

nkele098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 301

Part 1

Examples of optimization
problems

49 Wolfgang Bangerth
What is an optimization problem?

Mathematically speaking:

Let X be a Banach space; let

f : X→R{+}
g: X→Rne
h: X→Rni
be functions on X, find x ∈ X so that

f ( x) → min!
g ( x) = 0
h ( x) ≥ 0

Questions: Under what conditions on X, f, g, h can we

guarantee that (i) there is a solution; (ii) the solution is unique;
(iii) the solution is stable.

50 Wolfgang Bangerth
What is an optimization problem?
In practice:
●
x={u,y} is a set of design and auxiliary variables that
completely describe a physical, chemical,
economical model;
●
f(x) is an objective function with which we measure how
good a design is;
●
g(x) describes relationships that have to be met exactly
(for example the relationship between y and u)
●
h(x) describes conditions that must not be exceeded

Then find me that x for which

f ( x) → min!
g ( x) = 0
h( x) ≥ 0
Question: How do I find this x?
51 Wolfgang Bangerth
What is an optimization problem?
Optimization problems are often subdivided into classes:

Linear vs. Nonlinear

Convex vs. Nonconvex
Unconstrained vs. Constrained
Smooth vs. Nonsmooth
With derivatives vs. Derivativefree
Continuous vs. Discrete
Algebraic vs. ODE/PDE

Depending on which class an actual problem falls into, there are

different classes of algorithms.

52 Wolfgang Bangerth
Examples

Linear and nonlinear functions f(x)

on a domain bounded by linear inequalities

53 Wolfgang Bangerth
Examples

Strictly convex, convex, and nonconvex functions f(x)

54 Wolfgang Bangerth
Examples

Another non-convex function with many (local) optima.

We may want to find the one global optimum.
55 Wolfgang Bangerth
Examples

Optima in the presence of (nonsmooth) constraints.

56 Wolfgang Bangerth
Examples

Smooth and non-smooth nonlinear functions.

57 Wolfgang Bangerth
Applications: The drag coefficient of a car

Mathematical description:
x={u,y}: u are the design parameters (e.g. the shape of the car)
y is the flow field around the car
f(x): the drag force that results from the flow field
g(x)=y-q(u)=0:
constraints that come from the fact that there is a flow
field y=q(u) for each design. y may, for example, satisfy
58 the Navier-Stokes equations Wolfgang Bangerth
Applications: The drag coefficient of a car
Inequality constraints:
(expected sales price – profit margin) - cost(u) ≥ 0

volume(u) – volume(me, my wife, and her bags) ≥ 0

material stiffness * safety factor

- max(forces exerted by y on the frame) ≥ 0

59
legal margins(u) ≥ 0 Wolfgang Bangerth
Applications: The drag coefficient of a car
Analysis:
linearity: f(x) may be linear
g(x) is certainly nonlinear (Navier-Stokes equations)
h(x) may be nonlinear

convexity: ??

constrained: yes

smooth: f(x) yes

g(x) yes
h(x) some yes, some no

derivatives: available, but probably hard to compute in practice

continuous: yes, not discrete

ODE/PDE: yes, not just algebraic

60 Wolfgang Bangerth
Applications: The drag coefficient of a car
Remark:

In the formulation as shown, the objective function was of the form

f(x) = cd(y)

In practice, one often is willing to trade efficiency for cost, i.e. we are
willing to accept a slightly higher drag coefficient if the cost is smaller.
This leads to objective functions of the form

f(x) = cd(y) + a cost(u)

f(x) = cd(y) + a[cost(u)]2

61 Wolfgang Bangerth
Applications: Optimal oil production strategies
Permeability field Oil saturation

Mathematical description:
x={u,y}: u are the pumping rates at injection/production wells
y is the flow field (pressures/velocities)
f(x): the cost of production and injection minus sales price of
oil integrated over lifetime of reservoir (or -NPV)
g(x)=y-q(u)=0:
constraints that come from the fact that there is a flow
field y=q(u) for each u. y may, for example, satisfy
62 the multiphase porous media flow equations Wolfgang Bangerth
Applications: Optimal oil production strategies

Inequality constraints h(x)≥0:

Uimax-ui ≥ 0 (for all wells i):

Pumps have a maximal pumping rate/pressure

produced_oil(T)/available_oil(0) – c ≥ 0:
Legislative requirement to produce at least
a certain fraction

c - water_cut(t) ≥ 0 (for all times t):

It is inefficient to produce too much water

pressure – d ≥ 0 (for all times and locations):

Keeps the reservoir from collapsing

63 Wolfgang Bangerth
Applications: Optimal oil production strategies
Analysis:
linearity: f(x) is nonlinear
g(x) is certainly nonlinear
h(x) may be nonlinear

convexity: no

constrained: yes

smooth: f(x) yes

g(x) yes
h(x) yes

derivatives: available, but probably hard to compute in practice

continuous: yes, not discrete

ODE/PDE: yes, not just algebraic

64 Wolfgang Bangerth
Applications: Switching lights at an intersection

Mathematical description:
x={T, ti1, ti2}: round-trip time T for the stop light system,
switch-green and switch-red times for all lights i
f(x): number of cars that can pass the intersection per
hour;
Note: unknown as a function, but we can measure it
65 Wolfgang Bangerth
Applications: Switching lights at an intersection

Inequality constraints h(x)≥0:

300 – T ≥ 0:
No more than 5 minutes of round-trip time, so that people
don't have to wait for too long

t2i-t1i – 5 ≥ 0 (for all lights i):

At least 5 seconds of green for everyone

t1(i+1)-t2i – 5 ≥ 0:
At least 5 seconds of all-red between different greens

66 Wolfgang Bangerth
Applications: Switching lights at an intersection
Analysis:

linearity: f(x) ??
h(x) is linear

convexity: ??

constrained: yes

smooth: f(x) ??
h(x) yes

derivatives: not available

continuous: yes, not discrete

ODE/PDE: no

67 Wolfgang Bangerth
Applications: Trajectory planning

Mathematical description:
x={y(t),u(t)}: position of spacecraft and thrust vector at time t
T
f  x=∫0 ∣u t∣dt minimize fuel consumption

m ÿ t −u t=0 Newton's law

∣y t ∣−d 0≥0 Do not get too close to the sun
u max −∣ut ∣≥0 Only limited thrust available
68 Wolfgang Bangerth
Applications: Trajectory planning
Analysis:

linearity: f(x) is nonlinear

g(x) is linear
h(x) is nonlinear

convexity: no
constrained: yes
smooth: yes, here
derivatives: computable
continuous: yes, not discrete

ODE/PDE: yes

Note: Trajectory planning problems are often called optimal

control.

69 Wolfgang Bangerth
Applications: Data fitting 1

Mathematical description: 1
x={a,b}: parameters for the model y t = log cosh   ab t 
a
f(x)=1/N ∑i |yi-y(ti)|2:
mean square difference between predicted value
and actual measurement

70 Wolfgang Bangerth
Applications: Data fitting 1
Analysis:
linearity: f(x) is nonlinear

convexity: ?? (probably yes)

constrained: no

smooth: yes

derivatives: available, and easy to compute in practice

continuous: yes, not discrete

ODE/PDE: no, algebraic

71 Wolfgang Bangerth
Applications: Data fitting 2

Mathematical description:
x={a,b}: parameters for the model y t =atb
f(x)=1/N ∑i |yi-y(ti)|2:
mean square difference between
predicted value and actual measurement

72 Wolfgang Bangerth
Applications: Data fitting 2
Analysis:
linearity: f(x) is quadratic

Convexity: yes

constrained: no

smooth: yes

derivatives: available, and easy to compute in practice

continuous: yes, not discrete

ODE/PDE: no, algebraic

Note: Quadratic optimization problems (even with linear

constraints) are easy to solve!

73 Wolfgang Bangerth
Applications: Data fitting 3

Mathematical description:
x={a,b}: parameters for the model y t =atb
f(x)=1/N ∑i |yi-y(ti)|:
mean absolute difference between predicted
value and actual measurement

74 Wolfgang Bangerth
Applications: Data fitting 3
Analysis:
linearity: f(x) is nonlinear

Convexity: yes

constrained: no

smooth: no!

derivatives: not differentiable

continuous: yes, not discrete

ODE/PDE: no, algebraic

Note: Non-smooth problems are really hard to solve!

75 Wolfgang Bangerth
Applications: Data fitting 3, revisited

Mathematical description:
x={a,b, si}: parameters for the model y t =atb
“slack” variables si
f(x)=1/N ∑i si → min!
si - |yi-y(ti)| ≥ 0

76 Wolfgang Bangerth
Applications: Data fitting 3, revisited
Analysis:
linearity: f(x) is linear, h(x) is not linear

Convexity: yes

constrained: yes

smooth: no!

derivatives: not differentiable

continuous: yes, not discrete

ODE/PDE: no, algebraic

Note: Non-smooth problems are really hard to solve!

77 Wolfgang Bangerth
Applications: Data fitting 3, re-revisited

Mathematical description:
x={a,b, si}: parameters for the model y t =atb
“slack” variables si
f(x)=1/N ∑i si → min!
si - |yi-y(ti)| ≥ 0 si - (yi-y(ti)) ≥ 0
si + (yi-y(ti)) ≥ 0
78 Wolfgang Bangerth
Applications: Data fitting 3, re-revisited
Analysis:
linearity: f(x) is linear, h(x) is now also linear

Convexity: yes

constrained: yes

smooth: yes

derivatives: yes

continuous: yes, not discrete

ODE/PDE: no, algebraic

Note: Linear problems with linear constraints are simple to

solve!

79 Wolfgang Bangerth
Applications: Traveling salesman

Task: Find the shortest tour

through N cities with mutual
distances dij.

(Here: the 15 biggest cities of

Germany; there are 43,589,145,600
possible tours through all these cities.)

Mathematical description:
x={ci }: the index of the ith city on our trip, i=1...N
f(x)= ∑i d c c
i i1

ci ≠c j for i≠ j no city is visited twice (alternatively: ci c j ≥1 )

80 Wolfgang Bangerth
Applications: Traveling salesman
Analysis:
linearity: f(x) is linear, h(x) is nonlinear

Convexity: meaningless

constrained: yes

smooth: meaningless

derivatives: meaningless
N
continuous: discrete: x∈ X ⊂{1,2,... , N }

ODE/PDE: no, algebraic

Note: Integer problems (combinatorial problems) are often

exceedingly complicated to solve!

81 Wolfgang Bangerth
Part 2

Minima, minimizers,
sufficient and necessary
conditions

82 Wolfgang Bangerth
Part 3

Metrics of algorithmic
complexity

83 Wolfgang Bangerth
Outline of optimization algorithms
All algorithms to find minima of f(x) do so iteratively:

- start at a point x 0
- for k=1,2,..., :
. compute an update direction pk
. compute a step length k
. set x k  x k−1k pk
. set k  k1

84 Wolfgang Bangerth
Outline of optimization algorithms
All algorithms to find minima of f(x) do so iteratively:

- start at a point x 0
- for k=1,2,..., :
. compute an update direction pk
. compute a step length k
. set x k  x k−1k pk
. set k  k1

Questions:

- If x * is the minimizer that we are seeking,

does xk  x * ?
- How many iterations does it take for ∥xk −x *∥≤ ?
- How expensive is every iteration?
85 Wolfgang Bangerth
How expensive is every iteration?
The cost of optimization algorithms is dominated by evaluating
f(x), g(x), h(x) and derivatives:

●
Traffic light example: Evaluating f(x) requires us to sit at an
intersection for an hour, counting cars
●
Designing air foils: Testing an improved wing design in a
wind tunnel costs millions of dollars.

86 Wolfgang Bangerth
How expensive is every iteration?
Example: Boeing wing design

Boeing 767 (1980s) Boeing 777 (1990s) Boeing 787 (2000s)

50+ wing designs 18 wing designs 10 wing designs
tested in wind tunnel tested in wind tunnel tested in wind tunnel

Planes today are 30% more efficient than those developed in

the 1970s. Optimization in the wind tunnel and in silico made
that happen but is very expensive.
87 Wolfgang Bangerth
How expensive is every iteration?
Practical algorithms:

To determine the search direction p k

●
Gradient (steepest descent) method requires 1 evaluation
of ∇ f ⋅ per iteration
●
Newton's method requires 1 evaluation of ∇ f ⋅ and
2
1 evaluation of ∇ f ⋅ per iteration

●
If derivatives can not be computed exactly, they can be
approximated by several evaluations of f ⋅ and ∇ f ⋅

To determine the step length k

●
Both gradient and Newton method typically require several
evaluations of f ⋅ and potentially ∇ f ⋅ per iteration.

88 Wolfgang Bangerth
How many iterations do we need?
Question: Given a sequence xk  x * (for which we know
that ∥xk −x *∥ 0 ), can we determine exactly how fast the error
goes to zero?

∥x k − x *∥

89 Wolfgang Bangerth
How many iterations do we need?
Definition: We say that a sequence xk  x * is of order s if
s
∥x k − x*∥ ≤ C∥x k −1 −x*∥
A sequence of numbers ak  0 is called of order s if
∣a k∣ ≤ C∣a k−1∣s
s−1
C is called the asymptotic constant. We call C∣ak−1∣ gain factor.

Specifically:
If s=1, the sequence is called linearly convergent.
Note: Convergence requires C<1. In a singly logarithmic plot,
linearly convergent sequences are straight lines.
If s=2, we call the sequence quadratically convergent.
If 1<s<2, we call the sequence superlinearly convergent.
90 Wolfgang Bangerth
How many iterations do we need?
Example: The sequence of numbers
ak = 1, 0.9, 0.81, 0.729, 0.6561, ...
is linearly convergent because
∣a k∣ ≤ C∣a k−1∣s
with s=1, C=0.9.

Remark 1: Linearly convergent sequences can converge very

slowly if C is close to 1.

Remark 2: Linear convergence is considered slow. We will want

to avoid linearly convergent algorithms.

91 Wolfgang Bangerth
How many iterations do we need?
Example: The sequence of numbers
ak = 0.1, 0.03, 0.0027, 0.00002187, ...
is quadratically convergent because
∣a k∣ ≤ C∣a k−1∣s
with s=2, C=3.

Remark 1: Quadratically convergent sequences can converge

very slowly if C is large. For many algorithms we can show that
they converge quadratically if a0 is small enough since then
∣a 1∣ ≤ C∣a0∣2 ≤ ∣a 0∣
If a0 is too large then the sequence may fail to converge since
∣a 1∣ ≤ C∣a0∣2 ≥ ∣a 0∣
Remark 2: Quadratic convergence is considered fast. We will
want to use quadratically convergent algorithms.
92 Wolfgang Bangerth
How many iterations do we need?
Example: Compare linear and quadratic convergence

∥x k − x *∥
Linear convergence.
Gain factor C<1
is constant.

k Quadratic convergence.
Gain factor C∣ak−1∣1
becomes better and better!

93 Wolfgang Bangerth
Metrics of algorithmic complexity
Summary:

●
Quadratic algorithms converge faster in the limit than
linear or superlinear algorithms
●
Algorithms that are better than linear will need to be
started close enough to the solution

Algorithms are best compared by counting the number of

●
function,
●
gradient, or
●
Hessian evaluations
to achieve a certain accuracy. This is generally a good
measure for the run-time of such algorithms.

94 Wolfgang Bangerth
Part 4

Smooth unconstrained
problems:
Line search algorithms
minimize f  x 

95 Wolfgang Bangerth
Smooth problems: Characterization of Optima

Problem: find solution x * of

minimize x f  x

A strict local minimum x * must satisfy two conditions:

First order necessary condition: gradient must vanish:
∇ f  x*=0

Sufficient condition for a strict minimum:

spectrum ∇ 2 f  x *  0
96 Wolfgang Bangerth
Basic Algorithm for Smooth Unconstrained Problems

Basic idea for iterative solution x k  x * of the problem

minimize f  x 

Generate a sequence x k by
xk
1. finding a search direction p k
2. choosing a step length k pk
x k1

Then compute the update

x k1=x k k p k
Iterate until we are satisfied.
97 Wolfgang Bangerth
Step 1: Choose search direction

Conditions for a useful search direction:

Minimization function should
be decreased in this ∇ f  xk 
direction:

p k⋅∇ f  x k ≤0

Search direction should lead

−∇ f x k 
to the minimum as straight
as possible
98 Wolfgang Bangerth
Step 1: Choose search direction

Basic assumption: We can usually only expect to know the

minimization function f x k  locally at x k .
That means that we can only evaluate

f x k  ∇ f  x k =g k ∇ 2 f  x k =H k ...

For a search direction, try to model f in the vicinity of xk

by a Taylor series:

f  x k  pk  ≈ f  x k 
T
 gk pk
1 T
 pk H k pk  
2
99 Wolfgang Bangerth
Step 1: Choose search direction

Goal: Approximate f ⋅ in the vicinity of x k by a model

T 1 T
f  x k  p ≈ mk  p  = f k  g k p  p H k p  
2
with f x k =f k ∇ f  x k =g k ∇ 2 f  x k =H k ...

Then: Choose that direction pk that minimizes the model mk  p

100 Wolfgang Bangerth

Step 1: Choose search direction

Method 1 (Gradient method, Method of Steepest Descent):

search direction is minimizing direction of linear model
T
f  x k  p ≈ f k  g p = mk  p
k

p k = − gk

p k =−∇ f  x k 

101 Wolfgang Bangerth

Step 1: Choose search direction
Method 2 (Newton's method):
search direction is to the minimum of the quadratic model
T 1 T
mk  p  = f k  g p  p H k p
k
2
Minimum is characterized by
∂mk  p
= g k  H k p=0  p k = − H −1
k gk
∂p

102 Wolfgang Bangerth

Step 1: Choose search direction

Method 2 (Newton's method) -- alternative viewpoint:

Newton step is also generated when applying Newton's method
for the root-finding problem (F(x)=0) to the necessary optimality
condition:

Linearize necessary condition around xk:

0 = ∇ f  x * = ∇ f  x k   ∇ 2 f  x k   x *−x k   ...
gk Hk pk

p k = −H −1
k gk

103 Wolfgang Bangerth

Step 1: Choose search direction
Method 3 (A third order method):
The search direction is to the minimum of the cubic model

[ ]
3
T 1 T 1 ∂ f
mk  p = f k  g p  p H k p
k pl pm pn
2 6 ∂ xl ∂ xm ∂ xn k

Minimum is characterized by the quadratic equation

∂mk  p
[ ]
3
1 ∂ f
= g k  H k p p l p m=0  pk = ? ? ?
∂p 2 ∂ xl ∂ xm ∂ xn k

There doesn't appear to be any practical way to compute the

solution of this equation for problems with more than one
104 variable. Wolfgang Bangerth
Step 2: Determination of Step Length

Once the search direction is known, compute the update by

choosing a step length k and set
x k1 = x k k p k

Determine the step length by solving the

1-d minimization problem (line search):
k = arg min  f  x k  p k 

For Newton's method: If the quadratic

model is good, then step is good, then
take full step with k =1

105 Wolfgang Bangerth

Convergence: Gradient method

Gradient method converges linearly, i.e.

∥x k − x *∥ ≤ C∥x k−1−x *∥

Gain is a fixed factor C<1

Convergence can be very slow if C close to 1.

Example: If f(x)=xTHx, with H positive definite and for

optimal line search, then
n −1
C≈ {i }=spectrum H
n 1

106 x 2 y 2  C =0 x 25y 2  C≈0.6 Wolfgang Bangerth

Convergence: Newton's method

Newton's method converges quadratically, i.e.

∥x k − x ∥ ≤ C∥x k−1−x ∥2

Optimal convergence order only if step length is 1, otherwise

slower convergence (step length is 1 if quadratic model
valid!)

If quadratic convergence: accelerating progress as iterations

proceed.

Size of C:

C ∼ sup x , y
∥∇ 2 −1
f (x *) ( ∇ 2 f (x)−∇ 2 f ( y ))∥
∥x− y∥
C measures size of nonlinearity beyond quadratic part.
107 Wolfgang Bangerth
Example 1: Gradient method

f  x , y =−x 3 2x 2 y 2

Local minimum at x=y=0,

saddle point at x=4/3, y=0

108 Wolfgang Bangerth

Example 1: Gradient method

∥x k − x *∥

Convergence of gradient method:

Converges quite fast, with linear rate
Mean value of convergence constant C : 0.28
At (x=0,y=0), there holds

2 4−2
∇ f 0,0~{1=4, 2=2} C≈ ≈0.33
42
109 Wolfgang Bangerth
Example 1: Newton's method

f  x , y =−x 3 2x 2 y 2

Local minimum at x=y=0,

saddle point at x=4/3, y=0

110 Wolfgang Bangerth

Example 1: Newton's method

∥x k − x *∥

Convergence of Newton's method:

Converges very fast, with quadratic rate
Mean value of convergence constant C : 0.15
∥x k − x *∥ ≤ C∥x k−1− x *∥2

Theoretical estimate yields C=0.5

111 Wolfgang Bangerth
Example 1: Comparison between methods

∥x k −x *∥

k
Newton's method much faster than gradient method
Newton's method superior for high accuracy due to higher
order of convergence
Gradient method simple but converges in a reasonable
number of iterations as well
112 Wolfgang Bangerth
Example 2: Gradient method


4 2 2
f  x , y =  x− y   1

1 2
100 100
y

(Banana valley function)

Global minimum at x=y=0

113 Wolfgang Bangerth

Example 2: Gradient method

∥x k − x *∥

Convergence of gradient method:

Needs almost 35,000 iterations to come closer than 0.1 to
the solution!
Mean value of convergence constant C : 0.99995
At (x=4,y=2), there holds
2 268−0.1
∇ f  4,2~{1=0.1,2=268} C≈ ≈0.9993
2680.01
114 Wolfgang Bangerth
Example 2: Newton's method


4 2 2
f  x , y =  x − y   1

1 2
100 100
y

(Banana valley function)

Global minimum at x=y=0

115 Wolfgang Bangerth

Example 2: Newton's method

∥x k − x *∥

Convergence of Newton's method:

Less than 25 iterations for an accuracy of better than 10 -7!

Convergence roughly linear for first 15-20 iterations since

step length k ≠1

Convergence roughly quadratic for last iterations with step

116
length k ≈1 Wolfgang Bangerth
Example 2: Comparison between methods

∥x k −x *∥

Newton's method much faster than gradient method

Newton's method superior for high accuracy (i.e. in the
vicinity of the solution) due to higher order of convergence
Gradient method converges too slowly for practical use

117 Wolfgang Bangerth

Practical line search strategies

Ideally: Use an exact step length determination (line search)

based on
k = arg min  f  x k  p k 

This is a 1d minimization problem for α, solvable via Newton's

method/bisection search/etc.

However: Expensive, may require many function/gradient

evaluations.

Instead: Find practical criteria that guarantee convergence but

need less function evaluations!

118 Wolfgang Bangerth

Practical line search strategies

Strategy: Find practical criteria that guarantee convergence

but need less evaluations.

Rationale:
●
Near the optimum, quadratic approximation of f is valid
→ take full steps (step length 1) there
●
Line search only necessary far away from the solution
●
If close to solution, need to try α=1 first

Consequence:
●
Near solution, quadratic convergence of Newton's method
is retained
●
Far away, convergence is slower in any case.
119 Wolfgang Bangerth
Practical line search strategies

Practical strategy: Use an inexact line search that:

●
finds a reasonable approximation to the exact step length
●
chosen step length guarantees a sufficient decrease in f(x);
●
chooses full step length 1 for Newton's method whenever
possible.

f  x , y =x 4− x 2 y 4− y 2
120 Wolfgang Bangerth
Practical line search strategies

Wolfe condition 1 (“sufficient decrease” condition):

Require step lengths to produce a sufficient decrease

f  x k  p k  ≤ f  x k   c1  [ ∂ f  x k  pk 
∂ ]
=0
= f k  c 1  ∇ f k⋅p k

f x k  p k  Necessary:
0c 11
Typical values:
c 1=10−4
i.e.: only very small
decrease mandated


121 Wolfgang Bangerth
Practical line search strategies

Wolfe condition 2 (“curvature” condition):

Require step lengths where f has shown sufficient
curvature upwards

∇ f  x k  p k ⋅p k =
[ ∂ f  x k  p k 
∂ ]
 = k
≥ c2
[ ∂ f  x k  p k 
∂ ] =0
= c 2 ∇ f k⋅p k

f x k  p k  Necessary:
0c 1c 21
Typical:
c 2=0.9
Rationale: Exclude too
small step lengths


122 Wolfgang Bangerth
Practical line search strategies

Wolfe conditions
Conditions 1 and 2 usually yield reasonable ranges for the
step lengths, but do not guarantee optimal ones

f x k  p k 


123 Wolfgang Bangerth
Practical line search strategies - Alternatives

Strict Wolfe conditions: f x k  p k 

∣[ ∂ f  x k  p k 
∂ ] ∣ ∣[
=k
≤ c2
∂ f  x k  p k 
∂ ]∣
=0

f x k  p k 


Goldstein conditions:

f  x k  p k  ≥ f  x k   1−c1  [ ∂ f  x k  p k 
∂ ]
=0


124 Wolfgang Bangerth
Practical line search strategies

Conditions like the ones above tell us whether a given step

length is acceptable or not.

In practice, don't try too many step lengths – checking the

conditions involves function evaluations of f(x).

Typical strategy (“Backtracking line search”):

1. Start with a trial step length t = 
(for Newton's method:  =1 )
2. Verify acceptance conditions for this t
3. If yes: k =t
4. If no:  t =c  t , c1 and go to 2.

1
Note: A typical reduction factor is c=
2
125 Wolfgang Bangerth
Practical line search strategies

An alternative strategy (“Interpolating line search”):

(0)
●
Start with αt = ᾱ =1 , set i=0
●
Verify acceptance conditions for it 
●
If yes:  k =i
t

●
If no:
- let k = f x k  pk 
- from evaluating the sufficient decrease condition
i   i
f  x k t p k  ≤ f k  c 1 t ∇ f k⋅p k
we already know k 0= f x k  , k ' 0=∇ f k⋅p k =g k⋅pk
and k it = f  x k it p k 
- if i=0 then choose i1
t as minimizer of the quadratic
function that interpolates k 0, ' k 0 ,k  it 
- if i0 then choose i1
t as the minimizer of the cubic
i i−1
function that interpolates k 0, ' k 0 ,k t , k  t 
126 Wolfgang Bangerth
Practical line search strategies

An alternative strategy (“Interpolating line search”):

Step 1: Quadratic interpolation

(0)
αt
127 Wolfgang Bangerth
Practical line search strategies

An alternative strategy (“Interpolating line search”):

Step 2 and following: Cubic interpolation

(1) (0)
αt αt
128 Wolfgang Bangerth
Part 5

Smooth unconstrained
problems:
Trust region algorithms
minimize f  x 

129 Wolfgang Bangerth

Line search vs. trust region algorithms

Line search algorithms:

Choose a relatively simple strategy to find a search direction
Put significant effort into finding an appropriate step length

130 Wolfgang Bangerth

Line search vs. trust region algorithms

Trust region algorithms:

Choose simple strategy to determine a step length.
Put effort into finding an appropriate search direction.

Background:
In line search methods, we choose a direction based on a local
approximation of the objective function
I.e.: Try to predict f(x) far away from xk by looking at fk , gk , Hk

This can't work when still far

from the solution!
(Unless f(x) is almost
quadratic everywhere.)

131 Wolfgang Bangerth

Trust region algorithms

Trust region algorithms:

Choose simple strategy to determine a step length.
Put effort into finding an appropriate search direction.

Alternative strategy:
Keep a number Δk that indicates up to which distance we trust
that our model mk(p) is a good approximation of f(xk+pk).

Find an update as follows:

1 T
pk = arg min p mk  p=f k g k⋅p p B p
2
such that ∥p∥ ≤  k
Then accept the update unconditionally, i.e. without line search:
xk1 = xk  pk
132 Wolfgang Bangerth
Trust region algorithms
xk
Example:
f x mk  p
pk

Line search Newton direction leads to the exact minimum of

approximating model mk(p).

However, mk(p) does not approximate f(x) well at these

distances.
Consequently, we need line search as a safe guard.

133 Wolfgang Bangerth

Trust region algorithms
xk
Example:
f x mk  p

Rather, decide how far we trust the model and stay within this
radius!

134 Wolfgang Bangerth

Trust region algorithms

Basic trust region algorithm:

For k=1,2,...:
●
Compute update by finding approximation p k to the solution of
1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k

●
Compute predicted improvement PI = mk 0−mk  p k 
●
Compute actual improvement AI = f  x k − f  x k  p k 

1
●
If AI / PI  1/ 4 then  k1 = ∥ pk∥
4
AI / PI  3 /4 and ∥pk∥= k then  k1 = 2  k

●
If AI / PI   for some ∈[ 0,1/ 4 ) then x k1 = x k  p k
else x k1 = xk
135 Wolfgang Bangerth
Trust region algorithms

Fundamental difficulty of trust region algorithms:

1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k
●
Not a trivial problem to solve!
●
As with line search algorithms, don't spend too much time
finding the exact minimum of an approximate model.

●
Practical trust region methods are about finding cheap ways
to approximate the solution of the problem above!

136 Wolfgang Bangerth

Trust region algorithms: The dogleg method

Find an approximation to the solution of:

1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k

Note:
If trust region radius is small, then we get the “Cauchy point” in
the steepest descent direction:
C SD SD gk
pk ≈ p =  p
k k ∈[0,1] p k = − k
∥g k∥
p Ck is the minimizer of f(x) in direction p SD
k

If trust region radius is large, then we get the (quasi-)Newton

update: B −1
p k = p k = −B k g k

137 Wolfgang Bangerth

Trust region algorithms: The dogleg method

Find an approximation to the solution of:

1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k

xk p B xk p Bk
k
pCk
p Ck

 k ∥ p Bk ∥  k ∥ p Bk ∥

138 Wolfgang Bangerth

Trust region algorithms: The dogleg method

Find an approximation to the solution of:

1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k

xk p B xk p Bk
k
C
pk
p Ck

Idea:
Find the approximate solution p k along the “dogleg” line
x k  x k  pCk  x k  pkB
139 Wolfgang Bangerth
Trust region algorithms: The dogleg method

Find an approximation to the solution of:

1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k

In practice, the Cauchy point is difficult to compute because it

requires a line search.
Thus, dogleg method doesn't use the minimizer p Ck of f along pSD
k

but the minimizer gT g

U k k
p k =− T
gk
g Bk g k
k
of
1
mk  p= f k g Tk p pT Bk p
2

U B
The dogleg then runs along x k  x k  p k  x k  p k
140 Wolfgang Bangerth
Trust region algorithms: The dogleg method

Find an approximation to the solution of:

1
p k = arg min p mk  p= f k g k⋅p pT B k p
2
such that ∥p∥ ≤  k

Dogleg algorithm:
B
If pBk =−B−1
k gk satisfies ∥p Bk ∥ k then set p k = p k

U gTk g k p Uk
Otherwise, if p =− k T
g k satisfies ∥p Uk ∥ k then set p k = U
k
g Bk g k
k ∥p ∥
k

Otherwise choose p k as the intersection point of the line p Uk  p kB

and the circle with radius  k

141 Wolfgang Bangerth

Part 6

Practical aspects of
Newton methods

minimize f  x 

142 Wolfgang Bangerth

What if the Hessian is not positive definite

At the solution, Hessian ∇ 2 f  x * is positive definite. If f(x) is

smooth, Hessian is positive definite near the optimum.

However, this needs not be so far away from the optimum:

At initial point x 0
the Hessian is indefinite:
2
H 0=∇ f  x 0 = 
−0.022 0.134
0.134 −0.337 
1 =−0.386, 2=0.027

Quadratic model
T 1 T
mk  p= f k  g p p H k p
k
2
has saddle point instead of
minimum, Newton step is
143
invalid! Wolfgang Bangerth
What if the Hessian is not positive definite

Background: Search direction only useful if it is a descent

direction:
∇ f  x k T⋅p k 0

Trivially satisfied for Gradient method, for Newton's method

there holds:
−1 T T −1
p k =−H g k
k  g ⋅p k =−g H g k  0
k k k

Search direction only a

guaranteed descent direction,
if H positive definite!

Otherwise search direction is

direction to saddle point of
quadratic model and might be
a direction of ascent!
144 Wolfgang Bangerth
What if the Hessian is not positive definite
If Hessian is not positive definite, then modify the quadratic
model:
●
retain as much information as possible;
●
model should be convex, so that we can seek a minimum.

The general strategy then is to replace the quadratic model by

a positive definite one:
T 1 T
mk  p = f k g p p H k p
k
2

Here, H k is a suitable modification of exact Hessian H k =∇ 2 f  xk 

so that H k is positive definite.

Note: To retain ultimate quadratic convergence, we need that

H k  H k as xk  x *

145 Wolfgang Bangerth

What if the Hessian is not positive definite
The Levenberg-Marquardt modification:

Choose
H k = H k  I −i
so that the minimum of
T1 T
mk  p = f k g p p H k p
k
2
lies at
pk =− H −1 g = − H  I −1
gk
k k k pkN
Note: Search direction is mixture
between Newton direction and gradient.

Note: Close to the solution the Hessian pGk

must become positive definite and we
can choose =0
146 Wolfgang Bangerth
What if the Hessian is not positive definite

The eigenvalue modification strategy:

Since H is symmetric, it has a complete set of eigenvectors:

H k = ∇2 f  xk  = ∑i i i i
 v v T

Therefore replace the quadratic model by a positive definite

one: 1 T
T
mk  p = f k g k p p H k p
2
with
k =
H ∑i { i } i i
max  ,  v v T

Note: Only modify the Hessian in directions of negative

curvature.
Note: Close to the solution, all eigenvalues become positive
and we get again the original Newton matrix.
147 Wolfgang Bangerth
What if the Hessian is not positive definite

One problem with the modification

k =
H ∑i { i } i i
max  ,  v v T

is that the search direction is given by

1
pk = − H̃ k g k = −∑ i
T
v i ( v i gk )
−1

max { λi , ϵ }

that is search direction has large component (of size 1/ε) in

direction of modified curvatures!

An alternative that avoids this is to use

k =
H ∑i i i i
∣ ∣v v T

148 Wolfgang Bangerth

What if the Hessian is not positive definite

Theorem: Using full step length and either of the Hessian

modifications
H k = H k  I −i
k =
H ∑i max { i ,  } vi v T
i

we have that if x k  x * and if f ∈C 2,1 then convergence

happens with quadratic rate.

Proof: Since f is twice continuously differentiable, there is a k

such that xk is close enough to x* that Hk is positive definite.
When that is the case, then
 k = Hk
H
for all following iterations, providing the quadratic convergence
rate of the full step Newton method.
149 Wolfgang Bangerth
What if the Hessian is not positive definite

Example:
f ( x , y ) = x 4− x 2 + y 4 − y 2

Blue regions indicate that

Hessian
( )
2
2 12x −2 0
∇ f (x , y) =
0 12y 2 −2
is not positive definite.

±√ 2 ±√(2)
minima at x= y=
2, 2

150 Wolfgang Bangerth

What if the Hessian is not positive definite

Starting point:
x 0 =0.1 y 0=0.87

H0 =  −1.88 0
0 7.08 
1.Negative gradient
(3)
2.Unmodified Hessian search
direction
(4)
3.Search direction with eigenvalue
modified Hessian (=10-6)
(2) (1)
4.Search direction with shifted
Hessian (=2.5; search direction
only good by lucky choice of )
151 Wolfgang Bangerth
Truncated Newton methods

In any Newton or Trust Region method, we have to solve an

equation of the sort
H k pk = −gk
or potentially with a modified Hessian:

H k pk = −g k

Oftentimes, computing the Hessian is more expensive than

inverting it, but not always.

Question: Could we possibly get away with only approximately

solving this problem, i.e. finding
−1
pk ≈ −H g k
k

with suitable conditions on how accurate the approximation is?

152 Wolfgang Bangerth
Truncated Newton methods

Example: Since the Hessian (or a modified version) is a

positive definite matrix, we may want to solve
H k pk = −gk

using an iterative method such as the Conjugate Gradient

method, Gauss-Seidel, Richardson iteration, SSOR, etc etc.

While all these methods eventually converge to the exact

Newton direction, we may want to truncate this iteration at one
point.

Question: When can we terminate this iteration?

153 Wolfgang Bangerth

Truncated Newton methods

Theorem 1: Let pk be an approximation to the Newton

direction defined by
H k pk = −gk

and let there be a sequence of numbers { k },  k 1 so that

∥g k H k pk∥
≤ k 1
∥g k∥

Then if x k  x * then the full step Newton method converges

with linear order.

154 Wolfgang Bangerth

Truncated Newton methods

Theorem 2: Let p̂ k be an approximation to the Newton

direction defined by
H k pk = −gk

and let there be a sequence of numbers { k },  k 1, k 0

so that
∥g k H k pk∥
≤ k 1
∥g k∥

Then if x k → x * then the full step Newton method converges

with superlinear order.

155 Wolfgang Bangerth

Truncated Newton methods

Theorem 3: Let pk be an approximation to the Newton

direction defined by
H k pk = −gk

and let there be a sequence of numbers { k },  k 1, k =O ∥g k∥

so that
∥g k H k pk∥
≤ k 1
∥g k∥

Then if x k  x * then the full step Newton method converges

with quadratic order.

156 Wolfgang Bangerth

Part 7

Quasi-Newton update formulas

Bk1=B k ...

157 Wolfgang Bangerth

Quasi-Newton update formulas

Observation 1:

Computing the exact Hessian to determine the Newton search

direction
H k pk = −gk

is expensive, and sometimes impossible.

It at least doubles the effort per iteration because we need not

only the first but also the second derivative of f(x).

It also requires us to solve a linear system for the search

direction.

158 Wolfgang Bangerth

Quasi-Newton update formulas

Observation 2:

We know that we can get superlinear convergence if we

choose the update pk using
Bk p k = −gk
instead of
H k pk = −gk

under certain conditions on the matrix Bk.

159 Wolfgang Bangerth

Quasi-Newton update formulas

Question:
● Maybe it is possible to find matrices Bk for which:

● Computing Bk is cheap and requires no additional

function evaluations
●
Solving
Bk p k = −gk
for pk is cheap

●
The resulting iteration still converges with superlinear
order.

160 Wolfgang Bangerth

Motivation of ideas

Consider a function p(x).

The Fundamental Theorem of Calculus tells us that

T
p  z− p x =∇ p   z −x 
for some =xt  z−x , t ∈[0,1 ]

Let's apply this to p x =∇ f  x , z =x k , x=x k−1 :

2
∇ f  x k −∇ f  x k−1 =gk −gk−1=∇ f x k −t  p k  x k − x k−1
= H  x k −x k −1 

Let us denote y k −1 =g k −g k−1 , s k−1= x k − x k−1 then this reads

H s k −1 = y k−1
 .
with an “average” Hessian H
161 Wolfgang Bangerth
Motivation of ideas

Requirements:
● We seek a matrix Bk+1 so that

●
The “secant condition” holds:
Bk1 s k = y k

● Bk+1 is symmetric

● Bk+1 is positive definite

● Bk+1 changes minimally from Bk

●
The update equation is easy to solve for

p k 1 = −B−1
k1 g k1

162 Wolfgang Bangerth

Davidon-Fletcher-Powell

The DFP update formula:

Given Bk define Bk+1 by

T T T
B k +1=( I −γ y k s k ) B k ( I−γ s k y k )+γ y k y k
1
γk= T
yk sk
This satisfies the conditions:
●
It is symmetric and positive definite
●
It is among all possible matrices the one that minimizes
−1/ 2 −1/ 2
 
∥H B k1−Bk  H ∥F
●
It satisfies the secant condition Bk1 s k = y k
163 Wolfgang Bangerth
Broyden-Fletcher-Goldfarb-Shanno

The BFGS update formula:

Given Bk define Bk+1 by

Bk s k s TK Bk y k y Tk
Bk1=B k − T
 T
s B k sK
k y s
k k

This satisfies the conditions:

●
It is symmetric and positive definite
●
It is among all possible matrices the one that minimizes
1/ 2 1/ 2
 −1 −1

∥H B k1−Bk  H ∥F
●
It satisfies the secant condition Bk1 s k = y k
164 Wolfgang Bangerth
Broyden-Fletcher-Goldfarb-Shanno

So far:
● We seek a matrix Bk+1 so that

●
The secant condition holds:

Bk1 s k = y k
● Bk+1 is symmetric

● Bk+1 is positive definite

● Bk+1 changes minimally from Bk in some sense

●
The update equation is easy to solve for

pk = −B−1
k gk

165 Wolfgang Bangerth

DFP and BFGS

Now a miracle happens:

For the DFP formula:

T T T 1
B k1= I − k y s  Bk  I − k s k y  k y k y ,
k k k k k= T
yk sk
T −1
−1 −1 B−1
k y k k Bk
y s k s Tk
Bk 1=B k − T −1
 T
y B yk
k k y k sk
For the BFGS formula:
T T
B k s k s K Bk y k y k
B k1=B k − T  T
s k B k sk yk sk
−1 T −1 T T 1
B k1 = I −k s k y  B  I − k y s k s s ,
k k k k k k  k= T
y k sk
This makes computing the next update very cheap!
166 Wolfgang Bangerth
DFP + BFGS = Broyden class

What if we mixed:

DFP T T T 1
B k1 = I− k y s  B k  I − k s k y  k y k y ,
k k k k k = T
yk sk
BFGS Bk s k s TK B k y k y Tk
B k1 =Bk − T

s Bk s k
k y Tk s k

DFP BFGS
Bk1=k B k1 1−k  B k

This is called the “Broyden class” of update formulas.

The class of Broyden methods with 0≤k ≤1 is called the

“restricted Broyden class”.

167 Wolfgang Bangerth

DFP + BFGS = Broyden class

Theorem: Let f ∈C2 , let x 0 be a starting point so that the set

={x :f  x≤f  x 0}

is convex. Let B0 be any symmetric positive definite matrix.
Then
xk  x *

for any sequence x k generated by a quasi-Newton method

that uses a Hessian update formula by any member of the
restricted Broyden class with the exception of the DFP method
k =1 .

168 Wolfgang Bangerth

DFP + BFGS = Broyden class

Theorem: Let f ∈C2,1 . Assume the BFGS updates converge,

then
xk  x *

with superlinear order.

169 Wolfgang Bangerth

Practical BFGS: Starting matrix

Question: How do we choose the initial matrix B0 or B−1

0
?

Observation 1: The theorem stated that we will eventually

converge for any symmetric, positive definite starting matrix.

In particular, we could choose a multiple of the identity matrix

1−1
B0= I , B = I
0

Observation 2: If  is too small, then
−11
p0=−B g =− g0
0 0

is too large, and we need many trials in line search to find a
suitable step length.

Observation 3: The matrices B should approximate the

Hessian matrix, so they at least need to have the same
170 physical units. Wolfgang Bangerth
Practical BFGS: Starting matrix

Practical approaches:

Strategy 1: Compute the first gradient g0, choose a “typical”

step length  , then set
∥g0∥  −1
B0= I, B = 0I
 ∥g 0∥
so that we get g0
−1
p0=−B 0 g0 =−
∥g0∥

Strategy 2: Approximate the true Hessian somehow. For

example, do one step with the heuristic above, choose
y T1 y 1 −1 y T1 s 1
B0= T
I, B =
0 T
I
y s
1 1 y y1
1
and start over again.
171 Wolfgang Bangerth
Practical BFGS: Limited Memory BFGS (LM-BFGS)

Observation: The matrices

T T
B k s s Bk
k K yk y k
B k1=B k − T
 T
s B k sk
k y sk
k
−1 T −1 T T 1
B k1 = I −k s k y  B  I − k y s k s s ,
k k k k k k k= T
y k sk
are full, even if the true Hessian is sparse.

Consequence:

We need to compute all n2 entries, and store them.

172 Wolfgang Bangerth

Practical BFGS: Limited Memory BFGS (LM-BFGS)

Solution: Note that in the kth iteration, we can write

T T
B−1
k =V B −1
V
k−1 k−1 k−1  s s
k−1 k−1 k−1
1 T
with  k−1= T ,V k−1= I− k−1 y k−1 s k−1
y k−1 s k−1
We can expand this recursively:
T T
B−1
k =V B −1
V
k−1 k−1 k−1  s s
k−1 k−1 k −1
=V Tk−1 V Tk−2 B−1k−2 V k−2 V k−1

k−2 V Tk−1 s k−1 s Tk−2 V k−1k−1 s k−1 s Tk−1

=...
=[ V Tk−1⋅⋅⋅V T1 ] B−1
0 [ V 1⋅⋅⋅V k−1 ]
k
∑ j=1 k− j {[ V ]s
T T T
⋅⋅⋅V
k−1 k − j1 s
k− j k− j [ V k− j1⋅⋅⋅V k−1 ] }
Consequence: We need only store kn entries.
173 Wolfgang Bangerth
Practical BFGS: Limited Memory BFGS (LM-BFGS)

Problem: kn elements may still be quite a lot if we need many

iterations. Forming the product with this matrix will then also be
expensive.

Solution: Limit memory and CPU time by only storing the last
m updates:

B =[ V
−1
k
T
k−1 ⋅⋅⋅V ] B [ V k−m⋅⋅⋅V k −1 ]
T
k−m
−1
0,k
m
∑ j=1 k− j {[ V Tk−1⋅⋅⋅V Tk − j1 ] s k− j s Tk− j [ V k− j1⋅⋅⋅V k−1 ] }

Consequence: We need only store mn entries and

multiplication with this matrix requires 2mn+O(m3) operations.

174 Wolfgang Bangerth

Practical BFGS: Limited Memory BFGS (LM-BFGS)

B−1
k = [ k−1 k−m ] 0,k [ V k−m⋅⋅⋅V k −1 ]
V T
⋅⋅⋅V T
B −1

m
∑ j=1 k− j {[ V Tk−1⋅⋅⋅V Tk − j1 ] s k− j s Tk− j [ V k− j1⋅⋅⋅V k−1 ] }

In practice:
●
Initial matrix can be chosen independently in each
iteration; typical approach is again

−1 y Tk−1 sk −1
B =
0,k T
I
y k−1 y k−1
●
Typical values for m are between 3 and 30.

175 Wolfgang Bangerth

Parts 1-7

Summary of methods for

smooth unconstrained
problems
minimize f  x 

176 Wolfgang Bangerth

Summary
●
Newton's method is unbeatable with regard to speed of
convergence
●
However: To converge, one needs
- a line search method + conditions like the Wolfe conditions
- Hessian matrix modification if it is not positive definite
●
Newton's method can be expensive or infeasible if
- computing Hessians is complicated
- the number of variables is large
●
Quasi-Newton methods, e.g. LM-BFGS, help:
- only need first derivatives
- need little memory and no explicit matrix inversions
- but converge slower (at best superlinear)
●
Trust region methods are an alternative to Newton's method
but share the same drawbacks
177 Wolfgang Bangerth
Part 8

Equality-constrained
Problems

minimize f  x
g i  x  = 0, i=1,... , ne

178 Wolfgang Bangerth

An example

Consider the example of the body suspended from a ceiling

with springs, but this time with an additional rod of fixed
length attached to a fixed point:

To find the position of the body we now need to solve the

following problem:

minimize f  x =E  x, z=∑i E spring, i  x ,zE pot  x , z

∥x − x0∥−Lrod = 0
179 Wolfgang Bangerth
An example

We can gain some insight into the problem by plotting the

energy as a function of (x,z) along with the constraint:

180 Wolfgang Bangerth

Definitions

We call this the standard form of equality constrained

problems:

minimize x∈ D⊂R f  x
n

g i  x = 0, i=1 ...n e

We will also frequently write this as follows, implying equality

elementwise:

minimize x∈D⊂R f  x
n

g x = 0

181 Wolfgang Bangerth

Definitions

A trivial reformulation of the problem is obtained by defining the

feasible set:
n
={x∈R : g x=0}

Then the original problem is equivalently recast as

minimize x∈D∩⊂ R f  x
n

Note 1: Reformulation is not of much practical interest.

Note 2: Feasible set can be continuous or discrete, or empty if
constraints are mutually incompatible.
We will always assume that it is continuous and non-empty.
182 Wolfgang Bangerth
The quadratic penalty method

Observation: The solution of

minimize x∈D⊂R f  x
n

g x = 0
must lie within the feasible set where g(x)=0.

Idea: Let's relax the constraint and also search close to

where g(x) is small but not zero. However, make sure that
the objective function becomes very large if far away from
the feasible set:
1 2
minimize x∈ D⊂ R n Q  x= f  x ∥g  x∥
2
Qμ(x) is called the quadratic relaxation of the constrained
minimization problem. μ is the penalty parameter.
183 Wolfgang Bangerth
The quadratic penalty method

Why is Qμ(x) called relaxation of the constrained

minimization problem with f(x), g(x)?

Consider the original problem

minimize f  x =E  x, z=∑i E spring, i  x , zE pot  x , z

∥x − x0∥−Lrod = 0
with relaxation
1 2
Q x =E  x, z ∥x − x0∥−Lrod 
2
Replacing fixed rod by spring with constant D  would yield
an unconstrained problem with objective function

f x =E x , z 1 D ∥x − x0∥−L rod 2

2
184 Wolfgang Bangerth
The quadratic penalty method

Example: Qμ(x) with μ=infinity

185 Wolfgang Bangerth

The quadratic penalty method

Example: Qμ(x) with μ=0.01

186 Wolfgang Bangerth

The quadratic penalty method

Example: Qμ(x) with μ=0.001

187 Wolfgang Bangerth

The quadratic penalty method

Example: Qμ(x) with μ=0.00001

188 Wolfgang Bangerth

The quadratic penalty method

Algorithm:
Given xstart
0 , {μ t }→0, {t }→ 0
For t=0,1, 2, ...:
* *
Find approximation x t to the (unconstrained) mimizer xt
of Q  x that satisfies
t

*
∥∇ Q  x ∥≤t
t
t

using xstart
t
as starting point.
start *
Set t=t+1, x =x
t t−1

Typical values:

μt =c μt−1 , c=0.1 to 0.5

t =c t −1
189 Wolfgang Bangerth
The quadratic penalty method

Positive properties of the quadratic penalty method:

●
Algorithms for unconstrained problems readily available;
● Q at least as smooth as f, gi for equality constrained
problems;
●
Usually only few steps are needed for each penalty
parameter, since good starting point known;
●
It is not really necessary to solve each unconstrained
minimization to high accuracy.

Negative properties of the quadratic penalty method:

●
Minimizers for finite penalty parameters are usually
infeasible;
●
Problem is becoming more and more ill-conditioned near
optimum as penalty parameter is decreased, Hessian large.
190 Wolfgang Bangerth
The quadratic penalty method

*
Theorem (Convergence): Let x t be exact minimizer of Q  x
t
and let  t  0 . Let f,g be once differentiable.
*
Then every limit point of the sequence {xt }t =1,2,... is a
solution of the constrained minimization problem

minimize x∈ D⊂R f  x
n

g x = 0

191 Wolfgang Bangerth

The quadratic penalty method

*
Theorem (Convergence): Let x t be approximate
minimizers of Q  x with
t
*
∥∇ Q  x ∥≤ t
t t

for a sequence  t  0 and let  t  0 . Let f ∈C 2 , g∈C 1.

*
Then every limit point of the sequence {xt }t =1,2,... satisfies
certain first-order necessary conditions for solutions of the
constrained minimization problem

minimize x∈ D⊂R f  x
n

g x = 0

192 Wolfgang Bangerth

Lagrange multipliers

Consider a (single) constraint g(x) as a function of x:

g(x,z)=-0.1 g(x,z)=0 g(x,z)=0.1

g x =∥x − x0∥− Lrod

193 Wolfgang Bangerth
Lagrange multipliers

Now look at the objective function f(x):

3 1 2
f x =∑i=1 D ∥x − xi∥−L 0 
2
194 Wolfgang Bangerth
Lagrange multipliers

Now both f(x), g(x):

g(x,z)=0

195 Wolfgang Bangerth

Lagrange multipliers

Now both f(x), g(x):

Conclusion:
●
Solution is where isocontours are tangential to each other
●
That is, where gradients of f and g are parallel
●
Solution is where g(x)=0
196 Wolfgang Bangerth
Lagrange multipliers

Conclusion:
●
The solution is where gradients of f and g are parallel
●
The solution is where g(x)=0

In mathematical terms:
The (local) solutions of

minimize f  x = E  x , z=∑i Espring , i  x ,zE pot  x , z

g  x =∥x − x0∥−L rod = 0
are where the following conditions hold for some value of λ:

∇ f x − ∇ g x  = 0

g x  = 0
197 Wolfgang Bangerth
Lagrange multipliers

Consider the same situation for three variables and two

constraints:

f  x = f  x , y , z
198 Wolfgang Bangerth
Lagrange multipliers

Constraint 1: Contours of g1(x)

g1(x)=0 g 1(x)=1 g1(x)=2

199 Wolfgang Bangerth

Lagrange multipliers

Constraint 2: Contours of g2(x)

g2(x)=-1

g2(x)=0

g2(x)=1

200 Wolfgang Bangerth

Lagrange multipliers

Constraints 1+2 at the same time

g2(x)=0

g1(x)=0

201 Wolfgang Bangerth

Lagrange multipliers

Constraints 1+2 and f(x):

g2(x)=0

g1(x)=0 local solutions

202 Wolfgang Bangerth

Lagrange multipliers

Conclusion:
●
The solution is where the gradient of f can be written as a
linear combination of the gradients of g1, g2
● The solution is where g1(x)=0, g2(x)=0
203 Wolfgang Bangerth
Lagrange multipliers

Generally (under certain conditions):

The (local) solutions of
n
minimize f x  f x :ℝ ℝ
n n
g x  = 0, g x :ℝ  ℝ e

are where the conditions

∇ f x − ⋅∇ g x  = 0

g x  = 0
n
hold for some vector of Lagrange multipliers  ∈ℝ e

Note: There are enough equations to determine both x and λ.

204 Wolfgang Bangerth
Lagrange multipliers

By introducing the Lagrangian

n ne
L x ,  = f x −⋅g x  , L:ℝ ×ℝ ℝ
the conditions

∇ f x − ⋅∇ g x  = 0

g x  = 0
can conveniently be written as

∇ {x , } L x ,   = 0

205 Wolfgang Bangerth

Constraint Qualification: Example 1

When can we characterize solutions by Lagrange multipliers?

Consider the problem
2 2 2,
minimize f x  = x1  y1 z
g1 x = x = 0,
g2 x = y = 0.
with solution
* T
x =0,0,0
At the solution, we have
* T * T * T
∇ f x =2,2,0 , ∇ g 1 x =1,0,0 , ∇ g 2 x =0,1,0
and consequently
=2,2T
206 Wolfgang Bangerth
Constraint Qualification: Example 1

When can we characterize solutions by Lagrange multipliers?

Compare this with the problem
2 2 2,
minimize f x  = x1  y1 z
2
g1  x  = x = 0,
2
g2 x  = y = 0.
with the same solution
* T
x =0,0,0
At the solution, we now have
* T * * T
∇ f x =2,2,0 , ∇ g 1 x =∇ g2 x =0,0,0
and there are no Lagrange multipliers so that
* *

∇ f x = ⋅∇ g x 
207 Wolfgang Bangerth
Constraint Qualification: Example 2

When can we characterize solutions by Lagrange multipliers?

Consider the problem

minimize f  x  = y ,
2 2
g1 x = x−1  y −1= 0,
2 2
g2 x = x1  y −1= 0.

There is only a single

point at which both
constraints are satisfied:
* T
⃗x =(0,0)
208 Wolfgang Bangerth
Constraint Qualification: Example 2

When can we characterize solutions by Lagrange multipliers?

Consider the problem

minimize f  x  = y ,
2 2
g1 x = x−1  y −1= 0,
2 2
g2 x = x1  y −1= 0.

* T
At the solution x =0,0 , we have
* T * * T
∇ f x =0,1 , ∇ g 1 x =−∇ g 2 x = 2,0
and again there are no Lagrange multipliers so that
* *

∇ f x = ⋅∇ g x 
209 Wolfgang Bangerth
Constraint Qualification: LICQ

Definition:
We say that at a point x the linear independence constraint
qualification (LICQ) is satisfied if

{∇ g i x }i=1... n e

is a set of ne linearly independent vectors.

Note: This is equivalent to saying that the matrix

[ ]
T
[∇ g 1 x ]
A= ⋮
T
[ ∇ g n  x ]e

has full row rank ne.

210 Wolfgang Bangerth

First-order necessary conditions

Theorem:
*
Suppose that x is a local solution of
n
minimize f x  f x :ℝ ℝ
n n
g x  = 0, g x :ℝ  ℝ e

and suppose that at this point the LICQ holds. Then there exists a
unique Lagrange multiplier vector so that the following conditions are
satisfied:
∇ f x − ⋅∇ g x  = 0
g x  = 0
Note: - These conditions are often referred to as the Karush-Kuhn-
Tucker (KKT) conditions.
- If LICQ does not hold, there may still be a solution,
211 but it may not satisfy the KKT conditions! Wolfgang Bangerth
First-order necessary conditions

Theorem (alternative form):

*
Suppose that x is a local solution of
n
minimize f x  f x :ℝ ℝ
n n
g x  = 0, g x :ℝ  ℝ e

and suppose that at this point the LICQ holds. Then

*
∇ f x ⋅w = 0
for every vector tangential to all constraints,
*
w ∈ {v : v⋅∇ gi x =0, i=1...n e }
or equivalently
w ∈ NullA
212 Wolfgang Bangerth
Second-order necessary conditions

Theorem:
*
Suppose that x is a local solution of
n
minimize f x  f x :ℝ ℝ
n n
g x  = 0, g x :ℝ  ℝ e

and suppose that at this point the first order necessary conditions
and the LICQ hold. Then

T 2 *
w ∇ f x ⋅w ≥ 0
for every vector tangential to all constraints,

w ∈ Null A
213 Wolfgang Bangerth
Second-order sufficient conditions

Theorem:
Suppose that at a feasible point x the first order necessary (KKT)
conditions hold. Suppose also that
T 2
w ∇ f x ⋅w  0
for all tangential vectors
w ∈ NullA , w ≠0
Then x is a strict local minimizer of
n
minimize f x  f x :ℝ ℝ
n n
g x  = 0, g x :ℝ  ℝ e

214 Wolfgang Bangerth

Characterizing the null space of A

All necessary and sufficient conditions required us to test

conditions like T 2
w ∇ f x ⋅w  0
for all tangential vectors

w ∈ NullA , w ≠0
In practice, this can be done as follows:

If LICQ holds, then dim(Null(A))=n-ne. Thus, there exist n-ne

vectors zl so that Azl=0 , and every vector w can be written as
n n×n−ne n−ne
w = Z 
, w ∈ℝ , Z =[ z 1 ,... , z n−n ]∈ℝ
e
, ∈ℝ


This matrix Z can be computed from A for example by a QR

decomposition.
215 Wolfgang Bangerth
Characterizing the null space of A

With this matrix Z , the following statements are equivalent:

First order ∇ f  x ⋅w = 0 ∀ w ∈ NullA

necessary
T
conditions [ ∇ f  x  ] Z = 0
T 2
Second order w ∇ f x ⋅
w ≥ 0 ∀ w ∈ Null  A
necessary
conditions Z [ ∇ f x  ] Z is positive semidefinite
T 2

Second order T 2
sufficient w ∇ f x ⋅
w  0 ∀ w ∈ Null A , w ≠0
conditions
Z [ ∇ f x  ] Z is positive definite
T 2

216 Wolfgang Bangerth

Part 9

Quadratic programming

1 T T
minimize f  x= x G xd xe
2
gx= A x−b = 0

217 Wolfgang Bangerth

Solving equality constrained problems

Consider a general nonlinear program with general nonlinear

equality constraints:
n
minimize f x f x:ℝ ℝ
n n
g x = 0, gx :ℝ  ℝ e

Maybe we can solve such problems with an iterative scheme

like unconstrained ones?

Analogy: For unconstrained nonlinear programs, we

approximate f(x) in each iteration by a quadratic model. For
quadratic functions, we can find minima in one step:
1 T T
min x f  x= x H xd xe
2
2 −1 −1
[∇ f  x 0 ] p 0=−∇ f  x 0  ⇔ x 1=x 0 −H Hx 0d=−H d
218 Wolfgang Bangerth
Solving equality constrained problems

For the general nonlinear constrained problem:

Assuming a condition like LICQ holds, then we know that we
need to find points {x ,  } at which

∇ f x − ⋅∇ g x  = 0

g x  = 0
Alternatively, we can write this as

∇ {x , } L x ,   = 0

with
n ne
L x ,  = f x −⋅g x  , L:ℝ ×ℝ ℝ
219 Wolfgang Bangerth
Solving equality constrained problems

If we combine z={x , } then this can also be written as

∇ z L z = 0
which looks like the first-order necessary condition for
minimizing L(z). We then may think of finding solutions as
follows:
T
●
[
Start at a point z 0= x0 ,  0 ]
●
Compute search directions using [∇ 2z L z k ] pk=−∇ z L z k 
●
Compute a step length  k
●
Update z k1 =zk  k p k

Note: This is misleading, since we will in fact not look for

2
minima of L(z), but for saddle points. Consequently, ∇ z L z k 
220 is indefinite. Wolfgang Bangerth
Solving equality constrained problems

The equations we have to solve in each Newton iteration have

the form
2
[∇ z L z k] pk = −∇ z L z k 
Because
n ne
L x , =f  x−⋅g x , L :ℝ ×ℝ  ℝ
the equations we have to solve read in component form:

  
2 2
∇ f  x k−∑ i i , k ∇ g i  x k −∇ g  x k pkx

=
−∇ g  x k T 0 pk

221
=−
 −g x k  
∇ f  x k −∑i  i ,k ∇ gi  x k 

Wolfgang Bangerth
Linear quadratic programs

Consider first the linear quadratic case with symm. matrix G:

1 T T n
f x= x G xd xe , f :ℝ  ℝ
2
ne×n ne
gx =Ax−b , A∈ℝ , b∈ℝ
with
T 1 T T T
L x , =f  x− g x= x G xd xe−  Ax−b
2
Then the first search direction needs to satisfy the (linear) set
of equations
2
[∇ L z 0] p0 = −∇ z L z 0 
z

or equivalently:

    
T x T
G −A p0 Gx 0d− 0 A
= −
−A 0 
p0 − Ax 0 −b
222 Wolfgang Bangerth
Linear quadratic programs

Theorem 1: Assume that G is positive definite in all feasible

directions, i.e. ZTGZ is positive definite, and that the matrix A
has full row rank. Then the KKT matrix

 
T
G −A
−A 0
is nonsingular and the system

    
T x T
G −A p 0 = − Gx 0d− 0 A
−A 0 p

− Ax 0 −b
0

has a unique solution.

223 Wolfgang Bangerth

Linear quadratic programs

Theorem 2: Assume that G is positive definite in all feasible

directions, i.e. ZTGZ is positive definite. Then the solution of
the linear quadratic program
1 T T
min x f  x= x G xd xe
2
g x =Ax−b = 0
is equivalent to the first iterate
x
x1 = x 0 p0
that results from solving the linear system

    
T x T
G −A p0 Gx 0d− 0 A
= −
−A 0 
p0 − Ax 0 −b
irrespective of the starting point x0.
224 Wolfgang Bangerth
Linear quadratic programs

Theorem 3: Assume that G is positive definite in all feasible

directions, i.e. ZTGZ is positive definite, and that the matrix A
has full row rank. Then the KKT matrix

 
T
G −A
−A 0
has n positive, ne negative eigenvalues, and no zero
eigenvalues. In other words, the KKT matrix is indefinite but
non-singular, and the quadratic function

1 T T T
L x , = x G xd xe−  Ax−b
2
in {x , } has a single stationary point that is a saddle point.

225 Wolfgang Bangerth

Part 10

Sequential Quadratic Programming

(SQP)

minimize f  x
g x = 0

226 Wolfgang Bangerth

The basic SQP algorithm

For z={x , }, the equality-constrained optimality conditions read

∇ z L z = 0
Like in the unconstrained Newton's method, sequential
quadratic programming uses the following basic iteration:

T
●
Start at a point z 0=[ x0 ,  0 ]
2
●
Compute search directions using [∇ z L z k] pk=−∇ z L z k 
●
Compute a step length  k

●
Update z k1 =zk  k p k

227 Wolfgang Bangerth

Computing the SQP search direction

The equations for the search direction are

  
2 2
∇ f  x k −∑i  i, k ∇ gi  xk  −∇ g xk  pkx

=
T
−∇ g xk  0 pk

=−
 ∇ f  x k −∑i  i, k ∇ gi  xk 
−g x k  
which we will abbreviate as follows:


W k −A k p xk
T
−A k 0 p k   
=−
∇ f  x k −∑i  i, k ∇ gi x k 
−g x k  
with
2
W k = ∇ x L(xk ,λ k )
A k = ∇ x g(x k) = −∇ x ∇ λ L(x k ,λ k)
228 Wolfgang Bangerth
Computing the SQP search direction

Theorem 1: Assume that W is positive definite in all feasible

directions, i.e. ZkTWkZk is positive definite, and that the matrix
Ak has full row rank. Then the KKT matrix of SQP step k

 
T
W k −A k
−A k 0
is nonsingular and the system that determines the SQP search
direction

   
T x
W k −A
−A k 0
k pk

pk
= − ∇ x Lx k , k 
−g x k 
has a unique solution.
Proof: Use Theorem 1 from Part 9.
Note: The columns of the matrix Zk span the null space of Ak.
229 Wolfgang Bangerth
Computing the SQP search direction

Theorem 2: The solution of the SQP search direction system

   
T
W k −A k p xk
−A k 0 pk
= − ∇ x Lx k , k 
−g x k 
equals the minimizer of the problem
x T 1 xT 2
x x
min x mk  p =Lx k , k ∇ x L xk , k p  pk ∇ x Lx k , k  pk
k k
2
g xk ∇ g xk T pkx = 0
that approximates the original nonlinear equality-constrained
minimization problem.

Proof: Essentially just use Theorem 2 from Part 9.

Note: This means that SQP in each step minimizes a quadratic
model of the Lagrangian, subject to linearized constraints.
230 Wolfgang Bangerth
Computing the SQP search direction

Theorem 3: The SQP iteration with full steps, i.e.

   
T
W k −A k p xk
−A k 0 pk
= − ∇ x Lx k , k 
−g x k 
x 
xk 1 =x k pk ,  k1 = kp k
converges to the solution of the constrained nonlinear
optimization problem with quadratic order if (i) we start close
enough to the solution, (ii) the LICQ holds at the solution and
(iii) the matrix Z*TW*Z* is positive definite at the solution.

231 Wolfgang Bangerth

How SQP works

Example 1:

1 2 2
min f  x =  x1 x2 
2
g x = x 21 = 0

The search direction is then computed using the step

T
x 1 xT x
x
min mk ( p ) = L(x k , λk ) +
k
( 1,k
x2, k −λ k )
x
p k + pk p k
2
T
x 2,k +1 + 0 p k = 0
() x

1
In other words, the linearized constraint enforces that
x x
p2,k = −(x 2,k +1) → x 2,k+1 =x2, k + p 2,k = −1
232 Wolfgang Bangerth
How SQP works

Example 2:

min f  x
g  x = x 2−sin x 1  = 0

Search direction is then computed by

x
min mk  p k
T

 
x 2, k −sin  x 1,k   −cos x 1, k  pkx = 0
1

In particular, if we are currently at (0,-2), this enforces

− p1, k  p 2,k = 2
233 Wolfgang Bangerth
How SQP works

Example 3:

min f  x
g x = 0

If constraint is already satisfied at a

step, then search direction solves

min mk  p xk 
T x T x
g  x k ∇ g x k  p k = ∇ g  x k  p k = 0

In other words: The update step can only be tangential to

the constraint (along the linearized constraint)!

234 Wolfgang Bangerth

Hessian modifications for SQP

The SQP step

   
T
W k −A k p xk
−A k 0 pk
= − ∇ x Lx k , k 
−g x k 
is equivalent to the minimization problem
x 1 xT 2
T x x
min x mk  p =Lx k , k ∇ x L xk , k p  pk ∇ x Lx k , k  pk
k k
2
g xk ∇ g xk T pkx = 0
or abbreviated:
x T T 1 xT
x x
min x mk  p =L k∇ f − A k  p  pk W k pk
k x k k k
2
g xk  ATk pxk = 0
From this, we may expect to get into trouble if the matrix
ZkTWkZk is not positive definite.
235 Wolfgang Bangerth
Hessian modifications for SQP

If the matrix ZkTWkZk in the SQP step

   
T x
W k −A
−A k 0
k pk

pk
= − ∇ x Lx k , k 
−g x k 
is not positive definite, then there may not be a unique solution.

There exist a number of modifications to ensure that an

alternative step can be computed that satisfies

   
T
W k −A k pkx
−A k 0 pk
= − ∇ x L x k ,  k
−g x k  
instead.

236 Wolfgang Bangerth

Line search procedures for SQP

Motivation: For unconstrained problems, we used f(x) to

measure progress along a direction pk computed from a
quadratic model mk that approximates f(x).

Idea: For constrained problems, we could consider L(z) to

measure progress along a search direction pk computed using
the SQP step based on the model mk.

Problem 1: The Lagrangian L(z) is unbounded. E.g., for

linear-quadratic problems, L(z) is quadratic of saddle-point form.
Indeed, we are now looking for this saddle point of L.

Consequence 1: We can't use L(z) to measure progress in line

search algorithms.
237 Wolfgang Bangerth
Line search procedures for SQP

Motivation: For unconstrained problems, we used f(x) to

measure progress along a direction pk computed from a
quadratic model mk that approximates f(x).

Idea: For constrained problems, we could consider L(z) to

measure progress along a search direction pk computed using
the SQP step based on the model mk.

Problem 2: Some step lengths may lead to a significant

reduction in f(x) but take us far away from constraints g(x)=0. Is
this better than a step that may increase f(x) but lands on the
constraint ?

Consequence 2: We need a merit function that balances

238 decrease of f(x) with satisfying the constraint g(x). Wolfgang Bangerth
Line search procedures for SQP

Solution: Drive step length determination using a merit

function that contains both f(x) and g(x).

Examples: Commonly used choices are the l1 merit function

1
1 x  = f  x ∥gx ∥1

with 1
=∥ k1∥∞  , 0


or Fletcher's merit function

T 1 2
F  x = f  x−  x g x ∥g x∥
with 2
T −1
 x  =[ A  x A  x ] A  x ∇ f  x

239 Wolfgang Bangerth

Line search procedures for SQP

Definition: A merit functions is called exact if the constrained

optimizer of the problem
min x f  x
g x  = 0

is also a minimizer of the merit function.

Note: Both the l1 and Fletcher's merit function

1
1 x  = f  x ∥gx ∥1

T 1 2
F  x = f  x−  x g x ∥g x∥
2
are exact for appropriate choices of  , .

240 Wolfgang Bangerth

Line search procedures for SQP

Theorem 4: The SQP search direction that satisfies

   
T x
W k −A
−A k 0
k pk

pk
= − ∇ x Lx k , k 
−g x k 
is a direction of descent for both the l1 as well as Fletcher's
merit function if (i) the current point xk is not a stationary point
of the equality-constrained problem, and (ii) the matrix ZkTWkZk
is positive definite.

241 Wolfgang Bangerth

A practical SQP algorithm

Algorithm: For k=0,1,2,...

●
Find a search direction using the KKT system

   
T x
W k −A
−A k 0
k pk

pk
= − ∇ x Lx k , k 
−g x k 
●
Determine step length using a backtracking linear search,
a merit function and the Wolfe (or Goldstein) conditions:
x x
 x k  p  ≤  x k   c1  ∇  x k ⋅p
k k
x x x
∇  x k  p ⋅p ≥ c2 ∇  x k ⋅p
k k k

●
Update the iterate using either
x λ
xk + 1=xk + αk p , k λ k+ 1=λ k+ α k p k
or
x T −1
x k1=x k k p , k  k1=[ Ak 1 A ] Ak 1 ∇ f  x k 1 
k 1
242 Wolfgang Bangerth
Parts 8-10

Summary of methods for

equality-constrained Problems

minimize f  x
g i  x  = 0, i=1,... , ne

243 Wolfgang Bangerth

Summary of methods

Two general methods for equality-constrained problems:

●
Penalty methods (e.g. the quadratic penalty method)
convert constrained problem into unconstrained one that can
be solved with techniques well known.

However, often lead to ill-conditioned problems

●
Lagrange multipliers reformulate the problem into one
where we look for saddle points of a Lagrangian
●
Sequential quadratic programming (SQP) methods solve
a sequence of quadratic programs with linear constraints,
which are simple to solve
●
SQP methods are the most powerful methods.
244 Wolfgang Bangerth
Part 11

Inequality-constrained
Problems

minimize f  x
g i  x  = 0, i=1,... , ne
hi  x ≥ 0, i=1,. .. , n i

385 Wolfgang Bangerth

An example

Consider the example of the body suspended from a ceiling

with springs, but with an element of fixed minimal length
attached to a fixed point:

To find the position of the body we now need to solve the

following problem:

minimize f  x = E  x , z=∑i Espring , i  x , zE pot  x , z

∥x − x0∥− Lrod ≥ 0
386 Wolfgang Bangerth
An example

We can gain some insight into the problem by plotting the

energy as a function of (x,z) along with the constraint:

387 Wolfgang Bangerth

Definitions

We call this the standard form of inequality constrained

problems:

minimize x∈ D⊂ R f  x
n

g i  x = 0, i=1... n e
hi  x ≥ 0, i=1 ... ni

We will also frequently write this as follows, implying

(in)equality elementwise:

minimize x∈ D⊂ R f  x 
n

g x = 0
h x ≥ 0
388 Wolfgang Bangerth
Definitions

Let x* be the solution of

minimize x∈ D⊂ R f  x
n

g i  x = 0, i=1... n e
hi  x ≥ 0, i=1 ... ni

We call a constraint active if it is zero at the solution x*:

●
Obviously, all equality constraints are active, since a
solution needs to satisfy g(x*)=0
●
Some inequality constraints may not be active if it so
happens that hi  x*0 for some index i
●
Other inequality constraints may be active if hi  x *=0
We call the set of all active (equality and inequality)
389
constraints the active set. Wolfgang Bangerth
Definitions

Note: If x* is the solution of

minimize x∈ D⊂ R f  x 
n

g i  x  = 0, i=1... n e
hi  x  ≥ 0, i=1 ... ni

then it is also the solution of the problem

minimize x∈ D⊂ R f  x
n

g i  x = 0, i=1... n e
hi  x = 0, i=1 ... ni ,i is active at x*

where we have dropped all inactive constraints and made

equalities out of all active constraints.
390 Wolfgang Bangerth
Definitions

A trivial reformulation of the problem is obtained by defining the

feasible set:
n
={x∈R : g  x=0, h x≥0}

Then the original problem is equivalently recast as

minimize x∈D∩⊂ R f  x
n

Note 1: This reformulation is not of much practical interest.

Note 2: The feasible set can be continuous or discrete. It can
also be empty if the constraints are mutually incompatible. In
the following we will always assume that it is continuous and
391 non-empty. Wolfgang Bangerth
The quadratic penalty method

Observation: The solution of

minimize x∈ D⊂R n f  x
g  x = 0
h x ≥ 0
must lie within the feasible set.

Idea: Let's relax the constraint and allow to search also

where g(x) is small but not zero, or where h(x) is small and
negative. However, make sure that the objective function
becomes very large if far away from the feasible set:
1 2 1 − 2
minimize x ∈D⊂R n Q  x= f  x ∥g  x∥  ∥[h x ] ∥
2 2
Qμ(x) is called the quadratic relaxation of the minimization
problem. μ is the penalty parameter, and
−
392 [h x] = min {0 , h x} Wolfgang Bangerth
The quadratic penalty method

Replace the original constrained minimization problem

minimize f  x 
gi  x = 0, i=1,... , ne
hi  x ≥ 0, i=1,. .. , ni
by an unconstrained method with a quadratic penalty term:
1 1
minimize x∈ D⊂R Q  x = f  x
n ∥g  x∥2  ∥[h x]−∥2
2 2
μ=0.01

Example:
μ=0.1 minimize f  x =sin x
h1  x=x−0 ≥ 0,
h2  x=1−x ≥ 0.
393 Wolfgang Bangerth
The quadratic penalty method
Negative properties of the quadratic penalty method:
●
minimizers for finite penalty parameters are usually
infeasible;
●
problem is becoming more and more ill-conditioned near
optimum as penalty parameter is decreased, Hessian large;
●
for inequality constrained problems, Hessian not twice
differentiable at constraints.
=0.01 =0.2
=2

=0.02

=0.1

minimize x 22 s.t. g  x =x 2x 21=0

394 Wolfgang Bangerth
The logarithmic barrier method
Replace the original constrained minimization problem
minimize f  x 
hi  x ≥ 0, i=1,. .. , ni

by an unconstrained method with a logarithmic barrier term:

ni
minimize x∈D ⊂R Q  x=f  x ∑i=1 −log hi x 
n

=0.1 =0.05

f x

minimize f  x =sin x  s.t. x ≥0, x ≤1

395 Wolfgang Bangerth
The logarithmic barrier method

Properties of successive minimization of

minimize x Q   x  = f  x  −  ∑ log h i  x 
i

● intermediate minimizers are feasible, since Qμ(x)=∞ in the

infeasible region; the method is an interior point method.
●
Q is smooth if constraints are smooth;
●
we need a feasible point as starting point;
●
ill-conditioning and inadequacy of Taylor expansion remain;
● Q (x) may be unbounded from below if h(x) unbounded.
μ
●
inclusion of equality constraints as before by quadratic
penalty method.

Summary:
This is an efficient method for the solution of constrained
396 problems. Wolfgang Bangerth
Algorithms for penalty/barrier methods

Algorithm (exactly as for the equality constrained case):

start
Given x0 , { t } 0, { t }0
For t=0,1, 2, ...:
* *
Find approximation x t to the (unconstrained) mimizer xt

of Q  x
t
that satisfies
*
∥∇ Q  x ∥≤t
t
t

start
using xt as starting point.

start *
Set t=t+1, x =x
t t−1

Typical values:  t =c  t−1 , c=0.1 to 0.5

 t =c  t−1
397 Wolfgang Bangerth
The exact penalty method

Previous methods suffered from the fact that minimizers of

Qμ(x) for finite μ are not optima of the original problem.
Solution: Use

minimize x
1
  x  = f  x

1
 [ ∑∣g i  x ∣∑∣[h i  x ] ∣
i i
−
]
−1
 =10 −1=10

f x −1=4
−1
 =1 f x
−1
 =1

minimize f  x =sin x  s.t. x ≥0, x ≤1

398 Wolfgang Bangerth
The exact penalty method

Properties of the exact penalty method:

●
for sufficiently small penalty parameter, the optimum of the
modified problem is the optimum of the original one;
●
possibly only one iteration in the penalty parameter needed
if size of μ is known in advance;
●
this is a non-smooth problem!

This is an efficient method

if (but only if!) a solver for nonsmooth problems is available!

399 Wolfgang Bangerth

Part 12

Theory of
Inequality-Constrained
Problems
minimize f  x
g i  x  = 0, i=1,... , ne
hi  x ≥ 0, i=1,. .. , n i

400 Wolfgang Bangerth

Lagrange multipliers

Consider a (single) constraint h(x) as a function for all x:

h(x,z)=-0.1 h(x,z)=0 h(x,z)=0.1

h  x =∥x− x0∥−L rod ≥ 0

401 Wolfgang Bangerth
Lagrange multipliers

Now look at the objective function f(x):

3 1 2
f x =∑i=1 D ∥x − xi∥−L 0 
2
402 Wolfgang Bangerth
Lagrange multipliers

Both f(x), h(x) for the case of a rod of minimal length 20cm:

infeasible
region

h(x,z)=0 with Lrod=20cm

403 Wolfgang Bangerth

Lagrange multipliers

Could this be a solution x*?

∇ h x *
∇ f  x *
x*

Answer: No – moving into the feasible direction would also

reduce f(x).
Rather, the solution will equal the unconstrained one, and the
404 inequality constraint will be inactive at the solution. Wolfgang Bangerth
Lagrange multipliers

Both f(x), h(x) for the case of a rod of minimal length 35cm:

infeasible
region

h(x,z)=0 with Lrod=35cm

405 Wolfgang Bangerth

Lagrange multipliers

Could this be a solution x*?

x* ∇ f x *
∇ h x *

Answer: Yes – moving into feasible direction would increase f(x).

Note: The gradients of h and f are parallel and in the same
406 direction. Wolfgang Bangerth
Lagrange multipliers

Conclusion:
●
Solution can be where the constraint is not active
●
If the constraint is active at the solution: gradients of f and h
are parallel, but not antiparallel

In mathematical terms: The (local) solutions of

minimize f  x =E  x, z=∑i E spring, i  x , zE pot  x , z

hx =∥x − x0∥− Lrod ≥ 0

are where one of the following conditions hold for some λ,μ:

∇ f  x−⋅∇ h x = 0
∇ f x = 0
h x =0 or
hx  0
 ≥ 0
407 Wolfgang Bangerth
Lagrange multipliers

Conclusion, take 2: Solutions are where either

∇ f  x−⋅∇ h x = 0
∇ f x = 0
h x =0 or
hx  0
 ≥ 0
which could also be written like so:

∇ f  x−⋅∇ h x = 0 ∇ f x−⋅∇ hx = 0

h x =0 or h x  0
 ≥ 0  = 0
(constraint is active) (constraint is inactive)

408 Wolfgang Bangerth

Lagrange multipliers

Conclusion, take 3: Solutions are where

∇ f  x−⋅∇ h x = 0 ∇ f x−⋅∇ hx = 0

or
h x =0 h x  0
 ≥ 0  = 0

or written differently:

∇ f x−⋅∇ hx = 0
hx ≥ 0
 ≥ 0
 hx = 0
Note: The last condition is called complementarity.
409 Wolfgang Bangerth
Lagrange multipliers

Same idea, but with two minimum length elements:

infeasible
region

h1(x,z)=0 h2(x,z)=0

410 Wolfgang Bangerth

Lagrange multipliers
Could this be a solution x*?

∇ h1  x *
∇ f x * x*

Answer: No – moving into feasible direction would decrease f(x).

Note: The gradient of f is antiparallel to the gradient of h1. h2 is an
411
inactive constraint so doesn't matter here. Wolfgang Bangerth
Lagrange multipliers

Same idea, but with two different minimum length elements:

infeasible
region

h1(x,z)=0 h2(x,z)=0

412 Wolfgang Bangerth

Lagrange multipliers

Could this be a solution x*?

∇ f x *

∇ h1  x *
∇ h2  x *
x*

Answer: Yes – moving into feasible direction would increase f(x).

Note: The gradient of f is a linear combination (with positive
413 multiples) of the gradients of h1 and h2. Wolfgang Bangerth
Constraint Qualification: LICQ
Definition:
We say that at a point x the linear independence constraint
qualification (LICQ) is satisfied if

{∇ gi  x}i=1 ... n ,{∇ hi  x}i=1. ..n ,i active at x

e i

is a set of linearly independent vectors.

Note: This is equivalent to saying that the matrix of gradients of all

active constraints,

[ ]
T
[ ∇ g1x ]
⋮
[ ∇ g n x]T
A = e

[∇ h first active i x ]T

⋮ has full row rank (i.e. its rank is
[ ∇ h last active i x ]T ne + # of active ineq. constraints).
414 Wolfgang Bangerth
First-order necessary conditions
Theorem:
Suppose that x* is a local solution of
n
minimize f  x f x :ℝ  ℝ
n n
g x = 0, g x:ℝ  ℝ e

n n
hx ≥ 0, h x:ℝ ℝ i

and suppose that at this point the LICQ holds. Then there exist
unique Lagrange multipliers so that these conditions are satisfied:

∇ f  x−⋅∇ gx−⋅∇ h x = 0
g x =0
hx ≥0
 ≥0
 i hi  x =0

Note: These are often called the Karush-Kuhn-Tucker (KKT)

415 conditions. Wolfgang Bangerth
First-order necessary conditions
Note: By introducing a Lagrangian
T T
L x , ,  =f  x− gx− hx 
the first two of the necessary conditions

∇ f  x−⋅∇ gx−⋅∇ h x = 0
g x =0
hx ≥0
 ≥0
 i hi  x =0

follow from requiring that ∇ z L z with z={x , ,  } , but not the

rest.

Consequence: We can not hope to find simple Newton-based

methods like SQP to solve inequality-constrained problems.
416 Wolfgang Bangerth
First-order necessary conditions
Note: The necessary conditions
∇ f  x−⋅∇ gx−⋅∇ h x = 0
g x =0
hx ≥0
 ≥0
 i hi  x =0
imply that at x* there is a unique set of (active) Lagrange
multipliers so that
λ
( )
T
∇ f ( x)=A
[μ]active
where A is the matrix of gradients of active constraints. An
alternative way of saying this is
∇ f ( x) ∈ span (rows of ( A))
However, the opposite is not true: Multipliers must also satisfy
 i ≥0
417 Wolfgang Bangerth
First-order necessary conditions
A more refined analysis: Consider the constraints
h1  x=x2−ax1≥0, h2  x=x2 ax1≥0

Intuitively (consider the isocontours), the vertex point x* is optimal

if the direction of steepest ascent ∇ f  x is a member of the family
of red vectors above. That is, let F0 be the cone
n
F 0(x *)= {w∈ℝ : w=μ1 ∇ h1 (x *)+ μ2 ∇ h2 (x *), μ1≥0, μ2 ≥0}
Then x* is optimal if
∇ f  x * ∈ F 0  x *
418 Wolfgang Bangerth
First-order necessary conditions
A more refined analysis: Consider the constraints
h1  x=x2−ax1≥0, h2  x=x2 ax1≥0

Note: We can write things slightly different if we define

n T
F 1 x * = {w ∈ℝ : w a≥0 ∀ a∈F 0  x *}
i.e. the set of vectors that form angles less than 90 degrees with
all vectors in F0. This set can also be written as
n T T
F 1 x * = {w ∈ℝ : w ∇ h1 x *≥0, w ∇ h2 x *≥0}
419 Wolfgang Bangerth
First-order necessary conditions
A more refined analysis: If the problem also has equality
constraints
g x=0, h1 x≥0, h2 x ≥0

all of which are active at x*, then the cone F1 is

n T T T
F 1 x * = {w ∈ℝ : w ∇ g x *=0, w ∇ h1  x *≥0, w ∇ h2  x *≥0}

In general:

{ }
n T
F 1 x * = w∈ℝ : wT ∇ gi x *=0, i=1,... ,n e
w ∇ hi x *≥0, i=1,. ..,ni , constraint i is active at x *

Note: This is the cone of all feasible directions.

420 Wolfgang Bangerth

First-order necessary conditions
Theorem (a different version of the first order necessary
conditions): If x* is a local solution and if the LICQ hold at this
point, then

T
∇ f  x * w ≥ 0 ∀w∈F 1  x*

In other words: Whatever direction w in F1 we go into from x*, the

objective function to first order stays constant or increases.

Note: This is a necessary condition, but not sufficient. If f(x) stays

constant to first order it may still decrease in higher order Taylor
terms to make x* a local maximum or saddle point. But, if x* is a
solution, then the condition above has to be satisfied.
421 Wolfgang Bangerth
Second-order necessary conditions
Definition:
Let x* be a local solution of an inequality constrained problem
satisfying
∇ f x−⋅∇ gx−⋅∇ gx = 0
gi x = 0, i=1...n e
hi x ≥ 0, i=1...ni
i ≥ 0, i=1...ni
i hi x = 0, i=1... ni
We say that strict complementarity holds if for each inequality
constraint i exactly one of the following conditions is true:
●
 i =0
●
hi  x *=0
In other words, we require that the Lagrange multiplier is nonzero
422 for all active inequality constraints. Wolfgang Bangerth
Second-order necessary conditions
Definition:
Let x* be a local solution and assume that strict complementarity
holds. Then define as before

{ }
n T
F 1 x * = w∈ℝ : wT ∇ gi x *=0, i=1,... ,n e
w ∇ hi x *≥0, i=1,. ..,ni , constraint i is active at x *

and the subspace of all tangential directions as

{ }
n T
F 2 x * = w∈ℝ : wT ∇ gi x *=0, i=1,... ,ne
w ∇ hi  x*=0, i=1,..., ni , constraint i is active at x *

F 1 x * F 2 x *
423 Wolfgang Bangerth
Second-order necessary conditions
Note:
The subspace of all tangential directions

{ }
n T
F 2 x * = w∈ℝ : wT ∇ gi x *=0, i=1,... ,ne
w ∇ hi  x*=0, i=1,..., ni , constraint i is active at x *

can be trivial (i.e. contain only the zero vector) if n or more

constraints are active at x*.

Example:

Here, F1 is a nonempty set, but

F2 contains only the zero vector.

424 Wolfgang Bangerth

Second-order necessary conditions
Theorem (necessary conditions):
Let x* be a local solution that satisfies the first order necessary
conditions with unique Lagrange multipliers. Assume that strict
complementarity holds. Then

T 2
w ∇ x L x* , * , * w=
=w [ ∇ x f  x *− * ∇ x g x *− * ∇ x h x * ] w ≥ 0
T 2 T 2 2

∀ w∈F 2 x *

Note: This means that f(x) can not

“curve down” to second order along
tangential directions. The first order
Conditions imply that it doesn't “slope” F 2 x *
in these directions.
425 Wolfgang Bangerth
Second-order sufficient conditions
Theorem (sufficient conditions):
Let x* be a local solution that satisfies the first order necessary
conditions with unique Lagrange multipliers. Assume that strict
complementarity holds. Then

T 2
w ∇ x L x *,  *,  * w=
=w [ ∇ x f  x *− * ∇ x g x *− * ∇ x h x * ] w  0
T 2 T 2 2

∀ w∈F 2  x*, w≠0

Note: This means that f(x) actually

“curves up” in a neighborhood of x*,
at least in tangential directions!
F 2 x *
For all other directions, we know that f(x)
slopes up from the first order necessary conditions.
426 Wolfgang Bangerth
Second-order sufficient conditions
Remark:
If strict complementarity holds, then the definition

{ }
n T
F 2 x * = w∈ℝ : wT ∇ gi x *=0, i=1,... , ne
w ∇ hi  x*=0, i=1,..., ni , constraint i is active at x *
is equivalent to

F 2 x * = null A x *
with the matrix of gradients of active constraints A. If A does have
a null space, then the second order necessary and sufficient
conditions can also be written as
T 2
Z ∇ L x *, *, * Z is positive semidefinite
x
Z T ∇ L x *,  * , * Z is positive definite
2
x

respectively, where the columns of Z are a basis of the null space

of A.
427 Wolfgang Bangerth
Second-order necessary conditions
Definition (if strict complementarity does not hold):
Let x* be a local solution at which the KKT conditions with unique
Lagrange multiplier hold. Then define

{ }
n T
w∈ℝ : w ∇ gi  x*=0, i=1,. .., ne
F 2 x *,  * = T
w ∇ hi  x *=0, i=1,. .., ni , constraint i active and  i *0
wT ∇ hi  x *≥0, i=1,. .., ni , constraint i active and  i *=0

F 1 x * F 2 x *,  *

428 Wolfgang Bangerth

Second-order sufficient conditions
Theorem (sufficient conditions w/o strict complementarity):
Let x* be a local solution that satisfies the first order necessary
conditions with unique Lagrange multipliers. Assume that strict
complementarity does not hold. Then

T 2
w ∇ x L(x *, λ *,μ *)w=
=wT [ ∇ 2x f (x *)−λ *T ∇ 2x g (x *)−μ*∇ 2x h(x *) ] w > 0
∀ w∈F 2 (x *), w≠0

Note: This now means that f(x) actually

“curves up” in a neighborhood of x*,
at least in tangential directions plus all
those directions for which we can't infer F 2 x *
anything from the first order conditions!

429 Wolfgang Bangerth

Part 13

Active Set Methods for

Convex Quadratic Programs

1 T T
minimize f  x = x G xx d e
2
T
gi  x =ai x−bi = 0, i=1,. .., n e
T
hi x =i x− i ≥ 0, i=1,. .. , ni

430 Wolfgang Bangerth

General idea
Note:
Recall that if W* is the set of active (equality and inequality)
constraints at the solution x* then the solution of

1 T T
minimize f  x= x G xx de
2
T
gi x=ai x−bi = 0, i=1,... , ne
T
hi  x=i x− i ≥ 0, i=1,. .., ni

equals the solution of the following QP:

1 T T
minimize f x = x G xx de
2
T
gi x=ai x−bi = 0, i=1,... ,n e
T
hi  x=i x−i = 0, i=1,. .., ni ,i∈W *
431 Wolfgang Bangerth
General idea
Definition: Let

     
aT1 b1 aT1 b1
⋮ ⋮ ⋮ ⋮
T
an bn an
T bn
A= e B= e
A |W = e B |W = e

 T1 1 Tfirst inequality in W first inequality in W

⋮ ⋮ ⋮ ⋮
 Tn n Tlast inequality inW last inequality inW
i i

then the solution of the inequality-constrained QP equals the

solution of the following QP:

1 T T
minimize f x = x G xx de
2
A |W * x−B |W * = 0

432 Wolfgang Bangerth

General idea
Consequence: If we knew the active set W* at the solution, we
could just solve the linearly constrained QP

1 T T
minimize f x = x G xx de
2
A |W * x−B |W * = 0

and be done in one step.

Problem: Knowing the exact active set W* requires knowing the

solution x* because W* is the set of all equality constraints plus
those constraints for which
hi  x *=0
Solution: Solve a sequence of QPs using working sets Wk that we
iteratively refine until we have the exact active set W*.
433 Wolfgang Bangerth
The active set algorithm
Algorithm:
● Choose initial working set W0 and feasible point x0
●
For k=0, 1, 2, ....:
- Find search direction pk from xk to the solution xk+1 of the QP
1 T T
minimize f x = x G xx de
2
A |W x−B |W = 0
k k

- If pk=0 and all μi≥0 for constraints in Wk then stop

- Else if pk=0 but there are μi<0, then drop inequality with the
most negative μi from Wk to obtain Wk+1
- Else if xk+pk is feasible then set xk+1=xk+pk
- Otherwise, set xk+1=xk+αkpk with
{ }
T
 i− x k
i
k =min 1, mini∉W ,  T
p 0
k i k
Ti pk
and add the most blocking constraint to Wk+1
434 Wolfgang Bangerth
The active set algorithm
Example:

2 2
minimize f x = x1 −1  x2−2.5

 
1 −2 −2
−1 −2 −6
−1 2 x− −2 ≥0 h1 h2
1 0 0
0 1 0 h4 h3
h5

Choose as initial working set W0={3,5} and as starting point

x0=(2,0)T.
435 Wolfgang Bangerth
The active set algorithm
Example: Step 0

h1 h2

h4 h3
h5
W0={3,5}, x0=(2,0)T.
Then: p0=(0,0)T because no other point is feasible for W0
T

W0 0     
∇ f  x0 − | A |W = 2 −  3 −1 2 =0 implies
T
−5  5 0 1   
 3 −2
=
 5 −1
Consequently: W1={5}, x1=(2,0)T.
436 Wolfgang Bangerth
The active set algorithm
Example: Step 1

h1 h2

h4 h3
h5
W1={5}, x1=(2,0)T.
Then: p1=(-1,0)T leads to minimum along only active constraint.
There are no blocking constraints to get to the point xk+1=xk+pk

Consequently: W2={5}, x2=(1,0)T.

437 Wolfgang Bangerth
The active set algorithm
Example: Step 2

h1 h2

h4 h3
h5
W2={5}, x2=(1,0)T.
Then: p2=(0,0)T because we are at minimum of active constraints.

 
T
T 0 −  5   0 1  =0
∇ f x 2 − | A |W =
W2 2
−5
implies   5 =−5 
Consequently: W3={}, x3=(1,0)T.
438 Wolfgang Bangerth
The active set algorithm
Example: Step 3

h1 h2

h4 h3
h5
W3={}, x3=(1,0)T.
Then: p3=(0,2.5)T but this leads out of feasible region. The first
blocking constraint is inequality 1, and the maximal step length is
3 =0.6
Consequently: W4={1}, x4=(1,1.5)T.
439 Wolfgang Bangerth
The active set algorithm
Example: Step 4

h1 h2

h4 h3
h5
W4={1}, x4=(1,1.5)T.
Then: p4=(0.4,0.2)T is the minimizer along the sole constraint.
There are no blocking constraints to get there.

Consequently: W5={1}, x5=(1.4,1.7)T.

440 Wolfgang Bangerth
The active set algorithm
Example: Step 5

h1 h2

h4 h3
h5
W5={1}, x5=(1.4,1.7)T.
Then: p5=(0,0)T because we are already on the minimizer on the
constraint. Furthermore,

 
T
T
∇ f x 5 − | A |W = 0.8 −  1   1 −2  =0 implies   1 = 0.8  ≥0
W5 5
−1.6
Consequently: This is the solution.
441 Wolfgang Bangerth
The active set algorithm
Theorem:
If G is strictly positive definite (i.e. the objective function is strictly
convex), then Wk≠Wl for k ≠ l.
Consequently (because there are only finitely many possible
working sets), the active set algorithm terminates in a finite
number of steps.

Note:
In practice it may be that G is indefinite, and that for some
iterations the matrix ZkTGZk is indefinite as well. We know that at
the solution, Z*TGZ* is positive semidefinite, however. In that case,
we can't guarantee termination or convergence.

There are, however, Hessian modification techniques to deal with

this situation.
442 Wolfgang Bangerth
The active set algorithm
Remark:
In the active set method, we only change the working set Wk by at
most one element in each iteration.

One may be tempted to remove all constraints with negative

Lagrange multipliers at once, or add several constraints at the
same time when they become active.

However, we can then no longer guarantee that Wk≠Wl for k ≠ l

and cycling may happen, i.e. we cycle between the same points
and sets xk, Wk.

443 Wolfgang Bangerth

Active set SQP methods for general nonlinear problems

For equality constrained problems of the form

n
minimize f (x) f (x):ℝ →ℝ
n n
g(x) = 0, g(x):ℝ →ℝ e

we used the SQP method. It repeatedly solves linear-quadratic

problems of the form

x T x 1 xT 2 x
min x mk  p =Lx k , k ∇ x L xk , k p  pk ∇ x Lx k , k  pk
k k
2
g xk ∇ g xk T pkx = 0

Here, each subproblem (a single SQP step) could be solved in

one iteration by solving a saddle point linear system.

444 Wolfgang Bangerth

Part 14

Active Set SQP Methods

minimize f (x )
g i (x ) = 0, i=1,. .. , n e
h i (x ) ≥ 0, i=1,. .. , n i

445 Wolfgang Bangerth

Active set SQP methods for general nonlinear problems

For inequality constrained problems of the form

minimize f  x
gi x = 0, i=1,. .., ne
hi  x ≥ 0, i=1,..., ni
we repeatedly solve linear-quadratic problems of the form
x T 1 xT 2
x x
min x mk  p =Lx k ,k ∇ x L x k , k  p  pk ∇ x L x k , k  pk
k k
2
g xk ∇ g x kT pxk = 0
T x
h x k∇ h xk  pk ≥ 0
Each of these inequality constrained quadratic problems can be
solved using the active set method, and after we have the
exact solution of this approximate problem we can re-linearize
around this point for the next sub-problem.
446 Wolfgang Bangerth
Active set SQP methods for general nonlinear problems

Note: Each time we solve a problem like

x T x 1 xT 2 x
min x mk  p =Lx k ,k ∇ x L x k , k  p  pk ∇ x L x k , k  pk
k k
2
g xk ∇ g x kT pxk = 0
T x
h x k∇ h xk  pk ≥ 0

we have to do several active set iterations, though we can start

with the previous step's final working set and solution point.

Nevertheless, this is not going to be cheap, though it is

comparable to iterating over penalty/barrier parameters.

447 Wolfgang Bangerth

Parts 11-14

Summary of methods for

inequality-constrained problems

minimize f  x
g i  x  = 0, i=1,... , ne
hi  x ≥ 0, i=1,. .. , n i

448 Wolfgang Bangerth

Summary of methods
Two approaches to inequality-constrained problems:
●
Penalty/barrier methods:

Convert the constrained problem into an unconstrained

one that can be solved with known techniques.

Barrier methods ensure that intermediate iterates remain

feasible with respect to inequality constraints

●
Lagrange multiplier formulations lead to active set
methods

●
Both kinds of methods are expensive. Penalty/barrier
methods are simpler to implement but can only find
minima located at the boundary of the feasible set at the
price of dealing with ill-conditioned problems.
449 Wolfgang Bangerth
Part 15

Global optimization

minimize f  x
g i  x  = 0, i=1,... , ne
hi  x ≥ 0, i=1,. .. , n i

450 Wolfgang Bangerth

Motivation
What should we do when asked to find the (global) minimum
of functions like this:

1 2 2
f  x=  x1 x 2cos x 1cos x2 
20
451 Wolfgang Bangerth
A naïve sampling approach

Naïve approach: Sample at M-by-M points and choose the

one with the smallest value.

Alternatively: Start Newton's method at each of these points to

get higher accuracy.

Problem: If we have n variables, then we would have to start

at Mn points. This becomes prohibitive for large n!
452 Wolfgang Bangerth
Monte Carlo sampling

A better strategy (“Monte Carlo” sampling):

●
Start with a feasible point x0
●
For k=0,1,2,...:
- Choose a trial point xt

- If f  x t ≤f  xk  then xk 1 =x t [accept the sample]

- Else:
. draw a random number s in [0,1]
. if

then
exp − [
f  x −f x 
t
T
k
] ≥s

xk 1 =x t [accept the sample]

else
xk 1 =x k [reject the sample]
453 Wolfgang Bangerth
Monte Carlo sampling

Example: The first 200 sample points

454 Wolfgang Bangerth

Monte Carlo sampling

Example: The first 10,000 sample points

455 Wolfgang Bangerth

Monte Carlo sampling

Example: The first 100,000 sample points

456 Wolfgang Bangerth

Monte Carlo sampling

Example: Locations and values of the first 105 sample points

457 Wolfgang Bangerth

Monte Carlo sampling

Example: Values of the first 100,000 sample points

Note: The exact minimal value is -1.1032... . In the first

100,000 samples, we have 24 with values f(x)<-1.103.
458 Wolfgang Bangerth
Monte Carlo sampling

How to choose the constant T:

●
If T is chosen too small, then the condition

[
exp −
T ]
f  xt −f x k 
≥ s, s ∈U [0,1]

will lead to frequent rejections of sample points for which

f(x) increases.
Consequently, we will get stuck in local minima for long
periods of time before we accept a sequence of steps that
gets “us over the hump”.
●
On the other hand, if T is chosen too large, then we will
accept nearly every sample, irrespective of f(xt ).
Consequently, we will perform a random walk that is no
more efficient than uniform sampling.

459 Wolfgang Bangerth

Monte Carlo sampling

Example: First 100,000 samples, T=0.1

460 Wolfgang Bangerth

Monte Carlo sampling

Example: First 100,000 samples, T=1

461 Wolfgang Bangerth

Monte Carlo sampling

Example: First 100,000 samples, T=10

462 Wolfgang Bangerth

Monte Carlo sampling

Strategy: Choose T large enough that there is a reasonable

probability to get out of local minima; but small enough that this
doesn't happen too often.

1 2 2
Example: For f  x=  x1 x 2cos x 1cos x2 
20
the difference in function value between local minima and
saddle points is around 2. We want to choose T so that

[ ]
exp −
f
T
≥ s, s∈U [0,1]

is true maybe 10% of the time.

This is the case for T=0.87.

463 Wolfgang Bangerth
Monte Carlo sampling

How to choose the next sample xt:

● If xt is chosen independently of xk then we just sample the

entire domain, without exploring areas where f(x) is small.
Consequently, we should choose xt “close” to xk.

● If we choose xt too close to xk we will have a hard time

exploring a significant part of the feasible region.
● If we choose xt in an area around xk that is too large, then
we don't adequately explore areas where f(x) is small.
Common strategy: Choose
n
xt =x k y , y ∈N 0, I or U [−1,1] 
where σ is a fraction of the diameter of the domain or the
distance between local minima.
464 Wolfgang Bangerth
Monte Carlo sampling

Example: First 100,000 samples, T=1, σ=0.05

465 Wolfgang Bangerth

Monte Carlo sampling

Example: First 100,000 samples, T=1, σ=0.25

466 Wolfgang Bangerth

Monte Carlo sampling

Example: First 100,000 samples, T=1, σ=1

467 Wolfgang Bangerth

Monte Carlo sampling

Example: First 100,000 samples, T=1, σ=4

468 Wolfgang Bangerth

Monte Carlo sampling with constraints

Inequality constraints:
●
For simple inequality constraints, modify sample
generation strategy to never generate infeasible trial
samples
●
For complex inequality constraints, always reject samples
for which

hi  x t 0 for at least one i

469 Wolfgang Bangerth

Monte Carlo sampling with constraints

Inequality constraints:
●
For simple inequality constraints, modify the sample
generation strategy to never generate infeasible trial
samples
●
For complex inequality constraints, always reject samples:
- If Q xt ≤Q x k  then xk 1 =x t
- Else:
. draw a random number s in [0,1]
. if

then
exp −[Q xt −Q x k
T ] ≥s

xk 1 =x t
else
xk 1 =x k
where
Q x=∞ if at least one hi  x0, Q x =f  x otherwise
470 Wolfgang Bangerth
Monte Carlo sampling with constraints

Equality constraints:
●
Generate only samples that satisfy equality constraints

●
If we have only linear equality constraints of the form
g x= Ax−b=0
then one way to guarantee this is to generate samples
using
n−ne n−ne
xt =x k Z y , y∈ℝ , y=N 0, I or U [−1,1] 

where Z is the null space matrix of A, i.e. AZ=0.

471 Wolfgang Bangerth

Monte Carlo sampling

Theorem:
Let A be a subset of the feasible region. Under certain
conditions on the sample generation strategy, then as k ∞
we have
f (x)
−
number of samples x k∈ A ∝ ∫A e T
dx

That is: Every region A will be adequately sampled over time.

Areas around the global minimum will be better sampled than
other regions.

In particular, f (x)
1 1
( )
−
fraction of samples x k ∈A = ∫A e T
dx+ O
C √N
472 Wolfgang Bangerth
Monte Carlo sampling

Remark:
Monte Carlo sampling appears to be a strategy that bounces
around randomly, only taking into account the values (not the
derivatives) of f(x).

However, that is not so if sample generation strategy and T

are chosen carefully: Then we choose a new sample
moderately close to the previous one, and we always accept it
if f(x) is reduced, whereas we only sometimes accept it if f(x)
is increased by this step.

In other words: On average we still move in the direction of

steepest descent!

473 Wolfgang Bangerth

Monte Carlo sampling

Remark:
Monte Carlo sampling appears to be a strategy that bounces
around randomly, only taking into account the values (not the
derivatives) of f(x).

However, that is not so – because it compares function values.

That said: One can accelerate the Monte Carlo method by

choosing samples from a distribution that is biased towards
the negative gradient direction if the gradient is cheap to
compute.

Such methods are sometimes called Langevin samplers.

474 Wolfgang Bangerth

Simulated Annealing

Motivation:
Particles in a gas, or atoms in a crystal have an energy that is
on average in equilibrium with the rest of the system. At any
given time, however, its energy may be higher or lower.

In particular, the probability that its energy is E is

E
−
kB T
P E ∝ e
Where kB is the Boltzmann constant. Likewise, probability that
a particle can overcome an energy barrier of height ΔE is

{ }= { }
− E 1 if  E≤0
kB T
P E  E E ∝ min 1, e −E
k BT
e if  E0
This is exactly the Monte Carlo transition probability if we
identify
E = f kB
475 Wolfgang Bangerth
Simulated Annealing

Motivation:
In other words, Monte Carlo sampling is analogous to
watching particles bounce around in a potential f(x) when
driven by a gas at constant temperature.

On the other hand, we know that if we slowly reduce the

temperature of a system, it will end up in the ground state with
very high probability. For example, slowly reducing the
temperature of a melt results in a perfect crystal. (On the other
hand, reducing the temperature too quickly results in a glass.)

The Simulated Annealing algorithm uses this analogy by using

the modified transition probability

[
exp −
Tk ]
f  xt −f x k 
≥ s, s∈U [0,1], T k  0 as k  ∞

476 Wolfgang Bangerth

Simulated Annealing

Example: First 100,000 samples, σ=0.25

1
T=1 T k= −4
110 k

477 Wolfgang Bangerth

Simulated Annealing

Example: First 100,000 samples, σ=0.25

1
T=1 T k= −4
110 k

24 samples with f(x)<-1.103 192 samples with f(x)<-1.103

478 Wolfgang Bangerth
Simulated Annealing
2 1 2
Convergence: First 1,500 samples, f x = ∑i=1 x i cosx i
20

1
T=1 T k=
10.005 k

(Green line indicates the lowest function value found so far)

479 Wolfgang Bangerth
Simulated Annealing
10 1 2
Convergence: First 10,000 samples, f x=∑i=1 x i cos xi 
20

1
T=1 T k=
10.0005k

(Green line indicates the lowest function value found so far)

480 Wolfgang Bangerth
Simulated Annealing

Discussion:
Simulated Annealing is often more efficient in finding global
minima because it initially explores the energy landscape at
large, and later on explores the areas of low energy in greater
detail.

On the other hand, there is now another knob to play with

(namely how we reduce the temperature):
●
If the temperature is reduced too fast, we may get stuck in
local minima (the “glass” state)
●
If the temperature is not reduced fast enough, the
algorithm is no better than Monte Carlo sampling and may
require many many samples.

481 Wolfgang Bangerth

Very Fast Simulated Annealing (VFSA)

A further refinement:
In Very Fast Simulated Annealing we not only reduce
temperature over time, but also reduce the search radius of
our sample generation strategy, i.e. we compute
n
xt =x k  k y , y∈N 0, I or U [−1,1] 
and let
k  0
Like reducing the temperature, this ensures that we sample
the vicinity of minima better and better over time.

Remark: To guarantee that the algorithm can reach any point

in the search domain, we need to choose  k so that
∞
∑k=0 k =∞
482 Wolfgang Bangerth
Genetic Algorithms (GA)

An entirely different idea:

Choose a set (“population”) of N points (“individuals”)
P0={x1,...xN}
For k=0,1,2,... (“generations”):
● Copy those N <N individuals in P with the smallest f(x) (i.e.
f k
the “fittest individuals”) into Pk+1
● While #Pk+1<N:
- select two individuals (“parents”) xa,xb from
among the first Nf individuals in Pk+1 with probabilities
−f x i /T
proportional to e
- create a new point xnew from xa,xb (“mating”)
- perform some random changes on xnew (“mutation”)
- add it to Pk+1

483 Wolfgang Bangerth

Genetic Algorithms (GA)

Example: Populations at k=0,1,2,5,10,20, N=500, Ns=2/3 N

484 Wolfgang Bangerth

Genetic Algorithms (GA)

Convergence: Values of the N samples for all generations k

21 10 1 2
f x=∑i=1 x 2i cos x i  f x=∑i=1 x i cos xi 
20 20

485 Wolfgang Bangerth

Genetic Algorithms (GA)

Mating:
●
Mating is meant to produce new individuals that share the
traits of the two parents
●
If the variable x encodes real values, then mating could just
take the mean value of the parents:
x ax b
x new=
2
●
For more general properties (paths through cities, which of M
objects to put where in a suitcase, …) we have to encode x in
a binary string. Mating may then select bits (or bit sequences)
randomly from each of the parents

●
There is a huge variety of encoding and selection strategies
in the literature.

486 Wolfgang Bangerth

Genetic Algorithms (GA)

Mutation:
●
Mutations are meant to introduce an element of randomness
into the process, to explore search directions that aren't
represented yet in the population
●
If the variable x represents real values, we can just add a
small random value to x to simulate mutations
x ax b n
x new=  y , y ∈ℝ , y=N 0, I 
2
●
For more general properties, mutations can be introduced by
randomly flipping individual bits or bit sequences in the
encoded properties

●
There is a huge variety of mutation strategies in the literature.

487 Wolfgang Bangerth

Part 15

Summary of
global optimization methods

minimize f  x
g i  x  = 0, i=1,... , ne
hi  x ≥ 0, i=1,. .. , n i

488 Wolfgang Bangerth

Summary of methods
●
Global optimization problems with many minima are
difficult because of the curse of dimensionality: the
number of places where a minimum could be becomes
very large if the number of dimensions becomes large
●
There is a large zoo of methods for these kinds of
problems
●
Most algorithms are stochastic to sample feasible region
●
Algorithms also work for non-smooth problems
●
Most methods are not very effective (if one counts number
of function evaluations) in return for the ability to get out of
local minima

●
Global optimization algorithms should never be used
whenever we know that the problem has only a small
number of minima and/or is smooth and convex
489 Wolfgang Bangerth

Linear and Integer Optimization - Theory and Practice, 3rd Ed, 2015
100% (2)
Linear and Integer Optimization - Theory and Practice, 3rd Ed, 2015
676 pages
Optimal Control Notes
No ratings yet
Optimal Control Notes
208 pages
664 Optimal Control
No ratings yet
664 Optimal Control
184 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
An Optimization Primer
50% (2)
An Optimization Primer
149 pages
ConvexSpring25_Week_1_2
No ratings yet
ConvexSpring25_Week_1_2
46 pages
Classification of Optimization methods
No ratings yet
Classification of Optimization methods
68 pages
optimization
No ratings yet
optimization
16 pages
Main
No ratings yet
Main
57 pages
Computational OPT Book 2023 Chapter 01
No ratings yet
Computational OPT Book 2023 Chapter 01
18 pages
Matinf 2360 Part 3
No ratings yet
Matinf 2360 Part 3
106 pages
Cours D'optimisation
No ratings yet
Cours D'optimisation
159 pages
Optimization (SF1811 SF1831 SF1841)
No ratings yet
Optimization (SF1811 SF1831 SF1841)
198 pages
NEOM UNIT-1 Sept-23
No ratings yet
NEOM UNIT-1 Sept-23
34 pages
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
No ratings yet
Chapter 0: Introduction: 0.2.1 Examples in Machine Learning
4 pages
Optimization PDF
No ratings yet
Optimization PDF
59 pages
Previewpdf
No ratings yet
Previewpdf
86 pages
eecs127_reader
No ratings yet
eecs127_reader
199 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Linnear Nonlineae Numerical Method
No ratings yet
Linnear Nonlineae Numerical Method
43 pages
CH 2
No ratings yet
CH 2
31 pages
Introductory Lectures On Convex Optimization-Yurii Nesterov, 1998
No ratings yet
Introductory Lectures On Convex Optimization-Yurii Nesterov, 1998
212 pages
Chapter One: 1.1 Optimal Control Problem
No ratings yet
Chapter One: 1.1 Optimal Control Problem
25 pages
Optimización Lineal
No ratings yet
Optimización Lineal
304 pages
A First Course in Linear Optimi - Wei Zhi
No ratings yet
A First Course in Linear Optimi - Wei Zhi
306 pages
Optimisation and Optimal Control
No ratings yet
Optimisation and Optimal Control
82 pages
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 1: Introduction To Optimization Problems and Mathematical Programming
No ratings yet
CSC2411 - Linear Programming and Combinatorial Optimization Lecture 1: Introduction To Optimization Problems and Mathematical Programming
9 pages
OPTCON Optimization 2023 10 11
No ratings yet
OPTCON Optimization 2023 10 11
71 pages
MOSEKModelingCookbook Letter
No ratings yet
MOSEKModelingCookbook Letter
131 pages
15 Optimization Script
No ratings yet
15 Optimization Script
62 pages
Calculus of Variations
No ratings yet
Calculus of Variations
167 pages
CSE488_Lab6_Optimization
No ratings yet
CSE488_Lab6_Optimization
20 pages
MOSEKModeling Cookbook
No ratings yet
MOSEKModeling Cookbook
127 pages
NLP Slides
No ratings yet
NLP Slides
201 pages
Nonlinear Optimization CO 367
No ratings yet
Nonlinear Optimization CO 367
105 pages
MCQs_on_Calculus
No ratings yet
MCQs_on_Calculus
11 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
Constrained Op Tim Ization
No ratings yet
Constrained Op Tim Ization
76 pages
Bms Basic NLP 120609
No ratings yet
Bms Basic NLP 120609
103 pages
NLO Notes
No ratings yet
NLO Notes
75 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
Optimizatio With Matlab
No ratings yet
Optimizatio With Matlab
49 pages
Linear Programming MSM 2da (G05a, M09a) : Matthias Gerdts
No ratings yet
Linear Programming MSM 2da (G05a, M09a) : Matthias Gerdts
85 pages
Convex Cardinality Optimization
No ratings yet
Convex Cardinality Optimization
26 pages
Numerical Opticon Control
No ratings yet
Numerical Opticon Control
122 pages
Where can buy Linear and Convex Optimization A Mathematical Approach 1st Edition Michael H. Veatch ebook with cheap price
100% (1)
Where can buy Linear and Convex Optimization A Mathematical Approach 1st Edition Michael H. Veatch ebook with cheap price
40 pages
Full Download (Ebook) Linear and Convex Optimization: A Mathematical Approach by Michael H. Veatch ISBN 9781119664048, 1119664047 PDF DOCX
100% (10)
Full Download (Ebook) Linear and Convex Optimization: A Mathematical Approach by Michael H. Veatch ISBN 9781119664048, 1119664047 PDF DOCX
81 pages
Numopt 0
No ratings yet
Numopt 0
163 pages
Chap - 1 - Static Optimization - 1.1 - 2014
No ratings yet
Chap - 1 - Static Optimization - 1.1 - 2014
57 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
100% (1)
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
453 pages
Get An Introduction to Nonlinear Optimization Theory Durea free all chapters
100% (5)
Get An Introduction to Nonlinear Optimization Theory Durea free all chapters
85 pages
Convex It y 2015
0% (1)
Convex It y 2015
437 pages
Optimization_with_partial_differential_equations
No ratings yet
Optimization_with_partial_differential_equations
8 pages
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Horn Clause: Fundamentals and Applications
From Everand
Horn Clause: Fundamentals and Applications
Fouad Sabry
No ratings yet
BREAKS
No ratings yet
BREAKS
5 pages
Unit 3 Contextualising Teaching Practice As A Component of Teacher
No ratings yet
Unit 3 Contextualising Teaching Practice As A Component of Teacher
13 pages
Unit 4 Teacher Professional Knowledge Teaching in The 21st Century
No ratings yet
Unit 4 Teacher Professional Knowledge Teaching in The 21st Century
16 pages
6.1 - Output - Advantages, Disadvantages
No ratings yet
6.1 - Output - Advantages, Disadvantages
13 pages
UNIT 1 Lesson Observation 2019
No ratings yet
UNIT 1 Lesson Observation 2019
15 pages
Information Processing Cycle
No ratings yet
Information Processing Cycle
9 pages
Count Down by Steve Olson - Discussion Questions
No ratings yet
Count Down by Steve Olson - Discussion Questions
3 pages
Wbjee - 2023 Math
No ratings yet
Wbjee - 2023 Math
33 pages
Riemann Integral
No ratings yet
Riemann Integral
10 pages
Nonstationary Irreversible Thermodynamics - Israel
No ratings yet
Nonstationary Irreversible Thermodynamics - Israel
22 pages
M53 Lec1.1 Limits-OneSided1
No ratings yet
M53 Lec1.1 Limits-OneSided1
289 pages
Quantum Comp Lectures Caltech PDF
No ratings yet
Quantum Comp Lectures Caltech PDF
620 pages
Digital Sat k12 Student Weekend 139665361
No ratings yet
Digital Sat k12 Student Weekend 139665361
1 page
Frieze Patterns
No ratings yet
Frieze Patterns
9 pages
Ap Calculus Ab and BC Course and Exam Description PDF
50% (2)
Ap Calculus Ab and BC Course and Exam Description PDF
103 pages
Chapter 4 Numerical Differentiation and Integration
No ratings yet
Chapter 4 Numerical Differentiation and Integration
3 pages
Problem Set 5
No ratings yet
Problem Set 5
4 pages
Lesson 2
No ratings yet
Lesson 2
7 pages
2019 Fourier Series
100% (1)
2019 Fourier Series
24 pages
Definition Place Value
No ratings yet
Definition Place Value
8 pages
Grade 5 PPT - Math - Q1 - Lesson 11
100% (1)
Grade 5 PPT - Math - Q1 - Lesson 11
52 pages
IMO Syllabus
100% (1)
IMO Syllabus
1 page
E Hep 000698
No ratings yet
E Hep 000698
10 pages
Mathematics 5: Decimals Matter!
No ratings yet
Mathematics 5: Decimals Matter!
12 pages
GEC 3 Week 1
No ratings yet
GEC 3 Week 1
4 pages
InTech-Hilbert Transform and Applications
No ratings yet
InTech-Hilbert Transform and Applications
11 pages
Flex Theory
No ratings yet
Flex Theory
27 pages
MS Kelantan P2 2021
No ratings yet
MS Kelantan P2 2021
17 pages
Ee8591 Digital Signal Processing Part B & Part C Questions: Anna University Exams Regulation 2017
No ratings yet
Ee8591 Digital Signal Processing Part B & Part C Questions: Anna University Exams Regulation 2017
2 pages
Bài tập dai - so-tuyen - tinh
No ratings yet
Bài tập dai - so-tuyen - tinh
7 pages
Counting 0-100 Whole Numbers
No ratings yet
Counting 0-100 Whole Numbers
7 pages
851HW13 09solutions
No ratings yet
851HW13 09solutions
13 pages
Grade Level 9 - Summative Tests
No ratings yet
Grade Level 9 - Summative Tests
6 pages
Some Results on the Growth of Entire Functions on the Basis of Central Index
No ratings yet
Some Results on the Growth of Entire Functions on the Basis of Central Index
13 pages
Harry Lorayne - Mathematical Wizardry
No ratings yet
Harry Lorayne - Mathematical Wizardry
222 pages
Final SSCE1693 20162017
No ratings yet
Final SSCE1693 20162017
8 pages