Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Branch & Bound For NLP
Applied Numerical Optimization: Prof. Alexander Mitsos, Ph.D. Branch & Bound For NLP
𝑋
1. Construct a relaxation
LBD
1. Construct a relaxation
2. Solve relaxation LBD
UBD
LBD
1. Construct a relaxation
2. Solve relaxation LBD
3. Solve original locally UBD
branch
UBD
LBD
1. Construct a relaxation
2. Solve relaxation LBD
3. Solve original locally UBD
4. Branch to nodes (a) and (b)
branch branch
UBD UBD
LBD LBD
(a) (b)
1. Construct a relaxation 𝑥𝑎 𝑥𝑏
2. Solve relaxation LBD
3. Solve original locally UBD
4. Branch to nodes (a) and (b)
5. Repeat steps for each node
7 of 13 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
B&B Illustration for Box-Constrained NLPs (1)
branch branch
UBD UBD
LBDa
LBDb
LBD (a) (b)
1. Construct a relaxation 𝑥𝑎 𝑥𝑏
2. Solve relaxation LBD
3. Solve original locally UBD
4. Branch to nodes (a) and (b)
5. Repeat steps for each node
8 of 13 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
B&B Illustration for Box-Constrained NLPs (1)
branch branch
UBD
LBDa
UBD
LBDb
LBD
1. Construct a relaxation 𝑥𝑎 𝑥𝑏
2. Solve relaxation LBD
3. Solve original locally UBD
4. Branch to nodes (a) and (b)
5. Repeat steps for each node
9 of 13 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
B&B Illustration for Box-Constrained NLPs (1)
branch branch
UBD
UBD
LBDb
LBD
1. Construct a relaxation 𝑥𝑎 𝑥𝑏
2. Solve relaxation LBD
3. Solve original locally UBD 6. Fathom by value dominance
4. Branch to nodes (a) and (b)
5. Repeat steps for each node
10 of 13 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
B&B Illustration for Box-Constrained NLPs (1)
branch branch
UBD
UBD
LBDb
LBD
1. Construct a relaxation 𝑥𝑏
2. Solve relaxation LBD
3. Solve original locally UBD 6. Fathom by value dominance
4. Branch to nodes (a) and (b)
5. Repeat steps for each node Range reduction of variables
11 of 13 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
B&B Illustration for Box-Constrained NLPs (1)
branch branch
UBD
UBD
LBDb
LBD (a) (b)
1. Construct a relaxation 𝑥𝑎 𝑥𝑏
2. Solve relaxation LBD
How to get lower bounds?
3. Solve original locally UBD
4. Branch to nodes (a) and (b) How to get upper bounds?
5. Repeat steps 1-4 (Range reduction of the variable bounds?)
12 of 13 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Check Yourself
𝑥 𝐿 = −1
𝑥𝑈 = 3
• Many variants
piecewise
first decompose function 𝑥 𝐿 = −1
exponential function 𝑓 𝑥 − σ𝑖(1 − exp(𝛾𝑖 𝑥𝑖 − 𝑥𝑖𝐿 )(1 − exp(𝛾𝑖 (𝑥𝑖𝑈 − 𝑥𝑖 ))
𝑥𝑈 = 3
𝑥 𝐿 = −1 𝑥𝐿 = 1
𝑥 𝑈 =1 𝑥𝑈 = 3
min 𝑧1 + 𝑧2
𝑥,𝑧1 ,𝑧2
cc(−𝑥 3 )
s. t. 𝐥𝐢𝐧(cv exp 𝑥 ) ≤ 𝑧1 ≤ 𝐥𝐢𝐧(cc(exp 𝑥 )) −𝑥 3
cv(−𝑥 3 )
𝐥𝐢𝐧 cv −𝑥 3 ≤ 𝑧2 ≤ 𝐥𝐢𝐧(cc(−𝑥 3 ))
𝑥 ∈ [−1,1.5]
• Describe methods to obtain underestimating functions. What are the underlying assumptions?
Cluster effect: Du & Kerfott JOGO (1994), Wechsung, Schaber & Barton JOGO (2014)
2 of 4 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D. Convergence rate: Bompadre & Mitsos JOGO (2012), Najman & Mitsos JOGO (2016)
Convergence Rate of Relaxations: Properties (1)
• What does convergence of relaxations mean? How do we measure the convergence? What convergence
properties are established for standard relaxations?
• NLP
𝑛+1
min 𝑇𝑖𝑚 − 𝑇𝑖 2
dim 𝑘 + dim(𝐓)
𝑘,𝑇𝑖 𝑖=0
𝑇𝑖−1 − 2𝑇𝑖 + 𝑇𝑖+1 𝑞𝑖0 + 𝑞𝑖1 𝑇𝑖 1 = dim 𝑘 ≪ dim 𝐓 = 99
s. t. =− , 𝑖 ∈ 1, … , 𝑛 + 1
Δ𝑥 2 𝑘
𝑇0 = 500, 𝑇𝑛+1 = 600, 𝑘 ∈ 0.1,10 , 𝑇𝑖 ∈ [0,2000]
[7] Epperly & Pistikopoulos, JOGO, 11(3), 287-311 (1997) [8] Byrne & Bogle, Ind. Eng. Chem. Res, 39(11), 4296-4301 (2000) [9] Mitsos, Chachuat & Barton, SIOPT, 20(2), 573-601 (2009)
[10] Bongartz, & Mitsos, JOGO, 69(4), 761-796 (2017) [11] Bongartz and Mitsos, JOGO, 69(4), 761-796 (2018)
dim 𝑘 + dim(𝐓)
• NLP 1 = dim 𝑘 ≪ dim 𝐓 = 99
𝑛+1 1 500
min 𝑇𝑖𝑚 − 𝑓𝑖 (𝑘) 2 𝑞21 2
𝑘 𝑖=0 1 −2 − Δ𝑥 1dim 𝑘 + dim(𝐓) 𝑇0 𝑞0 2
− Δ𝑥
𝑘 1 𝑇1
= dim 𝑘 ≪ dim 𝐓 = 99 𝑘
s. t. 𝑇0 = 500, 𝑇𝑛+1 = 600, 𝑘 ∈ 0.1,10 ⋯ ⋯ ⋯ ⋮ = ⋮
𝑇𝑛 0
𝑞𝑛1 𝑞
1 −2 − Δ𝑥 2 1 𝑇𝑛+1 − Δ𝑥 2
𝑘 𝑘
1 600
[7] Epperly & Pistikopoulos, JOGO, 11(3), 287-311 (1997) [8] Byrne & Bogle, Ind. Eng. Chem. Res, 39(11), 4296-4301 (2000) [9] Mitsos, Chachuat & Barton, SIOPT, 20(2), 573-601 (2009)
[10] Bongartz, & Mitsos, JOGO, 69(4), 761-796 (2017) [11] Bongartz and Mitsos, JOGO, 69(4), 761-796 (2018)
[1] Chachuat et al., IFAC-PapersOnLine 48(8) (2015) [2] Bendtsen & Stauning, v2.1 (2012) Model [5] Wächter & Biegler, Mathematical Programming 106(1) (2006)
[3] COIN-OR CLP v1.17 (2019) [4] IBM ILOG CPLEX v12.8, (2017) [6] Johnson, The NLopt nonlinear-optimization package [7] Artelys Knitro v11.1.0, (2018)
• What is the benefit of using the reduced space instead of full space?
• Typical formulation
min 𝑓(𝒙)
𝒙∈Ω
Ω = 𝒙 ∈ 𝑅𝑛 |𝑐𝑖 (𝒙) ≤ 0, 𝑖 ∈ 𝐼, 𝒙𝐿 ≤ 𝒙 ≤ 𝒙𝑈
• Basic idea
input
algorithmic 𝒙: 𝒙𝐿 ≤ 𝒙 ≤ 𝒙𝑈
parameters
evaluation of
optimization
objective function
initialization algorithm
(black box)
output
𝑓 𝒙 , 𝒄 𝒙 ≤ 𝟎?
optimal solution
2 of 8 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Stochastic Global Optimization
• Advantages: robust, no derivatives required, easy to implement and parallelize, parallelization efficient
• Drawbacks: slower than gradient-based local methods, no rigorous termination criteria, no guarantee to finitely
find global optimum/feasible point, no certificate of optimality
• Important consequences
Comparisons are difficult
If possible tune your algorithm to your problems.
If you have no knowledge about your problem, try many algorithms
4 of 8 Applied Numerical Optimization Wolpert, David H., and William G. Macready. "No free lunch theorems for
Prof. Alexander Mitsos, Ph.D.
optimization." IEEE transactions on evolutionary computation 1.1 (1997): 67-82.
Random Search
• Random search:
starting from initial point 𝒙(0)
randomly choose new iterate 𝒙(𝑘+1)
compare 𝑓(𝒙 𝑘+1 ) and best value found 𝑓 ∗ , and update if applicable
• # initial guesses?
Theory: we would like to cover space, but this scales
exponentially with number of variables
In practice: determined by how long you are willing
to wait
• Parallelize
No communication between instances required → submit as separate processes
Instances may take long without progress → limit CPU time for each
• Describe multistart.
• What is the basic idea of stochastic global algorithms? What are their properties, advantages and
disadvantages?
• Accept survivors based on merit function and distance from 0.7 1 4.8 0.6
previous members ...
Merit function: tradeoff of objective and constraints
• Generate new members by mutation: perturb entries randomly 1.0 1 3.2 1.1 0.9 1 2.8 1.8
Move around in the place, ensures local optimization
• Hybridize with deterministic local solver: run local solver for promising points
• What is the basic idea of stochastic global algorithms? What are their properties, advantages and
disadvantages?
• Typical formulation
min 𝑓(𝒙)
𝒙∈Ω
Ω = 𝒙 ∈ 𝑅𝑛 |𝑐𝑖 (𝒙) ≤ 0, 𝑖 ∈ 𝐼, 𝒙𝐿 ≤ 𝒙 ≤ 𝒙𝑈
• Basic idea
input
algorithmic 𝒙: 𝒙𝐿 ≤ 𝒙 ≤ 𝒙𝑈
parameters
evaluation of
optimization
objective function
initialization algorithm
(black box)
output
𝑓 𝒙 , 𝒄 𝒙 ≤ 𝟎?
optimal solution
2 of 6 Applied Numerical Optimization
Prof. Alexander Mitsos, Ph.D.
Derivative-free Optimization
• Gradient evaluation by finite difference prone to errors due to inaccurate function evaluation
𝒙(𝑘)
13
9
11.5
𝛿
𝑘+2 ,1
𝒙
Basic algorithm:
• choose sequentially component 𝑖 of 𝒙(𝑘) and descend in this direction
• after 𝑛 steps start from the beginning or reverse the sequence