0% found this document useful (0 votes)
15 views84 pages

Optimumengineeringdesign Day5

The document discusses optimization methods for engineering design. It covers numerical optimization techniques like iterative methods and line searches. Specific iterative methods covered include the steepest descent method and conjugate gradient method. It also discusses using interval reduction and approximate search methods for solving the line search problem, including the golden section and quadratic curve fitting methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views84 pages

Optimumengineeringdesign Day5

The document discusses optimization methods for engineering design. It covers numerical optimization techniques like iterative methods and line searches. Specific iterative methods covered include the steepest descent method and conjugate gradient method. It also discusses using interval reduction and approximate search methods for solving the line search problem, including the golden section and quadratic curve fitting methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Optimization Methods

in Engineering Design
Day-5
Course Materials
• Arora, Introduction to Optimum Design, 3e, Elsevier,
(https://www.researchgate.net/publication/273120102_Introductio
n_to_Optimum_design)
• Parkinson, Optimization Methods for Engineering Design, Brigham
Young University
(http://apmonitor.com/me575/index.php/Main/BookChapters)
• Iqbal, Fundamental Engineering Optimization Methods, BookBoon
(https://bookboon.com/en/fundamental-engineering-optimization-
methods-ebook)
Numerical Optimization
• Consider an unconstrained NP problem: min 𝑓 𝒙
𝒙
• Use an iterative method to solve the problem: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘 ,
where 𝒅𝑘 is a search direction and 𝛼𝑘 is the step size, such that the
function value decreases at each step, i.e., 𝑓 𝒙𝑘+1 < 𝑓 𝒙𝑘
• We expect lim 𝒙𝑘 = 𝒙∗
𝑘→∞
• The general iterative method is a two-step process:
– Finding a suitable search direction 𝒅𝑘 along which the function
value locally decreases and any constraints are obeyed.
– Performing line search along 𝒅𝑘 to find 𝒙𝑘+1 such that 𝑓 𝒙𝑘+1
attains its minimum value.
The Iterative Method
• Iterative algorithm:
1. Initialize: chose 𝒙0
2. Check termination: 𝛻𝑓 𝒙𝑘 ≅ 0
3. Find a suitable search direction 𝒅𝑘 ,
that obeys the descent condition:
𝑇
𝛻𝑓 𝒙𝑘 𝒅𝑘 < 0
4. Search along 𝒅𝑘 to find where
𝑓 𝒙𝑘+1 attains minimum value
(line search problem)
5. Return to step 2
The Line Search Problem
• Assuming a suitable search direction 𝒅𝑘 has been determined, we
seek to determine a step length 𝛼𝑘 , that minimizes 𝑓 𝒙𝑘+1 .
• Assuming 𝒙𝑘 and 𝒅𝑘 are known, the projected function value along
𝒅𝑘 is expressed as:
𝑓 𝒙𝑘 + 𝛼𝑘 𝒅𝑘 = 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 = 𝑓(𝛼)
• The line search problem to choose 𝛼 to minimize 𝑓 𝒙𝑘+1 along 𝒅𝑘
is defined as:
min 𝑓(𝛼) = 𝑓 𝒙𝑘 + α𝒅𝑘
𝛼
• Assuming that a solution exists, it is found by setting 𝑓′ 𝛼 = 0.
Example: Quadratic Function
• Consider minimizing a quadratic function:
𝑓 𝒙 = 12 𝒙𝑇 𝑨𝒙 − 𝒃𝑇 𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃
• Given a descent direction 𝒅, the line search problem is defined as:
𝑇
min 𝑓(𝛼) = 𝒙𝑘 + 𝛼𝒅 𝑨 𝒙𝑘 + 𝛼𝒅 − 𝒃𝑇 𝒙𝑘 + 𝛼𝒅
𝛼
• A solution is found by setting 𝑓 ′ 𝛼 = 0, where
𝑓 ′ 𝛼 = 𝒅𝑇 𝑨 𝒙𝑘 + 𝛼𝒅 − 𝒅𝑇 𝒃 = 0
𝒅𝑇 𝑨𝒙𝑘 − 𝒃 𝛻𝑓(𝒙𝑘 )𝑇 𝒅
𝛼=− =−
𝒅𝑇 𝑨𝒅 𝒅𝑇 𝑨𝒅
• Finally, 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝒅.
Computer Methods for Line Search Problem
• Interval reduction methods
– Golden search
– Fibonacci search
• Approximate search methods
– Arjimo’s rule
– Quadrature curve fitting
Interval Reduction Methods
• The interval reduction methods find the minimum of a unimodal
function in two steps:
– Bracketing the minimum to an interval
– Reducing the interval to desired accuracy
• The bracketing step aims to find a three-point pattern, such that for
𝑥1 , 𝑥2 , 𝑥3 , 𝑓 𝑥1 ≥ 𝑓 𝑥2 < 𝑓 𝑥3 .
Fibonacci’s Method
• The Fibonacci’s method uses Fibonacci numbers to achieve
maximum interval reduction in a given number of steps.
• The Fibonacci number sequence is generated as:
𝐹0 = 𝐹1 = 1, 𝐹𝑖 = 𝐹𝑖−1 + 𝐹𝑖−2 , 𝑖 ≥ 2.
• The properties of Fibonacci numbers include:
𝐹𝑛−1 5−1
– They achieve the golden ratio 𝜏 = lim = ≅ 0.618034
𝑛→∞ 𝐹𝑛 2
– The number of interval reductions 𝑛 required to achieve a desired
accuracy 𝜀 (where 1/𝐹𝑛 < 𝜀) is specified in advance.
𝐹𝑛−1
– For given 𝐼1 and 𝑛, 𝐼2 = 𝐼 ,𝐼 = 𝐼1 − 𝐼2 , 𝐼4 = 𝐼2 − 𝐼3 , etc.
𝐹𝑛 1 3
The Golden Section Method
• The golden section method uses the golden ratio: 𝜏 = 0.618034.
• The golden section algorithm is given as:
𝜀
1. Initialize: specify 𝑥1 , 𝑥4 𝐼1 = 𝑥4 − 𝑥1 , 𝜀, 𝑛: 𝜏 𝑛 < 𝐼
1

2. Compute 𝑥2 = 𝜏𝑥1 + 1 − 𝜏 𝑥4 , evaluate 𝑓2


3. For 𝑖 = 1, … , 𝑛 − 1
Compute 𝑥3 = 1 − 𝜏 𝑥1 + 𝜏𝑥4 , evaluate 𝑓3 ; if 𝑓2 < 𝑓3 , set
𝑥4 ← 𝑥1 , 𝑥1 ← 𝑥3 ; else set 𝑥1 ← 𝑥2 , 𝑥2 ← 𝑥3 , 𝑓2 ← 𝑓3
Approximate Search Methods
• Consider the line search problem: min 𝑓(𝛼) = 𝑓 𝒙𝑘 + α𝒅𝑘
𝛼
• Sufficient Descent Condition. The sufficient descent condition guards
against 𝒅𝑘 becoming too close to 𝛻𝑓 𝒙𝑘 . The condition is stated as:
𝑇 𝑘 2
𝛻𝑓 𝒙𝑘 𝒅 < −𝑐 𝛻𝑓 𝒙𝑘 , 𝑐>0
• Sufficient Decrease Condition. The sufficient decrease condition ensures
a nontrivial reduction in the function value. The condition is stated as:
𝑇 𝑘
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 − 𝑓 𝒙𝑘 ≤ 𝜇 𝛼 𝛻𝑓 𝒙𝑘 𝒅 , 0< 𝜇<1
• Curvature Condition. The curvature condition guards against 𝛼 becoming
too small. The condition is stated as:
𝑇 𝑘
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 𝒅 ≥ 𝑓 𝒙𝑘 + 𝜂 𝛻𝑓 𝒙𝑘 𝑇 𝑘
𝒅 , 0<𝜇<𝜂<1
Approximate Line Search
• Strong Wolfe Conditions. The strong Wolfe conditions commonly
used by all line search algorithms include:
1. The sufficient decrease condition (Arjimo’s rule):
𝑓 𝛼 ≤ 𝑓 0 + 𝜇𝛼𝑓 ′ (0), 0 < 𝜇 < 1
2. Strong curvature condition:
𝑓′ 𝛼 ≤ 𝜂 𝑓′ 0 , 0 < 𝜇 ≤ 𝜂 < 1
Approximate Line Search
• The approximate line search includes two steps:
– Bracketing the minimum
– Estimating the minimum
• Bracketing the Minimum. In the bracketing step we seek an interval
𝛼, 𝛼 such that 𝑓 ′ 𝛼 < 0 and 𝑓 ′ 𝛼 > 0.
– Since for any descent direction, 𝑓 ′ 0 < 0, therefore, 𝛼 = 0 serves
as a lower bound on 𝛼. To find an upper bound, gradually increase
𝛼, e.g., 𝛼 = 1,2, …,
– Assume that for some 𝛼𝑖 > 0, we get 𝑓 ′ 𝛼𝑖 < 0 and 𝑓 ′ 𝛼𝑖+1 > 0;
then, 𝛼𝑖 serves as an upper bound.
Approximate Line Search
• Estimating the Minimum. Once the minimum has been bracketed
to a small interval, a quadratic or cubic polynomial approximation is
used to find the minimizer.
• If the polynomial minimizer 𝛼 satisfies strong Wolfe’s conditions for
the desired 𝜇 and 𝜂 values (say 𝜇 = 0.2, 𝜂 = 0.5), it is taken as the
function minimizer.
• Otherwise, 𝛼 is used to replace one of the 𝛼 or 𝛼, and the
polynomial approximation step repeated.
Quadratic Curve Fitting
• Assuming that the interval 𝛼𝑙 , 𝛼𝑢 contains the minimum of a
unimodal function, 𝑓 𝛼 , its quadratic approximation, given as:
𝑞 𝛼 = 𝑎0 + 𝑎1 𝛼 + 𝑎2 𝛼 2 , is obtained using three points
𝛼𝑙 , 𝛼𝑚 , 𝛼𝑢 , where the mid-point may be used for 𝛼𝑚
The quadratic coefficients {𝑎0 , 𝑎1 , 𝑎2 } are solved as:
1 𝑓 𝛼𝑢 −𝑓 𝛼𝑙 𝑓 𝛼𝑚 −𝑓 𝛼𝑙
𝑎2 = −
𝛼𝑢 −𝛼𝑚 𝛼𝑢 −𝛼𝑙 𝛼𝑚 −𝛼𝑙
1
𝑎1 = 𝑓 𝛼𝑚 − 𝑓 𝛼𝑙 − 𝑎2 (𝛼𝑙 + 𝛼𝑚 )
𝛼𝑚 −𝛼𝑙
𝑎0 = 𝑓(𝛼𝑙 ) − 𝑎1 𝛼𝑙 − 𝑎2 𝛼𝑙2
𝑎1
Then, the minimum is given as: 𝛼𝑚𝑖𝑛 = −
2𝑎2
Example: Approximate Search
• Let 𝑓 𝛼 = 𝑒 −𝛼 + 𝛼 2 , 𝑓 ′ 𝛼 = 2𝛼 − 𝑒 −𝛼 , 𝑓 0 = 1, 𝑓 ′ 0 = −1.
Let 𝜇 = 0.2, and try 𝛼 = 0.1, 0.2, …, to bracket the minimum.
• From the sufficient decrease condition, the minimum is bracketed
in the interval: [0, 0.5]
• Using quadratic approximation, the minimum is found as:
𝑥 ∗ = 0.3531
The exact solution is given as: 𝛼𝑚𝑖𝑛 = 0.3517
• The Matlab commands are:
Define the function:
f=@(x) x.*x+exp(-x);
mu=0.2; al=0:.1:1;
Example: Approximate Search
• Bracketing the minimum:
f1=feval(f,al)
1.0000 0.9148 0.8587 0.8308 0.8303 0.8565 0.9088 0.9866
1.0893 1.2166 1.3679
>> f2=f(0)-mu*al
1.0000 0.9800 0.9600 0.9400 0.9200 0.9000 0.8800 0.8600
0.8400 0.8200 0.8000
>> idx=find(f1<=f2)
• Quadratic approximation to find the minimum:
al=0; am=0.25; au=0.5;
a2 = ((f(au)-f(al))/(au-al)-(f(am)-f(al))/(am-al))/(au-am);
a1 = (f(am)-f(al))/(am-al)-a2*(al+am);
xmin = -a1/a2/2 % 0.3531
Computer Methods for Finding the Search Direction
• Gradient based methods
– Steepest descent method
– Conjugate gradient method
– Quasi Newton methods
• Hessian based methods
– Newton’s method
– Trust region methods
Steepest Descent Method
• The steepest descent method determines the search direction as:
𝒅𝑘 = −𝛻𝑓(𝒙𝑘 ),
• The update rule is given as: 𝒙𝑘+1 = 𝒙𝑘 − 𝛼𝑘 ∙ 𝛻𝑓(𝒙𝑘 )
where 𝛼𝑘 is determined by minimizing 𝑓(𝒙𝑘+1 ) along 𝒅𝑘
• Example: quadratic function
1
𝑓 𝒙 = 𝒙𝑇 𝑨𝒙 − 𝒃𝑇 𝒙, 𝛻𝑓 = 𝑨𝒙 − 𝒃
2
𝑇
𝑘+1 𝑘 𝑘 𝛻 𝑓 𝒙𝑘 𝛻 𝑓 𝒙𝑘
Then, 𝒙 = 𝒙 − 𝛼 ∙ 𝛻𝑓 𝒙 ; 𝛼 = 𝑇
𝛻 𝑓 𝒙𝑘 𝐀𝛻 𝑓 𝒙𝑘
Define 𝒓𝑘 = 𝒃 − 𝑨𝒙𝑘
𝒓𝑇
𝑘 𝒓𝑘
Then, 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒓𝑘 ; 𝛼𝑘 =
𝒓𝑇
𝑘 𝐴𝒓𝑘
Steepest Descent Algorithm
• Initialize: choose 𝒙0
• For 𝑘 = 0,1,2, …
– Compute 𝛻𝑓(𝒙𝑘 )
– Check convergence: if 𝛻𝑓(𝒙𝑘 ) < 𝜖, stop.
– Set 𝒅𝑘 = −𝛻𝑓(𝒙𝑘 )
– Line search problem: Find min 𝑓 𝒙𝑘 + 𝛼𝒅𝑘
𝛼≥0
– Set 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝒅𝑘 .
Example: Steepest Descent
• Consider min𝑓 𝒙 = 0.1𝑥12 + 𝑥22 ,
𝒙
0.2𝑥1 0.1 0 5
𝛻𝑓 𝒙 = , 𝛻2𝑓 𝑥 = ; let 𝒙0 = , then, 𝑓 𝒙0 = 3.5,
2𝑥2 0 1 1
−1
𝑑1 = −𝛻𝑓 𝒙0 = , 𝛼 = 0.61
−2
4.39
𝒙1 = , 𝑓 𝒙1 = 1.98
−0.22
Continuing..
Example: Steepest Descent
• MATLAB code:
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1];
xall=x';
for i=1:10
d=-df(x);
a=d'*d/(d'*H*d);
x=x+a*d;
xall=[xall;x'];
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal
Steepest Descent Method
• The steepest descent method becomes slow close to the optimum
• The method progresses in a zigzag fashion, since
𝑑 𝑇 𝑘 𝑇
𝑓 𝒙𝑘 + 𝛼𝒅𝑘 = 𝛻 𝑓 𝒙𝑘+1 𝒅 = −𝛻 𝑓 𝒙𝑘+1 𝛻 𝑓 𝒙𝑘 = 0
𝑑𝛼
• The method has linear convergence with rate constant
𝑓 𝒙𝑘+1 −𝑓 𝒙∗ 𝑐𝑜𝑛𝑑 𝑨 −1 2
𝐶= ≤
𝑓 𝒙𝑘 −𝑓 𝒙∗ 𝑐𝑜𝑛𝑑 𝑨 +1
Preconditioning
• Preconditioning (scaling) can be used to reduce the condition
number of the Hessian matrix and hence aid convergence
• Consider 𝑓 𝒙 = 0.1𝑥12 + 𝑥22 = 𝒙𝑇 𝑨𝒙, where 𝑨 = 𝑑𝑖𝑎𝑔(0.1, 1)
• Define a linear transformation: 𝒙 = 𝑷𝒚, where 𝑷 = 𝑑𝑖𝑎𝑔( 10, 1);
then, 𝑓 𝒙 = 𝒚𝑇 𝑷𝑇 𝑨𝑷𝒚 = 𝒚𝑇 𝒚
• Since 𝑐𝑜𝑛𝑑 𝑰 = 1, the steepest descent method in the case of a
quadratic function converges in a single iteration
Conjugate Gradient Method
• For any square matrix 𝑨, the set of 𝑨-conjugate vectors is defined
𝑇
by: 𝒅𝑖 𝑨𝒅𝑗 = 0, 𝑖 ≠ 𝑗
• Let 𝒈𝑘 = 𝛻 𝑓 𝒙𝑘 denote the gradient; then, starting from
𝒅0 = −𝒈0 , a set of 𝑨-conjugate directions is generated as:
𝒅0 = −𝒈0 ; 𝒅𝑘+1 = −𝒈𝑘+1 + 𝛽𝑘 𝒅𝑘 𝑘 ≥ 0, …
𝒈𝑇
𝑘+1 𝑨𝒅
𝑘
where 𝛽𝑘 = 𝑇
𝒅𝑘 𝑨𝒅𝑘
There are multiple ways to generate conjugate directions
• Using {𝒅0 , 𝒅2 , … , 𝒅𝑛−1 } as search directions, a quadratic function is
minimized in 𝑛 steps.
Conjugate Directions Method
• The parameter 𝛽𝑘 can be computed in different ways:
1
– By substituting 𝑨𝒅𝑘 = 𝛼 (𝒈𝑘+1 − 𝒈𝑘 ), we obtain:
𝑘
𝒈𝑇
𝑘+1 (𝒈𝑘+1 −𝒈𝑘 )
𝛽𝑘 = 𝑇 (the Hestenes-Stiefel formula)
𝒅𝑘 (𝒈𝑘+1 −𝒈𝑘 )
𝑇
– In the case of exact line search, 𝑔𝑘+1 𝒅𝑘 = 0; then
𝒈𝑇
𝑘+1 (𝒈𝑘+1 −𝒈𝑘 )
𝛽𝑘 = (the Polak-Ribiere formula)
𝒈𝑇
𝑘 𝒈𝑘
– Also, for exact line search 𝒈𝑇𝑘+1 𝒈𝑘 = 𝛽𝑘−1 (𝒈𝑘 + 𝛼𝑘 𝑨𝒅𝑘 )𝑇 𝒅𝑘−1 = 0,
𝒈𝑇
𝑘+1 𝒈𝑘+1
resulting in 𝛽𝑘 = (the Fletcher-Reeves formula)
𝒈𝑇
𝑘 𝒈𝑘
Other versions of 𝛽𝑘 have also been proposed.
Example: Conjugate Gradient Method
• Consider min𝑓 𝒙 = 0.1𝑥12 + 𝑥22 ,
𝒙
0.2𝑥1 0.1 0 5
𝛻𝑓 𝒙 = , 𝛻2𝑓 𝑥 = ; let 𝒙0 = , then 𝑓 𝒙0 = 3.5,
2𝑥2 0 1 1
−1
𝑑 0 = − 𝛻𝑓 𝒙0 = , 𝛼 = 0.61
−2
4.39
𝒙1 = , 𝑓 𝒙1 = 1.98
−0.22
𝛽0 = 0.19
−0.535
𝑑1 = , 𝛼 = 8.2
0.027
0
𝒙1 =
0
Example: Conjugate Gradient Method
• MATLAB code
H=[.2 0;0 2];
f=@(x) x'*H*x/2; df=@(x) H*x; ddf=H;
x=[5;1]; n=2;
xall=zeros(n+1,n); xall(1,:)=x';
d=-df(x); a=d'*d/(d'*H*d);
x=x+a*d; xall(2,:)=x';
for i=1:size(x,1)-1
b=df(x)'*H*d/(d'*H*d);
d=-df(x)+b*d;
r=-df(x);
a=r'*r/(d'*H*d);
x=x+a*d;
xall(i+2,:)=x';
end
plot(xall(:,1),xall(:,2)), grid
axis([-1 5 -1 5]), axis equal
Conjugate Gradient Algorithm
• Conjugate-Gradient Algorithm (Griva, Nash & Sofer, p454):
• Initialize: Choose 𝒙0 = 𝟎, 𝒓0 = 𝒃, 𝒅(−1) = 0, 𝛽0 = 0.
• For 𝑖 = 0,1, …
– Check convergence: if 𝒓𝑖 < 𝜖, stop.
𝒓𝑇
𝑖 𝒓𝑖
– If 𝑖 > 0, set 𝛽𝑖 =
𝒓𝑇
𝑖−1 𝒓𝑖−1

𝒓𝑇
𝑖 𝒓𝑖
– Set 𝒅𝑖 = 𝒓𝑖 + 𝛽𝑖 𝒅𝑖−1 ; 𝛼𝑖 = 𝑖𝑇 𝑖
; 𝒙 𝑖+1 = 𝒙 𝑖 + 𝛼 𝑖 𝒅 𝑖;
𝒅 𝑨𝒅
𝒓𝑖+1 = 𝒓𝑖 − 𝛼𝑖 𝑨𝒅𝑖 .
Conjugate Gradient Method
• Assume that an update that includes steps 𝛼𝑖 along 𝑛 conjugate
vectors 𝒅𝑖 is assembled as: 𝑦 = 𝑛𝑖=1 𝛼𝑖 𝒅𝑖 .
• Then, for a quadratic function, the minimization problem is
decomposed into a set of one-dimensional problems, i.e.,
𝑛 1 2 𝑖𝑇
min 𝑓(𝒚) ≡ 𝑖=1 min 𝛼𝑖 𝒅 𝑨𝒅𝑖 − 𝛼𝑖 𝒃𝑇 𝒅𝑖
𝑦 𝛼𝑖 2

• By setting the derivative with respect to 𝛼𝑖 equal to zero, i.e.,


𝑖𝑇 𝑖 𝑇 𝑖 𝒃𝑇 𝒅𝑖
𝛼𝑖 𝒅 𝑨𝒅 − 𝒃 𝒅 = 0, we obtain: 𝛼𝑖 = 𝑇 .
𝒅𝑖 𝑨𝒅𝑖
• This shows that the CG algorithm iteratively determines the
conjugate directions 𝒅𝑖 and their coefficients 𝛼𝑖 .
CG Rate of Convergence
• Conjugate gradient methods achieve superlinear convergence:
– In the case of quadratic functions, the minimum is reached exactly
in 𝑛 iterations.
– For general nonlinear functions, convergence in 2𝑛 iterations is to
be expected.
• Nonlinear CG methods typically have the lowest per iteration
computational costs of all gradient methods.
Newton’s Method
• Consider minimizing the second order approximation of 𝑓 𝒙 :
min 𝑓 𝒙𝑘 + Δ𝒙 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘 𝑇 Δ𝒙 + 12 Δ𝒙𝑇 𝑯𝑘 Δ𝒙
𝒅
• Apply FONC: 𝑯𝑘 𝒅 + 𝒈𝑘 = 𝟎, where 𝒈𝑘 = 𝛻𝑓 𝒙𝑘
Then, assuming that 𝑯𝑘 = 𝛻 2 𝑓 𝒙𝑘 stays positive definite, the
Newton’s update rule is derived as: 𝒙𝑘+1 = 𝒙𝑘 − 𝑯−1
𝑘 𝒈𝑘
• Note:
– The convergence of the Newton’s method is dependent on 𝑯𝑘
staying positive definite.
– A step size may be included in the Newton’s method, i.e.,
𝒙𝑘+1 = 𝒙𝑘 − 𝛼𝑘 𝑯−1 𝑘 𝒈𝑘
Marquardt Modification to Newton’s Method
• To ensure the positive definite condition on 𝑯𝑘 , Marquardt
proposed the following modification to Newton’s method:
𝑯𝑘 + 𝜆𝑰 𝒅 = −𝒈𝑘
where 𝜆 is selected to ensure that the Hessian is positive definite.
• Since 𝑯𝑘 + 𝜆𝑰 is also symmetric, the resulting system of linear
equations can be solved for 𝒅 as:
𝑳𝑫𝑳𝑇 𝒅 = −𝛻𝑓 𝒙𝑘
Newton’s Algorithm
Newton’s Method (Griva, Nash, & Sofer, p. 373):
1. Initialize: Choose 𝒙0 , specify 𝜖
2. For 𝑘 = 0,1, …
3. Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜖, stop
4. Factorize modified Hessian as 𝛻 2 𝑓 𝒙𝑘 + 𝑬 = 𝑳𝑫𝑳𝑇 and solve
𝑳𝑫𝑳𝑇 𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅
5. Perform line search to determine 𝛼𝑘 and update the solution
estimate as 𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘
Rate of Convergence
• Newton’s method achieves quadratic rate of convergence in the
close neighborhood of the optimal point, and superlinear
convergence otherwise.
• The main drawback of the Newton’s method is its computational
cost: the Hessian matrix needs to be computed at every step, and a
linear system of equations needs to be solved to obtain the update.
• Due to the high computational and storage costs, classic Newton’s
method is rarely used in practice.
Quasi Newton’s Methods
• The quasi-Newton methods derive from a generalization of secant
method, that approximates the second derivative as:
′′ 𝑓′ 𝑥𝑘 −𝑓′ (𝑥𝑘−1 )
𝑓 (𝑥𝑘 ) ≅
𝑥𝑘 −𝑥𝑘−1
• In the multi-dimensional case, the secant condition is generalized
as: 𝑯𝑘 𝒙𝑘 − 𝒙𝑘−1 = 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1
• Define 𝑭𝑘 = 𝑯𝑘−1 , then
𝒙𝑘 − 𝒙𝑘−1 = 𝑭𝑘 𝛻𝑓 𝒙𝑘 − 𝛻𝑓 𝒙𝑘−1
• The quasi-Newton methods iteratively update 𝑯𝑘 or 𝑭𝑘 as:
– Direct update: 𝑯𝑘+1 = 𝑯𝑘 + ∆𝑯𝑘 , 𝑯0 = 𝑰
– Inverse update: 𝑭𝑘+1 = 𝑭𝑘 + ∆𝑭𝑘 , 𝑭 = 𝑯−1 , 𝑭0 = 𝑰
Quasi-Newton Methods
• Quasi-Newton update:
Let 𝒔𝑘 = 𝒙𝑘+1 − 𝒙𝑘 , 𝒚𝑘 = 𝛻𝑓 𝒙𝑘+1 − 𝛻𝑓 𝒙𝑘 ; then,
– The DFP (Davison-Fletcher-Powell) formula for inverse Hessian
update is given as:
𝑭𝑘 𝒚𝑘 𝑭𝑘 𝒚𝑘 𝑇 𝒔𝑘 𝒔𝑘 𝑇
𝑭𝑘+1 = 𝑭𝑘 − +
𝒚𝑘 𝑇 𝑭𝑘 𝒚𝑘 𝒚𝑘 𝑇 𝒔𝑘
– The BGFS (Broyden, Fletcher, Goldfarb, Shanno) formula for direct
Hessian update is given as:
𝑯𝑘 𝒔𝑘 𝑯𝑘 𝒔𝑘 𝑇 𝒚𝑘 𝒚𝑘 𝑇
𝑯𝑘+1 = 𝑯𝑘 − + 𝑇
𝒔𝑘 𝑇 𝑯𝑘 𝒔𝑘 𝒚𝑘 𝒔𝑘
Quasi-Newton Algorithm
The Quasi-Newton Algorithm (Griva, Nash & Sofer, p.415):
• Initialize: Choose 𝒙0 , 𝑯0 (e.g., 𝑯0 = 𝑰), specify 𝜀
• For 𝑘 = 0,1, …
– Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop
– Solve 𝑯𝑘 𝒅 = −𝛻𝑓 𝒙𝑘 for 𝒅𝑘 (alternatively, 𝒅 = −𝑭𝑘 𝛻𝑓 𝒙𝑘 )
– Solve min 𝑓 𝒙𝑘 + 𝛼𝒅𝑘 for 𝛼𝑘 , and update the current estimate:
𝛼
𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘
– Compute 𝒔𝑘 , 𝒚𝑘 , and update 𝑯𝑘 (or 𝑭𝑘 as applicable)
Example: Quasi-Newton Method
• Consider the problem: min 𝑓(𝑥1 , 𝑥2 ) = 2𝑥12 − 𝑥1 𝑥2 + 𝑥22 , where
𝑥1 ,𝑥2
4 −1 𝑥1 1
𝑯= , 𝛻𝑓 = 𝑯 𝑥 . Let 𝒙0 = , 𝑓 0 = 4, 𝑯0 = 𝑰, 𝑭0 = 𝑰;
−1 2 2 1
Choose 𝒅0 = −𝛻𝑓 𝑥 0 = −3 ;
−1
then 𝑓 𝛼 = 2 1 − 3𝛼 2 + 1 − 𝛼 2
− (1 − 3𝛼)(1 − 𝛼),

Using 𝑓 ′ 𝛼 = 0 → 𝛼 = 16 → 𝒙1 = 0.625 , 𝑓 1 = 0.875;


5
0.688
−3.44 1.193 0.065 0.381 −0.206
then 𝒚1 = , 𝑭1 = , 𝑯1 = ,
0.313 0.065 1.022 −0.206 0.9313
0.4375
and using either update formula 𝒅1 = ; for the next step,
−1.313
0.2188
𝑓 𝛼 = 5.36𝛼 2 − 3.83𝛼 + 0.875 → 𝛼 = −0.3572, 𝒙2 = .
0.2188
Example: Quasi-Newton Method
• For quadratic function, convergence is achieved in two iterations.
Trust-Region Methods
• The trust-region methods locally employ a quadratic approximation
𝑞𝑘 𝒙𝑘 to the nonlinear objective function.
• The approximation is valid in the neighborhood of 𝒙𝑘 defined by
Ω𝑘 = 𝒙: 𝚪(𝒙 − 𝒙𝑘 ) ≤ ∆𝑘 , where 𝚪 is a scaling parameter.
• The method aims to find a 𝒙𝑘+1 ∈ Ω𝑘 , that satisfies the sufficient
decrease condition in 𝑓(𝒙).
• The quality of the quadratic approximation is estimated by the
𝑓(𝒙𝑘 )−𝑓(𝒙𝑘+1 )
reliability index: 𝛾𝑘 = . If this ratio is close to unity,
𝑞𝑘 𝒙𝑘 −𝑞𝑘 𝒙𝑘+1
the trust region may be expanded in the next iteration.
Trust-Region Methods
• At each iteration 𝑘, trust-region algorithm solves a constrained
optimization sub-problem involving quadratic approximation:
𝑇
1 𝑇 2
min 𝑞𝑘 𝒅 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘 𝒅 + 𝒅 𝛻 𝑓 𝒙𝑘 𝒅
𝒅 2
Subject to: 𝒅 ≤ ∆𝑘
Lagrangian function: ℒ 𝑥, 𝜆 = 𝑓 𝒙𝑘 + 𝜆 𝒅 − ∆𝑘
FONC: 𝛻 2 𝑓 𝒙𝑘 + 𝜆𝑰 𝒅𝑘 = −𝛻𝑓 𝒙𝑘 , 𝜆 𝒅 − ∆𝑘 = 0
• The resulting search direction 𝒅𝑘 is given as: 𝒅𝑘 = 𝒅𝑘 (𝜆).
– For large ∆𝑘 and a positive-definite 𝛻 2 𝑓 𝒙𝑘 , the Lagrange
multiplier 𝜆 → 0, and 𝒅𝑘 (𝜆) reduces to the Newton’s direction.
– For ∆𝑘 → 0, 𝜆 → ∞, and 𝒅𝑘 (𝜆) aligns with the steepest-descent
direction.
Trust-Region Algorithm
• Trust-Region Algorithm (Griva, Nash & Sofer, p.392):
1 3
• Initialize: choose 𝒙0 , ∆0 ; specify 𝜀, 0 < 𝜇 < 𝜂 < 1 (e.g., 𝜇 = 4 ; 𝜂 = 4)
• For 𝑘 = 0,1, …
– Check convergence: If 𝛻𝑓 𝒙𝑘 < 𝜀, stop
– Solve the subproblem: min 𝑞𝑘 𝒅 subject to 𝒅 ≤ ∆𝑘
𝒅
– Compute 𝛾𝑘 ,
1
• if 𝛾𝑘 < 𝜇, set 𝒙𝑘+1 = 𝒙𝑘 , ∆𝑘+1 = ∆𝑘
2
𝑘
• else if 𝛾𝑘 < 𝜂, set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅 , ∆𝑘+1 = ∆𝑘
• else set 𝒙𝑘+1 = 𝒙𝑘 + 𝒅𝑘 , ∆𝑘+1 = 2∆𝑘
Computer Methods for Constrained Problems
• Penalty and Barrier methods
• Augmented Lagrangian method (AL)
• Sequential linear programming (SLP)
• Sequential quadratic programming (SQP)
Penalty and Barrier Methods
• Consider the general optimization problem: min 𝑓 𝒙
𝒙

ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝;
Subject to 𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚;
𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈 , 𝑖 = 1, … , 𝑛.
• Define a composite function to be used for constraint compliance:
Φ 𝒙, 𝑟 = 𝑓 𝒙 + 𝑃 𝑔 𝒙 , ℎ 𝒙 , 𝒓
where 𝑃 defines a loss function, and 𝒓 is a vector of weights (penalty
parameters)
Penalty and Barrier Methods
• Penalty Function Method. A penalty function method employs a
quadratic loss function and iterates through the infeasible region
2
𝑃 𝑔 𝒙 ,ℎ 𝒙 ,𝒓 = 𝑟 𝑖 𝑔𝑖+ 𝒙 + 𝑖 ℎ𝑖 𝒙 2

𝑔𝑖+ 𝒙 = max 0, 𝑔𝑖 𝒙 ,𝑟 > 0


• Barrier Function Method. A barrier method employs a log barrier
function and iterates through the feasible region
1
𝑃 𝑔 𝒙 ,ℎ 𝒙 ,𝒓 = 𝑖 log −𝑔𝑖 𝑥
𝑟
• For both penalty and barrier methods, as 𝑟 → ∞, 𝒙(𝑟) → 𝒙∗
The Augmented Lagrangian Method
• Consider an equality-constrained problem: min 𝑓 𝒙
𝒙
Subject to: ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙
• Define the augmented Lagrangian (AL) as:
1
𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑗 𝑣𝑗 ℎ𝑗 𝒙 + 𝑟ℎ𝑗2 𝒙
2
where the additional term defines an exterior penalty function with
𝑟 as the penalty parameter.
• For inequality constrained problems, the AL may be defined as:
1 𝑢𝑗
𝑢𝑖 𝑔𝑖 𝒙 + 𝑟𝑔𝑖2 𝒙 , if 𝑔𝑗 + ≥0
2 𝑟
𝒫 𝒙, 𝒖, 𝑟 = 𝑓 𝒙 + 𝑖 1 2 𝑢𝑗
− 𝑢 , if 𝑔𝑗 + <0
2𝑟 𝑖 𝑟
where a large 𝑟 makes the Hessian of AL positive definite at 𝒙.
The Augmented Lagrangian Method
• The dual function for the AL is defined as:
1 2
𝜓 𝒗 = min 𝒫 𝒙, 𝒗, 𝑟 = 𝑓 𝒙 + 𝑗 𝑣𝑗 ℎ𝑗 𝒙 + 𝑟 ℎ𝑗 𝒙
𝒙 2
• The resulting dual optimization problem is: max 𝜓 𝒗
𝒗
• The dual problem may be solved via Newton’s method as:
−1
𝑑2 𝜓
𝒗𝑘+1 = 𝒗𝑘 − 𝒉
𝑑𝑣𝑖 𝑑𝑣𝑗

𝑑2𝜓
where = −𝛻ℎ𝑖 𝑇 𝛻 2 𝒫 −1 𝛻ℎ
𝑗
𝑑𝑣𝑖 𝑑𝑣𝑗

• For large 𝒓, the Newton’s update may be approximated as:


𝑣𝑗𝑘+1 = 𝑣𝑗𝑘 + 𝑟𝑗 ℎ𝑗 , 𝑗 = 1, … , 𝑙
Example: Augmented Lagrangian
• Maximize the volume of a cylindrical tank subject to surface area
constraint:
𝜋𝑑 2 𝑙 𝜋𝑑 2
max 𝑓 𝑑, 𝑙 = , subject to ℎ: + 𝜋𝑑𝑙 − 𝐴0 = 0
𝑑,𝑙 4 4
• We can normalize the problem as:
min 𝑓 𝑑, 𝑙 = −𝑑 2 𝑙, subject to ℎ: 𝑑 2 + 4𝑑𝑙 − 1 = 0
𝑑,𝑙
• The solution to the primal problem is obtained as:
Lagrangian function: ℒ 𝑑, 𝑙, 𝜆 = −𝑑 2 𝑙 + 𝜆(𝑑 2 + 4𝑑𝑙 − 1)
FONC: 𝜆 𝑑 + 2𝑙 − 𝑑𝑙 = 0, 𝜆𝑑 𝑑 + 4 − 𝑑 2 = 0, 𝑑 2 + 4𝑑𝑙 − 1 = 0
1
Optimal solution: 𝑑∗ = 2𝑙 ∗ = 4𝜆∗ = .
3
Example: Augmented Lagrangian
• Alternatively, define the Augmented Lagrangian function as:
1
𝒫 𝑑, 𝑙, 𝜆, 𝑟 = −𝑑 2 𝑙 + 𝜆 𝑑 2 + 4𝑑𝑙 − 1 + 𝑟 𝑑 2 + 4𝑑𝑙 − 1 2
2
• Define the dual function: 𝜓 𝜆 = min 𝒫 𝑑, 𝑙, 𝜆, 𝑟
𝑑,𝑙
• Define dual optimization problem: max 𝜓 𝜆
𝑑,𝑙
• Solution to the dual problem: 𝜆∗ = 𝜆𝑚𝑎𝑥 = 0.144
• Solution to the design variables: 𝑑 ∗ = 2𝑙 ∗ = 0.577
Sequential Linear Programming
• Consider the general optimization problem: min 𝑓 𝒙
𝒙

ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑝;
Subject to 𝑔𝑗 𝒙 ≤ 0, 𝑗 = 𝑖, … , 𝑚;
𝑥𝑖𝐿 ≤ 𝑥𝑖 ≤ 𝑥𝑖𝑈 , 𝑖 = 1, … , 𝑛.
• Let 𝒙𝑘 denote the current estimate of the design variables, and let
𝒅 denote the change in variables; define the first order expansion
of the objective and constraint functions in the neighborhood of 𝒙𝑘
𝑇
𝑓 𝒙𝑘 + 𝒅 = 𝑓 𝒙𝑘 + 𝛻𝑓 𝒙𝑘 𝒅
𝑇
𝑔𝑖 𝒙𝑘 + 𝒅 = 𝑔𝑖 𝒙𝑘 + 𝛻𝑔𝑖 𝑘
𝒙 𝒅, 𝑖 = 1, … , 𝑚
𝑇
ℎ𝑗 𝒙𝑘 + 𝒅 = ℎ𝑗 𝒙𝑘 + 𝛻ℎ𝑗 𝒙𝑘 𝒅, 𝑗 = 1, … , 𝑙
Sequential Linear Programming
• Let 𝑓 𝑘 = 𝑓 𝒙𝑘 , 𝑔𝑖𝑘 = 𝑔𝑖 𝒙𝑘 , ℎ𝑗𝑘 = ℎ𝑗 𝒙𝑘 ; 𝑏𝑖 = −𝑔𝑖𝑘 , 𝑒𝑗 = −ℎ𝑗𝑘 ,
𝒄 = 𝛻𝑓 𝒙𝑘 , 𝒂𝑖 = 𝛻𝑔𝑖 𝒙𝑘 , 𝒏𝑗 = 𝛻ℎ𝑗 𝒙𝑘 ,
𝑨 = 𝒂1 , 𝒂2 , … , 𝒂𝑚 , 𝑵 = 𝒏1 , 𝒏2 , … , 𝒏𝑙 .
• Using first order expansion, define an LP subprogram for the
current iteration of the NLP problem:
min 𝑓 = 𝒄𝑇 𝒅
𝒅
Subject to: 𝑨𝑇 𝒅 ≤ 𝒃,
𝑵𝑇 𝒅 = 𝒆
where 𝑓 represents first-order change in the cost function, and the
columns of 𝑨 and 𝑵 matrices represent, respectively, the gradients
of inequality and equality constraints.
• The resulting LP problem can be solved via the Simplex method.
Sequential Linear Programming
• We may note that:
– Since both positive and negative changes to design variables 𝒙𝑘 are
allowed, the variables 𝑑𝑖 are unrestricted in sign
– The SLP method requires additional constraints of the form:
− ∆𝑘𝑖𝑙 ≤ 𝑑𝑖𝑘 ≤ ∆𝑘𝑖𝑢 (termed move limits) to bind the LP solution.
These limits represent maximum allowable change in 𝑑𝑖 in the
current iteration and are selected as percentage of current value.
– Move limits serve dual purpose of binding the solution and
obviating the need for line search.
– Overly restrictive move limits tend to make the SLP problem
infeasible.
SLP Example
• Consider the convex NLP problem:
min 𝑓(𝑥1 , 𝑥2 ) = 𝑥12 − 𝑥1 𝑥2 + 𝑥22
𝑥1 ,𝑥2
Subject to: 1 − 𝑥12 − 𝑥22 ≤ 0; −𝑥1 ≤ 0, −𝑥2 ≤ 0
∗ 1 1
The problem has a single minimum at: 𝒙 = ,
2 2
• The objective and constraint gradients are:
𝛻𝑓 𝑇 = 2𝑥1 − 𝑥2 , 2𝑥2 − 𝑥1 ,
𝛻𝑔1𝑇 = −2𝑥1 , −2𝑥2 , 𝛻𝑔2𝑇 = −1,0 , 𝛻𝑔3𝑇 = [0, −1].
• Let 𝒙0 = 1, 1 , then 𝑓 0 = 1, 𝒄𝑇 = 1 1 , 𝑏1 = 𝑏2 = 𝑏3 = 1;
𝒂1𝑇 = −2 − 2 , 𝒂𝑇2 = −1 0 , 𝒂𝑇3 = 0 − 1
SLP Example
• Define the LP subproblem at the current step as:
min 𝑓 𝑥1 , 𝑥2 = 𝑑1 + 𝑑2
𝑑1 ,𝑑2
−2 −2 𝑑 1
1
Subject to: −1 0 ≤ 1
𝑑2
0 −1 1
• In the absence of move limits, the LP problem is unbounded; using
1 1 𝑇
50% move limits, the SLP update is given as: 𝒅∗ = − , − ,
2 2
1 1 𝑇 1
𝒙1= , with resulting constraint violation: 𝑔𝑖 = , 0, 0 ;
,
2 2 2
smaller move limits may be used to reduce the constraint violation.
Sequential Linear Programming
SLP Algorithm (Arora, p. 508):
• Initialize: choose 𝒙0 , 𝜀1 > 0, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Choose move limits ∆𝑘𝑖𝑙 , ∆𝑘𝑖𝑢 as some fraction of current design 𝒙𝑘
– Compute 𝑓 𝑘 , 𝒄, 𝑔𝑖𝑘 , ℎ𝑗𝑘 , 𝑏𝑖 , 𝑒𝑗
– Formulate and solve the LP subproblem for 𝒅𝑘
– If 𝑔𝑖 ≤ 𝜀1 ; 𝑖 = 1, … , 𝑚; ℎ𝑗 ≤ 𝜀1 ; 𝑖 = 1, … , 𝑝; and 𝒅𝑘 ≤ 𝜀2, stop
– Substitute 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘 , 𝑘 ← 𝑘 + 1.
Sequential Quadratic Programming
• Sequential quadratic programming (SQP) uses a quadratic
approximation to the objective function at every step of iteration.
• The SQP problem is defined as:
1
min 𝑓 = 𝒄𝑇 𝒅 + 𝒅𝑇 𝒅
𝒅 2
Subject to, 𝑨𝑇 𝒅 ≤ 𝒃, 𝑵𝑇 𝒅 = 𝒆
• SQP does not require move limits, alleviating the shortcomings of
the SLP method.
• The SQP problem is convex; hence, it has a single global minimum.
• SQP can be solved via Simplex based linear complementarity problem
(LCP) framework.
Sequential Quadratic Programming
• The Lagrangian function for the SQP problem is defined as:
ℒ 𝒅, 𝒖, 𝒗 = 𝒄𝑇 𝒅 + 12 𝒅𝑇 𝒅 + 𝒖𝑇 𝑨𝑇 𝒅 − 𝒃 + 𝒔 + 𝒗𝑇 (𝑵𝑇 𝒅 − 𝒆)
• Then the KKT conditions are:
Optimality: 𝛁ℒ = 𝒄 + 𝒅 + 𝑨𝒖 + 𝑵𝒗 = 𝟎,
Feasibility: 𝑨𝑇 𝒅 + 𝒔 = 𝒃, 𝑵𝑇 𝒅 = 𝒆 ,
Complementarity: 𝒖𝑇 𝒔 = 𝟎,
Non-negativity: 𝒖 ≥ 𝟎, 𝒔 ≥ 𝟎
Sequential Quadratic Programming
• Since 𝒗 is unrestricted in sign, let 𝒗 = 𝒚 − 𝒛, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎, and
the KKT conditions are compactly written as:
𝒅
𝑰 𝑨 𝟎 𝑵 −𝑵 𝒖 −𝒄
𝑨𝑇 𝟎 𝑰 𝟎 𝟎 𝒔 = 𝒃 ,
𝑵𝑇 𝟎 𝟎 𝟎 𝟎 𝒚 𝒆
𝒛
or 𝑷𝑿 = 𝑸
• The complementary slackness conditions, 𝒖𝑇 𝒔 = 𝟎, translate as:
𝑿𝑖 𝑿𝑖+𝑚 = 0, 𝑖 = 𝑛 + 1, ⋯ , 𝑛 + 𝑚.
• The resulting problem can be solved via Simplex method using LCP
framework.
Descent Function Approach
• In SQP methods, the line search step is based on minimization of a
descent function that penalizes constraint violations, i.e.,
Φ 𝒙 = 𝑓 𝒙 + 𝑅𝑉 𝒙
where 𝑓 𝒙 is the cost function, 𝑉 𝒙 represents current
maximum constraint violation, and 𝑅 > 0 is a penalty parameter.
• The descent function value at the current iteration is computed as:
Φ𝑘 = 𝑓𝑘 + 𝑅𝑉𝑘 ,
𝑚 𝑘 𝑝
𝑅 = max 𝑅𝑘 , 𝑟𝑘 where 𝑟𝑘 = 𝑖=1 𝑢𝑖 + 𝑗=1 𝑣𝑗𝑘
𝑉𝑘 = max {0; 𝑔𝑖 , 𝑖 = 1, . . . , 𝑚; ℎ𝑗 , 𝑗 = 1, … , 𝑝}
• The line search subproblem is defined as:
min Φ 𝛼 = Φ 𝒙𝑘 + 𝛼𝒅𝑘
𝛼
SQP Algorithm
SQP Algorithm (Arora, p. 526):
• Initialize: choose 𝒙0 , 𝑅0 = 1, 𝜀1 > 0, 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Compute 𝑓 𝑘 , 𝑔𝑖𝑘 , ℎ𝑗𝑘 , 𝒄, 𝑏𝑖 , 𝑒𝑗 ; compute 𝑉𝑘 .
– Formulate and solve the QP subproblem to obtain 𝒅𝑘 and the
Lagrange multipliers 𝒖𝑘 and 𝒗𝑘 .
– If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘 ≤ 𝜀2 , stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘 , 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1
• The above algorithm is convergent, i.e., Φ 𝒙𝑘 ≤ Φ 𝒙0 ; 𝒙𝑘
converges to the KKT point 𝒙∗
SQP with Approximate Line Search
• The SQP algorithm can use with approximate line search as follows:
Let 𝑡𝑗 , 𝑗 = 0,1, … denote a trial step size,
𝒙𝑘+1,𝑗 denote the trial design point,
𝑓 𝑘+1,𝑗 = 𝑓( 𝒙𝑘+1,𝑗 ) denote the function value at the trial solution, and
Φ𝑘+1,𝑗 = 𝑓 𝑘+1,𝑗 + 𝑅𝑉𝑘+1,𝑗 is the penalty function at the trial solution.
• The trial solution is required to satisfy the descent condition:
2
Φ𝑘+1,𝑗 + 𝑡𝑗 𝛾 𝒅𝑘 ≤ Φ𝑘,𝑗 , 0<𝛾<1
1 1
where a common choice is: 𝛾 = 2 , 𝜇 = 2 , 𝑡𝑗 = 𝜇 𝑗 , 𝑗 = 0,1,2, ….
• The above descent condition ensures that the constraint violation
decreases at each step of the method.
SQP Example
• Consider the NLP problem: min 𝑓(𝑥1 , 𝑥2 ) = 𝑥12 − 𝑥1 𝑥2 + 𝑥22
𝑥1 ,𝑥2
subject to 𝑔1 : 1 − 𝑥12 − 𝑥22 ≤ 0, 𝑔2 : −𝑥1 ≤ 0, 𝑔3 : −𝑥2 ≤ 0
Then 𝛻𝑓 𝑇 = 2𝑥1 − 𝑥2 , 2𝑥2 − 𝑥1 , 𝛻𝑔1𝑇 = −2𝑥1 , −2𝑥2 , 𝛻𝑔2𝑇 =
−1,0 , 𝛻𝑔3𝑇 = [0, −1]. Let 𝑥 0 = 1, 1 ; then, 𝑓 0 = 1, 𝒄 = 1, 1 𝑇 ,
𝑔1 1,1 = 𝑔2 1,1 = 𝑔3 1,1 = −1.
• Since all constraints are initially inactive, 𝑉0 = 0, and 𝒅 = −𝒄 =
−1, −1 𝑇 ; the line search problem is: min Φ 𝛼 = 1 − 𝛼 2 ;
𝛼
• By setting Φ′ 𝛼 = 0, we get the analytical solution: 𝛼 = 1; thus
𝑥 1 = 0, 0 , which results in a large constraint violation
SQP Example
• Alternatively, we may use approximate line search as follows:
1
– Let 𝑅0 = 10, 𝛾 = 𝜇 = ; let 𝑡0 = 1, then 𝒙1,0 = 0,0 , 𝑓 1,0 = 0,
2
𝑉1,0 = 1, Φ1,0 = 10; 𝒅0 2 = 2, and the descent condition
1
Φ1,0 + 𝒅0 2 ≤ Φ0 = 1 is not met at the trial point.
2
1 1 1 1 1
– Next, for 𝑡1 = , we get: 𝒙1,1 = , , 𝑓 1,1 = , V1,1 = ,
2 2 2 4 2
1
Φ1,1 = 5 , and the descent condition fails again;
4
1 3 3 9
– Next, for 𝑡2 = , we get: 𝒙1,2 = , , V1,2 = 0, 𝑓 1,2 = Φ1,2 = ,
4 4 4 16
1
and the descent condition checks as: Φ1,2 + 𝒅0 2 ≤ Φ0 = 1.
8
1 3 3
– Therefore, we set 𝛼 = 𝑡2 = , 𝒙1 = 𝒙1,2 = , with no
4 4 4
constraint violation.
The Active Set Strategy
• To reduce the computational cost of solving the QP subproblem, we
may only include the active constraints in the problem.
• For 𝒙𝑘 ∈ Ω, the set of potentially active constraints is defined as:
ℐ𝑘 = 𝑖: 𝑔𝑖𝑘 > −𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: 𝑗 = 1, … , 𝑝 for some 𝜀.
• For 𝒙𝑘 ∉ Ω, let 𝑉𝑘 = max {0; 𝑔𝑖𝑘 , 𝑖 = 1, . . . , 𝑚; ℎ𝑗𝑘 , 𝑗 = 1, … , 𝑝};
then, the active constraint set is defined as:
ℐ𝑘 = 𝑖: 𝑔𝑖𝑘 > 𝑉𝑘 − 𝜀; 𝑖 = 1, … , 𝑚 ⋃ 𝑗: ℎ𝑗𝑘 > 𝑉𝑘 − 𝜀; 𝑗 = 1, … , 𝑝
• The gradients of inactive constraints, i.e., those not in ℐ𝑘 , do not
need to be computed
SQP via Newton’s Method
• Consider the following equality constrained problem:
min 𝑓(𝒙), subject to ℎ𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑙
𝒙
• The Lagrangian function is given as: ℒ 𝒙, 𝒗 = 𝑓 𝒙 + 𝒗𝑇 𝒉(𝒙)
• The KKT conditions are: 𝛻ℒ 𝒙, 𝒗 = 𝛻𝑓 𝒙 + 𝑵𝒗 = 𝟎, 𝒉 𝒙 = 𝟎
where 𝑵 = 𝛁𝒉(𝒙) is a Jacobian matrix whose 𝑖th column is 𝛻ℎ𝑖 𝒙
• Using first order Taylor series expansion (with shorthand notation):
𝛻ℒ 𝑘+1 = 𝛻ℒ 𝑘 + 𝛻 2 ℒ 𝑘 Δ𝒙 + 𝑁Δ𝒗
𝒉𝑘+1 = 𝒉𝑘 + 𝑵𝑇 Δ𝒙
• By expanding Δ𝒗 = 𝒗𝑘+1 − 𝒗𝑘 , 𝛻ℒ 𝑘 = 𝛻𝑓 𝑘 + 𝑵𝒗𝑘 , and assuming
𝑘 𝑘+1 𝛻 2ℒ 𝑘 𝑵 Δ𝒙 𝑘 𝛻𝑓 𝑘
𝒗 ≅𝒗 we obtain: 𝑇 𝑘+1 = −
𝑵 𝟎 𝒗 𝒉𝑘
which is similar to N-R update, but uses Hessian of the Lagrangian
SQP via Newton’s Method
• Alternately, we consider minimizing the quadratic approximation:
1
min Δ𝒙𝑇 𝛻 2 ℒΔ𝒙 + 𝛻𝑓 𝑇 Δ𝒙
Δ𝒙 2
Subject to: ℎ𝑖 𝑥 + 𝒏𝑇𝑖 Δ𝒙 = 0, 𝑖 = 𝑖, … , 𝑙
• The KKT conditions are: 𝛻𝑓 + 𝛻 2 ℒΔ𝒙 + 𝑵𝒗 = 𝟎, 𝒉 + 𝑵Δ𝒙 = 𝟎
• Thus the QP subproblem can be solved via Newton’s method!
𝛻 2 ℒ 𝑘 𝑵 Δ𝒙𝑘 = − 𝛻𝑓 𝑘
𝑵𝑇 𝟎 𝒗𝑘+1 𝒉𝑘
• The Hessian of the Lagrangian can be updated via BFGS method as:
𝑯𝑘+1 = 𝑯𝑘 + 𝑫𝑘 − 𝑬𝑘
𝑇 𝑇
𝒚𝑘 𝒚𝑘 𝒄𝑘 𝒄𝑘 𝑘 = 𝑯𝑘 Δ𝒙𝑘 , 𝒚𝑘 = 𝛻ℒ 𝑘+1 − ℒ 𝑘
where 𝑫𝑘 = 𝑘𝑇 𝑘
, 𝑬 𝑘 =
𝑘𝑇 𝑘
, 𝒄
𝒚 Δ𝒙 𝒄 Δ𝒙
Example: SQP with Hessian Update
• Consider the NLP problem: min 𝑓(𝑥1 , 𝑥2 ) = 𝑥12 − 𝑥1 𝑥2 + 𝑥22
𝑥1 ,𝑥2
subject to 𝑔1 : 1 − 𝑥12 − 𝑥22 ≤ 0, 𝑔2 : −𝑥1 ≤ 0, 𝑔3 : −𝑥2 ≤ 0
Let 𝑥 0 = 1, 1 ; then, 𝑓 0 = 1, 𝒄 = 1, 1 𝑇 , 𝑔1 1,1 = 𝑔2 1,1 =
𝑔3 1,1 = −1; 𝛻𝑔1𝑇 = −2, −2 , 𝛻𝑔2𝑇 = −1,0 , 𝛻𝑔3𝑇 = [0, −1].
1 3 3
• Using approximate line search, 𝛼 = , 𝒙1 = , .
4 4 4
• For the Hessian update, we have:
𝑓 1 = 0.5625, 𝑔1 = −0.125, 𝑔2 = 𝑔3 = −0.75; 𝒄1 = [0.75, 0.75];
3 3
𝛻𝑔1𝑇 = − 2 , − 2 , 𝛻𝑔2𝑇 = −1,0 , 𝛻𝑔3𝑇 = 0, −1 ; Δ𝒙0 = −0.25, −0.25 ;
1 1 1 1
then, 𝑫0 = 8 , 𝑬0 = 8 , 𝑯1 = 𝑯0
1 1 1 1
SQP with Hessian Update
• For the next step, the QP problem is defined as:
3 1
min 𝑓 = 𝑑1 + 𝑑2 + 𝑑12 + 𝑑22
𝑑1 ,𝑑2 4 2
3
Subject to: − 2 𝑑1 + 𝑑2 ≤ 0, −𝑑1 ≤ 0, −𝑑2 ≤ 0
• The application of KKT conditions results in a linear system of
equations, which are solved to obtain:
𝒙𝑇 = 𝑑1 , 𝑑2 , 𝑢1 , 𝑢2 , 𝑢3 , 𝑠1 , 𝑠2 , 𝑠3 = 0.188, 0.188, 0, 0, 0,0.125, 0.75, 0.75
Modified SQP Algorithm
Modified SQP Algorithm (Arora, p. 558):
• Initialize: choose 𝒙0 , 𝑅0 = 1, 𝑯0 = 𝐼; 𝜀1 , 𝜀2 > 0.
• For 𝑘 = 0,1,2, …
– Compute 𝑓 𝑘 , 𝑔𝑖𝑘 , ℎ𝑗𝑘 , 𝒄, 𝑏𝑖 , 𝑒𝑗 , and 𝑉𝑘 . If 𝑘 > 0, compute 𝑯𝑘
– Formulate and solve the modified QP subproblem for search
direction 𝒅𝑘 and the Lagrange multipliers 𝒖𝑘 and 𝒗𝑘 .
– If 𝑉𝑘 ≤ 𝜀1 and 𝒅𝑘 ≤ 𝜀2, stop.
– Compute 𝑅; formulate and solve line search subproblem for 𝛼
– Set 𝒙𝑘+1 ← 𝒙𝑘 + 𝛼𝒅𝑘 , 𝑅𝑘+1 ← 𝑅, 𝑘 ← 𝑘 + 1.
SQP Algorithm
%SQP subproblem via Hessian update
% input: xk (current design); Lk (Hessian of Lagrangian
estimate)
%initialize
n=size(xk,1);
if ~exist('Lk','var'), Lk=diag(xk+(~xk)); end
tol=1e-7;
%function and constraint values
fk=f(xk);
dfk=df(xk);
gk=g(xk);
dgk=dg(xk);
%N-R update
A=[Lk dgk; dgk' 0*dgk'*dgk];
b=[-dfk;-gk];
dx=A\b;
dxk=dx(1:n);
lam=dx(n+1:end);
SQP Algorithm
%inactive constraints
idx1=find(lam<0);
if idx1
[dxk,lam]=inactive(lam,A,b,n);
end
%check termination
if abs(dxk)<tol, return, end
%adjust increment for constraint compliance
P=@(xk) f(xk)+lam'*abs(g(xk));
while P(xk+dxk)>P(xk),
dxk=dxk/2;
if abs(dxk)<tol, break, end
end
%Hessian update
dL=@(x) df(x)+dg(x)*lam;
Lk=update(Lk, xk, dxk, dL);
xk=xk+dxk;
disp([xk' f(xk) P(xk)])
SQP Algorithm
%function definitions
function [dxk,lam]=inactive(lam,A,b,n)
idx1=find(lam<0);
lam(idx1)=0;
idx2=find(lam);
v=[1:n,n+idx2];
A=A(v,v); b=b(v);
dx=A\b;
dxk=dx(1:n);
lam(idx2)=dx(n+1:end);
end

function Lk=update(Lk, xk, dxk, dL)


ga=dL(xk+dxk)-dL(xk);
Hx=Lk*dxk;
Dk=ga*ga'/(ga'*dxk);
Ek=Hx*Hx'/(Hx'*dxk);
Lk=Lk+Dk-Ek;
end
Generalized Reduced Gradient
• The GRG method finds the search direction by projecting the
objective function gradient onto the constraint hyperplane.
• The GRG points tangent to the constraint hyperplane, so that
iterative steps try to conform to the constraints.
• The constraints are effectively used to implicitly eliminate variables
and reduce problem dimensions.
Implicit Elimination
• Consider an equality constrained problem in two variables:
Objective: min 𝑓 𝒙 , 𝒙𝑇 = 𝑥1 , 𝑥2
Subject to: 𝑔 𝒙 = 0
• The variation in the objective and constraint functions are:
𝜕𝑓 𝜕𝑓
𝑑𝑓 = 𝛻𝑓 𝑇 𝑑𝒙 = 𝜕𝑥 𝑑𝑥1 + 𝜕𝑥 𝑑𝑥2
1 2
𝜕𝑔 𝜕𝑔
𝑑𝑔 = 𝛻𝑔 𝑇 𝑑𝒙 = 𝜕𝑥 𝑑𝑥1 + 𝜕𝑥 𝑑𝑥2 = 0
1 2
𝜕𝑔/𝜕𝑥1
• Solve for 𝑑𝑥2 = − 𝑑𝑥1 and substitute in the objective function:
𝜕𝑔/𝜕𝑥2
𝜕𝑓 𝜕𝑓 𝜕𝑔/𝜕𝑥1
𝑑𝑓 = − 𝜕𝑥 𝑑𝑥1
𝜕𝑥1 2 𝜕𝑔/𝜕𝑥2

• Then the reduced gradient of 𝑓 along 𝑥1 is given as:


𝜕𝑓 𝜕𝑓 𝜕𝑔/𝜕𝑥1
𝛻𝑓𝑅 = 𝜕𝑥 − 𝜕𝑥
1 2 𝜕𝑔/𝜕𝑥2
Implicit Elimination
• Consider a problem in 𝑛 variable with 𝑚 equality constraints:
Objective: min 𝑓 𝒙 , 𝒙𝑇 = 𝑥1 , 𝑥2 , … , 𝑥𝑛
Subject to: 𝑔𝑖 𝒙 = 0, 𝑖 = 1, … , 𝑚
• We define 𝑚 basic variables in terms of 𝑛 − 𝑚 nonbasic variables;
let 𝒙𝑇 = 𝒚𝑇 , 𝒛𝑇 , where 𝒚 are basic and 𝒛 are nonbasic.
• The gradient vector is partitioned as: 𝛻𝑓 𝑇 = 𝛻𝑓 𝒚 𝑇 , 𝛻𝑓 𝒛 𝑇 .
• The variations in the objective and constraint functions are:
𝑑𝑓 = 𝛻𝑓 𝒚 𝑇 𝑑𝒚 + 𝛻𝑓 𝒛 𝑇 𝑑𝒛
𝜕𝜓 𝜕𝜓
𝑑𝒈 = 𝑑𝒚 + 𝑑𝒛 =𝟎
𝜕𝒚 𝜕𝒛
where the matrices of partial derivatives are defined as:
𝜕𝜓 𝜕𝑔𝑖 𝜕𝜓 𝜕𝑔𝑖
= ; =
𝜕𝒚 𝑖𝑗 𝜕𝑦𝑗 𝜕𝒛 𝑖𝑗 𝜕𝑧𝑗
Generalized Reduced Gradient
𝜕𝜓
• Since is a square 𝑚 × 𝑚 matrix, we may solve for 𝑑𝒚 as:
𝜕𝒚
𝜕𝜓−1 𝜕𝜓
𝑑𝒚 =− , and substitute in 𝑑𝑓 to obtain:
𝜕𝒚 𝜕𝒛
−1 𝜕𝜓
𝑇 𝑑𝒛 − 𝑇 𝜕𝜓
𝑑𝑓 = 𝛻𝑓 𝒛 𝛻𝑓 𝒚 𝑑𝒛
𝜕𝒚 𝜕𝒛
• Then the reduce gradient 𝛻𝑓𝑅 is defined as:
𝜕𝜓−1 𝜕𝜓
𝛻𝑓𝑅𝑇 𝑇
= 𝛻𝑓 𝒛 − 𝛻𝑓 𝒚 𝑇
𝜕𝒚 𝜕𝒛
• Next, we choose negative of 𝛻𝑓𝑅𝑇 as the search direction and
perform a line search to determine step size; then Δ𝒛 = −𝛼𝛻𝑓𝑅 ,
𝜕𝜓−1 𝜕𝜓
Δ𝒚 = Δ𝒛
𝜕𝒚 𝜕𝒛
GRG Algorithm
• Initialize: choose 𝒙0 ; evaluate objective function and constraints;
convert binding inequality constraints to equality constraints.
• Partition the variables into 𝑚 basic and 𝑛 − 𝑚 nonbasic ones, e.g.,
choose first 𝑚 values, or 𝑚 highest values as basic variables.
• Compute the 𝛻𝑓𝑅 along nonbasic variables. If 𝛻𝑓𝑅 = 0, exit.
𝜕𝜓−1 𝜕𝜓
• Set Δ𝒛 = −𝛻𝑓𝑅 / 𝛻𝑓𝑅 , Δ𝒚 = −
𝜕𝒚 𝜕𝒛
Δ𝒛.

• Do a line search along Δ𝒙 to obtain α.


• Check feasibility at 𝒙𝑘 + 𝛼Δ𝒙. If necessary, use Newton-Raphson
𝜕𝜓−1 𝑘
iterations to adjust Δ𝒚 as: Δ𝒚𝑘+1 = Δ𝒚𝑘 − 𝑔
𝜕𝒚

• Update: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼Δ𝒙


Generalized Reduced Gradient
• Consider an equality constrained problem
Objective: min 𝑓 𝒙 = 3𝑥1 + 2𝑥2 + 2𝑥12 − 𝑥1 𝑥2 + 1.5𝑥22
Subject to: 𝑔 𝒙 = 𝑥12 − 𝑥2 − 1 = 0
• Let 𝒙0 = −1 ; then 𝑓 0 = −1, 𝛻𝑓 0 = −1 , 𝑔0 = 0, 𝛻𝑔0 = −2 .
0 3 −1
−2
• Let 𝒚 = 𝑥2 on the first iteration; then 𝛻𝑓𝑅𝑇 = −1 − 3 −1
= −7.
−2
• Let Δ𝒛 = 1, then Δ𝒚 = −1
1 = 2. By doing a line search along
0.333
Δ𝒙 = , we obtain 𝒙1 = −0.350 , 𝑓 1 = −2.13.
0.667 −0.577
• The optimum is reached in three iterations: 𝒙∗ = −0.634 ,
−0.598

𝑓 𝒙 = −2.137.
Generalized Reduced Gradient
• Consider an inequality constrained problem:
Objective: min 𝑓 𝒙 = 𝑥12 + 𝑥2
Subject to: 𝑔1 𝒙 = 𝑥12 + 𝑥22 − 9 ≤ 0, 𝑥1 + 𝑥2 − 1 ≤ 0
• Add slack variable to inequality constraints:
𝑔1 𝒙 = 𝑥12 + 𝑥22 − 9 + 𝑠1 = 0, 𝑔2 𝒙 = 𝑥1 + 𝑥2 − 1 + 𝑠2 = 0
2𝑥1 2𝑥1 1
Then 𝛻𝑓 𝒙 = ; 𝛻𝑔1 𝒙 = ; 𝛻𝑔2 𝒙 =
1 2𝑥2 1
• Let 𝒙0 = 2.56 ; then 𝑓 0 = 4.99, 𝛻𝑓 0 = 5.12 , 𝒈0 = −0.013 ,
−1.56 1 0
5.12 1
• Since 𝑔2 is binding, add 𝑠2 to variables: 𝛻𝑓 0 = 1 , 𝛻𝑔20 = 1
1 1
Generalized Reduced Gradient
𝑥2
• Let 𝑦 = 𝑥1 , 𝒛 = 𝑠 ; then 𝛻𝑓 𝑦 = 5.12, 𝛻𝑓 𝒛 = 1 , 𝛻𝑔2 𝑦 = 1,
2 0
1 1 1 −4.12
𝛻𝑔2 𝒛 = , therefore 𝛻𝑓𝑅 𝒛 = − 5.12 =
1 0 1 −5.12
• Let Δ𝒛 = −𝛻𝑓𝑅 𝒛 , Δ𝑦 = −[1 1]Δ𝒛 = −9.24; then, Δ𝒙 = −9.24 and
4.12
0
𝒔 = Δ𝒙/ Δ𝒙 . Suppose we limit the maximum step size to 𝛼 ≤ 0.5,
then 𝒙1 = 𝒙0 + 0.5𝒔0 = 2.103 with 𝑓 𝑥1 = 𝑓 1 = 3.068. There are
−1.356
no constraint violations, hence first iteration is completed.
• After seven iterations: 𝒙7 = 0.003 with 𝑓 7 = −3.0
−3.0
• The optimum is at: 𝒙∗ = 0.0 with 𝑓 ∗ = −3.0
−3.0
GRG for LP Problems
• Consider an LP problem: min 𝑓(𝒙) = 𝒄𝑇 𝒙
Subject to: 𝑨𝒙 = 𝒃, 𝒙 ≥ 𝟎
• Let 𝒙 be partitioned into 𝑚 basic variables and 𝑛 − 𝑚 nonbasic
variables: 𝒙𝑇 = [𝒚𝑇 , 𝒛𝑇 ].
• The objective function is partitioned as: 𝑓 𝒙 = 𝒄𝑇𝑦 𝒚 + 𝒄𝑇𝑧 𝒛
• The constraints are partitioned as: 𝑩𝒚 + 𝑵𝒛 = 𝒃, 𝒚 ≥ 𝟎, 𝒛 ≥ 𝟎.
Then 𝒚 = 𝑩−1 𝒃 − 𝑩−1 𝑵𝒛
• The objective function in terms of independent variables is:
𝑓 𝒛 = 𝒄𝑇𝑦 𝑩−1 𝒃𝒛 + (𝒄𝑇𝑧 − 𝒄𝑇𝑦 𝑩−1 𝑵)𝒛
• The reduced costs for nonbasic variables are given as:
𝒓𝑇𝑐 = 𝒄𝑇𝑧 − 𝒄𝑇𝑦 𝑩−1 𝑵, or 𝒓𝑇𝑐 = 𝒄𝑇𝑧 − 𝝀𝑇 𝑵
GRG for LP Problems
• Using Tableu notation, the reduced costs are computed as:
𝑩 𝑵 𝒃 𝑰 𝑩−1 𝑵 𝑩−1 𝒃
𝒄𝑇𝒚 𝒄𝑇𝑧 0 → 𝟎 𝒓𝑇𝑐 −𝒄𝑇𝒚 𝑩−1 𝒃
• The objective function variation is given as:
𝑑𝑓 = 𝛻𝑓𝒚𝑇 𝑑𝒚 + 𝛻𝑓𝒛𝑇 𝑑𝒛
• The reduced gradient along the constraint surface is given as:
𝛻𝑓𝑅𝑇 = 𝛻𝒛 𝑓 𝑇 − 𝛻𝒚 𝑓 𝑇 𝑩−1 𝑵 = 𝒓𝑇𝑐
GRG Algorithm for LP Problems
1. Choose the largest 𝑚 components of 𝒙 as basic variables
2. Compute the reduced gradient 𝛻𝑓𝑅𝑇 = 𝒓𝑇𝑐
−𝑟𝑖 𝑖𝑓 𝑟𝑖 ≤ 0
3. Let Δ𝑧𝑖 =
−𝑥𝑖 𝑟𝑖 𝑖𝑓 𝑟𝑖 > 0
4. If Δ𝒛 = 0, stop; otherwise set Δ𝒚 = 𝑩−1 𝑵Δ𝒛
5. Compute step size: let 𝛼1 = max 𝛼: 𝒚 + Δ𝒚 ≥ 0, 𝒛 + Δ𝒛 ≥ 0 ,
𝛼2 = min 𝑓 𝒙 + Δ𝒙 ≥ 0 , 𝛼 = min{𝛼1 , 𝛼2 }
6. Update: 𝒙𝑘+1 = 𝒙𝑘 + 𝛼Δ𝒙
7. If 𝛼2 ≥ 𝛼1 , update 𝑩, 𝑵 (use pivoting)
8. Return to 1

View publication stats

You might also like