0% found this document useful (0 votes)
17 views43 pages

MITESD 77S10 Lec07

The lecture covers key concepts in Multidisciplinary System Design Optimization, including the existence and uniqueness of optimal solutions, Karush-Kuhn-Tucker conditions, and the distinction between convex and unconstrained problems. It emphasizes understanding various optimization techniques rather than mastering algorithms, and outlines the importance of selecting appropriate methods based on problem characteristics. The session also discusses iterative optimization procedures and the role of gradients and Hessians in determining optimality.

Uploaded by

anhtri.journal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views43 pages

MITESD 77S10 Lec07

The lecture covers key concepts in Multidisciplinary System Design Optimization, including the existence and uniqueness of optimal solutions, Karush-Kuhn-Tucker conditions, and the distinction between convex and unconstrained problems. It emphasizes understanding various optimization techniques rather than mastering algorithms, and outlines the importance of selecting appropriate methods based on problem characteristics. The session also discusses iterative optimization procedures and the role of gradients and Hessians in determining optimality.

Uploaded by

anhtri.journal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Multidisciplinary System

Design Optimization (MSDO)

Numerical Optimization I
Lecture 7

Karen Willcox

1 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Today’s Topics

• Existence & Uniqueness of an Optimum


Solution
• Karush-Kuhn-Tucker Conditions
• Convex Spaces
• Unconstrained Problems

2 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Disclaimer!

• This is not a classic optimization class ...

• The aim is not to teach you the details of optimization


algorithms, but rather to expose you to different
methods.

• We will utilize optimization techniques – the goal is to


understand enough to be able to utilize them wisely.

• If you plan to use optimization extensively in your


research, you should take an optimization class,
e.g. 15.093

3 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Learning Objectives

After the next two lectures, you should:

• be familiar with what gradient-based (and some


gradient-free) optimization techniques are available
• understand the basics of how each technique works
• be able to choose which optimization technique is
appropriate for your problem
• understand what to look for when the algorithm
terminates
• understand why the algorithm might fail

4 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
How to Choose an Algorithm?

• Number of design variables


• Type of design variables (real/integer, continuous/discrete)
• Linear/nonlinear
• Continuous/discontinuous objective behavior
• Equality/inequality constraints
• Discontinuous feasible spaces
• Initial solution feasible/infeasible
• Availability of gradient information
• Simulation code (forward problem) runtime
5 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Gradient-Based Optimization Applications
• Gradient-based methods can be used to solve large-scale, highly
complex engineering problems
• Many recent advances in nonlinear optimization, e.g. in PDE-
constrained optimization, exploiting structure of the problem, adjoints,
preconditioning, specialized solvers, parallel computing, etc.
• We will discuss some basic methods – not state-of-the-art

Earthquake inverse modeling (CMU Quake


Project, 2003): Inversion of surface observations
for 17 million elastic parameters (right: target; left:
inversion result). Optimization problem solved in
24 hours on 2048 processors of an HP
AlphaServer system.

Accelerator shape optimization (SLAC): Next generation accelerators have complex


cavities that require shape optimization for improved performance and reduced cost.

6 Examples from O. Ghattas (UT Austin) and collaborators.


Courtesy of Omar Ghattas. Used with permission.
Standard Problem Definition

min J ( x)
s.t. g j (x ) 0 j 1,.., m1
hk (x ) 0 k 1,.., m2
xi xi x u
i i 1,.., n
• For now, we consider a single objective
function, J(x).
• There are n design variables, and a total of
m constraints (m=m1+m2).
• The bounds are known as side constraints.
7 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Linear vs. Nonlinear

The objective function is a linear function of the design


variables if each design variable appears only to the first
power with constant coefficients multiplying it.

J (x) x1 2x2 3.4 x3 is linear in x=[x1 x2 x3]T

J (x) x1x2 2x2 3.4 x3 is nonlinear in x

J (x) cos( x1) 2x2 3.4 x3 is nonlinear in x

8 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Linear vs. Nonlinear

A constraint is a linear function of the design variables if each


design variable appears only to the first power with constant
coefficients multiplying it.

6 x1 x2 2x3 10 is linear in x

6 x1 x2 2x32 10 is nonlinear in x

6 x1 sin( x2 ) 2x3 10 is nonlinear in x

9 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Iterative Optimization Procedures
Many optimization algorithms are iterative:
q q 1 q q
x x S
where
q=iteration number
S=vector search direction
=scalar distance
and the initial solution x0 is given
The algorithm determines the search direction S
according to some criteria.
Gradient-based algorithms use gradient information to
decide where to move. Gradient-free algorithms use
sampling and/or heuristics.
10
Iterative Optimization Procedures

MATLAB® demo

11 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Gradient Vector
Consider a function J(x), x=[x1,x2,...,xn]

The gradient of J(x) at a point x0 is a vector of length n:


J 0
(x )
x1
J 0
(x )
J (x 0 ) x2

J 0
(x )
xn

Each element in the vector is evaluated at the point x0.


12 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Hessian Matrix
Consider a function J(x), x=[x1,x2,...,xn]

The second derivative of J(x) at a point x0 is a


matrix of size n n: 2 2 2
J J J
x12 x 1 x 2 x1 x n
2
J
x1 x 2
H( x 0 ) 2
J (x 0 )

2 2
J J
x1 x n xn2

Each element in the matrix is evaluated at the point x0.


13 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Gradients & Hessian Example

2 3
J ( x) 3x1 x1x2 x 3 6x x 2 3

14 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Taylor Series
Consider scalar case:

df 1 d 2f
f (z ) f (z 0 ) (z z 0 ) ( z z 0 )2
dz z0 2 dz 2 z0
When function depends on a vector:
T
0 0
J (x) J(x ) J(x ) (x x0 )

1 n n 1
1
(x x 0 )T H(x 0 )( x x0 )
2
1 n n n n 1
The gradient vector and Hessian matrix can be approximated using finite
differences if they are not available analytically or using adjoints (L8).
15 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Existence & Uniqueness of an
Optimum Solution
• Usually cannot guarantee that global optimum is found
– multiple solutions may exist
– numerical ill-conditioning
start from several initial solutions
• Can determine mathematically if have relative minimum
• Under certain conditions can guarantee global optimum
(special class of optimization problems or with global
optimization methods)
• It is very important to interrogate the “optimum” solution

16 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Existence & Uniqueness:
Unconstrained Problems
• Unconstrained problems: at
J(x)
minimum, gradient must vanish
|| J(x*)|| = 0
– x* is a stationary point of J A B C D
x

– necessary but not sufficient


– here A,B,C,D all have J=0

• Calculus: at minimum, second derivative > 0

• Vector case: at minimum, H(x*) >0 (positive definite)

17 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
SPD Matrices
• Positive definiteness: yTHy > 0 for all y 0

• Consider the ith eigenmode of H: Hvi = ivi

• If H is a symmetric matrix, then the eigenvectors of H form


an orthogonal set: viTvj = ij

• Any vector y can be thought of as a linear combination of


eigenvectors: y = aivi

• Then: yT Hy ai vTi H aj v j ai vTi j a j v j ai2 i


i j i,j i

• Therefore, if the eigenvalues of H are all positive, H is SPD.

18 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Existence & Uniqueness:
Unconstrained Problems
Necessary and sufficient conditions for a minimum
(unconstrained problem):
1. Gradient must vanish
2. Hessian must be positive definite

J(x*)=0
local minimum at x*
H(x*)>0
J(x)
The minimum is only guaranteed
to be a global optimum if H(x)>0
for all values of x (e.g. simple x

19 parabola). © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Existence & Uniqueness:
Constrained Problems
At optimum: J(x) optimum constraint

– at least one constraint on


design is active
– J does not have to be zero x
A B C D
In order to improve design:
– move in direction that decreases objective
– move in direction that does not violate constraints
Usable direction = any direction that reduces objective
ST J(x) 0
Feasible direction = a direction in which a small move will not
violate constraints
ST gi(x) 0 (for all active constraints i)
Note that these conditions may be relaxed for certain algorithms.
20 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Lagrangian Functions
• A constrained minimization can be written as an
unconstrained minimization by defining the Lagrangian
function:
m1 m2
L(x, ) J ( x) j g j ( x) h ( x)
m1 k k
j 1 k 1

• L(x, ) is called the Lagrangian function.


• i is the jth Lagrange multiplier
• It can be shown that a stationary point x* of L(x, ) is a
stationary point of the original constrained minimization
problem.

21 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Lagrangian Example

2 2
min J(x) x 1 3x 2

s.t. x1 x2 1

22 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Karush-Kuhn-Tucker (KKT) Conditions

If x* is optimum, these conditions are satisfied:


1. x* is feasible
2. j gj(x*) = 0, j =1,..,m1 and j 0
m1 m2
3. J( x * ) j g j (x* ) m1 k hk ( x * ) 0
j 1 k 1

j 0
m1 k unrestrict ed in sign

The Kuhn-Tucker conditions are necessary and


sufficient if the design space is convex.
23 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
KKT Conditions: Interpretation

Condition 1: the optimal design satisfies the constraints

Condition 2: if a constraint is not precisely satisfied, then


the corresponding Lagrange multiplier is zero
– the jth Lagrange multiplier represents the sensitivity of the
objective function to the jth constraint
– can be thought of as representing the “tightness” of the
constraint
– if j is large, then constraint j is important for this solution

Condition 3: the gradient of the Lagrangian vanishes at


the optimum
24 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Convex Sets
Consider a set, and imagine drawing a line connecting
any two points in the set.
If every point along that line is inside the set, then the
set is convex.
If any point along that line is outside the set, then the
set is non-convex.

The line connecting points x1 and x2 is given by


w = x1 + (1- )x2, 0 1
25 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Convex Functions

Informal definition: a convex function is one that will hold


water, while a concave function will not hold water...

convex concave neither


f(x)
A function f(x) bounding a f(x2)
convex set is convex if:
f(x1)
f [ x1 (1 x2 ] f ( x1) (1 f x )

x
x1 x2
26 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Convex Spaces
Pick any two points in the feasible region. If all points on the line
connecting these points lie in the feasible region, then the
constraint surfaces are convex.

If the objective function is convex, then it has only one optimum


(the global one) and the Hessian matrix is positive definite for all
possible designs.

If the objective function and all constraint surfaces are convex,


then the design space is convex, and the KKT conditions are
sufficient to guarantee that x* is a global optimum.

In general, for engineering problems, the design space is not convex ...

27 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Convergence Criteria

Algorithm has converged when ...

no change in the objective function is obtained

OR the maximum number of iterations is reached

Once the “optimal” solution has been obtained, the KKT


conditions should be checked.

28 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Types of Optimization Algorithms

• Useful resource: Prof. Steven Johnson’s open-source


library for nonlinear optimization
http://ab-initio.mit.edu/wiki/index.php/NLopt_Algorithms

• Global optimization
Most methods
have some
• Local derivative-free optimization convergence
analysis and/or
• Local gradient-based optimization proofs.

• Heuristic methods

29 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Local Derivative-Free Optimization:
Nelder-Mead Simplex
• A simplex is a special polytope of N + 1 vertices in N
dimensions
– e.g., line segment on a line, triangle in 2D, tetrahedron in 3D

• Form an initial simplex around the initial guess x0


• Repeat the following general steps:
– Compute the function value at each vertex of the simplex
– Order the vertices according to function value, and
discard the worst one
– Generate a new point by “reflection”
– If the new point is acceptable, generate a new
simplex. Expand or contract simplex size
according to quality of new point.

• Converges to a local optimum when


the objective function varies
smoothly and is unimodal, but
can converge to a non-stationary
point in some cases Figures from
http://www.scholarpedia.org/article/Nelder-
• “fminsearch” in MATLAB Mead_algorithm
Courtesy of Saša Singer. Used with permission.
Global Derivative-Free Optimization:
DIRECT
• DIRECT: DIviding RECTangles algorithm for global optimization
(Jones et al., 1993)

• Initialize by dividing domain into hyper-rectangles

• Repeat
– Identify potentially optimal
hyper-rectangles
– Divide potentially optimal
hyper-rectangles
– Sample at centers of new
hyper-rectangles

• Balances local and global search


– Global convergence to the
optimum
– May take a large, exhaustive
search Image by MIT OpenCourseWare.
Gradient-Based Optimization Process

x0, q=0

Calculate J(xq)

Calculate Sq
q=q+1

Perform 1-D search


xq = xq-1 + Sq

no yes
Converged? Done

32 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Unconstrained Problems:
Gradient-Based Solution Methods
• First-Order Methods
– use gradient information to calculate S
– steepest descent method
– conjugate gradient method
– quasi-Newton methods
• Second-Order Methods
– use gradients and Hessian to calculate S
– Newton method
• Methods to calculate gradients in Lecture 8.
• Often, a constrained problem can be cast as an unconstrained
problems and these techniques used.

33 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Steepest Descent

Sq = - J(xq-1) - J(x) is the direction of


max decrease of J at x

Algorithm:
choose x0, set x=x0
repeat until converged:
S = - J(x)
choose to minimize J(x+ S)
x=x+ S

• doesn’t use any information from previous iterations


• converges slowly
is chosen with a 1-D search (interpolation or Golden section)
34 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Conjugate Gradient

S1 = - J(x0)
Sq = - J(xq-1) + qSq-1

2
q 1
q
J (x )
2
q 2
J(x )
• search directions are now conjugate
• directions Sj and Sk are conjugate if SjT H Sk = 0
(also called H-orthogonal)
• makes use of information from previous iterations
without having to store a matrix

35 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Geometric Interpretation

STEEPEST DESCENT CONJUGATE GRADIENT

X2 X2

8 8
100 100
0 0
60 -40 60 -40
20 20
X
4 4 -20 X
-20

-8 -4 0 4 8 X1 -8 -4 0 4 8 X1

-4
-4

Adapted from: "Optimal Design in Multidisciplinary System." AIAA Professional Development Short Course Notes. September 2002.

Figures from “Optimal Design in Multidisciplinary Systems,” AIAA


Professional Development Short Course Notes, September 2002.

36 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Newton’s Method
Taylor series:
0 0 T 1 T
J ( x) J(x ) J(x ) x x H(x 0 ) x
2
where x x x0
differentiate: J (x) J ( x 0 ) H( x 0 ) x

at optimum J(x*)=0
J (x 0 ) H(x 0 ) x 0
1
0
x H(x ) J(x 0 )
37 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
Newton’s Method

1
0 0
S H(x ) J (x )

• if J(x) is quadratic, method gives exact solution in one iteration


• if J(x) not quadratic, perform Taylor series about new point and
repeat until converged
• a very efficient technique if started near the solution
• H is not usually available analytically, and finite difference is too
expensive (n n matrix)
• H can be singular if J is linear in a design variable

38 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Quasi Newton
Sq = -Aq J(xq-1)
• Also known as variable metric methods
• Objective and gradient information is used to create an
approximation to the inverse of the Hessian
• A approaches H-1 during optimization of quadratic functions
• Convergence is similar to second-order methods (strictly 1st order)

• Initially: A=I, so S1 is steepest descent direction


then: Aq+1 = Aq + Dq
where D is a symmetric update matrix
Dq = fn(xq-xq-1, J(xq)- J(xq-1), Aq)
• Various methods to determine D
e.g. Davidon-Fletcher-Powell (DFP)
39
Broydon-Fletcher-Goldfarb-Shanno (BFGS)
© Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox
Engineering Systems Division and Dept. of Aeronautics and Astronautics
One-Dimensional Search
(Choosing )

• Polynomial interpolation
– pick several values for
– fit polynomials to J( )
– efficient, but need to be careful with implementation
• Golden section search
– easy to implement, but inefficient
• The one-dimensional search is one of the more
challenging aspects of implementing a gradient-based
optimization algorithm

40 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Polynomial Interpolation Example
1 1
J S J( c1 c2 c3 c4
2 2

dJ J x1 J x2
J TS
d x1 x2

J dJ/d
0 10 -5
1 6 -5
2 8 -5

41 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
Lecture Summary

• Gradient vector and Hessian matrix


• Existence and uniqueness
• Optimality conditions
• Convex spaces
• Unconstrained Methods

The next lecture will focus on gradient-based


techniques for nonlinear constrained optimization.
We will consider SQP and penalty methods. These
are the methods most commonly used for
engineering applications.

42 © Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox


Engineering Systems Division and Dept. of Aeronautics and Astronautics
MIT OpenCourseWare
http://ocw.mit.edu

ESD.77 / 16.888 Multidisciplinary System Design Optimization


Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like