Chapter 4. Optimization
Chapter 4. Optimization
Outline
General Ideas of Optimization
Interpreting the First Derivative
Interpreting the Second Derivative
Unconstrained Optimization
Constrained Optimization
2
Optimization
There are two ways of examining optimization.
Minimization
In this case you are looking for the lowest point on the
function.
Maximization
In this case you are looking for the highest point on the
function.
3
y
y = f(x) = x2 – 8x + 20
20
4 x
4
Questions Regarding the Minimum
What is the sign of the slope when you are to the
left of the minimum point?
Another way of saying this is what is f’(x) when x < x*?
Note: x* denotes the point where the function is at a
minimum.
5
Questions Regarding the Minimum Cont.
What is the sign of the slope when you are to the
right of the minimum point?
Another way of saying this is what is f’(x) when x > x*?
What is the sign of the slope when you at the
minimum point?
Another way of saying this is what is f’(x) when x = x*?
6
y
16 y = f(x) = -x2 + 8x
4 x
8
7
Questions Regarding the Maximum
What is the sign of the slope when you are to the
left of the maximum point?
Another way of saying this is what is f’(x) when x < x*?
Note: x* denotes the point where the function is at a
maximum.
8
Questions Regarding the Maximum
Cont.
What is the sign of the slope when you are to the
right of the maximum point?
Another way of saying this is what is f’(x) when x > x*?
What is the sign of the slope when you at the
maximum point?
Another way of saying this is what is f’(x) when x = x*?
9
Interpreting the First Derivative
The first derivative of a function as was shown
previously is the slope of the curve evaluated at a
particular point.
In essence it tells you the instantaneous rate of change of
the function at the given particular point.
Knowing the slope of the function can tell you where a
maximum or a minimum exists on a curve.
Why?
10
Defining Critical Point
A point x* on a function is said to be a critical
point if when you evaluate the derivative of the
function at the point x*, then the derivative at that
point is zero, i.e., f’(x*) = 0.
11
Question
Can the derivative tell you whether you are at a
maximum or a minimum?
The answer is yes if you examine the slope of the function
around the critical point, i.e., the point where the
derivative is zero.
An easier way of examining whether you have a
maximum or a minimum is to examine the second
derivative of the function.
12
The Second Derivative
The second derivative of a function f(x) is the
derivative of the function f’(x), where f’(x) is the
derivative of f(x).
The second derivative can tell you whether the function is
concave or convex at the critical point.
The second derivative can be denoted by f’’(x).
13
Concavity and the Second
Derivative
The maximum of a function f(x) occurs when a
critical point x* is at a concave portion of the
function.
This is equivalent to saying that f’’(x*) < 0.
If f’’(x) < 0 for all x, then the function is said to be
concave.
14
Convexity and the Second
Derivative
The minimum of a function f(x) occurs when a
critical point x* is at a convex portion of the
function.
This is equivalent to saying that f’’(x*) > 0.
If f’’(x) > 0 for all x, then the function is said to be
convex.
15
Special Case of the Second
Derivative
Suppose you have a function f(x) that has a
maximum at x*.
What does it mean when the second derivative is equal to
zero, i.e., f’’(x*) = 0?
This is a point where the second derivative may not be
able to tell you whether you have a maximum or a
minimum.
Usually in this case you will get a saddle point where the
point is neither a maximum nor a minimum.
16
Example of Special Case of the
Second Derivative
Suppose y = f(x) = x3, then f’(x) = 3x2 and f’’(x) =
6x,
This implies that x* = 0 and f’’(x*=0) = 0.
y=f(x)=x3
y
17
Unconstrained Optimization
An unconstrained optimization problem is one
where you only have to be concerned with the
objective function you are trying to optimize.
An objective function is a function that you are trying to
optimize.
None of the variables in the objective function are
constrained.
18
First and Second Order Condition
For a Maximum
The first order condition for a maximum at a point
x* on the function f(x) is when f’(x*) = 0.
The second order condition for a maximum at a
point x* on the function f(x) is when f’’(x*) < 0.
19
First and Second Order Condition
For a Minimum
The first order condition for a minimum at a point
x* on the function f(x) is when f’(x*) = 0.
The second order condition for a minimum at a
point x* on the function f(x) is when f’’(x*) > 0.
20
Example of Using First and Second
Order Conditions
Suppose you have the following function:
f(x) = x3 – 6x2 + 9x
Then the first order condition to find the critical
points is:
f’(x) = 3x2 - 12x + 9 = 0
This implies that the critical points are at x = 1 and x = 3.
21
Example of Using First and Second
Order Conditions Cont.
The next step is to determine whether the critical
points are maximums or minimums.
These can be found by using the second order condition.
f’’(x) = 6x – 12 = 6(x-2)
22
Example of Using First and Second
Order Conditions
Testing x = 1 implies:
Cont.
f’’(1) = 6(1-2) = -6 < 0.
Hence at x =1, we have a maximum.
Testing x = 3 implies:
f’’(3) = 6(3-2) = 6 > 0.
Hence at x =3, we have a minimum.
Are these the ultimate maximum and minimum of
the function f(x)?
23
Relative Vs. Absolute Extremum
A relative extremum is a point that is locally
greater or lesser than all points around it.
A relative extrema can be found by using the first order
condition.
An absolute extremum is a point that is either
absolutely greater than or less than all other
points, i.e., f(x*) > f(x) for all x not equal to x* for a
maximum and f(x*) < f(x) for all x not equal to x*
for a minimum.
24
Finding the Absolute Extremum
To find the absolute extremum, you need to
compare all the critical points on the function, as
well as, any potential end points of the function
like ∞ and - ∞.
When evaluating a polynomial function at ∞ , the value of
the function at ∞ takes the value of the at the highest
ordered variable.
25
Finding the Absolute Extremum Cont.
Some properties of ∞:
∞ + ∞ = ∞
∞ - ∞ is undefined
c*∞ = ∞ , where c is any value grater than zero
∞ * ∞ = ∞
∞ * (-∞ ) = -∞
From the previous example, the relative extremum
points occur at x =-∞, 1, 3, and ∞.
The absolute maximum occurs at x =∞ and the
absolute minimum occurs at x =-∞.
26
Unconstrained Optimization: Two
Variables
Suppose you have a function y = f(x1,x2), then to
find the critical points, you can use the
following first order condition:
∂f ( x1* , x2* )
f x1 = =0
∂x1
∂f ( x , x )
* *
f x2 = 1 2
=0
∂x2
27
Unconstrained Optimization: Two
Variables Cont.
The second order condition are more complex
where you have to examine the second derivative of
each of the variables, as well as, the cross
derivative.
28
Constrained Optimization
Constrained Optimization is said to occur when
one or more of the variables in the objective
function is constrained by some function.
Hence a constrained optimization problem will have an
objective functions and a set of constraints.
29
Constrained optimization
In the presence of constraints, a (local)
optimum does not need to be a stationary point
of the objective function!
Consider the 1-dimensional examples with
feasible region of the form a ≤ x ≤ b
Local optima are either
Stationary and feasible
Boundary points
Constrained optimization
We will study how to characterize local optima
for
multi-dimensional optimization problems
with more complex constraints
We will start by considering problems with only
equality constraints
We will also assume that the objective and constraint
functions are continuous and differentiable
Constrained optimization: equality
constraints
A general equality constrained multi-
dimensional NLP is:
max f ( x ) = f ( x 1 ,..., x n )
subject to
g1 ( x 1 ,..., x n ) = b1
g 2 ( x 1 ,..., x n ) = b2
M
g m ( x 1 ,..., x n ) = bm
Constrained optimization: equality
constraints
The Lagrangian approach is to associate a
Lagrange multiplier λi with the i th constraint
We then form the Lagrangian by adding
weighted constraint violations to the objective
function:
L( x 1 ,..., x n , λ1 ,..., λm ) =
m
f ( x 1 ,..., x n ) + ∑ λi ( bi − g i ( x 1 ,..., x n ) )
i =1
or L( x , λ ) = f ( x ) + λ ′ ( b − g ( x ) )
Constrained optimization: equality
constraints
Now consider the stationary points of the
Lagrangian:
∂L( x , λ ) ∂f ( x ) m ∂g i ( x )
= − ∑ λi =0 j = 1,..., n
∂x j ∂x j i =1 ∂x j
∂L( x , λ )
= bi − g i ( x ) = 0 i = 1,..., m
∂λi
The 2nd set of conditions says that x needs to satisfy
the equality constraints!
The 1st set of conditions generalizes the unconstrained
stationary point condition!
Constrained optimization: equality
constraints
Let (x*,λ*) maximize the Lagrangian
Then it should be a stationary point of L
g(x*)=b, i.e., x* is a feasible solution to the original
optimization problem
Furthermore, for all feasible x and all λ
L ( x * , λ * ) ≥ L( x , λ )
f (x * ) + λ * ′ ( b − g (x * ) ) ≥ f (x ) + λ′ ( b − g (x ) )
f (x * ) ≥ f (x )
So x* is optimal for the original problem!!
Constrained optimization: equality
constraints
Conclusion: we can find the optimal solution to
the constrained problem by considering all
stationary points of the unconstrained
Lagrangian problem
i.e., by finding all solutions to
∂L( x , λ ) ∂f ( x ) m ∂g i ( x )
= − ∑ λi =0 j = 1,..., n
∂x j ∂x j i =1 ∂x j
∂L( x , λ )
= bi − g i ( x ) = 0 i = 1,..., m
∂λi
Constrained optimization: equality
constraints
As a byproduct, we get the interesting
observation that
L( x * , λ * ) = f ( x * ) + λ * ′ ( b − g ( x * ) ) = f ( x * )
We will use this later when interpreting the
values of the multipliers λ*
Constrained optimization: equality
constraints
Note: if
the objective function f is concave
all constraint functions gi are linear
Then any stationary point of L is an optimal
solution to the constrained optimization
problem!!
this result also holds when f is convex
Constrained optimization: equality
constraints
An example:
max − x − x 2
1
2
2
s.t. x 1 + x 2 = 1
Constrained optimization: equality
constraints
Then
L( x 1 , x 2 , λ ) = − x − x + λ ( 1 − x 1 − x 2 )
2
1
2
2
= −x 12 − x 22 + λ − λ x 1 − λ x 2
First order conditions:
−2x 1 − λ = 0 x 1 = − 12 λ x1 = 1
2
−2x 2 − λ = 0 or x 2 = − 12 λ or x 2 = 1
2
1 − x1 − x 2 = 0 x1 + x 2 = 1 λ = −1
Constrained optimization:
Inequality Constraints
We will still assume that the objective and constraint
functions are continuous and differentiable
We will assume all constraints are “≤ ”
constraints
We will also look at problems with both equality and
inequality constraints
Constrained optimization:
inequality constraints
A general inequality constrained multi-
dimensional NLP is:
max f ( x ) = f ( x 1 ,..., x n )
subject to
g1 ( x 1 ,..., x n ) ≤ b1
g 2 ( x 1 ,..., x n ) ≤ b2
M
g m ( x 1 ,..., x n ) ≤ bm
Constrained optimization:
inequality constraints
In the case of inequality constraints, we also
associate a multiplier λi with the i th constraint
As in the case of equality constraints, these
multipliers can be interpreted as shadow prices
Constrained optimization:
inequality constraints
Without derivation or proof, we will look at a
set of necessary conditions, called Karush-
Kuhn-Tucker- or KKT-conditions, for axgiven
point, say , to be an optimal solution to the
NLP
These are valid when a certain condition
(“constraint qualification”) is verified.
The latter will be assumed for now.
Constrained optimization:
inequality constraints
By necessity, an optimal point should satisfy
the KKT-conditions.
However, not all points that satisfy the KKT-
conditions are optimal!
∂f ( x ) m ∂g i ( x )
− ∑ λi = 0 j = 1, , n
∂x j i =1 ∂x j
λi ( bi − g i ( x ) ) = 0 i = 1, , m
g i ( x ) ≤ bi i = 1, , m
λi ≥ 0 i = 1, , m
Constrained optimization:
KKT conditions
The second set of KKT conditions is
λi ×( bi − g i ( x ) ) = 0 i = 1,..., m
This is comparable to the complementary
slackness conditions from LP!
if λi > 0 then g i ( x ) = bi
if g i ( x ) < bi then λi = 0
Constrained optimization:
KKT conditions
This can be interpreted as follows:
Additional units of the resource bi only have value
if the available units are used fully in the optimal
solution
max − x 12 − x 22
s.t. − x 1 − x 2 ≤ −1
Constrained optimization:
inequality constraints
The KKT conditions are:
x2 = 1
2
− x 1 − x 2 ≤ −1
λ =1
λ ≥0
Constrained optimization:
inequality constraints
With multiple inequality constraints:
max − x − x
2
1
2
2
s.t. − 2x 1 − x 2 ≤ −1
− x 1 − 2 x 2 ≤ −1
Constrained optimization:
inequality constraints
The KKT conditions are:
−2x 1 + 2λ1 + λ2 = 0 with solution
−2x 2 + λ1 + 2λ2 = 0 x1 = 1
3
λ1 ×( −1 + 2x 1 + x 2 ) = 0 x2 = 1
3
λ2 ×( −1 + x 1 + 2x 2 ) = 0 λ1 = 2
3
−2x 1 − x 2 ≤ −1 λ2 = 2
3
− x 1 − 2 x 2 ≤ −1
λ1 , λ2 ≥ 0
Constrained optimization:
inequality constraints
Another example:
max − x 12 − x 22
s.t. − 5x 1 − x 2 ≤ −5
− 3x 1 − 2x 2 ≤ −6
Constrained optimization:
inequality constraints
The KKT conditions are:
2x 1 + 5λ1 + 3λ2 = 0 with solution
2x 2 + λ1 + 2λ2 = 0 x 1 = 1 135
λ1 ×( −5 + 5x 1 + x 2 ) = 0 x2 = 12
13
λ2 ×( −6 + 3x 1 + 2x 2 ) = 0 λ1 = 0
−5x 1 − x 2 ≤ −5 λ2 = 12
13
−3x 1 − 2x 2 ≤ −6
λ1 , λ2 ≥ 0
A Word on Constraint Qualification:
It has to be satisfied before we can apply KKT
theorem
It comes in several flavors
We only focus on the following:
The gradients of the constraint functions, including
those corresponding to non-negativity, have to be
linearly independent
When the constraints are all linear, the constraint
qualification is satisfied.
A Word on Constraint Qualification:
An example:
max f ( x1 , x2 ) = x1
s.t. x2 − (1 − x1 ) ≤ 0
3
(1)
x1 ≥ 0
x2 ≥ 0
A Word on Constraint Qualification:
µ1 ≥ 0 (2)
When evaluated at (1,0 ) they yield µ1 = −1,
contradicting (2).
A Word on Constraint Qualification:
In other words, we have an optimal solution that
does not satisfy KKT.
g 2 ( x1 , x2 ) = − x1 , g 3 ( x1 , x2 ) = − x2 ,
we get
0
1× ∇g1 (1,0) + 0 × ∇g 2 (1,0) + 1× ∇g 3 (1,0) = ,
0
showing that the gradients at (1,0) are not linearly indep.