Optimization Theory 2
Optimization Theory 2
min f (x)
x
or
max f (x)
x
where:
In this problem, we are interested in finding the value(s) of x that either minimize or maximize
f (x), depending on the problem at hand.
2. First-Order Condition (Optimality Condition): To find the optimal point (minimum or max-
imum), the first-order condition is that the gradient of the objective function should be zero
at the solution point:
∇f (x) = 0
This means that the derivative of the objective function with respect to x should vanish at
the optimum, i.e., the slope of the function is zero at this point.
3. Second-Order Condition: After finding the critical points using the first-order condition, the
second-order condition (Hessian matrix) helps determine whether the point is a minimum,
maximum, or saddle point.
1
If the Hessian matrix is positive definite, the point is a local minimum.
If the Hessian is negative definite, the point is a local maximum.
If the Hessian is indefinite, the point is a saddle point.
2. Multivariate Optimization:
1. Analytical Methods: - If the objective function is smooth and differentiable, you can often
find the solution by setting the gradient to zero and solving for the critical points. - Example:
min f (x) = x2 + 3x + 2
x
The solution is obtained by finding the derivative and setting it equal to zero:
d 2 3
(x + 3x + 2) = 2x + 3 = 0 ⇒ x=−
dx 2
Gradient Descent: Iteratively updates the variables by moving in the direction of the
negative gradient (for minimization).
2
1.4 Applications of Unconstrained Optimization
1. Machine Learning: Training models like linear regression and logistic regression involves
unconstrained optimization of loss functions.
2. Economics and Finance: Many problems, such as maximizing utility or minimizing costs, can
be modeled as unconstrained optimization problems in certain cases.
3. Engineering Design: Finding the optimal design parameters to minimize cost or maximize
efficiency often involves unconstrained optimization.
Since the Hessian is positive definite (its eigenvalues are both positive), the point (2, −3) is a
local minimum.
1.6 Exercise
Example:1 Quadratic Function
Optimize f (x, y) = x2 + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x, = 2y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
3
Critical point: (0, 0). 3. Hessian matrix:
2 0
Hf (x, y) =
0 2
4. Eigenvalues of the Hessian: 2, 2 (both positive). 5. Since both eigenvalues are positive, the
function has a local minimum at (0, 0).
Example 2: Saddle Point Problem: Optimize f (x, y) = x2 − y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x, = −2y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:
2 0
Hf (x, y) =
0 −2
4. Eigenvalues of the Hessian: 2, −2 (one positive, one negative). 5. Since the eigenvalues have
different signs, (0, 0) is a saddle point.
Example 3: Elliptic Paraboloid Problem: Optimize f (x, y) = 3x2 + 2y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 6x, = 4y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:
6 0
Hf (x, y) =
0 4
4. Eigenvalues of the Hessian: 6, 4 (both positive). 5. The function has a local minimum at (0, 0).
Example 4: Mixed Function Problem: Optimize f (x, y) = 4x2 − 4xy + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 8x − 4y, = 2y − 4x
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x=y=0
Critical point: (0, 0). 3. Hessian matrix:
8 −4
Hf (x, y) =
−4 2
4
4. Eigenvalues of the Hessian: Solve det(H − λI) = 0, yielding λ1 = 10, λ2 = 0. 5. The eigenvalues
indicate that (0, 0) is a saddle point.
Example 5: Cross-Term Function Problem: Optimize f (x, y) = x2 + xy + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x + y, = 2y + x
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:
2 1
Hf (x, y) =
1 2
4. Eigenvalues: λ1 = 3, λ2 = 1 (both positive). 5. The function has a local minimum at (0, 0).
Example 6: Exponential Function Problem: Optimize f (x, y) = ex+y .
Solution: 1. First-order partial derivatives:
∂f ∂f
= ex+y , = ex+y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0: There are no critical points since ex+y 6= 0 for any x and y.
x2 +y 2
Example 7: Rational Function Problem: Optimize f (x, y) = 1+x2 +y 2
.
Solution: 1. First-order partial derivatives:
x2 = ay 2 + bxy + cx + dy + e
5
Here, the goal is to determine the coefficients a, b, c, d, and e based on the 10 observations of
(xj , yj ), where j = 1, . . . , 10.
Steps to Solve:
1. Rewrite the equation in terms of a linear system: The given equation x2 = ay 2 + bxy + cx +
dy + e is linear in terms of the unknowns a, b, c, d, e. We can rearrange it as:
X · β = Xvec
a
b
where β = c .
d
e
3. Use least squares method: Since we have more equations (10) than unknowns (5), this is an
overdetermined system. We can solve it using the method of least squares, which minimizes the
residuals:
min kXβ − Xvec k2
β
β = (X T X)−1 X T Xvec
Xv ec = x0 .2 ;
% Solve the least squares problem to find the coefficients [a, b, c, d, e]
coefficients = (X 0 ∗ X) (X 0 ∗ Xv ec);
6
% Display the results a = coefficients(1); b = coefficients(2); c = coefficients(3); d = coefficients(4);
e = coefficients(5);
fprintf(’The coefficients are:\n’);
fprintf(’a = %f \n’, a);
fprintf(’b = %f \n’, b);
fprintf(’c = %f \n’, c);
fprintf(’d = %f \n’, d);
fprintf(’e = %f \n’, e);
Explanation:
1. Matrix ‘X‘: Each row of the matrix corresponds to the terms yj2 , xj yj , xj , yj , and 1, which
are derived from the equation x2j = ayj2 + bxj yj + cxj + dyj + e. 2. Vector ‘Xv ec‘: This vector
contains the values x2j for each observation. 3. Least squares solution: The MATLAB command
‘(X 0 ∗ X) (X 0 ∗ Xv ec)‘ solves the least squares problem to find the unknown coefficients a, b, c, d,
and e.
Output: The code will output the coefficients a, b, c, d, and e, which define the orbit of the
planetoid based on the observed positions.
Predicting the Orbit: Once you have the coefficients, you can use the equation:
x2 = ay 2 + bxy + cx + dy + e
to predict the planetoid’s future positions in its orbit, or find its next position when it becomes
visible again.
(b) For the parabolic approach, we assume the orbit equation takes the simpler form:
x2 = dy + e
Here, we have only two unknown coefficients, d and e, and we aim to determine these based on
the observed data.
Steps to Solve:
1. Rewrite the equation: For each observation (xj , yj ), the equation becomes:
x2j = dyj + e
This can be rewritten as a linear system:
d
x2j
= yj 1
e
2. Set up the matrix equation: Let’s define: - X: a matrix of size 10 × 2, where each row
corresponds to the terms yj and 1 for each observation j. - Xv ec: a vector of length 10 with
elements x2j .
Therefore, we need to solve the linear system:
X · β = Xvec
d
where β = .
e
3. Use least squares method: Since we have more equations (10) than unknowns (2), this is
an overdetermined system. We can solve it using the least squares method, which minimizes the
residuals:
min kXβ − Xvec k2
β
7
The least squares solution is given by:
β = (X T X)−1 X T Xvec
x2 = dy + e
to predict the planetoid’s future positions in its orbit based on the y-coordinate, or to find its
next position when it becomes visible again.
Exercise: 2 Gauss Normal Equation (in the context of linear regression)
The Gauss normal equation is a key result used to solve the least squares problem in linear
regression. In linear regression, the objective is to find the best-fit line or plane (in higher dimen-
sions) to a set of data points by minimizing the sum of the squared differences between the observed
and predicted values.
Proof of Gauss Normal Equation
The objective function is the sum of squared residuals, i.e.,
8
Expanding this expression:
min bT b − 2bT Ax + xT AT Ax
x
3. Differentiate with respect to x: To minimize the objective function, we take the derivative
with respect to x and set it equal to zero:
∂
bT b − 2bT Ax + xT AT Ax = 0
∂x
The derivative of each term is as follows: - The derivative of bT b with respect to x is zero (since
it does not depend on x). - The derivative of −2bT Ax with respect to x is −2AT b. - The derivative
of xT AT Ax with respect to x is 2AT Ax (using matrix calculus rules for quadratic forms).
Therefore, the derivative of the objective function is:
−2AT b + 2AT Ax = 0
4. Simplify the equation: Dividing through by 2, we get:
AT Ax = AT b
This is the Gauss normal equation.
5. Solving for x: If AT A is invertible, the solution for x is:
x = (AT A)−1 AT b
This is the least squares solution for the regression coefficients.
Exercise: 3 First we understand the L1 -Approximation Problem
The L1 -approximation problem is about finding a solution x that minimizes the L1 -norm of
the residuals kAx − bk1 , where A is a matrix and b is a vector. The geometric interpretation of
the L1 -norm minimization can be visualized as minimizing the sum of the absolute values of the
residuals, which corresponds to minimizing the distance between the point x and some data points,
in a piecewise linear manner.
The convex hull of a set of points is the smallest convex set that contains all the points.
Geometrically, the convex hull of four points in R2 will be a polygon whose vertices are those four
points if they do not lie on a straight line.
The points (3, 2), (2, 1), ( 23 , 23 ), and (1, 3) form a polygon, and the convex hull is the smallest
convex polygon containing these points. The shape of the polygon is a quadrilateral in this case.
We want to prove that the convex hull of the given points provides solutions to the L1 -
approximation problem. To do this, we need to check whether the optimal solution for the L1 -norm
minimization corresponds to one of the points in the convex hull, or a convex combination of these
points.
In the L1 -norm, the residuals are given by the absolute differences between the predicted values
and the true values. Minimizing the L1 -norm leads to a solution that is often a vertex of the convex
hull of the data points, because the L1 -norm geometry tends to place the optimal solution on a
vertex or along the edges of the convex polygon defined by the data points.
Verifying the Convex Hull as the Solution Set
9
The set of points (3, 2), (2, 1), ( 32 , 23 ), and (1, 3) form a convex polygon in R2 . We can express
any point inside the convex hull as a convex combination of the four vertices:
3 3
x = λ1 (3, 2) + λ2 (2, 1) + λ3 , + λ4 (1, 3)
2 2
where λ1 + λ2 + λ3 + λ4 = 1 and λi ≥ 0 for all i.
For the L1 -norm minimization problem, solutions often lie at the vertices of the convex hull,
which correspond to the points where the objective function (sum of absolute deviations) is mini-
mized. In this case, the points (3, 2), (2, 1), ( 32 , 32 ), and (1, 3) are candidates for solutions, as they
are the vertices of the convex hull.
Are There Any Other Solutions?
Since the optimal solution to an L1 -norm minimization problem often lies at a vertex of the
convex hull, the points on the edges or inside the convex polygon formed by the convex hull may
also be solutions in certain cases. However, the most common case is that the solution is one of
the vertices of the convex hull. Thus, solutions can include the points inside the convex hull (i.e.,
convex combinations of the given points), but the vertices are the most likely candidates for the
optimal solution.
Conclusion
The convex hull of the points (3, 2), (2, 1), ( 32 , 32 ), and (1, 3) provides solutions to the L1 -
approximation problem because the vertices of the convex hull are likely to be optimal solutions
to the problem. Other solutions might include points inside the convex hull, which are convex
combinations of these points, but the primary solutions are at the vertices of the convex polygon.
Example: 4
Let’s break the problem into two parts:
Part 1: Minimization using the method of alternate directions
We are tasked with minimizing the function:
∂f
= 2x1 − x2 − 3
∂x1
Set the derivative equal to zero to find the minimum:
x2 + 3
2x1 − x2 − 3 = 0 ⇒ x1 =
2
This gives us the value of x1 that minimizes f , given x2 .
Step 2: Minimize with respect to x2 (fixing x1 )
Now fix x1 and minimize f (x1 , x2 ) as a function of x2 .
10
f (x1 , x2 ) = x21 − x1 x2 − 3x1 + x22
Taking the derivative with respect to x2 :
∂f
= 2x2 − x1
∂x2
Set the derivative equal to zero to find the minimum:
x1
2x2 − x1 = 0 ⇒ x2 =
2
This gives us the value of x2 that minimizes f , given x1 .
Step 3: Iterative process
Start with the initial point x(0) = (0, 0)T .
(1) 0+3
1. **Iteration 1:** - Fix x2 = 0 and minimize with respect to x1 : x1 = 2 = 1.5. - Now fix
(1) 1.5
x1 = 1.5 and minimize with respect to x2 : x2 = 2 = 0.75.
(2) 0.75+3
2. **Iteration 2:** - Fix x2 = 0.75 and minimize with respect to x1 : x1 = 2 = 1.875. -
(2) 1.875
Now fix x1 = 1.875 and minimize with respect to x2 : x2 = 2 = 0.9375.
(3) 0.9375+3
3. **Iteration 3:** - Fix x2 = 0.9375 and minimize with respect to x1 : x1 = 2 = 1.96875.
(3) 1.96875
- Now fix x1 = 1.96875 and minimize with respect to x2 : = x2 = 0.984375.
2
Through repeated iterations, this process converges to the solution x1 = 2, x2 = 1, which is the
minimizer x∗ = (2, 1)T , as claimed.
—
Part 2: Minimizing f (x) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|}
The new function to minimize is:
|x1 − 7| = |2x1 − 5|
Solving this gives two cases: 1. x1 − 7 = 2x1 − 5 ⇒ x1 = 12 2. x1 − 7 = −(2x1 − 5) ⇒
x1 = 4
At x1 = 4, both expressions are equal, so the minimum occurs at x1 = 4.
Step 2: Fix x1 = 4 and minimize with respect to x2
The function becomes:
11
f (4, x2 ) = max{|2x2 − 3|, |x2 + 3|}
Set the two expressions equal to each other to minimize:
|2x2 − 3| = |x2 + 3|
Solving this gives two cases: 1. 2x2 − 3 = x2 + 3 ⇒ x2 = 6 2. 2x2 − 3 = −(x2 + 3) ⇒
x2 = 0
At x2 = 0, both expressions are equal, so the minimum occurs at x2 = 0.
—
Why don’t we get the minimizer x∗ = (1, 3)T with f (x∗ ) = 0?
The reason we do not reach the minimizer x∗ = (1, 3)T with f (x∗ ) = 0 using this method is
that the method of alternate directions may not always work well with **non-differentiable** or
**nonsmooth** functions like this one.
The function f (x1 , x2 ) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|} involves taking the maximum of
absolute value terms, which introduces non-smoothness and kinks in the function. The method
of alternate directions works well for smooth, differentiable functions, but it can struggle with
functions that have sharp changes or discontinuities in their gradient, like this one. Thus, the
iterative process does not converge to the true minimizer x∗ = (1, 3)T .
2 Penalty Methods
The penalty method is an optimization technique used to solve constrained optimization problems
by converting them into unconstrained optimization problems. The general idea is to add a penalty
term to the objective function, which discourages violations of the constraints by penalizing the
objective function for constraint violations.
Here are a few examples of how the penalty method is applied in optimization:
12
2.2 Quadratic Penalty Method (Inequality Constraints)
Consider the optimization problem:
As µ approaches zero, any violation of the constraint causes the penalty term to grow very
large, driving the solution towards the feasible region.
Here, gi (x) < 0 ensures that the argument of the logarithm is positive. As x approaches the
boundary gi (x) = 0, the penalty term becomes very large, discouraging solutions close to the
boundary.
13
2.6 Logarithmic Penalty for Optimization Problems with Inequality Constraints
Given an optimization problem with inequality constraints:
Here, the penalty function grows large when the solution x violates the bounds l ≤ x ≤ u.
14
X X
P (x, µ) = f (x) + µ max(0, gj (x))2 + hi (x)2
j i
This penalizes violations of both inequality and equality constraints, driving the solution towards
feasibility.
Conclusion:
The penalty method is a powerful tool in optimization for handling constraints. By adding
penalty terms to the objective function, constraint violations can be incorporated into the op-
timization process. Different types of penalty methods (quadratic, barrier, and augmented La-
grangian) allow for flexibility in handling various types of constraints (equality, inequality, and box
constraints).
g(x) = x1 + x2 − 5 = 0
Solution:
1. Set up the penalty function: The penalty method transforms the constrained problem into
an unconstrained problem by adding a penalty term proportional to the square of the constraint
violation. The penalty function is:
µ
P (x, µ) = f (x) + g(x)2
2
So the penalty function becomes:
µ
P (x, µ) = (x1 − 2)2 + (x2 − 3)2 + (x1 + x2 − 5)2
2
2. Choose an initial value for µ: Let µ = 10 (a typical starting point for penalty parameter).
3. Minimize the penalty function: Now, we minimize P (x, µ) with respect to x1 and x2 . Taking
the partial derivatives and solving:
∂P (x, µ)
= 2(x1 − 2) + 10(x1 + x2 − 5) = 0
∂x1
∂P (x, µ)
= 2(x2 − 3) + 10(x1 + x2 − 5) = 0
∂x2
Solving these equations simultaneously, we get:
x1 = 1.25, x2 = 3.75
15
4. Check constraint satisfaction: g(x) = x1 + x2 − 5 = 1.25 + 3.75 − 5 = 0, so the constraint is
satisfied.
Final solution: The optimal solution is x1 = 1.25, x2 = 3.75, and the minimum value of the
objective function is f (1.25, 3.75) = 3.125.
Example 2: Quadratic Penalty Method for Inequality Constraints
Problem: Minimize the objective function:
x1 = 1.196, x2 = 1.961
4. Check constraint satisfaction: g(x) = x21 + x22 − 4 = 1.1962 + 1.9612 − 4 = 3.9999 − 4 ≈ 0, so
the inequality constraint is satisfied.
Final solution: The optimal solution is x1 = 1.196, x2 = 1.961, and the minimum value of the
objective function is approximately f (1.196, 1.961) ≈ 1.005.
Example 3: Logarithmic Barrier Method for Inequality Constraints
Problem: Minimize the objective function:
g(x) = x1 + x2 − 1 ≤ 0
Solution:
1. Set up the barrier function: In the logarithmic barrier method, the penalty is infinite when
the inequality constraint is violated. The penalty function is:
1
P (x, µ) = f (x) − log(−g(x))
µ
For this problem, the penalty function becomes:
16
1
P (x, µ) = x21 + x22 − log(1 − (x1 + x2 ))
µ
2. Choose an initial value for µ: Let µ = 0.1.
3. Minimize the penalty function: We minimize P (x, µ) using numerical methods. After mini-
mization, we obtain:
x1 = 0.5, x2 = 0.5
4. Check constraint satisfaction: g(x) = x1 + x2 − 1 = 0.5 + 0.5 − 1 = 0, so the inequality
constraint is satisfied.
Final solution: The optimal solution is x1 = 0.5, x2 = 0.5, and the minimum value of the
objective function is f (0.5, 0.5) = 0.5.
Example 4: Augmented Lagrangian Method
Problem: Minimize the objective function:
g(x) = x1 + x2 − 2 = 0
Solution:
1. Set up the augmented Lagrangian: The augmented Lagrangian function is:
µ
L(x, λ, µ) = f (x) + λg(x) + g(x)2
2
For this problem, the augmented Lagrangian becomes:
µ
L(x, λ, µ) = x21 + x22 + λ(x1 + x2 − 2) +(x1 + x2 − 2)2
2
2. Choose initial values for λ and µ: Let λ = 1 and µ = 10.
3. Minimize the augmented Lagrangian: We minimize L(x, λ, µ) with respect to x1 and x2 .
After minimization, we find:
x1 = 1, x2 = 1
4. Check constraint satisfaction: g(x) = x1 +x2 −2 = 1+1−2 = 0, so the constraint is satisfied.
Final solution: The optimal solution is x1 = 1, x2 = 1, and the minimum value of the objective
function is f (1, 1) = 2.
Summary: - Quadratic Penalty Method: Adds a squared penalty term for constraint violations.
- Logarithmic Barrier Method: Adds a logarithmic penalty term to discourage constraint violations,
particularly near the boundary. - Augmented Lagrangian Method: Combines Lagrange multipliers
with a quadratic penalty term, improving stability and convergence.
These methods convert constrained problems into easier-to-solve unconstrained problems by
penalizing constraint violations, ultimately leading to the optimal solution.
17
2.12 Lagrangian Multipliers for Optimality Conditions
Let’s go through a couple of numerical examples to see how to apply the Lagrange Multiplier Rule
to find the necessary optimality conditions for constrained optimization problems.
Example 1: Maximizing a function subject to a constraint
Problem Maximize the function:
f (x, y) = x2 + y 2
subject to the constraint:
g(x, y) = x + y − 1 = 0
Step 1: Define the Lagrangian We introduce the Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x2 + y 2 + λ(x + y − 1)
Step 2: Compute partial derivatives We now compute the partial derivatives of the Lagrangian
with respect to x, y, and λ:
- ∂L ∂L ∂L
∂x = 2x + λ = 0 - ∂y = 2y + λ = 0 - ∂λ = x + y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ = 0 2. 2y + λ = 0 3. x + y = 1
From the first two equations, we have:
Thus, x = y.
Substitute x = y into the third equation:
1
x+x=1 ⇒ 2x = 1 ⇒ x=
2
Since x = y, we also have y = 21 .
Step 4: Solution The 2 2
1 1
maximum value of f (x, y) = x + y subject to the constraint x + y = 1
occurs at (x, y) = 2 , 2 .
Example 2: Minimizing a function subject to two constraints
Problem Minimize the function:
f (x, y, z) = x2 + y 2 + z 2
Step 1: Define the Lagrangian We introduce two Lagrange multipliers, λ1 and λ2 , for the
constraints and form the Lagrangian:
L(x, y, z, λ1 , λ2 ) = x2 + y 2 + z 2 + λ1 (x + y + z − 1) + λ2 (x − y)
Step 2: Compute partial derivatives We now compute the partial derivatives of the Lagrangian
with respect to x, y, z, λ1 , and λ2 :
- ∂L ∂L ∂L ∂L
∂x = 2x + λ1 + λ2 = 0 - ∂y = 2y + λ1 − λ2 = 0 - ∂z = 2z + λ1 = 0 - ∂λ1 = x + y + z − 1 = 0
∂L
- ∂λ 2
=x−y =0
18
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ1 + λ2 = 0 2. 2y + λ1 − λ2 = 0 3. 2z + λ1 = 0 4. x + y + z = 1 5. x − y = 0
From equation 5, we know x = y. Substituting x = y into equations 1 and 2 gives:
2x + λ1 + λ2 = 0 and 2x + λ1 − λ2 = 0
4x + 2λ1 = 0 ⇒ λ1 = −2x
2z − 2x = 0 ⇒ z=x
19
Thus, y = ± √12 .
Step 4: Solution The two critical points are:
1 1 1 1
(x, y) = √ , √ and (x, y) =−√ , −√
2 2 2 2
√
Evaluating f (x, y) = x + y at these points: - At √12 , √12 , f (x, y) = 2. - At − √12 , − √12 ,
√
f (x, y) = − 2. √ √
Thus, the maximum value is 2, and the minimum value is − 2.
Example 4: Minimizing a quadratic function with two constraints
Problem Minimize the function:
f (x, y) = x2 + y 2
subject to the constraints:
g1 (x, y) = x + y − 2 = 0
and
g2 (x, y) = x − y − 1 = 0
Step 1: Define the Lagrangian We introduce two Lagrange multipliers, λ1 and λ2 , for the
constraints and form the Lagrangian:
L(x, y, λ1 , λ2 ) = x2 + y 2 + λ1 (x + y − 2) + λ2 (x − y − 1)
Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, λ1 , and λ2 :
- ∂L ∂L ∂L ∂L
∂x = 2x + λ1 + λ2 = 0 - ∂y = 2y + λ1 − λ2 = 0 - ∂λ1 = x + y − 2 = 0 - ∂λ2 = x − y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ1 + λ2 = 0 2. 2y + λ1 − λ2 = 0 3. x + y = 2 4. x − y = 1
From equations 3 and 4, we can solve for x and y: - Adding the two equations: (x+y)+(x−y) =
2 + 1 ⇒ 2x = 3 ⇒ x = 32 - Substituting x = 23 into x + y = 2: 32 + y = 2 ⇒ y = 12
3 1 3
Now,
1
substitute x = 2 and y = 2 into the system: 1. 2 2 + λ1 + λ2 = 0 ⇒ 3 + λ1 + λ2 = 0
2. 2 2 + λ1 − λ2 = 0 ⇒ 1 + λ1 − λ2 = 0
From equation 2, solve for λ1 :
λ1 = λ2 − 1
Substitute into equation 1:
Substitute λ2 = −1 into λ1 = λ2 − 1:
λ1 = −1 − 1 = −2
Step 4: Solution The critical point is 23 , 12 . The minimum value of f (x, y) = x2 + y 2 at this
point is:
2 2
3 1 3 1 9 1 10
f , = + = + = = 2.5
2 2 2 2 4 4 4
20
Problem Minimize the function:
f (x, y) = x2 + 4y 2
3 Optimality Conditions
Let us find all three cones (cone of descent directions, cone of linearization, and cone of feasible
directions) for some problems at a specific point (x0 , y0 ).
Example: 1
Objective:
f (x, y) = (x − 1)2 + (y − 2)2
Constraint:
g(x, y) = x + 2y − 4 ≤ 0
We analyze the cones at (x0 , y0 ) = (0, 0).
1. Cone of Descent Direction: The cone of descent direction is formed by all directions d =
(dx , dy ) that satisfy:
∇f (x0 , y0 ) · d < 0
Gradient of f (x, y):
2(x − 1)
∇f (x, y) =
2(y − 2)
At (x0 , y0 ) = (0, 0):
−2
∇f (0, 0) =
−4
Direction d = (dx , dy ) must satisfy:
or equivalently:
dx + 2dy > 0
Cone of Descent Directions: The cone of descent directions is all d = (dx , dy ) such that dx +2dy >
0. This is a half-space where the direction vector points away from the line dx + 2dy = 0.
—
21
2. Cone of Linearization: The cone of linearization is determined by the linearized form of the
constraint:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
Gradient of g(x, y):
1
∇g(x, y) =
2
At (x0 , y0 ) = (0, 0):
1
∇g(0, 0) =
2
At (0, 0):
g(0, 0) = 0 + 0 − 4 = −4,
Linearized constraint:
1 · dx + 2 · dy ≤ 4
Cone of Linearization: The cone of linearization is all d = (dx , dy ) such that dx + 2dy ≤ 4. This
is another half-space, bounded by the line dx + 2dy = 4, and includes the region below or on the
line.
3. Cone of Feasible Directions: A feasible direction satisfies both the linearized constraint and
lies in the feasible region. At a boundary point g(x0 , y0 ) = 0, the feasible direction also satisfies:
∇g(x0 , y0 ) · d ≤ 0
At (x0 , y0 ) = (0, 0): The point (x0 , y0 ) = (0, 0) is not on the boundary (g(0, 0) = −4, strictly
feasible), so the feasible directions are the same as the cone of linearization:
dx + 2dy ≤ 0
22
Example: 2
Objective:
f (x, y) = x2 + y 2
Constraint:
g(x, y) = 2x + y − 5 ≤ 0
Point: (x0 , y0 ) = (1, 2)
1. Cone of Feasible Directions:
The feasible direction satisfies:
∇g(x0 , y0 ) · d ≤ 0
Gradient of g:
∇g(x, y) = [2, 1]
At (1, 2), the condition is:
2dx + dy ≤ 0
2. Linearizing Cone:
The linearizing cone includes the feasibility condition and adjusts for the constraint value:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
At (1, 2):
g(1, 2) = 2 + 2 − 5 = −1,
2dx + dy ≤ 1
3. Cone of Descent Directions:
Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [2x, 2y]
At (1, 2):
∇f (1, 2) = [2, 4]
Condition for descent direction:
2dx + 4dy < 0
23
Feasible directions satisfy:
Gradients:
∇g1 (x, y) = [1, −1], ∇g2 (x, y) = [2, 1]
Conditions:
dx − dy ≤ 0, 2dx + dy ≤ 0
2. Linearizing Cone: Add the constraint values:
At (2, 3):
g1 (2, 3) = 2 − 3 − 1 = −2, g2 (2, 3) = 2(2) + 3 − 6 = 1
Conditions:
dx − dy ≤ 2, 2dx + dy ≤ −1
3. Cone of Descent Directions: Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [2(x − 3), 2(y − 4)]
At (2, 3):
∇f (2, 3) = [−2, −2]
Condition:
−2dx − 2dy < 0
24
2. Linearizing Cone:
Add the constraint value:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
At (1, 1):
g(1, 1) = 12 + 12 − 4 = −2
Condition:
2dx + 2dy ≤ 2
3. Cone of Descent Directions:
Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [cos(x), 2y]
At (1, 1):
∇f (1, 1) = [cos(1), 2]
Condition:
cos(1)dx + 2dy < 0
In convex optimization problems, the Karush-Kuhn-Tucker (KKT) conditions are typically both
**necessary and sufficient** for optimality, given certain regularity conditions. However, there are
cases where these regularity conditions fail, and the KKT conditions may not be necessary, even
for convex problems. Below are examples illustrating such situations:
Example 1: Non-Smooth Constraint Consider the convex optimization problem:
Analysis:
The constraint x2 ≤ 0 implies x = 0, making x = 0 the only feasible point and hence the
global minimum.
However, f (x) is not differentiable at x = 0, and this violates the regularity conditions
required for the KKT conditions to apply. Therefore, KKT conditions cannot characterize
the solution, even though the problem is convex.
Analysis:
The problem is convex because both f (x) and g(x) are convex.
25
The gradient of g(x), ∇g(x) = −1, does not satisfy the regularity condition known as the
**Mangasarian-Fromovitz Constraint Qualification (MFCQ)**. Specifically, there is no di-
rection d such that ∇g(x)d < 0 for x = 0.
Since MFCQ fails, the KKT conditions are not guaranteed to hold, even though the problem
is convex.
Example 3: Semi-Infinite Constraints Consider:
Analysis:
The constraint g(x) = max(0, −x) = 0 essentially forces x ≥ 0.
However, the non-differentiable nature of g(x) (due to the ”max” function) makes it chal-
lenging to define a valid Lagrange multiplier. The KKT conditions do not apply because the
regularity conditions (like differentiability or constraint qualifications) are violated.
Example 4: Non-Interior Feasible Point (Slater’s Condition Violation)** Consider the problem:
Analysis:
The objective function f (x, y) = x2 + y 2 is convex, and the constraint g(x, y) = y − |x| defines
a feasible region bounded by y ≤ |x|.
The objective function f (x, y) = x2 + y 2 is convex, and the constraint g(x, y) = y − |x| defines
a feasible region bounded by y ≤ |x|.
The optimal solution is at (x, y) = (0, 0), which lies on the boundary of the feasible region.
Slater’s condition fails because there is no strictly interior point satisfying g(x, y) < 0 (the
feasible region touches the origin at the boundary). Consequently, KKT conditions may not
apply.
Example 5: Non-Differentiable Objective Function
Analysis:
The function f (x) = |x| is convex but not differentiable at x = 0.
The lack of differentiability at x = 0 prevents the application of KKT conditions, even though
the problem is convex.
Example 6: Inactive Constraints**
Analysis
26
The objective f (x, y) = x2 + y 2 is convex.
The constraints x ≥ 0, y ≥ 0, and x + y ≤ −1 are convex, but the feasible set is empty (the
constraints are inconsistent).
Since there are no feasible points, the KKT conditions cannot apply.
Analysis:
The feasible region is degenerate: it is a single point at (x, y) = (0, 0), which is trivially the
optimal solution.
The constraint qualifications (e.g., MFCQ or Slater’s condition) fail because the gradient of
the constraints cannot span the full space of variables.
The KKT conditions are not applicable, even though the problem is convex.
Analysis:
R1 R1
The objective f (x) = 0 x(t)2 dt is convex, and the constraint g(x) = 0 x(t) dt − 1 ≤ 0 is
also convex.
Analysis:
The problem is convex, but the two equality constraints x1 + x2 = 1 and x1 − x2 = 1 are
linearly dependent, leading to a violation of the regularity conditions.
These examples demonstrate that while convexity simplifies optimization problems, regularity
conditions such as Slater’s condition, differentiability, and linear independence of constraints play
crucial roles in ensuring the applicability of KKT conditions. When these conditions are violated,
even convex problems may lack necessary KKT characterizations.
References
27