0% found this document useful (0 votes)
46 views27 pages

Optimization Theory 2

Optimization course notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views27 pages

Optimization Theory 2

Optimization course notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Optimization Theory

Muhammad Abid Dar

December 16, 2024

1 Unconstrained Optimization Problems


An “unconstrained optimization problem” is a type of optimization problem in which the objective
is to find the maximum or minimum of a function without any restrictions or constraints on the
variables. In other words, there are no specific conditions (like inequalities or equalities) imposed
on the variables, meaning they can take any values within their domain.
General Form
An unconstrained optimization problem can be formulated as:

min f (x)
x
or
max f (x)
x

where:

ˆ f (x) is the objective function to be minimized (or maximized).

ˆ x is the vector of decision variables.

In this problem, we are interested in finding the value(s) of x that either minimize or maximize
f (x), depending on the problem at hand.

1.1 Key Features of Unconstrained Optimization


1. No Constraints: There are no additional conditions or limits on the values of x. Unlike
constrained problems, where variables must satisfy certain conditions (such as x ≥ 0 or
g(x) = 0), unconstrained problems allow the variables to vary freely.

2. First-Order Condition (Optimality Condition): To find the optimal point (minimum or max-
imum), the first-order condition is that the gradient of the objective function should be zero
at the solution point:
∇f (x) = 0
This means that the derivative of the objective function with respect to x should vanish at
the optimum, i.e., the slope of the function is zero at this point.

3. Second-Order Condition: After finding the critical points using the first-order condition, the
second-order condition (Hessian matrix) helps determine whether the point is a minimum,
maximum, or saddle point.

1
ˆ If the Hessian matrix is positive definite, the point is a local minimum.
ˆ If the Hessian is negative definite, the point is a local maximum.
ˆ If the Hessian is indefinite, the point is a saddle point.

1.2 Types of Unconstrained Optimization Problems


1. Univariate Optimization:

ˆ Involves optimizing a function with a single variable.


ˆ Example:
min f (x) = x2 + 4x + 5
x

2. Multivariate Optimization:

ˆ Involves optimizing a function of multiple variables.


ˆ Example:
min f (x, y) = x2 + y 2 + 4x − 6y
x,y

1.3 Solving Unconstrained Optimization Problems


There are several methods to solve unconstrained optimization problems depending on the com-
plexity of the function:

1. Analytical Methods: - If the objective function is smooth and differentiable, you can often
find the solution by setting the gradient to zero and solving for the critical points. - Example:

min f (x) = x2 + 3x + 2
x

The solution is obtained by finding the derivative and setting it equal to zero:
d 2 3
(x + 3x + 2) = 2x + 3 = 0 ⇒ x=−
dx 2

2. Numerical Methods: - For more complex or non-analytical functions, numerical methods


such as gradient descent, Newton’s method, or quasi-Newton methods (e.g., BFGS) are used.
These methods iteratively find the minimum or maximum by following the gradient of the
function.

ˆ Gradient Descent: Iteratively updates the variables by moving in the direction of the
negative gradient (for minimization).

xk+1 = xk − α∇f (xk )

where α is the step size.


ˆ Newton’s Method: Uses both the gradient and the Hessian matrix to update the vari-
ables.
xk+1 = xk − [Hf (xk )]−1 ∇f (xk )
where Hf (xk ) is the Hessian matrix.

2
1.4 Applications of Unconstrained Optimization
1. Machine Learning: Training models like linear regression and logistic regression involves
unconstrained optimization of loss functions.

2. Economics and Finance: Many problems, such as maximizing utility or minimizing costs, can
be modeled as unconstrained optimization problems in certain cases.

3. Engineering Design: Finding the optimal design parameters to minimize cost or maximize
efficiency often involves unconstrained optimization.

1.5 Example Problem


Problem: Minimize the function f (x, y) = x2 + y 2 − 4x + 6y.
Solution:

1. Compute the gradient:


 
∂ ∂
∇f (x, y) = , = (2x − 4, 2y + 6)
∂x ∂y

2. Set the gradient to zero:


2x − 4 = 0 ⇒ x=2
2y + 6 = 0 ⇒ y = −3

3. The critical point is (x, y) = (2, −3).

4. To confirm it’s a minimum, check the Hessian matrix:


 
2 0
Hf (x, y) =
0 2

Since the Hessian is positive definite (its eigenvalues are both positive), the point (2, −3) is a
local minimum.

Conclusion: Unconstrained optimization problems focus solely on optimizing a function without


restrictions on the variables, making them simpler than constrained optimization problems. Despite
their simplicity, they have broad applications across various fields, and methods to solve them can
range from basic analytical techniques to more sophisticated numerical methods.

1.6 Exercise
Example:1 Quadratic Function
Optimize f (x, y) = x2 + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x, = 2y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0

3
Critical point: (0, 0). 3. Hessian matrix:
 
2 0
Hf (x, y) =
0 2

4. Eigenvalues of the Hessian: 2, 2 (both positive). 5. Since both eigenvalues are positive, the
function has a local minimum at (0, 0).
Example 2: Saddle Point Problem: Optimize f (x, y) = x2 − y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x, = −2y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:
 
2 0
Hf (x, y) =
0 −2

4. Eigenvalues of the Hessian: 2, −2 (one positive, one negative). 5. Since the eigenvalues have
different signs, (0, 0) is a saddle point.
Example 3: Elliptic Paraboloid Problem: Optimize f (x, y) = 3x2 + 2y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 6x, = 4y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:
 
6 0
Hf (x, y) =
0 4

4. Eigenvalues of the Hessian: 6, 4 (both positive). 5. The function has a local minimum at (0, 0).
Example 4: Mixed Function Problem: Optimize f (x, y) = 4x2 − 4xy + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 8x − 4y, = 2y − 4x
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x=y=0
Critical point: (0, 0). 3. Hessian matrix:
 
8 −4
Hf (x, y) =
−4 2

4
4. Eigenvalues of the Hessian: Solve det(H − λI) = 0, yielding λ1 = 10, λ2 = 0. 5. The eigenvalues
indicate that (0, 0) is a saddle point.
Example 5: Cross-Term Function Problem: Optimize f (x, y) = x2 + xy + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x + y, = 2y + x
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:
 
2 1
Hf (x, y) =
1 2

4. Eigenvalues: λ1 = 3, λ2 = 1 (both positive). 5. The function has a local minimum at (0, 0).
Example 6: Exponential Function Problem: Optimize f (x, y) = ex+y .
Solution: 1. First-order partial derivatives:
∂f ∂f
= ex+y , = ex+y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0: There are no critical points since ex+y 6= 0 for any x and y.
x2 +y 2
Example 7: Rational Function Problem: Optimize f (x, y) = 1+x2 +y 2
.
Solution: 1. First-order partial derivatives:

∂f 2x(1 + x2 + y 2 ) − 2x(x2 + y 2 ) ∂f 2y(1 + x2 + y 2 ) − 2y(x2 + y 2 )


= , =
∂x (1 + x2 + y 2 )2 ∂y (1 + x2 + y 2 )2
∂f ∂f
2. Set ∂x = 0 and ∂y = 0 at x = 0, y = 0.
Example 8: Polynomial Function Problem: Optimize f (x, y) = x4 + y 4 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 4x3 , = 4y 3
∂x ∂y
∂f ∂f
2. Set ∂x = 0, ∂y = 0: Critical point at (0, 0).
Example 9: Logarithmic Function Problem: Optimize f (x, y) = log(1 + x2 + y 2 ).
Example 10: Sum of Sine Functions Problem: Optimize f (x, y) = sin(x) + sin(y).

1.7 Solution to the exercise of Chapter:1


Exercise: 1 To determine the orbit of the planetoid, we can use the given observations and fit an
elliptical orbit to the data using the method of least squares. The equation for the orbit is assumed
to be of the form:

x2 = ay 2 + bxy + cx + dy + e

5
Here, the goal is to determine the coefficients a, b, c, d, and e based on the 10 observations of
(xj , yj ), where j = 1, . . . , 10.
Steps to Solve:
1. Rewrite the equation in terms of a linear system: The given equation x2 = ay 2 + bxy + cx +
dy + e is linear in terms of the unknowns a, b, c, d, e. We can rearrange it as:

x2j = ayj2 + bxj yj + cxj + dyj + e for j = 1, 2, . . . , 10


2. Set up the matrix equation: For each observation (xj , yj ), the equation above can be written
in the form:
 
a
b
x2j = yj2 xj yj xj yj 1 
  
c

d
e
Let’s define: - X: a matrix of size 10 × 5, where each row corresponds to the terms yj2 , xj yj ,
xj , yj , and 1 for each observation j. - Xv ec: a vector of length 10 with elements x2j .
Therefore, we need to solve the following linear system:

X · β = Xvec
 
a
b
 
where β =   c .

d
e
3. Use least squares method: Since we have more equations (10) than unknowns (5), this is an
overdetermined system. We can solve it using the method of least squares, which minimizes the
residuals:
min kXβ − Xvec k2
β

The least squares solution is given by:

β = (X T X)−1 X T Xvec

MATLAB Code to Solve the Problem % Given data points (xj , yj )


x = [-1.024940, -0.949898, -0.866114, -0.773392, -0.671372, -0.559524, -0.437067, -0.302909, -
0.155493, -0.007464];
y = [-0.389269, -0.322894, -0.265256, -0.216557, -0.177152, -0.147582, -0.128618, -0.121353, -
0.127348, -0.148885];
% Number of observations n = length(x);
% Form the matrix X (10x5 matrix) X = [y.ˆ 2’, (x.*y)’, x’, y’, ones(n, 1)];
% Vector Xv ec containing x2j

Xv ec = x0 .2 ;
% Solve the least squares problem to find the coefficients [a, b, c, d, e]
coefficients = (X 0 ∗ X) (X 0 ∗ Xv ec);

6
% Display the results a = coefficients(1); b = coefficients(2); c = coefficients(3); d = coefficients(4);
e = coefficients(5);
fprintf(’The coefficients are:\n’);
fprintf(’a = %f \n’, a);
fprintf(’b = %f \n’, b);
fprintf(’c = %f \n’, c);
fprintf(’d = %f \n’, d);
fprintf(’e = %f \n’, e);
Explanation:
1. Matrix ‘X‘: Each row of the matrix corresponds to the terms yj2 , xj yj , xj , yj , and 1, which
are derived from the equation x2j = ayj2 + bxj yj + cxj + dyj + e. 2. Vector ‘Xv ec‘: This vector
contains the values x2j for each observation. 3. Least squares solution: The MATLAB command
‘(X 0 ∗ X) (X 0 ∗ Xv ec)‘ solves the least squares problem to find the unknown coefficients a, b, c, d,
and e.
Output: The code will output the coefficients a, b, c, d, and e, which define the orbit of the
planetoid based on the observed positions.
Predicting the Orbit: Once you have the coefficients, you can use the equation:

x2 = ay 2 + bxy + cx + dy + e
to predict the planetoid’s future positions in its orbit, or find its next position when it becomes
visible again.
(b) For the parabolic approach, we assume the orbit equation takes the simpler form:

x2 = dy + e
Here, we have only two unknown coefficients, d and e, and we aim to determine these based on
the observed data.
Steps to Solve:
1. Rewrite the equation: For each observation (xj , yj ), the equation becomes:

x2j = dyj + e
This can be rewritten as a linear system:
 
 d
x2j

= yj 1
e
2. Set up the matrix equation: Let’s define: - X: a matrix of size 10 × 2, where each row
corresponds to the terms yj and 1 for each observation j. - Xv ec: a vector of length 10 with
elements x2j .
Therefore, we need to solve the linear system:

X · β = Xvec
 
d
where β = .
e
3. Use least squares method: Since we have more equations (10) than unknowns (2), this is
an overdetermined system. We can solve it using the least squares method, which minimizes the
residuals:
min kXβ − Xvec k2
β

7
The least squares solution is given by:

β = (X T X)−1 X T Xvec

MATLAB Code to Solve the Problem


% Given data points (xj , yj )
x = [-1.024940, -0.949898, -0.866114, -0.773392, -0.671372, -0.559524, -0.437067, -0.302909, -
0.155493, -0.007464];
y = [-0.389269, -0.322894, -0.265256, -0.216557, -0.177152, -0.147582, -0.128618, -0.121353, -
0.127348, -0.148885];
% Number of observations
n = length(x);
% Form the matrix X (10x2 matrix) X = [y’, ones(n, 1)];
% Vector Xv ec containing x2j
Xv ec = x0 .2 ;
% Solve the least squares problem to find the coefficients [d, e]
coefficients = (X 0 ∗ X) (X 0 ∗ Xv ec);
% Display the results
d = coefficients(1); e = coefficients(2);
fprintf(’The coefficients are: \n’);
fprintf(’d = %f \n’, d);
fprintf(’e = %f \n’, e);
Explanation:
1. Matrix ‘X’: Each row of the matrix corresponds to the terms yj and 1, which are derived from
the equation x2j = dyj + e. 2. Vector Xv ec: This vector contains the values x2j for each observation.
3. Least squares solution: The MATLAB command (X 0 ∗ X) (X 0 ∗ Xv ec) solves the least squares
problem to find the unknown coefficients d and e.
Output: The code will output the coefficients d and e, which define the parabolic orbit of the
planetoid based on the observed positions.
Predicting the Orbit: Once you have the coefficients d and e, you can use the equation:

x2 = dy + e
to predict the planetoid’s future positions in its orbit based on the y-coordinate, or to find its
next position when it becomes visible again.
Exercise: 2 Gauss Normal Equation (in the context of linear regression)
The Gauss normal equation is a key result used to solve the least squares problem in linear
regression. In linear regression, the objective is to find the best-fit line or plane (in higher dimen-
sions) to a set of data points by minimizing the sum of the squared differences between the observed
and predicted values.
Proof of Gauss Normal Equation
The objective function is the sum of squared residuals, i.e.,

Objective: min krk2 = min kb − Axk2


x x
2. Expand the objective function: The squared residual norm can be expanded as:

kb − Axk2 = (b − Ax)T (b − Ax)

8
Expanding this expression:

(b − Ax)T (b − Ax) = bT b − 2bT Ax + xT AT Ax


So the objective function becomes:

min bT b − 2bT Ax + xT AT Ax

x

3. Differentiate with respect to x: To minimize the objective function, we take the derivative
with respect to x and set it equal to zero:


bT b − 2bT Ax + xT AT Ax = 0

∂x
The derivative of each term is as follows: - The derivative of bT b with respect to x is zero (since
it does not depend on x). - The derivative of −2bT Ax with respect to x is −2AT b. - The derivative
of xT AT Ax with respect to x is 2AT Ax (using matrix calculus rules for quadratic forms).
Therefore, the derivative of the objective function is:

−2AT b + 2AT Ax = 0
4. Simplify the equation: Dividing through by 2, we get:

AT Ax = AT b
This is the Gauss normal equation.
5. Solving for x: If AT A is invertible, the solution for x is:

x = (AT A)−1 AT b
This is the least squares solution for the regression coefficients.
Exercise: 3 First we understand the L1 -Approximation Problem
The L1 -approximation problem is about finding a solution x that minimizes the L1 -norm of
the residuals kAx − bk1 , where A is a matrix and b is a vector. The geometric interpretation of
the L1 -norm minimization can be visualized as minimizing the sum of the absolute values of the
residuals, which corresponds to minimizing the distance between the point x and some data points,
in a piecewise linear manner.
The convex hull of a set of points is the smallest convex set that contains all the points.
Geometrically, the convex hull of four points in R2 will be a polygon whose vertices are those four
points if they do not lie on a straight line.
The points (3, 2), (2, 1), ( 23 , 23 ), and (1, 3) form a polygon, and the convex hull is the smallest
convex polygon containing these points. The shape of the polygon is a quadrilateral in this case.
We want to prove that the convex hull of the given points provides solutions to the L1 -
approximation problem. To do this, we need to check whether the optimal solution for the L1 -norm
minimization corresponds to one of the points in the convex hull, or a convex combination of these
points.
In the L1 -norm, the residuals are given by the absolute differences between the predicted values
and the true values. Minimizing the L1 -norm leads to a solution that is often a vertex of the convex
hull of the data points, because the L1 -norm geometry tends to place the optimal solution on a
vertex or along the edges of the convex polygon defined by the data points.
Verifying the Convex Hull as the Solution Set

9
The set of points (3, 2), (2, 1), ( 32 , 23 ), and (1, 3) form a convex polygon in R2 . We can express
any point inside the convex hull as a convex combination of the four vertices:
 
3 3
x = λ1 (3, 2) + λ2 (2, 1) + λ3 , + λ4 (1, 3)
2 2
where λ1 + λ2 + λ3 + λ4 = 1 and λi ≥ 0 for all i.
For the L1 -norm minimization problem, solutions often lie at the vertices of the convex hull,
which correspond to the points where the objective function (sum of absolute deviations) is mini-
mized. In this case, the points (3, 2), (2, 1), ( 32 , 32 ), and (1, 3) are candidates for solutions, as they
are the vertices of the convex hull.
Are There Any Other Solutions?
Since the optimal solution to an L1 -norm minimization problem often lies at a vertex of the
convex hull, the points on the edges or inside the convex polygon formed by the convex hull may
also be solutions in certain cases. However, the most common case is that the solution is one of
the vertices of the convex hull. Thus, solutions can include the points inside the convex hull (i.e.,
convex combinations of the given points), but the vertices are the most likely candidates for the
optimal solution.
Conclusion
The convex hull of the points (3, 2), (2, 1), ( 32 , 32 ), and (1, 3) provides solutions to the L1 -
approximation problem because the vertices of the convex hull are likely to be optimal solutions
to the problem. Other solutions might include points inside the convex hull, which are convex
combinations of these points, but the primary solutions are at the vertices of the convex polygon.
Example: 4
Let’s break the problem into two parts:
Part 1: Minimization using the method of alternate directions
We are tasked with minimizing the function:

f (x1 , x2 ) = x21 + x22 − x1 x2 − 3x1


using the **method of alternate directions**. This means we iteratively minimize f (x1 , x2 ) by
first fixing x2 and minimizing with respect to x1 , then fixing x1 and minimizing with respect to x2 ,
and repeating.
Step 1: Minimize with respect to x1 (fixing x2 )
Fix x2 and minimize f (x1 , x2 ) as a function of x1 .

f (x1 , x2 ) = x21 − x1 x2 − 3x1 + x22


Taking the derivative with respect to x1 :

∂f
= 2x1 − x2 − 3
∂x1
Set the derivative equal to zero to find the minimum:
x2 + 3
2x1 − x2 − 3 = 0 ⇒ x1 =
2
This gives us the value of x1 that minimizes f , given x2 .
Step 2: Minimize with respect to x2 (fixing x1 )
Now fix x1 and minimize f (x1 , x2 ) as a function of x2 .

10
f (x1 , x2 ) = x21 − x1 x2 − 3x1 + x22
Taking the derivative with respect to x2 :

∂f
= 2x2 − x1
∂x2
Set the derivative equal to zero to find the minimum:
x1
2x2 − x1 = 0 ⇒ x2 =
2
This gives us the value of x2 that minimizes f , given x1 .
Step 3: Iterative process
Start with the initial point x(0) = (0, 0)T .
(1) 0+3
1. **Iteration 1:** - Fix x2 = 0 and minimize with respect to x1 : x1 = 2 = 1.5. - Now fix
(1) 1.5
x1 = 1.5 and minimize with respect to x2 : x2 = 2 = 0.75.
(2) 0.75+3
2. **Iteration 2:** - Fix x2 = 0.75 and minimize with respect to x1 : x1 = 2 = 1.875. -
(2) 1.875
Now fix x1 = 1.875 and minimize with respect to x2 : x2 = 2 = 0.9375.
(3) 0.9375+3
3. **Iteration 3:** - Fix x2 = 0.9375 and minimize with respect to x1 : x1 = 2 = 1.96875.
(3) 1.96875
- Now fix x1 = 1.96875 and minimize with respect to x2 : = x2 = 0.984375.
2
Through repeated iterations, this process converges to the solution x1 = 2, x2 = 1, which is the
minimizer x∗ = (2, 1)T , as claimed.

Part 2: Minimizing f (x) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|}
The new function to minimize is:

f (x1 , x2 ) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|}


We are asked to use the same method of alternate directions, starting with x(0) = (0, 0)T .
Step 1: Fix x2 = 0 and minimize with respect to x1
The function becomes:

f (x1 , 0) = max{|x1 − 7|, |2x1 − 5|}


To minimize this, consider the two expressions: - |x1 − 7| - |2x1 − 5|
We need to find where these two expressions are equal, as this is likely to minimize f (x1 , 0).
Setting them equal to each other:

|x1 − 7| = |2x1 − 5|
Solving this gives two cases: 1. x1 − 7 = 2x1 − 5 ⇒ x1 = 12 2. x1 − 7 = −(2x1 − 5) ⇒
x1 = 4
At x1 = 4, both expressions are equal, so the minimum occurs at x1 = 4.
Step 2: Fix x1 = 4 and minimize with respect to x2
The function becomes:

f (4, x2 ) = max{|4 + 2x2 − 7|, |8 + x2 − 5|}


Simplifying:

11
f (4, x2 ) = max{|2x2 − 3|, |x2 + 3|}
Set the two expressions equal to each other to minimize:

|2x2 − 3| = |x2 + 3|
Solving this gives two cases: 1. 2x2 − 3 = x2 + 3 ⇒ x2 = 6 2. 2x2 − 3 = −(x2 + 3) ⇒
x2 = 0
At x2 = 0, both expressions are equal, so the minimum occurs at x2 = 0.

Why don’t we get the minimizer x∗ = (1, 3)T with f (x∗ ) = 0?
The reason we do not reach the minimizer x∗ = (1, 3)T with f (x∗ ) = 0 using this method is
that the method of alternate directions may not always work well with **non-differentiable** or
**nonsmooth** functions like this one.
The function f (x1 , x2 ) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|} involves taking the maximum of
absolute value terms, which introduces non-smoothness and kinks in the function. The method
of alternate directions works well for smooth, differentiable functions, but it can struggle with
functions that have sharp changes or discontinuities in their gradient, like this one. Thus, the
iterative process does not converge to the true minimizer x∗ = (1, 3)T .

2 Penalty Methods
The penalty method is an optimization technique used to solve constrained optimization problems
by converting them into unconstrained optimization problems. The general idea is to add a penalty
term to the objective function, which discourages violations of the constraints by penalizing the
objective function for constraint violations.
Here are a few examples of how the penalty method is applied in optimization:

2.1 Quadratic Penalty Method (Equality Constraints)


Consider the optimization problem:

min f (x) subject to g(x) = 0


Here, g(x) is the equality constraint.
In the quadratic penalty method, the constrained problem is transformed into an unconstrained
problem by adding a penalty term to the objective function. The modified objective function is:
µ
P (x, µ) = f (x) + kg(x)k2
2
where: - µ is the penalty parameter, which controls the weight of the penalty term. - kg(x)k2
is the squared norm of the constraint violation.
As µ increases, the penalty for constraint violations becomes more severe, encouraging the
solution to approach the feasible region where g(x) = 0.

12
2.2 Quadratic Penalty Method (Inequality Constraints)
Consider the optimization problem:

min f (x) subject to g(x) ≤ 0


In this case, the quadratic penalty method can be modified for inequality constraints. The
penalty function becomes:
µ
P (x, µ) = f (x) + max(0, g(x))2
2
Here, the penalty is applied only when the constraint is violated (i.e., when g(x) > 0).

2.3 Exterior Penalty Method


In the exterior penalty method, the penalty term is applied in such a way that it becomes infinite
when the solution violates the constraints, forcing the solution to remain in the feasible region.
The objective function is modified as follows:
For a constrained problem with inequality constraints g(x) ≤ 0:
1X
P (x, µ) = f (x) + max(0, gi (x))2
µ
i

As µ approaches zero, any violation of the constraint causes the penalty term to grow very
large, driving the solution towards the feasible region.

2.4 Interior Penalty Method (Logarithmic Barrier)


In the interior penalty method (or barrier method), the penalty is applied when the solution
approaches the boundary of the feasible region. This is commonly used for inequality constraints.
For a problem with constraints g(x) ≤ 0, the modified objective function becomes:
1X
P (x, µ) = f (x) − log(−gi (x))
µ
i

Here, gi (x) < 0 ensures that the argument of the logarithm is positive. As x approaches the
boundary gi (x) = 0, the penalty term becomes very large, discouraging solutions close to the
boundary.

2.5 Augmented Lagrangian Method


The augmented Lagrangian method combines both a penalty term and a Lagrange multiplier. For
a problem with equality constraints g(x) = 0, the objective function becomes:
µ
P (x, λ, µ) = f (x) + λT g(x) + kg(x)k2
2
Here: - λ is the vector of Lagrange multipliers. - The penalty term µ2 kg(x)k2 helps enforce the
constraint.
The augmented Lagrangian method is more stable than pure penalty methods because it incor-
porates both the Lagrange multipliers and a penalty term.

13
2.6 Logarithmic Penalty for Optimization Problems with Inequality Constraints
Given an optimization problem with inequality constraints:

min f (x) subject to g(x) ≤ 0


The logarithmic barrier penalty method transforms the objective function by adding a barrier
term:
1
P (x, µ) = f (x) − log(−g(x))
µ
As x approaches the boundary where g(x) = 0, the logarithmic term approaches infinity, pre-
venting the solution from violating the inequality constraint.

2.7 Penalty Method for Box Constraints


Consider the optimization problem with box constraints l ≤ x ≤ u, where l and u are lower and
upper bounds on x.
The penalty method adds a term to penalize solutions outside the feasible range:
X
max(0, xi − ui )2 + max(0, li − xi )2

P (x, µ) = f (x) + µ
i

Here, the penalty function grows large when the solution x violates the bounds l ≤ x ≤ u.

2.8 Quadratic Penalty for Multiple Constraints


For a problem with multiple constraints g1 (x) = 0, g2 (x) ≤ 0, . . ., the penalty method could combine
terms for both equality and inequality constraints:
 
X X
P (x, µ) = f (x) + µ  gi (x)2 + max(0, gj (x))2 
i j

2.9 Penalty Method in Machine Learning (Lasso Regression)


In Lasso regression, the penalty method is used to enforce sparsity in the model coefficients. The
Lasso objective function includes a penalty term on the 1-norm of the coefficient vector β:
!
X
T 2
min (yi − Xi β) + λkβk1
β
i

Here, λ controls the strength of the penalty on the coefficients.

2.10 Penalty Method for Quadratically Constrained Optimization


Consider the problem:

min f (x) subject to g(x) ≤ 0, h(x) = 0


The penalty method modifies the objective function as follows:

14
 
X X
P (x, µ) = f (x) + µ  max(0, gj (x))2 + hi (x)2 
j i

This penalizes violations of both inequality and equality constraints, driving the solution towards
feasibility.
Conclusion:
The penalty method is a powerful tool in optimization for handling constraints. By adding
penalty terms to the objective function, constraint violations can be incorporated into the op-
timization process. Different types of penalty methods (quadratic, barrier, and augmented La-
grangian) allow for flexibility in handling various types of constraints (equality, inequality, and box
constraints).

2.11 Examples of Penalty Methods


Here are some examples of optimization problems that use penalty methods along with their solu-
tions:
Example 1: Quadratic Penalty Method for Equality Constraints
Problem: Minimize the objective function:

f (x) = (x1 − 2)2 + (x2 − 3)2


subject to the equality constraint:

g(x) = x1 + x2 − 5 = 0
Solution:
1. Set up the penalty function: The penalty method transforms the constrained problem into
an unconstrained problem by adding a penalty term proportional to the square of the constraint
violation. The penalty function is:
µ
P (x, µ) = f (x) + g(x)2
2
So the penalty function becomes:
µ
P (x, µ) = (x1 − 2)2 + (x2 − 3)2 + (x1 + x2 − 5)2
2
2. Choose an initial value for µ: Let µ = 10 (a typical starting point for penalty parameter).
3. Minimize the penalty function: Now, we minimize P (x, µ) with respect to x1 and x2 . Taking
the partial derivatives and solving:

∂P (x, µ)
= 2(x1 − 2) + 10(x1 + x2 − 5) = 0
∂x1
∂P (x, µ)
= 2(x2 − 3) + 10(x1 + x2 − 5) = 0
∂x2
Solving these equations simultaneously, we get:

x1 = 1.25, x2 = 3.75

15
4. Check constraint satisfaction: g(x) = x1 + x2 − 5 = 1.25 + 3.75 − 5 = 0, so the constraint is
satisfied.
Final solution: The optimal solution is x1 = 1.25, x2 = 3.75, and the minimum value of the
objective function is f (1.25, 3.75) = 3.125.
Example 2: Quadratic Penalty Method for Inequality Constraints
Problem: Minimize the objective function:

f (x) = (x1 − 1)2 + (x2 − 2)2


subject to the inequality constraint:

g(x) = x21 + x22 − 4 ≤ 0


Solution:
1. Set up the penalty function: In the case of inequality constraints, we penalize the violation
when g(x) > 0. The penalty function is:
µ
P (x, µ) = f (x) +max(0, g(x))2
2
For this problem, the penalty function becomes:
µ
P (x, µ) = (x1 − 1)2 + (x2 − 2)2 + max(0, x21 + x22 − 4)2
2
2. Choose an initial value for µ: Let µ = 100 (chosen to heavily penalize constraint violations).
3. Minimize the penalty function: Since we can’t differentiate directly due to the max function,
we use numerical methods to minimize P (x, µ) over x1 and x2 .
After performing minimization, we find:

x1 = 1.196, x2 = 1.961
4. Check constraint satisfaction: g(x) = x21 + x22 − 4 = 1.1962 + 1.9612 − 4 = 3.9999 − 4 ≈ 0, so
the inequality constraint is satisfied.
Final solution: The optimal solution is x1 = 1.196, x2 = 1.961, and the minimum value of the
objective function is approximately f (1.196, 1.961) ≈ 1.005.
Example 3: Logarithmic Barrier Method for Inequality Constraints
Problem: Minimize the objective function:

f (x) = x21 + x22


subject to the inequality constraint:

g(x) = x1 + x2 − 1 ≤ 0
Solution:
1. Set up the barrier function: In the logarithmic barrier method, the penalty is infinite when
the inequality constraint is violated. The penalty function is:
1
P (x, µ) = f (x) − log(−g(x))
µ
For this problem, the penalty function becomes:

16
1
P (x, µ) = x21 + x22 − log(1 − (x1 + x2 ))
µ
2. Choose an initial value for µ: Let µ = 0.1.
3. Minimize the penalty function: We minimize P (x, µ) using numerical methods. After mini-
mization, we obtain:

x1 = 0.5, x2 = 0.5
4. Check constraint satisfaction: g(x) = x1 + x2 − 1 = 0.5 + 0.5 − 1 = 0, so the inequality
constraint is satisfied.
Final solution: The optimal solution is x1 = 0.5, x2 = 0.5, and the minimum value of the
objective function is f (0.5, 0.5) = 0.5.
Example 4: Augmented Lagrangian Method
Problem: Minimize the objective function:

f (x) = x21 + x22


subject to the equality constraint:

g(x) = x1 + x2 − 2 = 0
Solution:
1. Set up the augmented Lagrangian: The augmented Lagrangian function is:
µ
L(x, λ, µ) = f (x) + λg(x) + g(x)2
2
For this problem, the augmented Lagrangian becomes:
µ
L(x, λ, µ) = x21 + x22 + λ(x1 + x2 − 2) +(x1 + x2 − 2)2
2
2. Choose initial values for λ and µ: Let λ = 1 and µ = 10.
3. Minimize the augmented Lagrangian: We minimize L(x, λ, µ) with respect to x1 and x2 .
After minimization, we find:

x1 = 1, x2 = 1
4. Check constraint satisfaction: g(x) = x1 +x2 −2 = 1+1−2 = 0, so the constraint is satisfied.
Final solution: The optimal solution is x1 = 1, x2 = 1, and the minimum value of the objective
function is f (1, 1) = 2.
Summary: - Quadratic Penalty Method: Adds a squared penalty term for constraint violations.
- Logarithmic Barrier Method: Adds a logarithmic penalty term to discourage constraint violations,
particularly near the boundary. - Augmented Lagrangian Method: Combines Lagrange multipliers
with a quadratic penalty term, improving stability and convergence.
These methods convert constrained problems into easier-to-solve unconstrained problems by
penalizing constraint violations, ultimately leading to the optimal solution.

17
2.12 Lagrangian Multipliers for Optimality Conditions
Let’s go through a couple of numerical examples to see how to apply the Lagrange Multiplier Rule
to find the necessary optimality conditions for constrained optimization problems.
Example 1: Maximizing a function subject to a constraint
Problem Maximize the function:
f (x, y) = x2 + y 2
subject to the constraint:
g(x, y) = x + y − 1 = 0
Step 1: Define the Lagrangian We introduce the Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x2 + y 2 + λ(x + y − 1)
Step 2: Compute partial derivatives We now compute the partial derivatives of the Lagrangian
with respect to x, y, and λ:
- ∂L ∂L ∂L
∂x = 2x + λ = 0 - ∂y = 2y + λ = 0 - ∂λ = x + y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ = 0 2. 2y + λ = 0 3. x + y = 1
From the first two equations, we have:

λ = −2x and λ = −2y

Thus, x = y.
Substitute x = y into the third equation:
1
x+x=1 ⇒ 2x = 1 ⇒ x=
2
Since x = y, we also have y = 21 .
Step 4: Solution The 2 2
1 1
 maximum value of f (x, y) = x + y subject to the constraint x + y = 1
occurs at (x, y) = 2 , 2 .
Example 2: Minimizing a function subject to two constraints
Problem Minimize the function:

f (x, y, z) = x2 + y 2 + z 2

subject to the constraints:

g1 (x, y, z) = x + y + z − 1 = 0 and g2 (x, y, z) = x − y = 0

Step 1: Define the Lagrangian We introduce two Lagrange multipliers, λ1 and λ2 , for the
constraints and form the Lagrangian:

L(x, y, z, λ1 , λ2 ) = x2 + y 2 + z 2 + λ1 (x + y + z − 1) + λ2 (x − y)

Step 2: Compute partial derivatives We now compute the partial derivatives of the Lagrangian
with respect to x, y, z, λ1 , and λ2 :
- ∂L ∂L ∂L ∂L
∂x = 2x + λ1 + λ2 = 0 - ∂y = 2y + λ1 − λ2 = 0 - ∂z = 2z + λ1 = 0 - ∂λ1 = x + y + z − 1 = 0
∂L
- ∂λ 2
=x−y =0

18
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ1 + λ2 = 0 2. 2y + λ1 − λ2 = 0 3. 2z + λ1 = 0 4. x + y + z = 1 5. x − y = 0
From equation 5, we know x = y. Substituting x = y into equations 1 and 2 gives:

2x + λ1 + λ2 = 0 and 2x + λ1 − λ2 = 0

Adding these two equations results in:

4x + 2λ1 = 0 ⇒ λ1 = −2x

Substituting λ1 = −2x into equation 3:

2z − 2x = 0 ⇒ z=x

Now, using x = y = z in equation 4:


1
x+x+x=1 ⇒ 3x = 1 ⇒ x=
3
Thus, y = z = 13 .
Step 4: Solution The minimum value of f (x, y, z) = x2 + y 2 + z 2 subject to the constraints
1 1 1

occurs at (x, y, z) = 3 , 3 , 3 .
Let’s look at a few more examples of writing the necessary optimality conditions using the
**Lagrange Multiplier Rule** in constrained optimization problems.
Example 3: Maximizing a function with a circular constraint
Problem Maximize the function:
f (x, y) = x + y
subject to the constraint:
g(x, y) = x2 + y 2 − 1 = 0
(This is the equation of a circle with radius 1.)
Step 1: Define the Lagrangian We introduce a Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x + y + λ(x2 + y 2 − 1)
Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, and λ:
- ∂L ∂L ∂L 2 2
∂x = 1 + 2λx = 0 - ∂y = 1 + 2λy = 0 - ∂λ = x + y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
1 + 2λx = 0 2. 1 + 2λy = 0 3. x2 + y 2 = 1
From equations 1 and 2, we can solve for λ:
1 1
λ=− and λ = −
2x 2y
Equating the two expressions for λ:
1 1
− =− ⇒ x=y
2x 2y
Substituting x = y into the constraint equation x2 + y 2 = 1:
1
x2 + x2 = 1 ⇒ 2x2 = 1 ⇒ x = ±√
2

19
Thus, y = ± √12 .
Step 4: Solution The two critical points are:
   
1 1 1 1
(x, y) = √ , √ and (x, y) =−√ , −√
2 2 2 2
  √  
Evaluating f (x, y) = x + y at these points: - At √12 , √12 , f (x, y) = 2. - At − √12 , − √12 ,

f (x, y) = − 2. √ √
Thus, the maximum value is 2, and the minimum value is − 2.
Example 4: Minimizing a quadratic function with two constraints
Problem Minimize the function:
f (x, y) = x2 + y 2
subject to the constraints:
g1 (x, y) = x + y − 2 = 0
and
g2 (x, y) = x − y − 1 = 0
Step 1: Define the Lagrangian We introduce two Lagrange multipliers, λ1 and λ2 , for the
constraints and form the Lagrangian:

L(x, y, λ1 , λ2 ) = x2 + y 2 + λ1 (x + y − 2) + λ2 (x − y − 1)

Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, λ1 , and λ2 :
- ∂L ∂L ∂L ∂L
∂x = 2x + λ1 + λ2 = 0 - ∂y = 2y + λ1 − λ2 = 0 - ∂λ1 = x + y − 2 = 0 - ∂λ2 = x − y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ1 + λ2 = 0 2. 2y + λ1 − λ2 = 0 3. x + y = 2 4. x − y = 1
From equations 3 and 4, we can solve for x and y: - Adding the two equations: (x+y)+(x−y) =
2 + 1 ⇒ 2x = 3 ⇒ x = 32 - Substituting x = 23 into x + y = 2: 32 + y = 2 ⇒ y = 12
3 1 3
Now,
1
 substitute x = 2 and y = 2 into the system: 1. 2 2 + λ1 + λ2 = 0 ⇒ 3 + λ1 + λ2 = 0
2. 2 2 + λ1 − λ2 = 0 ⇒ 1 + λ1 − λ2 = 0
From equation 2, solve for λ1 :
λ1 = λ2 − 1
Substitute into equation 1:

3 + (λ2 − 1) + λ2 = 0 ⇒ 3 + 2λ2 − 1 = 0 ⇒ 2λ2 = −2 ⇒ λ2 = −1

Substitute λ2 = −1 into λ1 = λ2 − 1:

λ1 = −1 − 1 = −2

Step 4: Solution The critical point is 23 , 12 . The minimum value of f (x, y) = x2 + y 2 at this


point is:
   2  2
3 1 3 1 9 1 10
f , = + = + = = 2.5
2 2 2 2 4 4 4

Example 5: Minimizing a function subject to a linear constraint

20
Problem Minimize the function:

f (x, y) = x2 + 4y 2

subject to the constraint:


g(x, y) = x + y − 3 = 0
Step 1: Define the Lagrangian We introduce a Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x2 + 4y 2 + λ(x + y − 3)
Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, and λ:
- ∂L
∂x = 2x + λ = 0

3 Optimality Conditions
Let us find all three cones (cone of descent directions, cone of linearization, and cone of feasible
directions) for some problems at a specific point (x0 , y0 ).

Example: 1
Objective:
f (x, y) = (x − 1)2 + (y − 2)2
Constraint:
g(x, y) = x + 2y − 4 ≤ 0
We analyze the cones at (x0 , y0 ) = (0, 0).
1. Cone of Descent Direction: The cone of descent direction is formed by all directions d =
(dx , dy ) that satisfy:
∇f (x0 , y0 ) · d < 0
Gradient of f (x, y):  
2(x − 1)
∇f (x, y) =
2(y − 2)
At (x0 , y0 ) = (0, 0):  
−2
∇f (0, 0) =
−4
Direction d = (dx , dy ) must satisfy:

−2dx − 4dy < 0

or equivalently:
dx + 2dy > 0
Cone of Descent Directions: The cone of descent directions is all d = (dx , dy ) such that dx +2dy >
0. This is a half-space where the direction vector points away from the line dx + 2dy = 0.

21
2. Cone of Linearization: The cone of linearization is determined by the linearized form of the
constraint:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
Gradient of g(x, y):  
1
∇g(x, y) =
2
At (x0 , y0 ) = (0, 0):  
1
∇g(0, 0) =
2
At (0, 0):
g(0, 0) = 0 + 0 − 4 = −4,
Linearized constraint:
1 · dx + 2 · dy ≤ 4
Cone of Linearization: The cone of linearization is all d = (dx , dy ) such that dx + 2dy ≤ 4. This
is another half-space, bounded by the line dx + 2dy = 4, and includes the region below or on the
line.
3. Cone of Feasible Directions: A feasible direction satisfies both the linearized constraint and
lies in the feasible region. At a boundary point g(x0 , y0 ) = 0, the feasible direction also satisfies:

∇g(x0 , y0 ) · d ≤ 0

At (x0 , y0 ) = (0, 0): The point (x0 , y0 ) = (0, 0) is not on the boundary (g(0, 0) = −4, strictly
feasible), so the feasible directions are the same as the cone of linearization:

dx + 2dy ≤ 0

Summary of Cones at (x0 , y0 ) = (0, 0):


1. Cone of Descent Directions:
dx + 2dy > 0
2. Cone of Linearization:
dx + 2dy ≤ 4
3. Cone of Feasible Directions:
dx + 2dy ≤ 0
Visualization:
The cones can be visualized as follows:
Cone of Descent Directions is a half-space pointing upwards and away from the line dx +2dy = 0.
Cone of Linearization is a half-space bounded below by the line dx + 2dy = 0.
Cone of Feasible Directions coincides with the cone of linearization since the point is strictly
feasible.
Here are the detailed solutions for the **cone of feasible directions**, **linearizing cone**, and
**cone of descent directions** for the five examples provided above.

22
Example: 2
Objective:
f (x, y) = x2 + y 2
Constraint:
g(x, y) = 2x + y − 5 ≤ 0
Point: (x0 , y0 ) = (1, 2)
1. Cone of Feasible Directions:
The feasible direction satisfies:
∇g(x0 , y0 ) · d ≤ 0
Gradient of g:
∇g(x, y) = [2, 1]
At (1, 2), the condition is:
2dx + dy ≤ 0
2. Linearizing Cone:
The linearizing cone includes the feasibility condition and adjusts for the constraint value:

∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
At (1, 2):
g(1, 2) = 2 + 2 − 5 = −1,

2dx + dy ≤ 1
3. Cone of Descent Directions:
Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [2x, 2y]
At (1, 2):
∇f (1, 2) = [2, 4]
Condition for descent direction:
2dx + 4dy < 0

Example 3: Multiple linear constraints


Objective:
f (x, y) = (x − 3)2 + (y − 4)2
Constraints:
g1 (x, y) = x − y − 1 ≤ 0, g2 (x, y) = 2x + y − 6 ≤ 0
Point: (x0 , y0 ) = (2, 3)
1. Cone of Feasible Directions:

23
Feasible directions satisfy:

∇g1 (x0 , y0 ) · d ≤ 0, ∇g2 (x0 , y0 ) · d ≤ 0

Gradients:
∇g1 (x, y) = [1, −1], ∇g2 (x, y) = [2, 1]
Conditions:
dx − dy ≤ 0, 2dx + dy ≤ 0
2. Linearizing Cone: Add the constraint values:

∇g1 (x0 , y0 ) · d ≤ −g1 (x0 , y0 ), ∇g2 (x0 , y0 ) · d ≤ −g2 (x0 , y0 )

At (2, 3):
g1 (2, 3) = 2 − 3 − 1 = −2, g2 (2, 3) = 2(2) + 3 − 6 = 1
Conditions:
dx − dy ≤ 2, 2dx + dy ≤ −1
3. Cone of Descent Directions: Descent direction satisfies:

∇f (x0 , y0 ) · d < 0

Gradient of f :
∇f (x, y) = [2(x − 3), 2(y − 4)]
At (2, 3):
∇f (2, 3) = [−2, −2]
Condition:
−2dx − 2dy < 0

Example 4: Nonlinear constraint


Objective:
f (x, y) = sin(x) + y 2
Constraint:
g(x, y) = x2 + y 2 − 4 ≤ 0
Point: (x0 , y0 ) = (1, 1)
1. Cone of Feasible Directions:
Feasible directions satisfy:
∇g(x0 , y0 ) · d ≤ 0
Gradient of g:
∇g(x, y) = [2x, 2y]
At (1, 1):
∇g(1, 1) = [2, 2]
Condition:
2dx + 2dy ≤ 0

24
2. Linearizing Cone:
Add the constraint value:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
At (1, 1):
g(1, 1) = 12 + 12 − 4 = −2
Condition:
2dx + 2dy ≤ 2
3. Cone of Descent Directions:
Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [cos(x), 2y]
At (1, 1):
∇f (1, 1) = [cos(1), 2]
Condition:
cos(1)dx + 2dy < 0

In convex optimization problems, the Karush-Kuhn-Tucker (KKT) conditions are typically both
**necessary and sufficient** for optimality, given certain regularity conditions. However, there are
cases where these regularity conditions fail, and the KKT conditions may not be necessary, even
for convex problems. Below are examples illustrating such situations:
Example 1: Non-Smooth Constraint Consider the convex optimization problem:

Minimize f (x) = |x|, subject to x2 ≤ 0.

Analysis:

ˆ The constraint x2 ≤ 0 implies x = 0, making x = 0 the only feasible point and hence the
global minimum.

ˆ The objective function f (x) = |x| is convex.

ˆ However, f (x) is not differentiable at x = 0, and this violates the regularity conditions
required for the KKT conditions to apply. Therefore, KKT conditions cannot characterize
the solution, even though the problem is convex.

Example 2: Missing Constraint Qualification (MFCQ Violation) Consider:

Minimize f (x) = x, subject to g(x) = −x ≤ 0.

Analysis:

ˆ The problem is convex because both f (x) and g(x) are convex.

ˆ The feasible region is x ≥ 0, and the optimal solution is x = 0.

25
ˆ The gradient of g(x), ∇g(x) = −1, does not satisfy the regularity condition known as the
**Mangasarian-Fromovitz Constraint Qualification (MFCQ)**. Specifically, there is no di-
rection d such that ∇g(x)d < 0 for x = 0.

ˆ Since MFCQ fails, the KKT conditions are not guaranteed to hold, even though the problem
is convex.
Example 3: Semi-Infinite Constraints Consider:

Minimize f (x) = x2 , subject to g(x) = max(0, −x) = 0.

Analysis:
ˆ The constraint g(x) = max(0, −x) = 0 essentially forces x ≥ 0.

ˆ The problem is convex, and the solution is x = 0.

ˆ However, the non-differentiable nature of g(x) (due to the ”max” function) makes it chal-
lenging to define a valid Lagrange multiplier. The KKT conditions do not apply because the
regularity conditions (like differentiability or constraint qualifications) are violated.
Example 4: Non-Interior Feasible Point (Slater’s Condition Violation)** Consider the problem:

Minimize f (x, y) = x2 + y 2 , subject to g(x, y) = y − |x| ≤ 0.

Analysis:
ˆ The objective function f (x, y) = x2 + y 2 is convex, and the constraint g(x, y) = y − |x| defines
a feasible region bounded by y ≤ |x|.

ˆ The objective function f (x, y) = x2 + y 2 is convex, and the constraint g(x, y) = y − |x| defines
a feasible region bounded by y ≤ |x|.

ˆ The optimal solution is at (x, y) = (0, 0), which lies on the boundary of the feasible region.

ˆ Slater’s condition fails because there is no strictly interior point satisfying g(x, y) < 0 (the
feasible region touches the origin at the boundary). Consequently, KKT conditions may not
apply.
Example 5: Non-Differentiable Objective Function

Minimize f (x) = |x|, subject to x ≥ 0.

Analysis:
ˆ The function f (x) = |x| is convex but not differentiable at x = 0.

ˆ The feasible region x ≥ 0 makes x = 0 the optimal solution.

ˆ The lack of differentiability at x = 0 prevents the application of KKT conditions, even though
the problem is convex.
Example 6: Inactive Constraints**

Minimize f (x, y) = x2 + y 2 , subject to x ≥ 0, y ≥ 0, x + y ≤ −1.

Analysis

26
ˆ The objective f (x, y) = x2 + y 2 is convex.

ˆ The constraints x ≥ 0, y ≥ 0, and x + y ≤ −1 are convex, but the feasible set is empty (the
constraints are inconsistent).

ˆ Since there are no feasible points, the KKT conditions cannot apply.

Example 7: Degenerate Feasible Region**

Minimize f (x, y) = x + y, subject to x = 0, y = 0.

Analysis:

ˆ The feasible region is degenerate: it is a single point at (x, y) = (0, 0), which is trivially the
optimal solution.

ˆ The constraint qualifications (e.g., MFCQ or Slater’s condition) fail because the gradient of
the constraints cannot span the full space of variables.

ˆ The KKT conditions are not applicable, even though the problem is convex.

Example 8: Infinite Dimensional Constraints**


Z 1 Z 1
2
Minimize f (x) = x(t) dt, subject to g(x) = x(t) dt ≤ 1.
0 0

Analysis:
R1 R1
ˆ The objective f (x) = 0 x(t)2 dt is convex, and the constraint g(x) = 0 x(t) dt − 1 ≤ 0 is
also convex.

ˆ Infinite-dimensional problems often fail standard constraint qualifications, making it chal-


lenging to apply KKT conditions directly.

Example 9: Equality Constraint with Linearly Dependent Gradients

Minimize f (x1 , x2 ) = x21 + x22 , subject to x1 + x2 = 1, x1 − x2 = 1.

Analysis:

ˆ The problem is convex, but the two equality constraints x1 + x2 = 1 and x1 − x2 = 1 are
linearly dependent, leading to a violation of the regularity conditions.

ˆ The KKT conditions cannot be used to characterize the solution.

These examples demonstrate that while convexity simplifies optimization problems, regularity
conditions such as Slater’s condition, differentiability, and linear independence of constraints play
crucial roles in ensuring the applicability of KKT conditions. When these conditions are violated,
even convex problems may lack necessary KKT characterizations.

References

27

You might also like