0% found this document useful (0 votes)

46 views27 pages

Optimization Theory 2

Optimization course notes

Uploaded by

quizactivities273

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views27 pages

Optimization Theory 2

Optimization course notes

Uploaded by

quizactivities273

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Optimization Theory

Muhammad Abid Dar

December 16, 2024

1 Unconstrained Optimization Problems

An “unconstrained optimization problem” is a type of optimization problem in which the objective
is to find the maximum or minimum of a function without any restrictions or constraints on the
variables. In other words, there are no specific conditions (like inequalities or equalities) imposed
on the variables, meaning they can take any values within their domain.
General Form
An unconstrained optimization problem can be formulated as:

min f (x)
x
or
max f (x)
x

where:

f (x) is the objective function to be minimized (or maximized).

x is the vector of decision variables.

In this problem, we are interested in finding the value(s) of x that either minimize or maximize
f (x), depending on the problem at hand.

1.1 Key Features of Unconstrained Optimization

1. No Constraints: There are no additional conditions or limits on the values of x. Unlike
constrained problems, where variables must satisfy certain conditions (such as x ≥ 0 or
g(x) = 0), unconstrained problems allow the variables to vary freely.

2. First-Order Condition (Optimality Condition): To find the optimal point (minimum or max-
imum), the first-order condition is that the gradient of the objective function should be zero
at the solution point:
∇f (x) = 0
This means that the derivative of the objective function with respect to x should vanish at
the optimum, i.e., the slope of the function is zero at this point.

3. Second-Order Condition: After finding the critical points using the first-order condition, the
second-order condition (Hessian matrix) helps determine whether the point is a minimum,
maximum, or saddle point.

1
If the Hessian matrix is positive definite, the point is a local minimum.
If the Hessian is negative definite, the point is a local maximum.
If the Hessian is indefinite, the point is a saddle point.

1.2 Types of Unconstrained Optimization Problems

1. Univariate Optimization:

Involves optimizing a function with a single variable.

Example:
min f (x) = x2 + 4x + 5
x

2. Multivariate Optimization:

Involves optimizing a function of multiple variables.

Example:
min f (x, y) = x2 + y 2 + 4x − 6y
x,y

1.3 Solving Unconstrained Optimization Problems

There are several methods to solve unconstrained optimization problems depending on the com-
plexity of the function:

1. Analytical Methods: - If the objective function is smooth and differentiable, you can often
find the solution by setting the gradient to zero and solving for the critical points. - Example:

min f (x) = x2 + 3x + 2
x

The solution is obtained by finding the derivative and setting it equal to zero:
d 2 3
(x + 3x + 2) = 2x + 3 = 0 ⇒ x=−
dx 2

2. Numerical Methods: - For more complex or non-analytical functions, numerical methods

such as gradient descent, Newton’s method, or quasi-Newton methods (e.g., BFGS) are used.
These methods iteratively find the minimum or maximum by following the gradient of the
function.

Gradient Descent: Iteratively updates the variables by moving in the direction of the
negative gradient (for minimization).

xk+1 = xk − α∇f (xk )

where α is the step size.

Newton’s Method: Uses both the gradient and the Hessian matrix to update the vari-
ables.
xk+1 = xk − [Hf (xk )]−1 ∇f (xk )
where Hf (xk ) is the Hessian matrix.

2
1.4 Applications of Unconstrained Optimization
1. Machine Learning: Training models like linear regression and logistic regression involves
unconstrained optimization of loss functions.

2. Economics and Finance: Many problems, such as maximizing utility or minimizing costs, can
be modeled as unconstrained optimization problems in certain cases.

3. Engineering Design: Finding the optimal design parameters to minimize cost or maximize
efficiency often involves unconstrained optimization.

1.5 Example Problem

Problem: Minimize the function f (x, y) = x2 + y 2 − 4x + 6y.
Solution:

1. Compute the gradient:

∂ ∂
∇f (x, y) = , = (2x − 4, 2y + 6)
∂x ∂y

2. Set the gradient to zero:

2x − 4 = 0 ⇒ x=2
2y + 6 = 0 ⇒ y = −3

3. The critical point is (x, y) = (2, −3).

4. To confirm it’s a minimum, check the Hessian matrix:

2 0
Hf (x, y) =
0 2

Since the Hessian is positive definite (its eigenvalues are both positive), the point (2, −3) is a
local minimum.

Conclusion: Unconstrained optimization problems focus solely on optimizing a function without

restrictions on the variables, making them simpler than constrained optimization problems. Despite
their simplicity, they have broad applications across various fields, and methods to solve them can
range from basic analytical techniques to more sophisticated numerical methods.

1.6 Exercise
Example:1 Quadratic Function
Optimize f (x, y) = x2 + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x, = 2y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0

3
Critical point: (0, 0). 3. Hessian matrix:

2 0
Hf (x, y) =
0 2

4. Eigenvalues of the Hessian: 2, 2 (both positive). 5. Since both eigenvalues are positive, the
function has a local minimum at (0, 0).
Example 2: Saddle Point Problem: Optimize f (x, y) = x2 − y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x, = −2y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:

2 0
Hf (x, y) =
0 −2

4. Eigenvalues of the Hessian: 2, −2 (one positive, one negative). 5. Since the eigenvalues have
different signs, (0, 0) is a saddle point.
Example 3: Elliptic Paraboloid Problem: Optimize f (x, y) = 3x2 + 2y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 6x, = 4y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:

6 0
Hf (x, y) =
0 4

4. Eigenvalues of the Hessian: 6, 4 (both positive). 5. The function has a local minimum at (0, 0).
Example 4: Mixed Function Problem: Optimize f (x, y) = 4x2 − 4xy + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 8x − 4y, = 2y − 4x
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x=y=0
Critical point: (0, 0). 3. Hessian matrix:

8 −4
Hf (x, y) =
−4 2

4
4. Eigenvalues of the Hessian: Solve det(H − λI) = 0, yielding λ1 = 10, λ2 = 0. 5. The eigenvalues
indicate that (0, 0) is a saddle point.
Example 5: Cross-Term Function Problem: Optimize f (x, y) = x2 + xy + y 2 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 2x + y, = 2y + x
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0:
x = 0, y=0
Critical point: (0, 0). 3. Hessian matrix:

2 1
Hf (x, y) =
1 2

4. Eigenvalues: λ1 = 3, λ2 = 1 (both positive). 5. The function has a local minimum at (0, 0).
Example 6: Exponential Function Problem: Optimize f (x, y) = ex+y .
Solution: 1. First-order partial derivatives:
∂f ∂f
= ex+y , = ex+y
∂x ∂y
∂f ∂f
2. Set ∂x = 0 and ∂y = 0: There are no critical points since ex+y 6= 0 for any x and y.
x2 +y 2
Example 7: Rational Function Problem: Optimize f (x, y) = 1+x2 +y 2
.
Solution: 1. First-order partial derivatives:

∂f 2x(1 + x2 + y 2 ) − 2x(x2 + y 2 ) ∂f 2y(1 + x2 + y 2 ) − 2y(x2 + y 2 )

= , =
∂x (1 + x2 + y 2 )2 ∂y (1 + x2 + y 2 )2
∂f ∂f
2. Set ∂x = 0 and ∂y = 0 at x = 0, y = 0.
Example 8: Polynomial Function Problem: Optimize f (x, y) = x4 + y 4 .
Solution: 1. First-order partial derivatives:
∂f ∂f
= 4x3 , = 4y 3
∂x ∂y
∂f ∂f
2. Set ∂x = 0, ∂y = 0: Critical point at (0, 0).
Example 9: Logarithmic Function Problem: Optimize f (x, y) = log(1 + x2 + y 2 ).
Example 10: Sum of Sine Functions Problem: Optimize f (x, y) = sin(x) + sin(y).

1.7 Solution to the exercise of Chapter:1

Exercise: 1 To determine the orbit of the planetoid, we can use the given observations and fit an
elliptical orbit to the data using the method of least squares. The equation for the orbit is assumed
to be of the form:

x2 = ay 2 + bxy + cx + dy + e

5
Here, the goal is to determine the coefficients a, b, c, d, and e based on the 10 observations of
(xj , yj ), where j = 1, . . . , 10.
Steps to Solve:
1. Rewrite the equation in terms of a linear system: The given equation x2 = ay 2 + bxy + cx +
dy + e is linear in terms of the unknowns a, b, c, d, e. We can rearrange it as:

x2j = ayj2 + bxj yj + cxj + dyj + e for j = 1, 2, . . . , 10

2. Set up the matrix equation: For each observation (xj , yj ), the equation above can be written
in the form:
 
a
b
x2j = yj2 xj yj xj yj 1 
 
c

d
e
Let’s define: - X: a matrix of size 10 × 5, where each row corresponds to the terms yj2 , xj yj ,
xj , yj , and 1 for each observation j. - Xv ec: a vector of length 10 with elements x2j .
Therefore, we need to solve the following linear system:

X · β = Xvec
 
a
b
 
where β =   c .

d
e
3. Use least squares method: Since we have more equations (10) than unknowns (5), this is an
overdetermined system. We can solve it using the method of least squares, which minimizes the
residuals:
min kXβ − Xvec k2
β

The least squares solution is given by:

β = (X T X)−1 X T Xvec

MATLAB Code to Solve the Problem % Given data points (xj , yj )

x = [-1.024940, -0.949898, -0.866114, -0.773392, -0.671372, -0.559524, -0.437067, -0.302909, -
0.155493, -0.007464];
y = [-0.389269, -0.322894, -0.265256, -0.216557, -0.177152, -0.147582, -0.128618, -0.121353, -
0.127348, -0.148885];
% Number of observations n = length(x);
% Form the matrix X (10x5 matrix) X = [y.ˆ 2’, (x.*y)’, x’, y’, ones(n, 1)];
% Vector Xv ec containing x2j

Xv ec = x0 .2 ;
% Solve the least squares problem to find the coefficients [a, b, c, d, e]
coefficients = (X 0 ∗ X) (X 0 ∗ Xv ec);

6
% Display the results a = coefficients(1); b = coefficients(2); c = coefficients(3); d = coefficients(4);
e = coefficients(5);
fprintf(’The coefficients are:\n’);
fprintf(’a = %f \n’, a);
fprintf(’b = %f \n’, b);
fprintf(’c = %f \n’, c);
fprintf(’d = %f \n’, d);
fprintf(’e = %f \n’, e);
Explanation:
1. Matrix ‘X‘: Each row of the matrix corresponds to the terms yj2 , xj yj , xj , yj , and 1, which
are derived from the equation x2j = ayj2 + bxj yj + cxj + dyj + e. 2. Vector ‘Xv ec‘: This vector
contains the values x2j for each observation. 3. Least squares solution: The MATLAB command
‘(X 0 ∗ X) (X 0 ∗ Xv ec)‘ solves the least squares problem to find the unknown coefficients a, b, c, d,
and e.
Output: The code will output the coefficients a, b, c, d, and e, which define the orbit of the
planetoid based on the observed positions.
Predicting the Orbit: Once you have the coefficients, you can use the equation:

x2 = ay 2 + bxy + cx + dy + e
to predict the planetoid’s future positions in its orbit, or find its next position when it becomes
visible again.
(b) For the parabolic approach, we assume the orbit equation takes the simpler form:

x2 = dy + e
Here, we have only two unknown coefficients, d and e, and we aim to determine these based on
the observed data.
Steps to Solve:
1. Rewrite the equation: For each observation (xj , yj ), the equation becomes:

x2j = dyj + e
This can be rewritten as a linear system:

d
x2j

= yj 1
e
2. Set up the matrix equation: Let’s define: - X: a matrix of size 10 × 2, where each row
corresponds to the terms yj and 1 for each observation j. - Xv ec: a vector of length 10 with
elements x2j .
Therefore, we need to solve the linear system:

X · β = Xvec

d
where β = .
e
3. Use least squares method: Since we have more equations (10) than unknowns (2), this is
an overdetermined system. We can solve it using the least squares method, which minimizes the
residuals:
min kXβ − Xvec k2
β

7
The least squares solution is given by:

β = (X T X)−1 X T Xvec

MATLAB Code to Solve the Problem

% Given data points (xj , yj )
x = [-1.024940, -0.949898, -0.866114, -0.773392, -0.671372, -0.559524, -0.437067, -0.302909, -
0.155493, -0.007464];
y = [-0.389269, -0.322894, -0.265256, -0.216557, -0.177152, -0.147582, -0.128618, -0.121353, -
0.127348, -0.148885];
% Number of observations
n = length(x);
% Form the matrix X (10x2 matrix) X = [y’, ones(n, 1)];
% Vector Xv ec containing x2j
Xv ec = x0 .2 ;
% Solve the least squares problem to find the coefficients [d, e]
coefficients = (X 0 ∗ X) (X 0 ∗ Xv ec);
% Display the results
d = coefficients(1); e = coefficients(2);
fprintf(’The coefficients are: \n’);
fprintf(’d = %f \n’, d);
fprintf(’e = %f \n’, e);
Explanation:
1. Matrix ‘X’: Each row of the matrix corresponds to the terms yj and 1, which are derived from
the equation x2j = dyj + e. 2. Vector Xv ec: This vector contains the values x2j for each observation.
3. Least squares solution: The MATLAB command (X 0 ∗ X) (X 0 ∗ Xv ec) solves the least squares
problem to find the unknown coefficients d and e.
Output: The code will output the coefficients d and e, which define the parabolic orbit of the
planetoid based on the observed positions.
Predicting the Orbit: Once you have the coefficients d and e, you can use the equation:

x2 = dy + e
to predict the planetoid’s future positions in its orbit based on the y-coordinate, or to find its
next position when it becomes visible again.
Exercise: 2 Gauss Normal Equation (in the context of linear regression)
The Gauss normal equation is a key result used to solve the least squares problem in linear
regression. In linear regression, the objective is to find the best-fit line or plane (in higher dimen-
sions) to a set of data points by minimizing the sum of the squared differences between the observed
and predicted values.
Proof of Gauss Normal Equation
The objective function is the sum of squared residuals, i.e.,

Objective: min krk2 = min kb − Axk2

x x
2. Expand the objective function: The squared residual norm can be expanded as:

kb − Axk2 = (b − Ax)T (b − Ax)

8
Expanding this expression:

(b − Ax)T (b − Ax) = bT b − 2bT Ax + xT AT Ax

So the objective function becomes:

min bT b − 2bT Ax + xT AT Ax

x

3. Differentiate with respect to x: To minimize the objective function, we take the derivative
with respect to x and set it equal to zero:

∂
bT b − 2bT Ax + xT AT Ax = 0

∂x
The derivative of each term is as follows: - The derivative of bT b with respect to x is zero (since
it does not depend on x). - The derivative of −2bT Ax with respect to x is −2AT b. - The derivative
of xT AT Ax with respect to x is 2AT Ax (using matrix calculus rules for quadratic forms).
Therefore, the derivative of the objective function is:

−2AT b + 2AT Ax = 0
4. Simplify the equation: Dividing through by 2, we get:

AT Ax = AT b
This is the Gauss normal equation.
5. Solving for x: If AT A is invertible, the solution for x is:

x = (AT A)−1 AT b
This is the least squares solution for the regression coefficients.
Exercise: 3 First we understand the L1 -Approximation Problem
The L1 -approximation problem is about finding a solution x that minimizes the L1 -norm of
the residuals kAx − bk1 , where A is a matrix and b is a vector. The geometric interpretation of
the L1 -norm minimization can be visualized as minimizing the sum of the absolute values of the
residuals, which corresponds to minimizing the distance between the point x and some data points,
in a piecewise linear manner.
The convex hull of a set of points is the smallest convex set that contains all the points.
Geometrically, the convex hull of four points in R2 will be a polygon whose vertices are those four
points if they do not lie on a straight line.
The points (3, 2), (2, 1), ( 23 , 23 ), and (1, 3) form a polygon, and the convex hull is the smallest
convex polygon containing these points. The shape of the polygon is a quadrilateral in this case.
We want to prove that the convex hull of the given points provides solutions to the L1 -
approximation problem. To do this, we need to check whether the optimal solution for the L1 -norm
minimization corresponds to one of the points in the convex hull, or a convex combination of these
points.
In the L1 -norm, the residuals are given by the absolute differences between the predicted values
and the true values. Minimizing the L1 -norm leads to a solution that is often a vertex of the convex
hull of the data points, because the L1 -norm geometry tends to place the optimal solution on a
vertex or along the edges of the convex polygon defined by the data points.
Verifying the Convex Hull as the Solution Set

9
The set of points (3, 2), (2, 1), ( 32 , 23 ), and (1, 3) form a convex polygon in R2 . We can express
any point inside the convex hull as a convex combination of the four vertices:

3 3
x = λ1 (3, 2) + λ2 (2, 1) + λ3 , + λ4 (1, 3)
2 2
where λ1 + λ2 + λ3 + λ4 = 1 and λi ≥ 0 for all i.
For the L1 -norm minimization problem, solutions often lie at the vertices of the convex hull,
which correspond to the points where the objective function (sum of absolute deviations) is mini-
mized. In this case, the points (3, 2), (2, 1), ( 32 , 32 ), and (1, 3) are candidates for solutions, as they
are the vertices of the convex hull.
Are There Any Other Solutions?
Since the optimal solution to an L1 -norm minimization problem often lies at a vertex of the
convex hull, the points on the edges or inside the convex polygon formed by the convex hull may
also be solutions in certain cases. However, the most common case is that the solution is one of
the vertices of the convex hull. Thus, solutions can include the points inside the convex hull (i.e.,
convex combinations of the given points), but the vertices are the most likely candidates for the
optimal solution.
Conclusion
The convex hull of the points (3, 2), (2, 1), ( 32 , 32 ), and (1, 3) provides solutions to the L1 -
approximation problem because the vertices of the convex hull are likely to be optimal solutions
to the problem. Other solutions might include points inside the convex hull, which are convex
combinations of these points, but the primary solutions are at the vertices of the convex polygon.
Example: 4
Let’s break the problem into two parts:
Part 1: Minimization using the method of alternate directions
We are tasked with minimizing the function:

f (x1 , x2 ) = x21 + x22 − x1 x2 − 3x1

using the **method of alternate directions**. This means we iteratively minimize f (x1 , x2 ) by
first fixing x2 and minimizing with respect to x1 , then fixing x1 and minimizing with respect to x2 ,
and repeating.
Step 1: Minimize with respect to x1 (fixing x2 )
Fix x2 and minimize f (x1 , x2 ) as a function of x1 .

f (x1 , x2 ) = x21 − x1 x2 − 3x1 + x22

Taking the derivative with respect to x1 :

∂f
= 2x1 − x2 − 3
∂x1
Set the derivative equal to zero to find the minimum:
x2 + 3
2x1 − x2 − 3 = 0 ⇒ x1 =
2
This gives us the value of x1 that minimizes f , given x2 .
Step 2: Minimize with respect to x2 (fixing x1 )
Now fix x1 and minimize f (x1 , x2 ) as a function of x2 .

10
f (x1 , x2 ) = x21 − x1 x2 − 3x1 + x22
Taking the derivative with respect to x2 :

∂f
= 2x2 − x1
∂x2
Set the derivative equal to zero to find the minimum:
x1
2x2 − x1 = 0 ⇒ x2 =
2
This gives us the value of x2 that minimizes f , given x1 .
Step 3: Iterative process
Start with the initial point x(0) = (0, 0)T .
(1) 0+3
1. **Iteration 1:** - Fix x2 = 0 and minimize with respect to x1 : x1 = 2 = 1.5. - Now fix
(1) 1.5
x1 = 1.5 and minimize with respect to x2 : x2 = 2 = 0.75.
(2) 0.75+3
2. **Iteration 2:** - Fix x2 = 0.75 and minimize with respect to x1 : x1 = 2 = 1.875. -
(2) 1.875
Now fix x1 = 1.875 and minimize with respect to x2 : x2 = 2 = 0.9375.
(3) 0.9375+3
3. **Iteration 3:** - Fix x2 = 0.9375 and minimize with respect to x1 : x1 = 2 = 1.96875.
(3) 1.96875
- Now fix x1 = 1.96875 and minimize with respect to x2 : = x2 = 0.984375.
2
Through repeated iterations, this process converges to the solution x1 = 2, x2 = 1, which is the
minimizer x∗ = (2, 1)T , as claimed.
—
Part 2: Minimizing f (x) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|}
The new function to minimize is:

f (x1 , x2 ) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|}

We are asked to use the same method of alternate directions, starting with x(0) = (0, 0)T .
Step 1: Fix x2 = 0 and minimize with respect to x1
The function becomes:

f (x1 , 0) = max{|x1 − 7|, |2x1 − 5|}

To minimize this, consider the two expressions: - |x1 − 7| - |2x1 − 5|
We need to find where these two expressions are equal, as this is likely to minimize f (x1 , 0).
Setting them equal to each other:

|x1 − 7| = |2x1 − 5|
Solving this gives two cases: 1. x1 − 7 = 2x1 − 5 ⇒ x1 = 12 2. x1 − 7 = −(2x1 − 5) ⇒
x1 = 4
At x1 = 4, both expressions are equal, so the minimum occurs at x1 = 4.
Step 2: Fix x1 = 4 and minimize with respect to x2
The function becomes:

f (4, x2 ) = max{|4 + 2x2 − 7|, |8 + x2 − 5|}

Simplifying:

11
f (4, x2 ) = max{|2x2 − 3|, |x2 + 3|}
Set the two expressions equal to each other to minimize:

|2x2 − 3| = |x2 + 3|
Solving this gives two cases: 1. 2x2 − 3 = x2 + 3 ⇒ x2 = 6 2. 2x2 − 3 = −(x2 + 3) ⇒
x2 = 0
At x2 = 0, both expressions are equal, so the minimum occurs at x2 = 0.
—
Why don’t we get the minimizer x∗ = (1, 3)T with f (x∗ ) = 0?
The reason we do not reach the minimizer x∗ = (1, 3)T with f (x∗ ) = 0 using this method is
that the method of alternate directions may not always work well with **non-differentiable** or
**nonsmooth** functions like this one.
The function f (x1 , x2 ) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|} involves taking the maximum of
absolute value terms, which introduces non-smoothness and kinks in the function. The method
of alternate directions works well for smooth, differentiable functions, but it can struggle with
functions that have sharp changes or discontinuities in their gradient, like this one. Thus, the
iterative process does not converge to the true minimizer x∗ = (1, 3)T .

2 Penalty Methods
The penalty method is an optimization technique used to solve constrained optimization problems
by converting them into unconstrained optimization problems. The general idea is to add a penalty
term to the objective function, which discourages violations of the constraints by penalizing the
objective function for constraint violations.
Here are a few examples of how the penalty method is applied in optimization:

2.1 Quadratic Penalty Method (Equality Constraints)

Consider the optimization problem:

min f (x) subject to g(x) = 0

Here, g(x) is the equality constraint.
In the quadratic penalty method, the constrained problem is transformed into an unconstrained
problem by adding a penalty term to the objective function. The modified objective function is:
µ
P (x, µ) = f (x) + kg(x)k2
2
where: - µ is the penalty parameter, which controls the weight of the penalty term. - kg(x)k2
is the squared norm of the constraint violation.
As µ increases, the penalty for constraint violations becomes more severe, encouraging the
solution to approach the feasible region where g(x) = 0.

12
2.2 Quadratic Penalty Method (Inequality Constraints)
Consider the optimization problem:

min f (x) subject to g(x) ≤ 0

In this case, the quadratic penalty method can be modified for inequality constraints. The
penalty function becomes:
µ
P (x, µ) = f (x) + max(0, g(x))2
2
Here, the penalty is applied only when the constraint is violated (i.e., when g(x) > 0).

2.3 Exterior Penalty Method

In the exterior penalty method, the penalty term is applied in such a way that it becomes infinite
when the solution violates the constraints, forcing the solution to remain in the feasible region.
The objective function is modified as follows:
For a constrained problem with inequality constraints g(x) ≤ 0:
1X
P (x, µ) = f (x) + max(0, gi (x))2
µ
i

As µ approaches zero, any violation of the constraint causes the penalty term to grow very
large, driving the solution towards the feasible region.

2.4 Interior Penalty Method (Logarithmic Barrier)

In the interior penalty method (or barrier method), the penalty is applied when the solution
approaches the boundary of the feasible region. This is commonly used for inequality constraints.
For a problem with constraints g(x) ≤ 0, the modified objective function becomes:
1X
P (x, µ) = f (x) − log(−gi (x))
µ
i

Here, gi (x) < 0 ensures that the argument of the logarithm is positive. As x approaches the
boundary gi (x) = 0, the penalty term becomes very large, discouraging solutions close to the
boundary.

2.5 Augmented Lagrangian Method

The augmented Lagrangian method combines both a penalty term and a Lagrange multiplier. For
a problem with equality constraints g(x) = 0, the objective function becomes:
µ
P (x, λ, µ) = f (x) + λT g(x) + kg(x)k2
2
Here: - λ is the vector of Lagrange multipliers. - The penalty term µ2 kg(x)k2 helps enforce the
constraint.
The augmented Lagrangian method is more stable than pure penalty methods because it incor-
porates both the Lagrange multipliers and a penalty term.

13
2.6 Logarithmic Penalty for Optimization Problems with Inequality Constraints
Given an optimization problem with inequality constraints:

min f (x) subject to g(x) ≤ 0

The logarithmic barrier penalty method transforms the objective function by adding a barrier
term:
1
P (x, µ) = f (x) − log(−g(x))
µ
As x approaches the boundary where g(x) = 0, the logarithmic term approaches infinity, pre-
venting the solution from violating the inequality constraint.

2.7 Penalty Method for Box Constraints

Consider the optimization problem with box constraints l ≤ x ≤ u, where l and u are lower and
upper bounds on x.
The penalty method adds a term to penalize solutions outside the feasible range:
X
max(0, xi − ui )2 + max(0, li − xi )2

P (x, µ) = f (x) + µ
i

Here, the penalty function grows large when the solution x violates the bounds l ≤ x ≤ u.

2.8 Quadratic Penalty for Multiple Constraints

For a problem with multiple constraints g1 (x) = 0, g2 (x) ≤ 0, . . ., the penalty method could combine
terms for both equality and inequality constraints:
 
X X
P (x, µ) = f (x) + µ  gi (x)2 + max(0, gj (x))2 
i j

2.9 Penalty Method in Machine Learning (Lasso Regression)

In Lasso regression, the penalty method is used to enforce sparsity in the model coefficients. The
Lasso objective function includes a penalty term on the 1-norm of the coefficient vector β:
!
X
T 2
min (yi − Xi β) + λkβk1
β
i

Here, λ controls the strength of the penalty on the coefficients.

2.10 Penalty Method for Quadratically Constrained Optimization

Consider the problem:

min f (x) subject to g(x) ≤ 0, h(x) = 0

The penalty method modifies the objective function as follows:

14
 
X X
P (x, µ) = f (x) + µ  max(0, gj (x))2 + hi (x)2 
j i

This penalizes violations of both inequality and equality constraints, driving the solution towards
feasibility.
Conclusion:
The penalty method is a powerful tool in optimization for handling constraints. By adding
penalty terms to the objective function, constraint violations can be incorporated into the op-
timization process. Different types of penalty methods (quadratic, barrier, and augmented La-
grangian) allow for flexibility in handling various types of constraints (equality, inequality, and box
constraints).

2.11 Examples of Penalty Methods

Here are some examples of optimization problems that use penalty methods along with their solu-
tions:
Example 1: Quadratic Penalty Method for Equality Constraints
Problem: Minimize the objective function:

f (x) = (x1 − 2)2 + (x2 − 3)2

subject to the equality constraint:

g(x) = x1 + x2 − 5 = 0
Solution:
1. Set up the penalty function: The penalty method transforms the constrained problem into
an unconstrained problem by adding a penalty term proportional to the square of the constraint
violation. The penalty function is:
µ
P (x, µ) = f (x) + g(x)2
2
So the penalty function becomes:
µ
P (x, µ) = (x1 − 2)2 + (x2 − 3)2 + (x1 + x2 − 5)2
2
2. Choose an initial value for µ: Let µ = 10 (a typical starting point for penalty parameter).
3. Minimize the penalty function: Now, we minimize P (x, µ) with respect to x1 and x2 . Taking
the partial derivatives and solving:

∂P (x, µ)
= 2(x1 − 2) + 10(x1 + x2 − 5) = 0
∂x1
∂P (x, µ)
= 2(x2 − 3) + 10(x1 + x2 − 5) = 0
∂x2
Solving these equations simultaneously, we get:

x1 = 1.25, x2 = 3.75

15
4. Check constraint satisfaction: g(x) = x1 + x2 − 5 = 1.25 + 3.75 − 5 = 0, so the constraint is
satisfied.
Final solution: The optimal solution is x1 = 1.25, x2 = 3.75, and the minimum value of the
objective function is f (1.25, 3.75) = 3.125.
Example 2: Quadratic Penalty Method for Inequality Constraints
Problem: Minimize the objective function:

f (x) = (x1 − 1)2 + (x2 − 2)2

subject to the inequality constraint:

g(x) = x21 + x22 − 4 ≤ 0

Solution:
1. Set up the penalty function: In the case of inequality constraints, we penalize the violation
when g(x) > 0. The penalty function is:
µ
P (x, µ) = f (x) +max(0, g(x))2
2
For this problem, the penalty function becomes:
µ
P (x, µ) = (x1 − 1)2 + (x2 − 2)2 + max(0, x21 + x22 − 4)2
2
2. Choose an initial value for µ: Let µ = 100 (chosen to heavily penalize constraint violations).
3. Minimize the penalty function: Since we can’t differentiate directly due to the max function,
we use numerical methods to minimize P (x, µ) over x1 and x2 .
After performing minimization, we find:

x1 = 1.196, x2 = 1.961
4. Check constraint satisfaction: g(x) = x21 + x22 − 4 = 1.1962 + 1.9612 − 4 = 3.9999 − 4 ≈ 0, so
the inequality constraint is satisfied.
Final solution: The optimal solution is x1 = 1.196, x2 = 1.961, and the minimum value of the
objective function is approximately f (1.196, 1.961) ≈ 1.005.
Example 3: Logarithmic Barrier Method for Inequality Constraints
Problem: Minimize the objective function:

f (x) = x21 + x22

subject to the inequality constraint:

g(x) = x1 + x2 − 1 ≤ 0
Solution:
1. Set up the barrier function: In the logarithmic barrier method, the penalty is infinite when
the inequality constraint is violated. The penalty function is:
1
P (x, µ) = f (x) − log(−g(x))
µ
For this problem, the penalty function becomes:

16
1
P (x, µ) = x21 + x22 − log(1 − (x1 + x2 ))
µ
2. Choose an initial value for µ: Let µ = 0.1.
3. Minimize the penalty function: We minimize P (x, µ) using numerical methods. After mini-
mization, we obtain:

x1 = 0.5, x2 = 0.5
4. Check constraint satisfaction: g(x) = x1 + x2 − 1 = 0.5 + 0.5 − 1 = 0, so the inequality
constraint is satisfied.
Final solution: The optimal solution is x1 = 0.5, x2 = 0.5, and the minimum value of the
objective function is f (0.5, 0.5) = 0.5.
Example 4: Augmented Lagrangian Method
Problem: Minimize the objective function:

f (x) = x21 + x22

subject to the equality constraint:

g(x) = x1 + x2 − 2 = 0
Solution:
1. Set up the augmented Lagrangian: The augmented Lagrangian function is:
µ
L(x, λ, µ) = f (x) + λg(x) + g(x)2
2
For this problem, the augmented Lagrangian becomes:
µ
L(x, λ, µ) = x21 + x22 + λ(x1 + x2 − 2) +(x1 + x2 − 2)2
2
2. Choose initial values for λ and µ: Let λ = 1 and µ = 10.
3. Minimize the augmented Lagrangian: We minimize L(x, λ, µ) with respect to x1 and x2 .
After minimization, we find:

x1 = 1, x2 = 1
4. Check constraint satisfaction: g(x) = x1 +x2 −2 = 1+1−2 = 0, so the constraint is satisfied.
Final solution: The optimal solution is x1 = 1, x2 = 1, and the minimum value of the objective
function is f (1, 1) = 2.
Summary: - Quadratic Penalty Method: Adds a squared penalty term for constraint violations.
- Logarithmic Barrier Method: Adds a logarithmic penalty term to discourage constraint violations,
particularly near the boundary. - Augmented Lagrangian Method: Combines Lagrange multipliers
with a quadratic penalty term, improving stability and convergence.
These methods convert constrained problems into easier-to-solve unconstrained problems by
penalizing constraint violations, ultimately leading to the optimal solution.

17
2.12 Lagrangian Multipliers for Optimality Conditions
Let’s go through a couple of numerical examples to see how to apply the Lagrange Multiplier Rule
to find the necessary optimality conditions for constrained optimization problems.
Example 1: Maximizing a function subject to a constraint
Problem Maximize the function:
f (x, y) = x2 + y 2
subject to the constraint:
g(x, y) = x + y − 1 = 0
Step 1: Define the Lagrangian We introduce the Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x2 + y 2 + λ(x + y − 1)
Step 2: Compute partial derivatives We now compute the partial derivatives of the Lagrangian
with respect to x, y, and λ:
- ∂L ∂L ∂L
∂x = 2x + λ = 0 - ∂y = 2y + λ = 0 - ∂λ = x + y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ = 0 2. 2y + λ = 0 3. x + y = 1
From the first two equations, we have:

λ = −2x and λ = −2y

Thus, x = y.
Substitute x = y into the third equation:
1
x+x=1 ⇒ 2x = 1 ⇒ x=
2
Since x = y, we also have y = 21 .
Step 4: Solution The 2 2
1 1
maximum value of f (x, y) = x + y subject to the constraint x + y = 1
occurs at (x, y) = 2 , 2 .
Example 2: Minimizing a function subject to two constraints
Problem Minimize the function:

f (x, y, z) = x2 + y 2 + z 2

subject to the constraints:

g1 (x, y, z) = x + y + z − 1 = 0 and g2 (x, y, z) = x − y = 0

Step 1: Define the Lagrangian We introduce two Lagrange multipliers, λ1 and λ2 , for the
constraints and form the Lagrangian:

L(x, y, z, λ1 , λ2 ) = x2 + y 2 + z 2 + λ1 (x + y + z − 1) + λ2 (x − y)

Step 2: Compute partial derivatives We now compute the partial derivatives of the Lagrangian
with respect to x, y, z, λ1 , and λ2 :
- ∂L ∂L ∂L ∂L
∂x = 2x + λ1 + λ2 = 0 - ∂y = 2y + λ1 − λ2 = 0 - ∂z = 2z + λ1 = 0 - ∂λ1 = x + y + z − 1 = 0
∂L
- ∂λ 2
=x−y =0

18
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ1 + λ2 = 0 2. 2y + λ1 − λ2 = 0 3. 2z + λ1 = 0 4. x + y + z = 1 5. x − y = 0
From equation 5, we know x = y. Substituting x = y into equations 1 and 2 gives:

2x + λ1 + λ2 = 0 and 2x + λ1 − λ2 = 0

Adding these two equations results in:

4x + 2λ1 = 0 ⇒ λ1 = −2x

Substituting λ1 = −2x into equation 3:

2z − 2x = 0 ⇒ z=x

Now, using x = y = z in equation 4:

1
x+x+x=1 ⇒ 3x = 1 ⇒ x=
3
Thus, y = z = 13 .
Step 4: Solution The minimum value of f (x, y, z) = x2 + y 2 + z 2 subject to the constraints
1 1 1

occurs at (x, y, z) = 3 , 3 , 3 .
Let’s look at a few more examples of writing the necessary optimality conditions using the
**Lagrange Multiplier Rule** in constrained optimization problems.
Example 3: Maximizing a function with a circular constraint
Problem Maximize the function:
f (x, y) = x + y
subject to the constraint:
g(x, y) = x2 + y 2 − 1 = 0
(This is the equation of a circle with radius 1.)
Step 1: Define the Lagrangian We introduce a Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x + y + λ(x2 + y 2 − 1)
Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, and λ:
- ∂L ∂L ∂L 2 2
∂x = 1 + 2λx = 0 - ∂y = 1 + 2λy = 0 - ∂λ = x + y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
1 + 2λx = 0 2. 1 + 2λy = 0 3. x2 + y 2 = 1
From equations 1 and 2, we can solve for λ:
1 1
λ=− and λ = −
2x 2y
Equating the two expressions for λ:
1 1
− =− ⇒ x=y
2x 2y
Substituting x = y into the constraint equation x2 + y 2 = 1:
1
x2 + x2 = 1 ⇒ 2x2 = 1 ⇒ x = ±√
2

19
Thus, y = ± √12 .
Step 4: Solution The two critical points are:

1 1 1 1
(x, y) = √ , √ and (x, y) =−√ , −√
2 2 2 2
√
Evaluating f (x, y) = x + y at these points: - At √12 , √12 , f (x, y) = 2. - At − √12 , − √12 ,
√
f (x, y) = − 2. √ √
Thus, the maximum value is 2, and the minimum value is − 2.
Example 4: Minimizing a quadratic function with two constraints
Problem Minimize the function:
f (x, y) = x2 + y 2
subject to the constraints:
g1 (x, y) = x + y − 2 = 0
and
g2 (x, y) = x − y − 1 = 0
Step 1: Define the Lagrangian We introduce two Lagrange multipliers, λ1 and λ2 , for the
constraints and form the Lagrangian:

L(x, y, λ1 , λ2 ) = x2 + y 2 + λ1 (x + y − 2) + λ2 (x − y − 1)

Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, λ1 , and λ2 :
- ∂L ∂L ∂L ∂L
∂x = 2x + λ1 + λ2 = 0 - ∂y = 2y + λ1 − λ2 = 0 - ∂λ1 = x + y − 2 = 0 - ∂λ2 = x − y − 1 = 0
Step 3: Solve the system of equations We now solve the following system of equations: 1.
2x + λ1 + λ2 = 0 2. 2y + λ1 − λ2 = 0 3. x + y = 2 4. x − y = 1
From equations 3 and 4, we can solve for x and y: - Adding the two equations: (x+y)+(x−y) =
2 + 1 ⇒ 2x = 3 ⇒ x = 32 - Substituting x = 23 into x + y = 2: 32 + y = 2 ⇒ y = 12
3 1 3
Now,
1
substitute x = 2 and y = 2 into the system: 1. 2 2 + λ1 + λ2 = 0 ⇒ 3 + λ1 + λ2 = 0
2. 2 2 + λ1 − λ2 = 0 ⇒ 1 + λ1 − λ2 = 0
From equation 2, solve for λ1 :
λ1 = λ2 − 1
Substitute into equation 1:

3 + (λ2 − 1) + λ2 = 0 ⇒ 3 + 2λ2 − 1 = 0 ⇒ 2λ2 = −2 ⇒ λ2 = −1

Substitute λ2 = −1 into λ1 = λ2 − 1:

λ1 = −1 − 1 = −2

Step 4: Solution The critical point is 23 , 12 . The minimum value of f (x, y) = x2 + y 2 at this

point is:
2 2
3 1 3 1 9 1 10
f , = + = + = = 2.5
2 2 2 2 4 4 4

Example 5: Minimizing a function subject to a linear constraint

20
Problem Minimize the function:

f (x, y) = x2 + 4y 2

subject to the constraint:

g(x, y) = x + y − 3 = 0
Step 1: Define the Lagrangian We introduce a Lagrange multiplier λ for the constraint and
form the Lagrangian:
L(x, y, λ) = x2 + 4y 2 + λ(x + y − 3)
Step 2: Compute partial derivatives We compute the partial derivatives of the Lagrangian with
respect to x, y, and λ:
- ∂L
∂x = 2x + λ = 0

3 Optimality Conditions
Let us find all three cones (cone of descent directions, cone of linearization, and cone of feasible
directions) for some problems at a specific point (x0 , y0 ).

Example: 1
Objective:
f (x, y) = (x − 1)2 + (y − 2)2
Constraint:
g(x, y) = x + 2y − 4 ≤ 0
We analyze the cones at (x0 , y0 ) = (0, 0).
1. Cone of Descent Direction: The cone of descent direction is formed by all directions d =
(dx , dy ) that satisfy:
∇f (x0 , y0 ) · d < 0
Gradient of f (x, y):
2(x − 1)
∇f (x, y) =
2(y − 2)
At (x0 , y0 ) = (0, 0):
−2
∇f (0, 0) =
−4
Direction d = (dx , dy ) must satisfy:

−2dx − 4dy < 0

or equivalently:
dx + 2dy > 0
Cone of Descent Directions: The cone of descent directions is all d = (dx , dy ) such that dx +2dy >
0. This is a half-space where the direction vector points away from the line dx + 2dy = 0.
—

21
2. Cone of Linearization: The cone of linearization is determined by the linearized form of the
constraint:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
Gradient of g(x, y):
1
∇g(x, y) =
2
At (x0 , y0 ) = (0, 0):
1
∇g(0, 0) =
2
At (0, 0):
g(0, 0) = 0 + 0 − 4 = −4,
Linearized constraint:
1 · dx + 2 · dy ≤ 4
Cone of Linearization: The cone of linearization is all d = (dx , dy ) such that dx + 2dy ≤ 4. This
is another half-space, bounded by the line dx + 2dy = 4, and includes the region below or on the
line.
3. Cone of Feasible Directions: A feasible direction satisfies both the linearized constraint and
lies in the feasible region. At a boundary point g(x0 , y0 ) = 0, the feasible direction also satisfies:

∇g(x0 , y0 ) · d ≤ 0

At (x0 , y0 ) = (0, 0): The point (x0 , y0 ) = (0, 0) is not on the boundary (g(0, 0) = −4, strictly
feasible), so the feasible directions are the same as the cone of linearization:

dx + 2dy ≤ 0

Summary of Cones at (x0 , y0 ) = (0, 0):

1. Cone of Descent Directions:
dx + 2dy > 0
2. Cone of Linearization:
dx + 2dy ≤ 4
3. Cone of Feasible Directions:
dx + 2dy ≤ 0
Visualization:
The cones can be visualized as follows:
Cone of Descent Directions is a half-space pointing upwards and away from the line dx +2dy = 0.
Cone of Linearization is a half-space bounded below by the line dx + 2dy = 0.
Cone of Feasible Directions coincides with the cone of linearization since the point is strictly
feasible.
Here are the detailed solutions for the **cone of feasible directions**, **linearizing cone**, and
**cone of descent directions** for the five examples provided above.
—

22
Example: 2
Objective:
f (x, y) = x2 + y 2
Constraint:
g(x, y) = 2x + y − 5 ≤ 0
Point: (x0 , y0 ) = (1, 2)
1. Cone of Feasible Directions:
The feasible direction satisfies:
∇g(x0 , y0 ) · d ≤ 0
Gradient of g:
∇g(x, y) = [2, 1]
At (1, 2), the condition is:
2dx + dy ≤ 0
2. Linearizing Cone:
The linearizing cone includes the feasibility condition and adjusts for the constraint value:

∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
At (1, 2):
g(1, 2) = 2 + 2 − 5 = −1,

2dx + dy ≤ 1
3. Cone of Descent Directions:
Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [2x, 2y]
At (1, 2):
∇f (1, 2) = [2, 4]
Condition for descent direction:
2dx + 4dy < 0

Example 3: Multiple linear constraints

Objective:
f (x, y) = (x − 3)2 + (y − 4)2
Constraints:
g1 (x, y) = x − y − 1 ≤ 0, g2 (x, y) = 2x + y − 6 ≤ 0
Point: (x0 , y0 ) = (2, 3)
1. Cone of Feasible Directions:

23
Feasible directions satisfy:

∇g1 (x0 , y0 ) · d ≤ 0, ∇g2 (x0 , y0 ) · d ≤ 0

Gradients:
∇g1 (x, y) = [1, −1], ∇g2 (x, y) = [2, 1]
Conditions:
dx − dy ≤ 0, 2dx + dy ≤ 0
2. Linearizing Cone: Add the constraint values:

∇g1 (x0 , y0 ) · d ≤ −g1 (x0 , y0 ), ∇g2 (x0 , y0 ) · d ≤ −g2 (x0 , y0 )

At (2, 3):
g1 (2, 3) = 2 − 3 − 1 = −2, g2 (2, 3) = 2(2) + 3 − 6 = 1
Conditions:
dx − dy ≤ 2, 2dx + dy ≤ −1
3. Cone of Descent Directions: Descent direction satisfies:

∇f (x0 , y0 ) · d < 0

Gradient of f :
∇f (x, y) = [2(x − 3), 2(y − 4)]
At (2, 3):
∇f (2, 3) = [−2, −2]
Condition:
−2dx − 2dy < 0

Example 4: Nonlinear constraint

Objective:
f (x, y) = sin(x) + y 2
Constraint:
g(x, y) = x2 + y 2 − 4 ≤ 0
Point: (x0 , y0 ) = (1, 1)
1. Cone of Feasible Directions:
Feasible directions satisfy:
∇g(x0 , y0 ) · d ≤ 0
Gradient of g:
∇g(x, y) = [2x, 2y]
At (1, 1):
∇g(1, 1) = [2, 2]
Condition:
2dx + 2dy ≤ 0

24
2. Linearizing Cone:
Add the constraint value:
∇g(x0 , y0 ) · d ≤ −g(x0 , y0 )
At (1, 1):
g(1, 1) = 12 + 12 − 4 = −2
Condition:
2dx + 2dy ≤ 2
3. Cone of Descent Directions:
Descent direction satisfies:
∇f (x0 , y0 ) · d < 0
Gradient of f :
∇f (x, y) = [cos(x), 2y]
At (1, 1):
∇f (1, 1) = [cos(1), 2]
Condition:
cos(1)dx + 2dy < 0

In convex optimization problems, the Karush-Kuhn-Tucker (KKT) conditions are typically both
**necessary and sufficient** for optimality, given certain regularity conditions. However, there are
cases where these regularity conditions fail, and the KKT conditions may not be necessary, even
for convex problems. Below are examples illustrating such situations:
Example 1: Non-Smooth Constraint Consider the convex optimization problem:

Minimize f (x) = |x|, subject to x2 ≤ 0.

Analysis:

The constraint x2 ≤ 0 implies x = 0, making x = 0 the only feasible point and hence the
global minimum.

The objective function f (x) = |x| is convex.

However, f (x) is not differentiable at x = 0, and this violates the regularity conditions
required for the KKT conditions to apply. Therefore, KKT conditions cannot characterize
the solution, even though the problem is convex.

Example 2: Missing Constraint Qualification (MFCQ Violation) Consider:

Minimize f (x) = x, subject to g(x) = −x ≤ 0.

Analysis:

The problem is convex because both f (x) and g(x) are convex.

The feasible region is x ≥ 0, and the optimal solution is x = 0.

25
The gradient of g(x), ∇g(x) = −1, does not satisfy the regularity condition known as the
**Mangasarian-Fromovitz Constraint Qualification (MFCQ)**. Specifically, there is no di-
rection d such that ∇g(x)d < 0 for x = 0.

Since MFCQ fails, the KKT conditions are not guaranteed to hold, even though the problem
is convex.
Example 3: Semi-Infinite Constraints Consider:

Minimize f (x) = x2 , subject to g(x) = max(0, −x) = 0.

Analysis:
The constraint g(x) = max(0, −x) = 0 essentially forces x ≥ 0.

The problem is convex, and the solution is x = 0.

However, the non-differentiable nature of g(x) (due to the ”max” function) makes it chal-
lenging to define a valid Lagrange multiplier. The KKT conditions do not apply because the
regularity conditions (like differentiability or constraint qualifications) are violated.
Example 4: Non-Interior Feasible Point (Slater’s Condition Violation)** Consider the problem:

Minimize f (x, y) = x2 + y 2 , subject to g(x, y) = y − |x| ≤ 0.

Analysis:
The objective function f (x, y) = x2 + y 2 is convex, and the constraint g(x, y) = y − |x| defines
a feasible region bounded by y ≤ |x|.

The objective function f (x, y) = x2 + y 2 is convex, and the constraint g(x, y) = y − |x| defines
a feasible region bounded by y ≤ |x|.

The optimal solution is at (x, y) = (0, 0), which lies on the boundary of the feasible region.

Slater’s condition fails because there is no strictly interior point satisfying g(x, y) < 0 (the
feasible region touches the origin at the boundary). Consequently, KKT conditions may not
apply.
Example 5: Non-Differentiable Objective Function

Minimize f (x) = |x|, subject to x ≥ 0.

Analysis:
The function f (x) = |x| is convex but not differentiable at x = 0.

The feasible region x ≥ 0 makes x = 0 the optimal solution.

The lack of differentiability at x = 0 prevents the application of KKT conditions, even though
the problem is convex.
Example 6: Inactive Constraints**

Minimize f (x, y) = x2 + y 2 , subject to x ≥ 0, y ≥ 0, x + y ≤ −1.

Analysis

26
The objective f (x, y) = x2 + y 2 is convex.

The constraints x ≥ 0, y ≥ 0, and x + y ≤ −1 are convex, but the feasible set is empty (the
constraints are inconsistent).

Since there are no feasible points, the KKT conditions cannot apply.

Example 7: Degenerate Feasible Region**

Minimize f (x, y) = x + y, subject to x = 0, y = 0.

Analysis:

The feasible region is degenerate: it is a single point at (x, y) = (0, 0), which is trivially the
optimal solution.

The constraint qualifications (e.g., MFCQ or Slater’s condition) fail because the gradient of
the constraints cannot span the full space of variables.

The KKT conditions are not applicable, even though the problem is convex.

Example 8: Infinite Dimensional Constraints**

Z 1 Z 1
2
Minimize f (x) = x(t) dt, subject to g(x) = x(t) dt ≤ 1.
0 0

Analysis:
R1 R1
The objective f (x) = 0 x(t)2 dt is convex, and the constraint g(x) = 0 x(t) dt − 1 ≤ 0 is
also convex.

Infinite-dimensional problems often fail standard constraint qualifications, making it chal-

lenging to apply KKT conditions directly.

Example 9: Equality Constraint with Linearly Dependent Gradients

Minimize f (x1 , x2 ) = x21 + x22 , subject to x1 + x2 = 1, x1 − x2 = 1.

Analysis:

The problem is convex, but the two equality constraints x1 + x2 = 1 and x1 − x2 = 1 are
linearly dependent, leading to a violation of the regularity conditions.

The KKT conditions cannot be used to characterize the solution.

These examples demonstrate that while convexity simplifies optimization problems, regularity
conditions such as Slater’s condition, differentiability, and linear independence of constraints play
crucial roles in ensuring the applicability of KKT conditions. When these conditions are violated,
even convex problems may lack necessary KKT characterizations.

References

Constrained Optimization & Lagrange Multipliers
No ratings yet
Constrained Optimization & Lagrange Multipliers
25 pages
Multi Unconstrained Optimisation
100% (1)
Multi Unconstrained Optimisation
12 pages
Multiple-Choice-Questions For PMSO PDF
100% (4)
Multiple-Choice-Questions For PMSO PDF
6 pages
2 - Monaco Tips and Tricks
100% (1)
2 - Monaco Tips and Tricks
99 pages
C. B. Gupta - Optimization Techniques in Operation Research-I.K. International (2020)
No ratings yet
C. B. Gupta - Optimization Techniques in Operation Research-I.K. International (2020)
381 pages
1DMCQDMC
No ratings yet
1DMCQDMC
45 pages
203
No ratings yet
203
19 pages
Linear Programming Notes
100% (1)
Linear Programming Notes
9 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
Maxima Minima For Several Variables
No ratings yet
Maxima Minima For Several Variables
20 pages
An Introduction To Model-Based Predictive Control (MPC) : ECE 680 Fall 2017
No ratings yet
An Introduction To Model-Based Predictive Control (MPC) : ECE 680 Fall 2017
25 pages
Lecture NLP 1+updated+ (Part+2) Print
No ratings yet
Lecture NLP 1+updated+ (Part+2) Print
49 pages
09 EC Introduction
No ratings yet
09 EC Introduction
22 pages
MAE Optimization Lecture 2 Handout
No ratings yet
MAE Optimization Lecture 2 Handout
46 pages
Chapter 6 Lecture Notes
No ratings yet
Chapter 6 Lecture Notes
4 pages
Solving Optimization Problems: Debasis Samanta
No ratings yet
Solving Optimization Problems: Debasis Samanta
22 pages
EC400 Slides Lecture 1
No ratings yet
EC400 Slides Lecture 1
44 pages
4-Optimization of 2 Variables, Gradient Descent
No ratings yet
4-Optimization of 2 Variables, Gradient Descent
12 pages
Modern Pharmaceutics - Optimization Techniques
No ratings yet
Modern Pharmaceutics - Optimization Techniques
20 pages
Process Optimizationcorrect1without Addendum 1
No ratings yet
Process Optimizationcorrect1without Addendum 1
55 pages
Chapter - 10 - (Un) Constrained - Optimization - LMS - Answers
No ratings yet
Chapter - 10 - (Un) Constrained - Optimization - LMS - Answers
80 pages
Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey
No ratings yet
Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey
28 pages
Lecture 32 34
No ratings yet
Lecture 32 34
71 pages
I Problemas5 Sol
No ratings yet
I Problemas5 Sol
13 pages
29-Introduction To Classical Optimization-20-03-2024
No ratings yet
29-Introduction To Classical Optimization-20-03-2024
70 pages
Unconstrained Optimization - Ipynb - Colaboratory
No ratings yet
Unconstrained Optimization - Ipynb - Colaboratory
5 pages
MAE Opti Worksheet 3 Correction
No ratings yet
MAE Opti Worksheet 3 Correction
6 pages
Semester Result Gezette Faculty of Science Fall-24
No ratings yet
Semester Result Gezette Faculty of Science Fall-24
226 pages
Classroom - Notes1 2022
No ratings yet
Classroom - Notes1 2022
48 pages
Dssm-U5 MHK
No ratings yet
Dssm-U5 MHK
51 pages
Chapter 2 - Unconstrained Optimization
No ratings yet
Chapter 2 - Unconstrained Optimization
20 pages
UNIT12345
No ratings yet
UNIT12345
14 pages
Non Linear Programming
No ratings yet
Non Linear Programming
15 pages
Optimization Lesson 2 - Constrained Multi-Variable Optimization
No ratings yet
Optimization Lesson 2 - Constrained Multi-Variable Optimization
31 pages
03a Optimization
No ratings yet
03a Optimization
33 pages
The Method of Lagrange Multipliers
No ratings yet
The Method of Lagrange Multipliers
4 pages
K Chqe LXZ 2 BJai MRL
No ratings yet
K Chqe LXZ 2 BJai MRL
43 pages
Material and Energy Balance
No ratings yet
Material and Energy Balance
26 pages
5 Optimization Techniques
No ratings yet
5 Optimization Techniques
40 pages
E1251 Aug 3:0 Linear and Nonlinear Optimization: Instructor
No ratings yet
E1251 Aug 3:0 Linear and Nonlinear Optimization: Instructor
2 pages
Lect 4 Unconstraint Optimization
No ratings yet
Lect 4 Unconstraint Optimization
16 pages
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
No ratings yet
Unconstrained Optimization Methods: Amirkabir University of Technology Dr. Madadi
10 pages
Constrained Optimization: Class Notes On: Mathematical Foundations in Engineering, ECEG 6209
100% (1)
Constrained Optimization: Class Notes On: Mathematical Foundations in Engineering, ECEG 6209
19 pages
Chapter 2 Constrained Optimization Mat Econ 3rd y
No ratings yet
Chapter 2 Constrained Optimization Mat Econ 3rd y
15 pages
CE 771 Lecture 4 Interaction PDF
No ratings yet
CE 771 Lecture 4 Interaction PDF
11 pages
Uncostrained and Constrained Optimization. Due On 26.03.2015
No ratings yet
Uncostrained and Constrained Optimization. Due On 26.03.2015
3 pages
Constrained Optimization
No ratings yet
Constrained Optimization
10 pages
Introduction To Optimization
No ratings yet
Introduction To Optimization
18 pages
Lecture 18
No ratings yet
Lecture 18
7 pages
CSE488 Lab6 Optimization
No ratings yet
CSE488 Lab6 Optimization
20 pages
Chapter 4. Optimization
No ratings yet
Chapter 4. Optimization
62 pages
E604 Lect3s06
No ratings yet
E604 Lect3s06
13 pages
7H Positive Definite
No ratings yet
7H Positive Definite
19 pages
Chapter 2 Optimization
No ratings yet
Chapter 2 Optimization
22 pages
Mathematical Economics (ECON 471) Unconstrained & Constrained Optimization
No ratings yet
Mathematical Economics (ECON 471) Unconstrained & Constrained Optimization
20 pages
Module - 2 Lecture Notes - 3 Optimization of Functions of Multiple Variables: Unconstrained Optimization
No ratings yet
Module - 2 Lecture Notes - 3 Optimization of Functions of Multiple Variables: Unconstrained Optimization
4 pages
55 Optimization
No ratings yet
55 Optimization
21 pages
Solved Papers QM II (2005) Quiz3 (Key)
No ratings yet
Solved Papers QM II (2005) Quiz3 (Key)
9 pages
Chapter 4 - Constrained Optimization
No ratings yet
Chapter 4 - Constrained Optimization
13 pages
Lecture 2 - Optimization With Equality Constraints
No ratings yet
Lecture 2 - Optimization With Equality Constraints
44 pages
Chapter 2. Constrained Optimization
No ratings yet
Chapter 2. Constrained Optimization
53 pages
Power Systems Operation and Management: Second Lecture
No ratings yet
Power Systems Operation and Management: Second Lecture
35 pages
1 Mathematical Preliminaries 2
No ratings yet
1 Mathematical Preliminaries 2
17 pages
NEOM UNIT-1 Sept-23
No ratings yet
NEOM UNIT-1 Sept-23
34 pages
Optimal Control
No ratings yet
Optimal Control
48 pages
Lec 18
No ratings yet
Lec 18
6 pages
Na
No ratings yet
Na
67 pages
Gradient Based Optimization
No ratings yet
Gradient Based Optimization
24 pages
1.bOPTIMIZATION MPH - Naveena Quincy
No ratings yet
1.bOPTIMIZATION MPH - Naveena Quincy
86 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
46 pages
ECN 2115 Lecture 1 - 2
No ratings yet
ECN 2115 Lecture 1 - 2
6 pages
10e 03 Chap Student Workbook
No ratings yet
10e 03 Chap Student Workbook
21 pages
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
No ratings yet
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
22 pages
Lec 17 Multivariable OT
No ratings yet
Lec 17 Multivariable OT
30 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Dokumen - Pub - Nonlinear Optimization 9783030194611 9783030194628
No ratings yet
Dokumen - Pub - Nonlinear Optimization 9783030194611 9783030194628
374 pages
CH 3
No ratings yet
CH 3
25 pages
Penaltyfunctionmethodsusingmatrixlaboratory MATLAB
No ratings yet
Penaltyfunctionmethodsusingmatrixlaboratory MATLAB
39 pages
EC400 Optimisation and Fixed Points Course Pack
No ratings yet
EC400 Optimisation and Fixed Points Course Pack
57 pages
MATH-418 Course Outline
No ratings yet
MATH-418 Course Outline
3 pages
Dynamic Timetable Scheduler: Click To Edit Master Title Style
No ratings yet
Dynamic Timetable Scheduler: Click To Edit Master Title Style
9 pages
6 MOLP and Goal Programming
No ratings yet
6 MOLP and Goal Programming
30 pages
Foundations of Mathematical Economics Michael Carter PDF Download
No ratings yet
Foundations of Mathematical Economics Michael Carter PDF Download
77 pages
Multiple-Objective (Goal) Programming Model For Feed Formulation: An Example For Reducing Nutrient Variation
No ratings yet
Multiple-Objective (Goal) Programming Model For Feed Formulation: An Example For Reducing Nutrient Variation
11 pages
Chapter 1 Introduction To Optimal Control
No ratings yet
Chapter 1 Introduction To Optimal Control
24 pages
1updated HHC TT Spring 25 - 250129 - 165355
No ratings yet
1updated HHC TT Spring 25 - 250129 - 165355
10 pages
Chapra 15
No ratings yet
Chapra 15
3 pages
Analytical PPT 2
No ratings yet
Analytical PPT 2
7 pages
112-2 Week9 Econ
No ratings yet
112-2 Week9 Econ
15 pages
Modelling and Simulation
No ratings yet
Modelling and Simulation
6 pages
AI Humaira - Rani 21021510 071 P - 09db6fd7409b69db
No ratings yet
AI Humaira - Rani 21021510 071 P - 09db6fd7409b69db
7 pages
Hira M&S
No ratings yet
Hira M&S
7 pages
Index CODaniel Dadush
No ratings yet
Index CODaniel Dadush
1 page
Nmoup 2.3
No ratings yet
Nmoup 2.3
5 pages
Attendece Sheet Section (7th A) .Ab
No ratings yet
Attendece Sheet Section (7th A) .Ab
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet

Optimization Theory 2

Uploaded by

Optimization Theory 2

Uploaded by

Optimization Theory

Muhammad Abid Dar

December 16, 2024

1 Unconstrained Optimization Problems

 f (x) is the objective function to be minimized (or maximized).

 x is the vector of decision variables.

1.1 Key Features of Unconstrained Optimization

1.2 Types of Unconstrained Optimization Problems

 Involves optimizing a function with a single variable.

 Involves optimizing a function of multiple variables.

1.3 Solving Unconstrained Optimization Problems

2. Numerical Methods: - For more complex or non-analytical functions, numerical methods

xk+1 = xk − α∇f (xk )

where α is the step size.

1.5 Example Problem

1. Compute the gradient:

2. Set the gradient to zero:

3. The critical point is (x, y) = (2, −3).

4. To confirm it’s a minimum, check the Hessian matrix:

Conclusion: Unconstrained optimization problems focus solely on optimizing a function without

∂f 2x(1 + x2 + y 2 ) − 2x(x2 + y 2 ) ∂f 2y(1 + x2 + y 2 ) − 2y(x2 + y 2 )

1.7 Solution to the exercise of Chapter:1

x2j = ayj2 + bxj yj + cxj + dyj + e for j = 1, 2, . . . , 10

The least squares solution is given by:

MATLAB Code to Solve the Problem % Given data points (xj , yj )

MATLAB Code to Solve the Problem

Objective: min krk2 = min kb − Axk2

kb − Axk2 = (b − Ax)T (b − Ax)

(b − Ax)T (b − Ax) = bT b − 2bT Ax + xT AT Ax

f (x1 , x2 ) = x21 + x22 − x1 x2 − 3x1

f (x1 , x2 ) = x21 − x1 x2 − 3x1 + x22

f (x1 , x2 ) = max{|x1 + 2x2 − 7|, |2x1 + x2 − 5|}

f (x1 , 0) = max{|x1 − 7|, |2x1 − 5|}

f (4, x2 ) = max{|4 + 2x2 − 7|, |8 + x2 − 5|}

2.1 Quadratic Penalty Method (Equality Constraints)

min f (x) subject to g(x) = 0

min f (x) subject to g(x) ≤ 0

2.3 Exterior Penalty Method

2.4 Interior Penalty Method (Logarithmic Barrier)

2.5 Augmented Lagrangian Method

min f (x) subject to g(x) ≤ 0

2.7 Penalty Method for Box Constraints

2.8 Quadratic Penalty for Multiple Constraints

2.9 Penalty Method in Machine Learning (Lasso Regression)

Here, λ controls the strength of the penalty on the coefficients.

2.10 Penalty Method for Quadratically Constrained Optimization

min f (x) subject to g(x) ≤ 0, h(x) = 0

2.11 Examples of Penalty Methods

f (x) = (x1 − 2)2 + (x2 − 3)2

f (x) = (x1 − 1)2 + (x2 − 2)2

g(x) = x21 + x22 − 4 ≤ 0

f (x) = x21 + x22

f (x) = x21 + x22

λ = −2x and λ = −2y

subject to the constraints:

g1 (x, y, z) = x + y + z − 1 = 0 and g2 (x, y, z) = x − y = 0

Adding these two equations results in:

Substituting λ1 = −2x into equation 3:

Now, using x = y = z in equation 4:

3 + (λ2 − 1) + λ2 = 0 ⇒ 3 + 2λ2 − 1 = 0 ⇒ 2λ2 = −2 ⇒ λ2 = −1

Example 5: Minimizing a function subject to a linear constraint

subject to the constraint:

−2dx − 4dy < 0

Summary of Cones at (x0 , y0 ) = (0, 0):

Example 3: Multiple linear constraints

∇g1 (x0 , y0 ) · d ≤ 0, ∇g2 (x0 , y0 ) · d ≤ 0

∇g1 (x0 , y0 ) · d ≤ −g1 (x0 , y0 ), ∇g2 (x0 , y0 ) · d ≤ −g2 (x0 , y0 )

Example 4: Nonlinear constraint

Minimize f (x) = |x|, subject to x2 ≤ 0.

 The objective function f (x) = |x| is convex.

Example 2: Missing Constraint Qualification (MFCQ Violation) Consider:

Minimize f (x) = x, subject to g(x) = −x ≤ 0.

 The feasible region is x ≥ 0, and the optimal solution is x = 0.

Minimize f (x) = x2 , subject to g(x) = max(0, −x) = 0.

 The problem is convex, and the solution is x = 0.

Minimize f (x, y) = x2 + y 2 , subject to g(x, y) = y − |x| ≤ 0.

f (x) is the objective function to be minimized (or maximized).

x is the vector of decision variables.

Involves optimizing a function with a single variable.

Involves optimizing a function of multiple variables.

The objective function f (x) = |x| is convex.

The feasible region is x ≥ 0, and the optimal solution is x = 0.

The problem is convex, and the solution is x = 0.

The feasible region x ≥ 0 makes x = 0 the optimal solution.

Infinite-dimensional problems often fail standard constraint qualifications, making it chal-

The KKT conditions cannot be used to characterize the solution.