0% found this document useful (0 votes)
16 views37 pages

Lecture14 KKT

The document discusses Karash-Kuhn-Tucker (KKT) conditions for constrained optimization problems. It introduces Lagrangian multipliers and defines the primal and dual problems. The minimax inequality establishes weak duality between the primal and dual problems. KKT conditions provide a method to find the optimal solution when strong duality holds between the primal and dual problems. Strong duality holds when the problem satisfies Slater's condition.

Uploaded by

srinivasa.reddy1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views37 pages

Lecture14 KKT

The document discusses Karash-Kuhn-Tucker (KKT) conditions for constrained optimization problems. It introduces Lagrangian multipliers and defines the primal and dual problems. The minimax inequality establishes weak duality between the primal and dual problems. KKT conditions provide a method to find the optimal solution when strong duality holds between the primal and dual problems. Strong duality holds when the problem satisfies Slater's condition.

Uploaded by

srinivasa.reddy1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Karash-Kuhn-Tucker conditions

MFDS Team
Introduction

▶ We will look at constrained optimization and Lagrange


multipliers.
▶ We will look at primal and dual problems and how their
solutions are related.
▶ We will set up Karash-Kuhn-Tucker conditions.
Constrained optimization and Lagrange multipliers

▶ Consider the following problem: minx f (x ), f : RD → R,


subject to additional constraints - so we are looking at a
minimization problem except that the set of all x over which
minimization is performed is not all of RD .
▶ The constrained problem becomes minx f (x ) subject to
gi (x ) ≤ 0 ∀i1, 2, . . . m.
▶ Since we have a method of finding a solution to the
unconstrained optimization problem, one way to proceed now
is to convert the given constrained optimization problem into
an unconstrained one.
▶ We construct J(x ) = f (x ) + i=mi=1 1(gi (x )), where 1(z) = 0
P
for z ≤ 0 and 1(z) = ∞ for z > 0.
Constrained optimization and Lagrange multipliers

▶ The formulation of J(x ) in the previous slide ensures that its


value is infinity if any one of the constraints gi (x ) is not
satisfied. This ensures that the optimal solution to the
unconstrained problem is the same as the constrained problem.
▶ The step-function is also difficult to optimize, and our
solution is to replace the step-function by a linear function
using Lagrange multipliers.
▶ We create the Lagrangian of the given constrained
optimization problem as follows:
L(x , λ) = f (x ) + i=1 λi gi (x ) = f (x ) + λT g (x ), where
Pi=m
λi ≥ 0 for all i.
Primal and dual problems

▶ The primal problem is min f (x ) subject to


gi (x ) ≤ 0, 1 ≤ i ≤ m. Optimization is performed over the
primal variables x .
▶ The associated Lagrangian dual problem is maxλ∈Rm D(λ)
subject to λ ≥ 0 where λ are dual variables.
▶ D(λ) = minx ∈Rd L(x , λ).
▶ The following minimax inequality holds over two arguments
x , y : maxy minx ϕ(x , y ) ≤ minx maxy ϕ(x , y ).
Minimax inequality

▶ Why is this inequality true?


▶ Assume that x , y : maxy minx ϕ(x , y ) = ϕ(xA , yA ) and
minx maxy ϕ(x , y ) = ϕ(xB , yB ).
▶ Fixing y at yA we see that the inner operation on the left
hand side of the minimax inequality is a min operation over x
and returns xA . Thus we have ϕ(xA , yA ) ≤ ϕ(xB , yA ).
▶ Fixing x at xB we see that the inner operation on the right
hand side of the minimax inequality is a max operation over y
and returns yB . Thus we have ϕ(xB , yB ) ≥ ϕ(xB , yA ).
▶ From the above we conclude that ϕ(xB , yB ) ≥ ϕ(xA , yA ).
Minimax inequality

▶ The difference between J(x ) and the Lagrangian L(x , λ) is


that the indicator function is relaxed to a linear function.
▶ When λ ≥ 0, the Lagrangian L(x , λ) is a lower bound on
J(x ).
▶ The maximum of L(x , λ) with respect to λ is J(x ) - if the
point x satisfies all the constraints gi (x ) ≤ 0, then the
maximum of the Lagrangian is obtained at λ = 0 and it is
equal to J(x ). If one or more constraints is violated such that
gi (x ) > 0, then the associated Lagrangian coefficient λi can
be taken to be ∞ so as to equal J(x ).
Minimax inequality

▶ From the previous slide, we have J(x ) = maxλ≥0 L(x , λ).


▶ Our original constrained optimization problem boiled down to
minimizing J(x ), in other words we are looking at
minx ∈Rd maxλ≥0 L(x , λ)
▶ Using the minimax inequality we see that
minx ∈Rd maxλ≥0 L(x , λ) ≥ maxλ≥0 minx ∈Rd L(x , λ).
▶ This is known as weak duality. The inner part of the right
hand side of the inequality is D(λ), and the inequality above
is the reason for setting up the associated Lagrangian dual
problem for the original constrained optimization problem.
Lagrangian formulation

▶ In contrast to the original formulation


D(λ) = minx ∈Rd L(x , λ) is an unconstrained optimization
problem for a given value of λ.
▶ We observe that D(λ) = minx ∈Rd L(x , λ) is a point-wise
minimum of affine functions and hence D(λ) is concave even
though f () and g () may be nonconvex.
▶ We have obtained a Lagrangian formulation for a constrained
optimization problem where the constraints are inequalities.
What happens when some constraints are equalities?
Modeling equality constraints

▶ Suppose the problem is minx f (x ) subject to gi (x ) ≤ 0 for all


1 ≤ i ≤ m and hj (x ) = 0 for 1 ≤ j ≤ n.
▶ We model the equality constraint hj (x ) = 0 with two
inequality constraints hj (x ) ≥ 0 and hj (x ) ≤ 0.
▶ The resulting Lagrange multipliers are then unconstrained.
▶ The Lagrange multipliers for the original inequality constraints
are non-negative while those corresponding to the equality
constraints are unconstrained.
Convex optimization

▶ We are interested in a class of optimization problems where


we can guarantee global optimality.
▶ When f (), the objective function, is a convex function and g ()
and h() are convex functions, we have a convex optimization
problem.
▶ In this setting we have strong duality - the optimal solution of
the primal problem is equal to the optimal solution of the dual
problem.
▶ What is a convex function?
Convex function

▶ First we need to know what is a convex set. A set C is a


convex set if for any x, y ∈ C , θx + (1 − θ)y ∈ C where
0 ≤ θ ≤ 1.
▶ For any two points lying in the convex set, a line joining them
lies entirely in the convex set.
▶ Let a function f : Rd → R be a function whose domain is a
convex set C .
▶ The function is a convex function if for any x , y ∈ C ,
f (θx + (1 − θ)y ) ≤ θf (x ) + (1 − θ)f (y )
▶ Another way of looking at a convex function is to use the
gradient: for any two points x and y , we have
f (y ) ≥ f (x ) + ∇x f (x )(y − x ).
Example

▶ The negative entropy, a useful function in Machine Learning,


is convex: f (x) = xlog2 x for x > 0.
▶ First let us check if f (θx + (1 − θ)y ) ≤ θf (x) + (1 − θ)f (y ).
Take x = 2, y = 4, and θ = 0.5 to get
f (0.5 ∗ 2 + 0.5 ∗ 4) = f (3) = 3log2 3 ≈ 4.75. Then
θf (2) + (1 − θ)f (4) = 0.5 ∗ 2log2 2 + 0.5 ∗ 4log2 4 = log2 32 = 5.
Therefore the convexity criterion is satisfied for these two
points.
▶ Let us now use the gradient criterion. We have
∇x = log2 x + x xlog1 e 2 . Calculating f (2) + ∇f (2) ∗ (4 − 2)
gives 2log2 2 + (log2 2 + log1e 2 ∗ 2 ≈ 6.9. We see that
f (4) = 4log2 4 = 8 which shows that the gradient criterion is
also satisfied.
Linear programming

▶ Let us look at a convex optimization problem where the


objective function and constraints are all linear.
▶ Such a convex optimization problem is called a linear
programming problem.
▶ We can express a linear programming problem as minx c T x
subject to Ax ≤ b where A ∈ Rm×d and b ∈ Rm×1 .
▶ The Lagrangian L(x , λ) is given by
L(x , λ) = c T x + λT (Ax − b ) where λ ∈ Rm is the vector of
non-negative Lagrangian multipliers.
▶ We can rewrite the Lagrangian as
L(x , λ) = (c + AT λ)T x − λT b .
Linear programming

▶ Taking the derivative of the Lagrangian with respect to x and


setting it to zero we get c + AT λ = 0.
▶ Since D(λ) = minx ∈Rd L(x , λ), plugging in the above
equation gives D(λ) = −λT b .
▶ We would like to maximize D(λ), subject to the constraint
λ ≥ 0.
▶ Thus we end up with the following problem:

maxλ∈Rm − λT b
subject to c + AT λ = 0
λ≥0
Linear programming

▶ We can solve the original primal linear program or the dual


one - the optimum in each case is the same.
▶ The primal linear program is in d variables but the dual is in
m variables, where m is the number of constraints in the
original primal program.
▶ We choose to solve the primal or dual based on which of m or
d is smaller.
Quadratic programming

▶ We now consider the case of a quadratic objective function


subject to affine constraints:

minx ∈Rd
2
x Qx + c T x subject to
1 T

Ax ≤ b
▶ Here A ∈ Rm×d , b ∈ Rm , c ∈ Rd
Quadratic programming

▶ The Lagrangian L(x , λ) is given by


2 x Qx + c x + λ (Ax − b ).
1 T T T

▶ Rearranging the above we have


L(x , λ) = 12 x T Qx + (c + AT λ)T x − λT b
▶ Taking the derivative of L(x , λ) and setting it equal to zero
gives Qx + (c + AT λ) = 0.
▶ If we take Q to be invertible, we have x = Q −1 (c + AT λ).
▶ Plugging this value of x into L(x , λ) gives us
D(λ) = − 21 (c + AT λ)Q −1 (c + AT λ) − λT b .
▶ This gives us the dual optimization problem:
maxλ∈Rm − 12 (c + AT λ)Q −1 (c + AT λ) − λT b subject to
λ ≥ 0.
Strong duality

▶ The minimax inequality establishes weak duality which states


that the optimal solution of the primal problem is greater than
or equal to that of the dual problem.
▶ When equality holds, this becomes strong duality.
▶ Strong duality is useful in that one can solve the dual problem
to get the same solution as solving the primal problem.
▶ Solving the dual problem may be easier.
▶ When does strong duality hold?
Slater’s condition

We shall work with the following optimization problem:

minimize f (x ) subject to
gi (x ) ≤ 0 ∀i ∈ [m]
hj (x ) = 0 ∀j ∈ [p]

The Lagrangian associated with this optimization problem is


i=m j=p
minimizef (x ) + λ i g (x ) + νj hj (x )
X X

i=1 j=1

The λi s and hj s are called Lagrange multipliers.


Slater’s condition

Given a Lagrangian L(x , λ, ν) over some optimization domain D,


the Lagrangian dual is the function F (λ, ν) = inf x ∈D L(x , λ, ν).
The dual optimization problem is

max F (λ, ν)
subject to λ ≥ 0
Slater’s condition

▶ For a primal optimization problem we say that it obeys Slater’s


condition if the objective function f is convex, the constraints
gi are all convex and the contraint functions hi are all linear
and there exists a point x̄ in the interior of the region, i.e
gi (x̄ ) < 0 for all i ∈ [m] and hj (x̄ ) = 0 for all j ∈ [p].
▶ Theorem: Suppose Slater’s condition holds and the
region has a non-empty interior. Then we have strong
duality.
Proof of Slater’s condition

Let us define two sets A = {(u , v , t)|∃x ∈ D such that gi (x ) ≤


ui , i = 1 . . . m, hi (x) = vi i = 1 . . . p, f (x ) ≤ t} and
B = {(0, 0, s) ∈ Rm × Rp × R|s < p ∗ }, where p ∗ is the optimal
value to the primal problem.
We can show that the sets A and B are convex sets and are
disjoint. This means according to the separating hyperplane
theorem, there exists a separating hyperplane which separates two
disjoint convex sets.
Proof of Slater’s condition

We can define the separating hyerplane as follows:

(u , v , t) ∈ A =⇒ λ̃T u + ν̃ T v + µt ≥ α
(0, 0, t) ∈ B =⇒ λ̃T u + ν̃ T v + µt ≤ α

We can see from the above that λ̃ ≥ 0 and µ > 0. This is because
if (u , v , t) ∈ A then (k u , v , kt), k > 1 ∈ A, and a negative λ̃, µ will
make the left hand-side of the inequality arbitrarily small, so that it
cannot be lower-bounded by α.
The bottom condition means that µt ≤ α for t < p ∗ which means
that µp ∗ ≤ α.
Proof of Slater’s condition

For any x ∈D
i=m
λ̃i gi (x ) + ν T (Ax − b) + µf (x ) ≥ α ≥ µp ∗
X

i=1

There are now two cases: µ > 0 and µ = 0. When µ > 0, we can
divide the left and right-hand sides to get

L(x , λ̃/µ, ν̃/µ) ≥ p ∗

for all x ∈ D. Defining λ = λ̃/µ and ν = ν̃/µ, we can set


g (λ, ν) = infx L(x , λ, ν). We can see that g (λ, ν) ≥ p ∗ .
Proof of Slater’s condition

▶ By weak duality we know that p ∗ ≥ g (λ, ν). From the


previous slide we have g (λ, ν) = p ∗ .
▶ Let us now consider the case µ = 0.
▶ Then, for all x ∈ D, we have i=1 λ̃i gi (x ) + ν T (Ax − b) ≥ 0.
Pi=m

▶ For the point x̃ that satisfies Slater’s condition (which is


Pi=m
gi (x̃ ) < 0 and Ax̃ = b), we have i=1 λ̃i gi (x̃ ) ≥ 0.
Proof of Slater’s condition

▶ From gi (x̃ ) < 0 and λ̃i ≥ 0, we conclude that λ̃i = 0.


▶ From (λ̃, ν̃, µ) ̸= 0, and (λ̃, µ) = 0 we conclude that ν̃ ̸= 0.
i=1 λ̃i gi (x ) + ν (Ax − b) ≥ 0, we have
▶ Then from i=m T
P
ν̃ T (Ax − b) ≥ 0.
▶ We already know that x̃ is such that Ax̃ − b = 0. Since
x̃ ∈ D, we can think of a point x̃ + ϵ ∈ D such that
ν̃ T (A(x̃ + ϵ) − b) < 0 unless ν̃ T A = 0.
▶ But if there exists non-zero ν̃ such that ν̃ T A = 0, then it
means A does not have rank p which is a contradiction. Thus
ν̃ = 0, but this contradicts (λ̃, ν̃, µ) ̸= 0. Therefore µ cannot
be zero.
Duality gap

▶ In some cases computing the optimum solution for the dual


problem is easier than computing the optimal solution for the
primal problem.
▶ Let α∗ denote the optimal solution to the primal problem and
β ∗ denote the optimal solution to the dual problem. From
weak duality we know that α∗ ≥ β ∗ .
▶ Any feasible solution to the dual problem is a lower bound on
the optimal solution to the primal problem.
▶ We have f (x ) − α∗ < f (x ) − F (λ, ν). If we know that
f (x ) − F (λ, ν) < ϵ, then we know that x is at most ϵ away
from the true optimal solution. f (x ) − F (λ, ν) is called the
duality gap.
Complementary slackness

We make the following claim: Claim 1: Let x ∗ ∈ Rn be primal


optimal and (λ∗ , ν ∗ ) ∈ Rm × Rp be dual optimal. Then
▶ x ∗ ∈ argminx L(x , λ∗ , ν ∗ )
▶ λ∗i gi (x ∗ ) = 0 ∀i ∈ [m]
Proof of complementary slackness

We have f (x ∗ ) = F (λ∗ , ν∗) because of strong duality. Then we


can write

f (x ∗ ) = F (λ∗ , ν∗)
= infx (f (x ) + λ∗i gi (x ) + νi∗ hi (x ))
X X

i∈[m] i∈[p]

≤ f (x ∗ ) + λ∗i gi (x ∗ ) + νi∗ hi (x ∗ )
X X

i∈[m] i∈[p]
≤ f (x ∗)
Proof of complementary slackness

▶ The first line of the preceding set of equations is due to strong


duality.
▶ The second line states shows how the optimal dual solution is
defined.
▶ The third line is simply the definition of the Lagrangian dual.
▶ The fourth and final line comes about because we know that
the primal feasibility of x ∗ gives gi (x ∗ ) ≤ 0, hi (x ∗ ) = 0 and
the dual feasibility of (λ∗ , ν ∗ ) gives λ∗i ≥ 0.
Proof of complementary slackness

▶ Our chain of inequalities started with f (x ∗ ) and ended with


f (x ∗ ). Thus the inequalities are actually equalities. In
particular, there isP an equality between the third and fourth
i∈[m] λi gi (x ) = 0. Each term in the
line which means ∗ ∗

summation i∈[m] λ∗i gi (x ∗ ) has the same sign which means


P
that the sum can be zero only when each term is zero. Thus
we have λ∗i gi (x ∗ ) = 0 ∀i ∈ [m]. This is known as
complementary slackness.
KKT conditions for strong duality

Given a primal optimization problem, we say that x ∗ and


(λ∗ , ν ∗ ) ∈ Rm × Rp respect the Karash-Kuhn-Tucker conditions if:
▶ gi (x ∗ ) ≤ 0 ∀i ∈ [m].
▶ hi (x ∗ ) = 0 ∀i ∈ [p].
▶ λ∗i ≥ 0 ∀i ∈ [m].
▶ λ∗i gi (x ∗ ) = 0 ∀i ∈ [m].
▶ ∇f (x ∗ ) + i=m i=1 λi ∇gi (x ) + i=1 νi ∇hi (x ) = 0.
P ∗ ∗
Pi=p ∗ ∗
KKT conditions for strong duality

Theorem: For any optimization problem, if strong duality holds


then any primal optimal solution x ∗ and dual optimal solution
(λ∗ , ν ∗ ) ∈ Rm × Rp respect the KKT conditions. Conversely if f
and gi are convex for all i ∈ [m] and hi are affine for all i ∈ [p]
then the KKT conditions are sufficient for strong duality.
Therefore the KKT conditions are both necessary and sufficient for
strong duality. We will show the proof in the next slides.
KKT conditions for strong duality

Assume that strong duality holds and x ∗ and (λ∗ , ν ∗ ) ∈ Rm × Rp


are primal and dual optimal solutions. Since x ∗ is feasible, we see
that the first two KKT conditions are true: gi (x ∗ ) ≤ 0 ∀i ∈ [m]
and hi (x ∗ ) = 0 ∀i ∈ [p]. Since (λ∗ , ν ∗ ) is dual feasible, we see that
the third KKT condition is true: λ∗i ≥ 0.

The previous claim we proved establishes that for the primal and
dual feasible solutions, the fourth KKT condition must hold, i.e
λ∗i gi (x ∗ ) = 0 ∀i ∈ [m]. The previous claim also establishes that
x ∗ ∈ argminx L(x , λ∗ , ν ∗ ), which means that the gradient of L must
vanish at x ∗ . Thus the last KKT condition must hold true.
Sufficiency - KKT conditions for strong duality

Now we will show that if we assume the KKT conditions and the
problem is convex, we have strong duality. The first two conditions
gi (x ∗ ) ≤ 0 ∀i ∈ [m] and hi (x ∗ ) = 0 ∀i ∈ [p]
The condition λ∗i ≥ 0 ∀i ∈ [m] together with the information that
f and the constraints gi are convex and the constraints hi are
affine enable us to establish
Pi=m that
∗ ∗
i=1 νi hi (x ) is a convex
L(x , λ , ν ) = f (x ) + i=1 λ∗i gi (x ) + i=m ∗
P
function.
By the last condition we see that the gradient of this convex
function vanishes at x ∗ which means x ∗ is a local and global
minimum.
Sufficiency - KKT conditions for strong duality

Thus we have

F (λ∗ , ν ∗ ) = L(x ∗ , λ∗ , ν ∗ )
i=m i=p
= f (x ) + x νi∗ hi (x ∗ )
X X

λ∗i gi ( ∗ ) +
i=1 i=1
= f (x ∗ )

You might also like