100% found this document useful (1 vote)
53 views73 pages

1 Basics Optimization

The following notes have been extracted from Stephen Boyd's lecture notes for his course on convex optimization. The Goal is to minimize the cost by choosing a suitable decision. Concrete features of problem formulation: z is real vector space: Continuous problem. z is discrete set: discrete or combinatorial problem.

Uploaded by

Jesse Martin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
53 views73 pages

1 Basics Optimization

The following notes have been extracted from Stephen Boyd's lecture notes for his course on convex optimization. The Goal is to minimize the cost by choosing a suitable decision. Concrete features of problem formulation: z is real vector space: Continuous problem. z is discrete set: discrete or combinatorial problem.

Uploaded by

Jesse Martin
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Model Predictive Control for Linear and

Hybrid Systems.
Basics on Optimization
Francesco Borrelli
Department of Mechanical Engineering,
University of California at Berkeley,
USA
[email protected]
February 3, 2011
References
From my Book:
Main Concepts (Chapter 2)
Optimality Conditions: Lagrange duality theory and KKT Conditions
(Chapter 3)
Polyhedra, Polytopes and Simplices (Chapter 4)
Linear and Quadratic Programming (Chapter 5)
Note: The following notes have been extracted from
Stephen Boyds lecture notes for his course on convex optimization. The
full notes can be downloaded from http://www.stanford.edu/~boyd
Nonlinear Programming Theory and Algorithms by Bazaraa, Sherali,
Shetty
LMIs in Control by Scherer and Weiland
Lectures on Polytopes by Ziegler
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 2 / 73
Outline
1
Main concepts
Optimization problems
Continuous problems
Integer and Mixed-Integer Problems
Convexity
2
Optimality conditions: Lagrange duality theory and KKT conditions
3
Polyhedra, polytopes and simplices
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 3 / 73
Abstract optimization problems
Optimization problems are abundant in economical daily life (wherever a
decision problem is)
General abstract features of problem formulation:

Decision set Z

Constraints on decision and subset S Z of feasible decisions.

Assign to each decision a cost f(x) R.

Goal is to minimize the cost by choosing a suitable decision.


Concrete features of problem formulation:

Z is real vector space: Continuous problem.

Z is discrete set: Discrete or combinatorial problem.


Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 4 / 73
Concrete Optimization Problems
Generally formulated as
inf
z
f(z)
subj. to z S Z
(1)
The vector z collects the decision variables
Z is the domain of the decision variables,
S Z is the set of feasible or admissible decisions.
The function f : Z R assigns to each decision z a cost f(z) R.
Shorter form of problem (1)
inf
zSZ
f(z) (2)
Problem (2) is a called nonlinear mathematical program or simply nonlinear
program.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 5 / 73
Concrete Optimization Problems
Solving problem (2) means
Compute the least possible cost J

(called the optimal value)


J

inf
zS
f(z) (3)
J

is the greatest lower bound of f(z) over the set S:


f(z) J

, z S
AND
1
z S : f( z) = J

OR
2
> 0 z S| f(z) J

+
Compute the optimizer, z

S withf(z

) = J

. If z

exists, then
rewrite (3) as
J

= min
zS
f(z) (4)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 6 / 73
Concrete Optimization Problems
Consider the Nonlinear Program (NLP)
J

= min
zS
f(z)
Notation:
If J

= the problem is unbounded below.


If the set S is empty then the problem is said to be infeasible (we set
J

= +).
If S = Z the problem is said to be unconstrained.
The set of all optimal solutions is denoted by
argmin
zS
f(z) {z S : f(z) = J

}
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 7 / 73
Continuous problems
Consider the problem
inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
(5)
The domain Z of the problem (5) is a subset of R
s
(the
nite-dimensional Euclidian vector-space), dened as:
Z = {z R
s
: z dom f, z dom g
i
, i = 1, . . . , m,
z dom h
i
, i = 1, . . . , p}
A point z R
s
is feasible for problem (5) if: z Z
and g
i
( z) 0 for i = 1, . . . , m, h
i
( z) = 0 for i = 1, . . . , p
The set of feasible vectors is
S = {z R
s
: z Z, g
i
(z) 0, i = 1, . . . , m, h
i
(z) = 0,
i = 1, . . . , p}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 8 / 73
Local and Global Optimizer
Let J

be the optimal value of problem (5). A global optimizer, if it


exists, is a feasible vector z

with f(z

) = J

.
A feasible point z is a local optimizer for problem (5) if there exists an
R > 0 such that
f( z) = inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z z R
z Z
(6)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 9 / 73
Active, Inactive and Redundant Constraints
Consider the problem
inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
The i-th inequality constraint g
i
(z) 0 is active at z if g
i
( z) = 0,
otherwise is inactive
Equality constraints are always active for all feasible points.
Removing a redundant constraint does not change the feasible set S,
this implies that removing a redundant constraint from the optimization
problem does not change its solution.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 10 / 73
Problem Description
The functions f, g
i
and h
i
can be available in analytical form or can be
described through an oracle model (also called black box or
subroutine model).
In an oracle model f, g
i
and h
i
are not known explicitly but can be
evaluated by querying the oracle. Often the oracle consists of subroutines
which, called with the augment z, return f(z), g
i
(z) and h
i
(z) and their
gradients f(z), g
i
(z), h
i
(z).
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 11 / 73
Integer and Mixed-Integer Problems
If the decision set Z in the optimization problem is nite, then the
optimization problem is called combinatorial or discrete.
If Z {0, 1}
s
, then the problem is said to be integer.
If Z is a subset of the Cartesian product of an integer set and a real
Euclidian space, i.e., Z {[z
c
, z
b
] : z
c
R
s
c
, z
b
{0, 1}
s
b
}, then the
problem is said to be mixed-integer.
The standard formulation of a mixed-integer non-linear program is
inf
[z
c
,z
b
]
f(z
c
, z
b
)
subj. to g
i
(z
c
, z
b
) 0 for i = 1, . . . , m
h
i
(z
c
, z
b
) = 0 for i = 1, . . . , p
z
c
R
s
c
, z
b
{0, 1}
s
b
[z
c
, z
b
] Z
(7)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 12 / 73
Convexity
A set S R
s
is convex if
z
1
+ (1 )z
2
S for all z
1
, z
2
S, [0, 1].
A function f : S R is convex if S is convex and
f(z
1
+ (1 )z
2
) f(z
1
) + (1 )f(z
2
)
for all z
1
, z
2
S, [0, 1].
A function f : S R is strictly convex if S is convex and
f(z
1
+ (1 )z
2
) < f(z
1
) + (1 )f(z
2
)
for all z
1
, z
2
S, (0, 1).
A function f : S R is concave if S is convex and f is convex.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 13 / 73
Operations preserving convexity
1
The intersection of an arbitrary number of convex sets is a convex set:
if S
n
is convex n N
+
then

nN
+
S
n
is convex.
The empty set is convex.
2
The sub-level sets of a convex function f on S are convex:
if f(z) is convex then S

{z S : f(z) } is convex .
3
f
1
, . . . , f
N
convex

N
i=1

i
f
i
is convex function
i
0, i = 1, . . . , N.
4
The composition of a convex function f(z) with an ane map z = Ax +b
generates a convex function f(Ax +b) of x.
5
A linear function f(z) = c

z +d is both convex and concave.


6
A quadratic function f(z) = z

Qz + 2s

z +r is convex if and only if


Q R
ss
is positive semidenite. Strictly convex if Q R
ss
is positive
denite.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 14 / 73
Operations preserving convexity
Suppose f(x) = h(g(x)) = h(g
1
(x), . . . , g
k
(x)) with h : R
k
R,
g
i
: R
s
R. Then,
1
f is convex if h is convex, h is nondecreasing in each argument, and g
i
are
convex,
2
f is convex if h is convex, h is nonincreasing in each argument, and g
i
are
concave,
3
f is concave if h is concave, h is nondecreasing in each argument, and g
i
are concave.
The pointwise maximum of a set of convex functions is a convex function:
f
1
(z), . . . , f
k
(z) convex functions
f(z) = max{f
1
(z), . . . , f
k
(z)} is a convex function.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 15 / 73
Convex optimization problems
Problem (5) is convex if the cost f is convex on Z and S is convex.
In convex optimization problems local optimizers are also global
optimizers.
Non-convex optimization problems solved by iterating between the
solutions of convex sub-problems.
Convexity of the feasible set S dicult to prove except in special cases.
(See operations preserving the convexity)
Non-convex problems exist which can be transformed into convex
problems through a change of variables and manipulations on cost and
constraints.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 16 / 73
Outline
1
Main concepts
2
Optimality conditions: Lagrange duality theory and KKT conditions
Optimality Conditions
Duality theory
Certicate of optimality
Complementarity slackness
KKT conditions
3
Polyhedra, polytopes and simplices
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 17 / 73
Optimality Conditions
Consider the problem
inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
In general, an analytical solution does not exist.
Solutions are usually computed by recursive algorithms which start from
an initial guess z
0
and at step k generate a point z
k
such that
{f(z
k
)}
k=0,1,2,...
converges to J

.
These algorithms recursively use and/or solve analytical conditions for
optimality
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 18 / 73
Optimality Conditions for Unconstrained
Optimization Problems
Theorem (Necessary condition*)
f : R
s
R dierentiable at z. If there exists a vector d such that
f( z)

d < 0, then there exists a > 0 such that f( z +d) < f( z) for all
(0, ).
The vector d in the theorem above is called descent direction.
The direction of steepest descent d
s
at z is dened as the normalized
direction where f( z)

d
s
< 0 is minimized.
The direction d
s
of steepest descent is d
s
=
f( z)
f( z)
.
Corollary
f : R
s
R is dierentiable at z. If z is a local minimizer, then f( z) = 0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 19 / 73
Optimality Conditions for Unconstrained
Optimization Problems
Theorem (Sucient condition*)
Suppose that f : R
s
R is twice dierentiable at z. If f( z) = 0 and the
Hessian of f(z) at z is positive denite, then z is a local minimizer.
Theorem (Necessary and sucient condition*)
Suppose that f : R
s
R is dierentiable at z. If f is convex, then z is a global
minimizer if and only if f( z) = 0.

Proofs available in Chapter 4 of M.S. Bazaraa, H.D. Sherali, and C.M.


Shetty. Nonlinear Programming Theory and Algorithms. John Wiley & Sons,
Inc., New York, second edition, 1993.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 20 / 73
Duality Theory. The Lagrange Function
Consider the primal optimization problem
inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
Any feasible point: upper bound of the optimal value
Lagrange dual problem: lower bound on optimal value
Construct Lagrange function
L(z, u, v) = f(z) +u
1
g
1
(z) + . . . +u
m
g
m
(z)+
+v
1
h
1
(z) +. . . +v
p
h
p
(z)
More compact
L(z, u, v) f(z) +u

g(z) +v

h(z)
u
i
and v
i
called Lagrange multipliers or dual variables
objective is augmented with weighted sum of constraint functions
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 21 / 73
Duality Theory. The Lagrange Function
Consider Lagrange function
L(z, u, v) f(z) +u

g(z) +v

h(z)
Let z S be feasible. For arbitrary vectors u 0 and v trivially have
L(z, u, v) f(z)
After inmization we infer
inf
zZ
L(z, u, v) inf
zZ, g(z)0, h(z)=0
f(z)
Best lower bound: since u 0 and v arbitrary
sup
(u,v), u0
inf
zZ
L(z, u, v) inf
zZ, g(z)0, h(z)=0
f(z)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 22 / 73
Duality Theory. The Dual Problem
Let
(u, v) inf
zZ
L(z, u, v) [, +] (8)
Lagrangian dual problem
sup
(u,v), u0
(u, v) (9)
Remarks
The (8) (Lagrangiang dual subproblem) is an unconstrained
optimization problem. Only points (u, v) with (u, v) > are
interesting
(u, v) always concave the (9) is concave, much easier to solve than
the primal (non convex in general)
Weak duality always holds:
sup
(u,v), u0
(u, v) inf
zZ, g(z)0, h(z)=0
f(z)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 23 / 73
Duality Theory. Duality Gap and Certicate of
Optimality
Let:
d

= max
(u,v), u0
(u, v)
J

= min
zZ, g(z)0, h(z)=0
f(z)
then
we always have d

is called optimal duality gap


Strong duality if d

= J

In case of strong duality u

and v

serve as certicate of optimality


Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 24 / 73
Duality Theory. Constraint Qualications: The Slaters
Condition
When do we have
Dual optimal value = Primal optimal value?
In general, strong duality does not hold, even for convex primal problems.
Constraint qualications. Conditions on the constraint functions
implying strong duality for convex problems.
Slaters condition (A well known constraint qualication )
Consider the primal problem. There exists z R
s
which belongs to the
relative interior of the problem domain Z, which is feasible
(g( z) 0, h( z) = 0) and for which g
j
( z) < 0 for all j for which g
j
is not
an ane function.
Other constraint qualications exist. Linear Independence, Cottles,
Zangwills, Kuhn-Tuckers constraint qualications.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 25 / 73
Duality Theory. The Slaters Theorem
When do we have
Dual optimal value = Primal optimal value?
Theorem (Slaters theorem)
Consider the primal problem and its dual problem. If the primal problem is
convex and Slaters condition holds then d

> and d

= J

.
Then
max
(u,v), u0
(u, v) = min
zZ, g(z)0, h(z)=0
f(z)
Slaters condition reduces to feasibility when all inequality constraints are
linear. Strong duality holds for convex QP and for LP (feasible)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 26 / 73
Certicate of Optimality
Be z a feasible point. f(z) is an upper bound on the cost, i.e.,
J

f(z).
Be (u, v) a dual feasible point, (u, v) is an lower bound on the cost,
i.e., (u, v) J

.
If z and (u, v) are primal and dual feasible, respectively, then
(u, v) J

f(z). Therefore z is -suboptimal, with equal to


primal-dual gap, i.e., = f(z) (u, v).
The optimal value of the primal (and dual) problems will lie in the same
interval
J

, d

[(u, v), f(z)].


(u, v) is a certicate that proves the (sub)optimality of z.
At iteration k, algorithms produce a primal feasible z
k
and a dual feasible
u
k
, v
k
with f(z
k
) (u
k
, v
k
) 0 as k , hence at iteration k we
know J

[(u
k
, v
k
), f(z
k
)] (useful stopping criteria)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 27 / 73
Complementary slackness
Suppose that z

, u

, v

are primal, dual feasible with zero duality gap (hence,


they are primal, dual optimal)
f(z

) = (u

, v

)
= inf
z
_
f(z) +u

g(z) +v

h(z)
_
f(z

) +u

g(z

) +v

h(z

)
hence we have

m
i=1
u

i
g
i
(z

) = 0 and so
u

i
g
i
(z

) = 0, i = 1, . . . , m
called complementary slackness condition
i-th constraint inactive at optimum = u
i
= 0
u

i
> 0 at optimum = i-th constraint active at optimum
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 28 / 73
KKT optimality conditions
Suppose
f, g
i
, h
i
are dierentiable
z

, u

, v

are (primal, dual) optimal, with zero duality gap


by complementary slackness we have
f(z

) +

i
u

i
g
i
(z

) +

j
v

j
h
j
(z

) =
min
z
_
_
f(z) +

i
u

i
g
i
(z) +

j
v

j
h
j
(z)
_
_
(10)
i.e., z

minimizes L(z, u

, v

) therefore
f(z

) +

i
u

i
g
i
(z

) +

j
v

j
h
j
(z

) = 0
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 29 / 73
KKT optimality conditions
z

, (u

, v

) of an optimization problem, with dierentiable cost and


constraints and zero duality gap, have to satisfy the following conditions:
0 = f(z

) +
m

i=1
u

i
g
i
(z

) +
p

j=1
v

j
h
i
(z

), (11a)
0 = u

i
g
i
(z

), i = 1, . . . , m (11b)
0 u

i
, i = 1, . . . , m (11c)
0 g
i
(z

), i = 1, . . . , m (11d)
0 = h
j
(z

) j = 1, . . . , p (11e)
Conditions (11a)-(11e) are called the Karush-Kuhn-Tucker (KKT) conditions.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 30 / 73
KKT optimality conditions
Consider the primal problem (5).
Theorem (Necessary and sucient condition)
Suppose that problem (5) is convex and that cost and constraints f, g
i
and h
i
are dierentiable at a feasible z

. If problem (5) satises Slaters condition


then z

is optimal if and only if there are (u

, v

) that, together with z

,
satisfy the KKT conditions.
Theorem (Necessary and sucient condition)
Let z

be a feasible solution and A = {i : g


i
(z

) = 0} be the set of active


constraints at z

. Suppose that f, g
i
are dierentiable at z

i and that h
i
are
continuously dierentiable at z

i. Further, suppose that g


i
(z

) for i A
and h
i
(z

) for i = 1, . . . , p, are linearly independent. If z

, (u

, v

) are
optimal, then they satisfy the KKT conditions. In addition, if problem (5) is
convex, then z

is optimal if and only if there are (u

, v

) that, together
with z

, satisfy the KKT conditions.


Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 31 / 73
KKT geometric interpretation
-f(z)
1
g(z)
1 2
g

(
z
)
1

0
g
(
z
)
3

0
g( ) z 0
g(z)
1 1
-f(z)
2
g(z)
2 2
z
2
z
1
g(z)
2 3
g
(
z
)
2

0
Rewrite the (11a), as
f(z) =

iI
u
i
g
i
(z), u
i
0,
i.e., the direction of cost steepest descent belongs to the convex cone spanned
by g
i
s,
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 32 / 73
KKT conditions. Example
KKT conditions under only the convexity assumptions are not
necessary Consider the problem:
min z
1
subj. to (z
1
1)
2
+ (z
2
1)
2
1
(z
1
1)
2
+ (z
2
+ 1)
2
1
(12)
(1,1)
(1,-1)
f(1,0)
g
1
(1,0)
g (1,0)
2
z
1
z
2
KKT are necessary if constraints qualication is satised:
z Z such that g( z) 0, h( z) = 0, g
j
( z) < 0 if g
j
is not ane and z intZ
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 33 / 73
Outline
1
Main concepts
2
Optimality conditions: Lagrange duality theory and KKT conditions
3
Polyhedra, polytopes and simplices
General Set Denitions and Operations
Polyhedra Denitions and Representations
Basic Operations on Polytopes
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 34 / 73
General Set Denitions and Operations
An n-dimensional ball B(x
0
, ) is the set
B(x
0
, ) = {x R
n
|
_
x x
0

2
}. x
0
and are the center and the
radius of the ball, respectively.
Ane sets are sets described by the solutions of a system of linear
equations:
F = {x R
n
: Ax = b, with A R
mn
, b R
m
}.
The ane combination of x
1
, . . . , x
k
is dened as the point

1
x
1
+. . . +
k
x
k
where

k
i=1

i
= 1.
The ane hull of K R
n
is the set of all ane combinations of points
in K and it is
a(K) = {
1
x
1
+. . . +
k
x
k
| x
i
K, i = 1, . . . , k,
k

i=1

i
= 1}
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 35 / 73
General Set Denitions and Operations
The dimension of an ane set is the dimension of the largest ball of
radius > 0 included in the set.
The convex combination of x
1
, . . . , x
k
is dened as the point

1
x
1
+. . . +
k
x
k
where

k
i=1

i
= 1 and
i
0, i = 1, . . . , k.
The convex hull of a set K R
n
is the set of all convex combinations
of points in K and it is denoted as conv(K):
conv(K) {
1
x
1
+. . . +
k
x
k
| x
i
K,
i
0, i = 1, . . . , k,
k

i=1

i
= 1}.
A cone spanned by a nite set of points K = {x
1
, . . . , x
k
} is dened as
cone(K) = {
k

i=1

i
x
i
,
i
0, i = 1, . . . , k}.
The Minkowski sum of two sets P, Q R
n
is dened as
P Q {x +y|x P, y Q}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 36 / 73
Polyhedra Denitions and Representations
An H-polyhedron P in R
n
denotes an intersection of a nite set of closed
halfspaces in R
n
:
P = {x R
n
: Ax b}
A two-dimensional H-polyhedron
a x b
1 1

a x b
3 3

a x b
4 4

a x b
2 2

a x b
5 5

Inequalities which can be removed without changing the polyhedron are called
redundant. The representation of an H-polyhedron is minimal if it does not
contain redundant inequalities.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 37 / 73
Polyhedra Denitions and Representations
A V-polyhedron P in R
n
denotes the Minkowski sum:
P = conv(V ) cone(Y )
for some V = [V
1
, . . . , V
k
] R
nk
, Y = [y
1
, . . . , y
k
] R
nk

.
Any H-polyhedron is a V-polyhedron.
An H-polytope (V-polytope) is a bounded H-polyhedron
(V-polyhedron). Any H-polytope is a V-polytope
The dimension of a polytope (polyhedron) P is the dimension of its
ane hull and is denoted by dim(P).
A polytope P R
n
, is full-dimensional if dim(P) = n or, equivalently,
if it is possible to t a non-empty n-dimensional ball in P
Otherwise, we say that polytope P is lower-dimensional.
If P
x
i

2
= 1, where P
x
i
denotes the i-th row of a matrix P
x
, we say that
the polytope P is normalized.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 38 / 73
Polyhedra Denitions and Representations
A linear inequality cz c
0
is said to be valid for P if it is satised for all
points z P.
A face of P is any nonempty set of the form
F = P {z R
s
| cz = c
0
}
where cz c
0
is a valid inequality for P.
The faces of dimension 0,1, dim(P)-2 and dim(P)-1 are called vertices,
edges, ridges, and facets, respectively.
A d-simplex is a polytope of R
d
with d + 1 vertices.
10 8 6 4 2 0 2 4 6 8 10
10
8
6
4
2
0
2
4
6
8
10
x
1
x
2
(a) V-representation.
10 8 6 4 2 0 2 4 6 8 10
10
8
6
4
2
0
2
4
6
8
10
x
1
x
2
(b) H-representation.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 39 / 73
Polytopal Complexes
A set C R
n
is called a P-collection (in R
n
) if it is a collection of a nite
number of n-dimensional polytopes, i.e.
C = {C
i
}
N
C
i=1
,
where C
i
:= {x R
n
: C
x
i
x C
c
i
}, dim(C
i
) = n, i = 1, . . . , N
C
, with N
C
< .
The underlying set of a P-collection C = {C
i
}
N
C
i=1
is the point set
C :=
_
PC
P =
N
C
_
i=1
C
i
.
Special Classes
A collection of sets {C
i
}
N
C
i=1
is a strict partition of a set C if (i)

N
C
i=1
C
i
= C and (ii) C
i
C
j
= , i = j.
{C
i
}
N
C
i=1
is a strict polyhedral partition of a polyhedral set C if {C
i
}
N
C
i=1
is a strict partition of C and

C
i
is a polyhedron for all i, where

C
i
denotes
the closure of the set C
i
A collection of sets {C
i
}
N
C
i=1
is a partition of a set C if (i)

N
C
i=1
C
i
= C
and (ii) (C
i
\C
i
) (C
j
\C
j
) = , i = j.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 40 / 73
Functions on Polytopal Complexes
A function h() : R
k
, where R
s
, is piecewise ane (PWA) if
there exists a strict partition R
1
,. . . ,R
N
of and h() = H
i
+k
i
,
R
i
, i = 1, . . . , N.
A function h() : R
k
, where R
s
, is piecewise ane on
polyhedra (PPWA) if there exists a strict polyhedral partition
R
1
,. . . ,R
N
of and h() = H
i
+k
i
, R
i
, i = 1, . . . , N.
A function h() : R, where R
s
, is piecewise quadratic
(PWQ) if there exists a strict partition R
1
,. . . ,R
N
of and
h() =

H
i
+k
i
+l
i
, R
i
, i = 1, . . . , N.
A function h() : R, where R
s
, is piecewise quadratic on
polyhedra (PPWQ) if there exists a strict polyhedral partition
R
1
,. . . ,R
N
of and h() =

H
i
+k
i
+l
i
, R
i
, i = 1, . . . , N.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 41 / 73
Basic Operations on Polytopes
Convex Hull of a set of points V = {V
i
}
N
V
i=1
, with V
i
R
n
,
conv(V ) = {x R
n
: x =
N
V

i=1

i
V
i
, 0
i
1,
N
V

i=1

i
= 1}. (13)
Used to switch from a V-representation of a polytope to an
H-representation.
Vertex Enumeration of a polytope P given in H-representation. (dual
of the convex hull operation)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 42 / 73
Basic Operations on Polytopes
Polytope reduction is the computation of the minimal representation
of a polytope. A polytope P R
n
, P = {x R
n
: P
x
x P
c
} is in a
minimal representation if the removal of any row in P
x
x P
c
would
change it (i.e., if there are no redundant constraints).
The Chebychev Ball of a polytope P = {x R
n
| P
x
x P
c
}, with
P
x
R
n
P
n
, P
c
R
n
P
, corresponds to the largest radius ball B(x
c
, R)
with center x
c
, such that B(x
c
, R) P.
10 8 6 4 2 0 2 4 6 8 10
10
8
6
4
2
0
2
4
6
8
10
x
1
x
2
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 43 / 73
Basic Operations on Polytopes
Projection Given a polytope
P = {[x

R
n+m
: P
x
x +P
y
y P
c
} R
n+m
the projection onto
the x-space R
n
is dened as
proj
x
(P) := {x R
n
| y R
m
: P
x
x +P
y
y P
c
}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 44 / 73
Basic Operations on Polytopes
Set-Dierence The set-dierence of two polytopes Y and R
0
R = Y \ R
0
:= {x R
n
: x Y, x / R
0
},
in general, can be a nonconvex and disconnected set and can be described
as a P-collection R =

m
i=1
R
i
, where Y =

m
i=1
R
i

(R
0

Y).
The P-collection R =

m
i=1
R
i
can be computed by consecutively
inverting the half-spaces dening R
0
as described in the following
Theorem
Let Y R
n
be a polyhedron, R
0
{x R
n
: Ax b}, and

R
0
{x Y : Ax b} = R
0

Y, where b R
m
, R
0
= and Ax b is a
minimal representation of R
0
. Also let
R
i
=
_
x Y :
A
i
x > b
i
A
j
x b
j
, j < i
_
i = 1, . . . , m
Let R

m
i=1
R
i
. Then, R is a P-collection and {

R
0
, R
1
, . . . , R
m
} is a strict
polyhedral partition of Y.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 45 / 73
Basic Operations on Polytopes
R
0
X
x
2
x
+
2
x

2
x
1
x
+
1
x

1
R
0
X
x
2
x
+
2
x

2
x
1
x
+
1
x

1
R
1
g
1
0g
1
0
R
0
X
x
2
x
+
2
x

2
x
1
x
+
1
x

1
R
1
R
2
g
1
0
g
2
0
R
0
X
x
2
x
+
2
x

2
x
1
x
+
1
x

1
R
1
R
2
R
3
R
4
R
5
Figure: Two dimensional example: partition of the rest of the space X\R
0
.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 46 / 73
Basic Operations on Polytopes
Pontryagin Dierence The Pontryagin dierence (also known as
Minkowski dierence) of two polytopes P and Q is a polytope
P Q := {x R
n
: x +q P, q Q}.
The Minkowski sum of two polytopes P and Q is a polytope
P Q := {x R
n
: y P, z Q, x = y +z}.
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
x
1
x
2
P Q
P
Q
(a) Pontryagin dierence
P Q.
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
x
1
x
2
P Q
P
Q
(b) Minkowski sum P Q.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 47 / 73
Minkowski Sum of Polytopes
The Minkowski sum is computationally expensive.
Consider
P = {y R
n
| P
y
y P
c
}, Q = {z R
n
| Q
z
z Q
c
},
it holds that
W = P Q
=
_
x R
n
| y P
y
y P
c
, z Q
z
z Q
c
, y, z R
n
, x = y +z
_
=
_
x R
n
| y R
n
, s.t. P
y
y P
c
, Q
z
(x y) Q
c
_
=
_
x R
n
| y R
n
, s.t.
_
0 P
y
Q
z
Q
z
_ _
x
y
_

_
P
c
Q
c
_
_
= proj
x
_
_
[x

] R
n+n
|
_
0 P
y
Q
z
Q
z
_ _
x
y
_

_
P
c
Q
c
_
_
_
.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 48 / 73
Pontryagin Dierence of Polytopes
The Pontryagin dierence is not computationally expensive.
Consider
P = {y R
n
s.t. , P
y
y P
b
}, Q = {z R
n
s.t. Q
z
z Q
b
},
Then:
P Q = {x R
n
s.t.P
y
x P
b
H(P
y
, Q)}
where the i-th element of H(P
y
, Q) is
H
i
(P
y
, Q) max
xQ
P
y
i
x
and P
y
i
is the i-th row of the matrix P
y
.
For special cases (e.g. when Q is a hypercube), more ecient
computational methods exist.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 49 / 73
Basic Operations on Polytopes
Note that (P Q) Q P.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x
1
x
2
P
Q
(c) Two polytopes P and Q.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x
1
x
2
P Q
P
(d) Polytope P and Pon-
tryagin dierence P Q.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x
1
x
2
P Q
(P Q) Q
(e) Polytope P Q and the
set (P Q) Q.
Figure: Illustration that (P Q) Q P.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 50 / 73
Ane Mappings and Polyhedra
Consider a polyhedron P = {x R
n
| P
x
x P
c
}, with P
x
R
n
P
n
and
an ane mapping f(z)
f : z R
n
Az +b, A R
nn
, b R
n
Dene the composition of P and f as the following polyhedron
P f {z R
m
| P
x
f(z) P
c
} = {z R
m
| P
x
Az P
c
P
x
b}
Useful for backward-reachability
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 51 / 73
Ane Mappings and Polyhedra
Consider a polyhedron P = {x R
n
| P
x
x P
c
}, with P
x
R
n
P
n
and
an ane mapping f(z)
f : z R
n
Az +b, A R
nn
, b R
n
Dene the composition of f and P as the following polyhedron
f P {y R
m
A
| y = Ax +b x R
n
, P
x
x P
c
}
The polyhedron f P in can be computed as follows. Write P in
V-representation P = conv(V ) and map the vertices V = {V
1
, . . . , V
k
}
through the transformation f. Because the transformation is ane, the
set f P is the convex hull of the transformed vertices
f P = conv(F), F = {AV
1
+b, . . . , AV
k
+b}.
If f is invertible x = A
1
y A
1
b and therefore
f P = {y R
m
A
| P
x
A
1
y P
c
+P
x
A
1
b}
Useful for forward-reachability
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 52 / 73
Outline
1
Main concepts
2
Optimality conditions: Lagrange duality theory and KKT conditions
3
Polyhedra, polytopes and simplices
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 53 / 73
Linear Programming
inf
z
c

z
subj. to Gz W
where z R
s
.
Convex optimization problems.
Other common forms:
inf
z
c

z
subj. to Gz W
G
eq
z = W
eq
or
inf
z
c

z
subj. to G
eq
z = W
eq
z 0
Always possible to convert one of the three forms into the other.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 54 / 73
Graphical Interpretation and Solutions Properties
Let P be the feasible set. P is a polyhedron.
If P is empty, then the problem is infeasible.
Denote by J

the optimal value and by Z

the set of optimizers


Z

= argmin
zP
c

z
Case 1. The LP solution is unbounded, i.e., J

= .
Case 2. The LP solution is bounded, i.e., J

> and the optimizer is unique.


Z

is a singleton.
Case 3. The LP solution is bounded and there are multiple optima. Z

is an
uncountable subset of R
s
which can be bounded or unbounded.
cz = k
4
cz = k
1
P
(a) Case 1
cz = k
4
cz = k
3
cz = k
1
z

P
(b) Case 2
cz = k
4
cz = k
1
z

P
(c) Case 3
Figure: Graphical Interpretation of the Linear Program Solution, k
i
< k
i1
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 55 / 73
Dual of LP
Consider the LP
inf
z
c

z
subj. to Gz W
(14)
with z R
s
and G R
ms
. The Lagrange function is
L(z, u) = c

z +u

(Gz W).
The dual cost is
(u) = inf
z
L(z, u) = inf
z
(c

+u

G)z u

W =
_
u

W if G

u = c
if G

u = c
Since we are interested only in cases where is nite,
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 56 / 73
sup
u
u

W
subj. to G

u = c
u 0
inf
u
W

u
subj. to G

u = c
u 0
KKT condition for LP
The KKT conditions for the LP (14) become
G

u = c, (15a)
(G
j
z W
j
)u
j
= 0, (15b)
u 0, (15c)
Gz W (15d)
which are: primal feasibility (15d), dual feasibility (15a), (15c) and slackness
complementary conditions (15b).
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 57 / 73
Active Constraints and Degeneracies
Be J {1, . . . , m} the set of constraint indices and consider the sets
A(z) {j J : G
j
z = W
j
}, of active constraints at feasible z
NA(z) {j J : G
j
z < W
j
}, of inactive constraints at feasible z.
C 1. A(z

) undened, since z

is undened
C 2. The cardinality of A(z

) can be between s and m. Higher than s


regardless whether P is minimal or not (see gure below)
C 3. If P is minimal, then the cardinality of A(z

) is s p with p the dimension


of the optimal face at all points z

contained in the relative interior


of the optimal face, otherwise it can be any number p with 1 < p m.
P
z*
1
2
3
4
5
6
Figure: Primal Degeneracy in a Linear Program
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 58 / 73
Active Constraints and Degeneracies
Denition (Primal degenerate)
The LP is said to be primal degenerate if there exists a z

such that the


number of active constraints at z

is greater than the number of variables s.


Denition (Dual degenerate)
The LP is said to be dual degenerate if its dual problem is primal degenerate.
Homework. Show that if the primal problem has multiple optima, then the
dual problem is primal degenerate (i.e., the primal problem is dual
degenerate). The converse is not always true. Details in the book.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 59 / 73
Convex Piecewise Linear Optimization
Consider
J

= min
z
J(z)
subj. to Gz W
(16)
where the cost function has the form
J(z) = max
i=1,...,k
{c
i
z +d
i
} (17)
where c
i
R
s
and d
i
R.
0 1 2 3 4 5 6 7
0
1
2
3
4
5
x
P
1
f
x
(
)
f
1
( ) x
f
2
( ) x
f
3
( ) x
f
4
( ) x
P
2
P
3
P
4
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 60 / 73
Convex Piecewise Linear Optimization
The cost function J(z) in (17) is a convex PWA function. The optimization
problem (16)-(17) can be solved by the following linear program:
J

= min
z,

subj. to Gz W
c
i
z +d
i
, i = 1, . . . , k
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 61 / 73
Convex Piecewise Linear Optimization
Consider
J

= min
z
J
1
(z
1
) +J
2
(z
2
)
subj. to G
1
z
1
+G
2
z
2
W
(18)
where the cost function has the form
J
1
(z
1
) = max
i=1,...,k
{c
i
z
1
+d
i
}
J
2
(z
2
) = max
i=1,...,j
{m
i
z
2
+n
i
}
(19)
The optimization problem (18)-(19) can be solved by the following linear
program:
J

= min
z,
1
,
2

1
+
2
subj. to G
1
z
1
+G
2
z
2
W
c
i
z
1
+d
i

1
, i = 1, . . . , k
m
i
z
2
+n
i

2
, i = 1, . . . , j
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 62 / 73
Example
The optimization problem:
min
z
1
,z
2
f(x, u) = min |z
1
+ 5| +|z
2
3|
subject to 2.5 z
1
5
1 z
2
1
can be solved in Matlab by using v=linprog(f,A,b) where v = [

1
,

2
, z

1
, z

2
],
f = [1 1 0 0],
A =
_

_
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
1 0 1 0
1 0 1 0
0 10 0 1
0 10 0 1
_

_
, b =
_

_
5
2.5
1
1
5
5
3
3
_

_
Solution is v = [7.5 0.2 2.5 1]
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 63 / 73
Quadratic Programming
min
z
1
2
z

Hz +q

z +r
subj. to Gz W
(20)
where z R
s
, H = H

> 0 R
ss
.
Other QP forms often include equality and inequality constraints.
Let P be the feasible set. Two cases can occur if P is not empty:
Case 1. The optimizer lies strictly inside the feasible polyhedron
Case 2. The optimizer lies on the boundary of the feasible polyhedron
z'Hz+qz+r=k
i
P
z*
(a) Case 1
z'Hz+qz+r=k
i
P
z*
(b) Case 2
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 64 / 73
Dual of QP
min
z
1
2
z

Hz +q

z
subj. to Gz W
The Lagrange function is
L(z, u) = {
1
2
z

Hz +q

z +u

(Gz W)}
The the dual cost is
(u) = min
z
{
1
2
z

Hz +q

z +u

(Gz W)} (21)


and the dual problem is
max
u0
min
z
{
1
2
z

Hz +q

z +u

(Gz W)}.
For a given u the Lagrange function
1
2
z

Hz +q

z +u

(Gz W) is convex.
Therefore it is necessary and sucient for optimality that the gradient is zero
Hz +q +G

u = 0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 65 / 73
Dual of QP
From the previous equation we can derive z = H
1
(q +G

u) and
substituting this in equation (21) we obtain:
(u) =
1
2
u

(GH
1
G

)u u

(W +GH
1
q)
1
2
q

H
1
q (22)
By using (22) the dual problem can be rewritten as:
min
u
1
2
u

(GH
1
G

)u +u

(W +GH
1
q) +
1
2
q

H
1
q
subj. to u 0
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 66 / 73
KKT condition for QP
Consider the QP (20), f(z) = Hz +q, g
i
(z) = G

i
z W
i
, g
i
(z) = G
i
. The
KKT conditions become
Hz +q +G

u = 0 (23a)
u
i
(G

i
z W
i
) = 0 (23b)
u 0 (23c)
Gz W 0 (23d)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 67 / 73
Active Constraints and Degeneracies
Let J {1, . . . , m} be the set of constraint indices. Consider the set of active
and inactive constraints at feasible z:
A(z) {j J : G
j
z = W
j
}
NA(z) {j J : G
j
z < W
j
}.
We have two cases:
Case 1. A(z

) = {}.
Case 2. A(z

) is a nonempty subset of {1, . . . , m}.


The QP is said to be primal degenerate if there exists a z

such that
the number of active constraints at z

is greater than the number of


variables s.
The QP is said to be dual degenerate if its dual problem is primal
degenerate.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 68 / 73
Constrained Least-Squares Problems
The problem of minimizing the convex quadratic function arises in many elds
and has many names, e.g., linear regression or least-squares approximation.
Az b
2
2
= z

Az 2b

Az +b

b
The minimizer is
z

= (A

A)
1
A

b A

b
When linear inequality constraints are added, the problem is called
constrained linear regression or constrained least-squares, and there is no
longer a simple analytical solution. As an example we can consider regression
with lower and upper bounds on the variables, i.e.,
min
z
Az b
2
2
subj. to l
i
z
i
u
i
, i = 1, . . . , n,
which is a QP.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 69 / 73
Example
Consider
min
z
Az b
2
2
where
A =
_
_
0.7513 0.5472 0.8143
0.2551 0.1386 0.2435
0.5060 0.1493 0.9293
0.6991 0.2575 0.3500
0.8909 0.8407 0.1966
0.9593 0.2543 0.2511
_
_
, b =
_
_
0.6160
0.4733
0.3517
0.8308
0.5853
0.5497
_
_
Unconstrained Least-Squares: in Matlab z = A\b or
z=quadprog(A

A,b

A). z

= [0.7166, 0.0205, 0.1180],


Assume z
2
0. Constrained Least-Squares in Matlab:
z=quadprog(A

A,b

A,[0 -1 0],0 ). z

= [0.7045, 0, 0.1194].
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 70 / 73
Nonlinear Programming
Consider
min
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
A variety of softwares exists
In general, global optimality not guaranteed
Solutions are usually computed by recursive algorithms which start from
an initial guess z
0
and at step k generate a point z
k
such that
{f(z
k
)}
k=0,1,2,...
converges to J

.
These algorithms recursively use and/or solve analytical conditions for
optimality
In this class we will use NPSOL
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 71 / 73
Nonlinear Programming - NPSOL
Possible syntax:
[INFORM,ITER,ISTATE,C,CJAC,CLAMDA,OBJF,OBJGRAD,R,X] =
npsol(X0,A,L,U,funobj,funcon,OPTION);
Solving
min
z
funobj(z)
subj. to L
_
_
z
Az
funcon(z)
_
_
U
NPSOL Manual and Example on bSpace
Note it is a mex-function
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 72 / 73
Nonlinear Programming - Matlab Optimization toolbox
Possible syntax:
[X,FVAL,EXITFLAG] =
fmincon(funobj,X0,A,B,Aeq,Beq,LB,UB,NONLCON);
Solving
min
z
funobj(z)
subj. to A z B
Aeq z = Beq
LB z UB
Ceq(z) = 0
C(z) 0
The function NONLCON accepts z and returns the vectors C(z) and
Ceq(z)
help fmincon in Matlab
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 73 / 73

You might also like