1 Basics Optimization
1 Basics Optimization
Hybrid Systems.
Basics on Optimization
Francesco Borrelli
Department of Mechanical Engineering,
University of California at Berkeley,
USA
[email protected]
February 3, 2011
References
From my Book:
Main Concepts (Chapter 2)
Optimality Conditions: Lagrange duality theory and KKT Conditions
(Chapter 3)
Polyhedra, Polytopes and Simplices (Chapter 4)
Linear and Quadratic Programming (Chapter 5)
Note: The following notes have been extracted from
Stephen Boyds lecture notes for his course on convex optimization. The
full notes can be downloaded from http://www.stanford.edu/~boyd
Nonlinear Programming Theory and Algorithms by Bazaraa, Sherali,
Shetty
LMIs in Control by Scherer and Weiland
Lectures on Polytopes by Ziegler
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 2 / 73
Outline
1
Main concepts
Optimization problems
Continuous problems
Integer and Mixed-Integer Problems
Convexity
2
Optimality conditions: Lagrange duality theory and KKT conditions
3
Polyhedra, polytopes and simplices
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 3 / 73
Abstract optimization problems
Optimization problems are abundant in economical daily life (wherever a
decision problem is)
General abstract features of problem formulation:
Decision set Z
inf
zS
f(z) (3)
J
, z S
AND
1
z S : f( z) = J
OR
2
> 0 z S| f(z) J
+
Compute the optimizer, z
S withf(z
) = J
. If z
exists, then
rewrite (3) as
J
= min
zS
f(z) (4)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 6 / 73
Concrete Optimization Problems
Consider the Nonlinear Program (NLP)
J
= min
zS
f(z)
Notation:
If J
= +).
If S = Z the problem is said to be unconstrained.
The set of all optimal solutions is denoted by
argmin
zS
f(z) {z S : f(z) = J
}
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 7 / 73
Continuous problems
Consider the problem
inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
(5)
The domain Z of the problem (5) is a subset of R
s
(the
nite-dimensional Euclidian vector-space), dened as:
Z = {z R
s
: z dom f, z dom g
i
, i = 1, . . . , m,
z dom h
i
, i = 1, . . . , p}
A point z R
s
is feasible for problem (5) if: z Z
and g
i
( z) 0 for i = 1, . . . , m, h
i
( z) = 0 for i = 1, . . . , p
The set of feasible vectors is
S = {z R
s
: z Z, g
i
(z) 0, i = 1, . . . , m, h
i
(z) = 0,
i = 1, . . . , p}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 8 / 73
Local and Global Optimizer
Let J
with f(z
) = J
.
A feasible point z is a local optimizer for problem (5) if there exists an
R > 0 such that
f( z) = inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z z R
z Z
(6)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 9 / 73
Active, Inactive and Redundant Constraints
Consider the problem
inf
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
The i-th inequality constraint g
i
(z) 0 is active at z if g
i
( z) = 0,
otherwise is inactive
Equality constraints are always active for all feasible points.
Removing a redundant constraint does not change the feasible set S,
this implies that removing a redundant constraint from the optimization
problem does not change its solution.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 10 / 73
Problem Description
The functions f, g
i
and h
i
can be available in analytical form or can be
described through an oracle model (also called black box or
subroutine model).
In an oracle model f, g
i
and h
i
are not known explicitly but can be
evaluated by querying the oracle. Often the oracle consists of subroutines
which, called with the augment z, return f(z), g
i
(z) and h
i
(z) and their
gradients f(z), g
i
(z), h
i
(z).
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 11 / 73
Integer and Mixed-Integer Problems
If the decision set Z in the optimization problem is nite, then the
optimization problem is called combinatorial or discrete.
If Z {0, 1}
s
, then the problem is said to be integer.
If Z is a subset of the Cartesian product of an integer set and a real
Euclidian space, i.e., Z {[z
c
, z
b
] : z
c
R
s
c
, z
b
{0, 1}
s
b
}, then the
problem is said to be mixed-integer.
The standard formulation of a mixed-integer non-linear program is
inf
[z
c
,z
b
]
f(z
c
, z
b
)
subj. to g
i
(z
c
, z
b
) 0 for i = 1, . . . , m
h
i
(z
c
, z
b
) = 0 for i = 1, . . . , p
z
c
R
s
c
, z
b
{0, 1}
s
b
[z
c
, z
b
] Z
(7)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 12 / 73
Convexity
A set S R
s
is convex if
z
1
+ (1 )z
2
S for all z
1
, z
2
S, [0, 1].
A function f : S R is convex if S is convex and
f(z
1
+ (1 )z
2
) f(z
1
) + (1 )f(z
2
)
for all z
1
, z
2
S, [0, 1].
A function f : S R is strictly convex if S is convex and
f(z
1
+ (1 )z
2
) < f(z
1
) + (1 )f(z
2
)
for all z
1
, z
2
S, (0, 1).
A function f : S R is concave if S is convex and f is convex.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 13 / 73
Operations preserving convexity
1
The intersection of an arbitrary number of convex sets is a convex set:
if S
n
is convex n N
+
then
nN
+
S
n
is convex.
The empty set is convex.
2
The sub-level sets of a convex function f on S are convex:
if f(z) is convex then S
{z S : f(z) } is convex .
3
f
1
, . . . , f
N
convex
N
i=1
i
f
i
is convex function
i
0, i = 1, . . . , N.
4
The composition of a convex function f(z) with an ane map z = Ax +b
generates a convex function f(Ax +b) of x.
5
A linear function f(z) = c
Qz + 2s
.
These algorithms recursively use and/or solve analytical conditions for
optimality
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 18 / 73
Optimality Conditions for Unconstrained
Optimization Problems
Theorem (Necessary condition*)
f : R
s
R dierentiable at z. If there exists a vector d such that
f( z)
d < 0, then there exists a > 0 such that f( z +d) < f( z) for all
(0, ).
The vector d in the theorem above is called descent direction.
The direction of steepest descent d
s
at z is dened as the normalized
direction where f( z)
d
s
< 0 is minimized.
The direction d
s
of steepest descent is d
s
=
f( z)
f( z)
.
Corollary
f : R
s
R is dierentiable at z. If z is a local minimizer, then f( z) = 0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 19 / 73
Optimality Conditions for Unconstrained
Optimization Problems
Theorem (Sucient condition*)
Suppose that f : R
s
R is twice dierentiable at z. If f( z) = 0 and the
Hessian of f(z) at z is positive denite, then z is a local minimizer.
Theorem (Necessary and sucient condition*)
Suppose that f : R
s
R is dierentiable at z. If f is convex, then z is a global
minimizer if and only if f( z) = 0.
g(z) +v
h(z)
u
i
and v
i
called Lagrange multipliers or dual variables
objective is augmented with weighted sum of constraint functions
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 21 / 73
Duality Theory. The Lagrange Function
Consider Lagrange function
L(z, u, v) f(z) +u
g(z) +v
h(z)
Let z S be feasible. For arbitrary vectors u 0 and v trivially have
L(z, u, v) f(z)
After inmization we infer
inf
zZ
L(z, u, v) inf
zZ, g(z)0, h(z)=0
f(z)
Best lower bound: since u 0 and v arbitrary
sup
(u,v), u0
inf
zZ
L(z, u, v) inf
zZ, g(z)0, h(z)=0
f(z)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 22 / 73
Duality Theory. The Dual Problem
Let
(u, v) inf
zZ
L(z, u, v) [, +] (8)
Lagrangian dual problem
sup
(u,v), u0
(u, v) (9)
Remarks
The (8) (Lagrangiang dual subproblem) is an unconstrained
optimization problem. Only points (u, v) with (u, v) > are
interesting
(u, v) always concave the (9) is concave, much easier to solve than
the primal (non convex in general)
Weak duality always holds:
sup
(u,v), u0
(u, v) inf
zZ, g(z)0, h(z)=0
f(z)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 23 / 73
Duality Theory. Duality Gap and Certicate of
Optimality
Let:
d
= max
(u,v), u0
(u, v)
J
= min
zZ, g(z)0, h(z)=0
f(z)
then
we always have d
= J
and v
> and d
= J
.
Then
max
(u,v), u0
(u, v) = min
zZ, g(z)0, h(z)=0
f(z)
Slaters condition reduces to feasibility when all inequality constraints are
linear. Strong duality holds for convex QP and for LP (feasible)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 26 / 73
Certicate of Optimality
Be z a feasible point. f(z) is an upper bound on the cost, i.e.,
J
f(z).
Be (u, v) a dual feasible point, (u, v) is an lower bound on the cost,
i.e., (u, v) J
.
If z and (u, v) are primal and dual feasible, respectively, then
(u, v) J
, d
[(u
k
, v
k
), f(z
k
)] (useful stopping criteria)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 27 / 73
Complementary slackness
Suppose that z
, u
, v
) = (u
, v
)
= inf
z
_
f(z) +u
g(z) +v
h(z)
_
f(z
) +u
g(z
) +v
h(z
)
hence we have
m
i=1
u
i
g
i
(z
) = 0 and so
u
i
g
i
(z
) = 0, i = 1, . . . , m
called complementary slackness condition
i-th constraint inactive at optimum = u
i
= 0
u
i
> 0 at optimum = i-th constraint active at optimum
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 28 / 73
KKT optimality conditions
Suppose
f, g
i
, h
i
are dierentiable
z
, u
, v
) +
i
u
i
g
i
(z
) +
j
v
j
h
j
(z
) =
min
z
_
_
f(z) +
i
u
i
g
i
(z) +
j
v
j
h
j
(z)
_
_
(10)
i.e., z
minimizes L(z, u
, v
) therefore
f(z
) +
i
u
i
g
i
(z
) +
j
v
j
h
j
(z
) = 0
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 29 / 73
KKT optimality conditions
z
, (u
, v
) +
m
i=1
u
i
g
i
(z
) +
p
j=1
v
j
h
i
(z
), (11a)
0 = u
i
g
i
(z
), i = 1, . . . , m (11b)
0 u
i
, i = 1, . . . , m (11c)
0 g
i
(z
), i = 1, . . . , m (11d)
0 = h
j
(z
) j = 1, . . . , p (11e)
Conditions (11a)-(11e) are called the Karush-Kuhn-Tucker (KKT) conditions.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 30 / 73
KKT optimality conditions
Consider the primal problem (5).
Theorem (Necessary and sucient condition)
Suppose that problem (5) is convex and that cost and constraints f, g
i
and h
i
are dierentiable at a feasible z
, v
,
satisfy the KKT conditions.
Theorem (Necessary and sucient condition)
Let z
. Suppose that f, g
i
are dierentiable at z
i and that h
i
are
continuously dierentiable at z
) for i A
and h
i
(z
, (u
, v
) are
optimal, then they satisfy the KKT conditions. In addition, if problem (5) is
convex, then z
, v
) that, together
with z
(
z
)
1
0
g
(
z
)
3
0
g( ) z 0
g(z)
1 1
-f(z)
2
g(z)
2 2
z
2
z
1
g(z)
2 3
g
(
z
)
2
0
Rewrite the (11a), as
f(z) =
iI
u
i
g
i
(z), u
i
0,
i.e., the direction of cost steepest descent belongs to the convex cone spanned
by g
i
s,
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 32 / 73
KKT conditions. Example
KKT conditions under only the convexity assumptions are not
necessary Consider the problem:
min z
1
subj. to (z
1
1)
2
+ (z
2
1)
2
1
(z
1
1)
2
+ (z
2
+ 1)
2
1
(12)
(1,1)
(1,-1)
f(1,0)
g
1
(1,0)
g (1,0)
2
z
1
z
2
KKT are necessary if constraints qualication is satised:
z Z such that g( z) 0, h( z) = 0, g
j
( z) < 0 if g
j
is not ane and z intZ
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 33 / 73
Outline
1
Main concepts
2
Optimality conditions: Lagrange duality theory and KKT conditions
3
Polyhedra, polytopes and simplices
General Set Denitions and Operations
Polyhedra Denitions and Representations
Basic Operations on Polytopes
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 34 / 73
General Set Denitions and Operations
An n-dimensional ball B(x
0
, ) is the set
B(x
0
, ) = {x R
n
|
_
x x
0
2
}. x
0
and are the center and the
radius of the ball, respectively.
Ane sets are sets described by the solutions of a system of linear
equations:
F = {x R
n
: Ax = b, with A R
mn
, b R
m
}.
The ane combination of x
1
, . . . , x
k
is dened as the point
1
x
1
+. . . +
k
x
k
where
k
i=1
i
= 1.
The ane hull of K R
n
is the set of all ane combinations of points
in K and it is
a(K) = {
1
x
1
+. . . +
k
x
k
| x
i
K, i = 1, . . . , k,
k
i=1
i
= 1}
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 35 / 73
General Set Denitions and Operations
The dimension of an ane set is the dimension of the largest ball of
radius > 0 included in the set.
The convex combination of x
1
, . . . , x
k
is dened as the point
1
x
1
+. . . +
k
x
k
where
k
i=1
i
= 1 and
i
0, i = 1, . . . , k.
The convex hull of a set K R
n
is the set of all convex combinations
of points in K and it is denoted as conv(K):
conv(K) {
1
x
1
+. . . +
k
x
k
| x
i
K,
i
0, i = 1, . . . , k,
k
i=1
i
= 1}.
A cone spanned by a nite set of points K = {x
1
, . . . , x
k
} is dened as
cone(K) = {
k
i=1
i
x
i
,
i
0, i = 1, . . . , k}.
The Minkowski sum of two sets P, Q R
n
is dened as
P Q {x +y|x P, y Q}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 36 / 73
Polyhedra Denitions and Representations
An H-polyhedron P in R
n
denotes an intersection of a nite set of closed
halfspaces in R
n
:
P = {x R
n
: Ax b}
A two-dimensional H-polyhedron
a x b
1 1
a x b
3 3
a x b
4 4
a x b
2 2
a x b
5 5
Inequalities which can be removed without changing the polyhedron are called
redundant. The representation of an H-polyhedron is minimal if it does not
contain redundant inequalities.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 37 / 73
Polyhedra Denitions and Representations
A V-polyhedron P in R
n
denotes the Minkowski sum:
P = conv(V ) cone(Y )
for some V = [V
1
, . . . , V
k
] R
nk
, Y = [y
1
, . . . , y
k
] R
nk
.
Any H-polyhedron is a V-polyhedron.
An H-polytope (V-polytope) is a bounded H-polyhedron
(V-polyhedron). Any H-polytope is a V-polytope
The dimension of a polytope (polyhedron) P is the dimension of its
ane hull and is denoted by dim(P).
A polytope P R
n
, is full-dimensional if dim(P) = n or, equivalently,
if it is possible to t a non-empty n-dimensional ball in P
Otherwise, we say that polytope P is lower-dimensional.
If P
x
i
2
= 1, where P
x
i
denotes the i-th row of a matrix P
x
, we say that
the polytope P is normalized.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 38 / 73
Polyhedra Denitions and Representations
A linear inequality cz c
0
is said to be valid for P if it is satised for all
points z P.
A face of P is any nonempty set of the form
F = P {z R
s
| cz = c
0
}
where cz c
0
is a valid inequality for P.
The faces of dimension 0,1, dim(P)-2 and dim(P)-1 are called vertices,
edges, ridges, and facets, respectively.
A d-simplex is a polytope of R
d
with d + 1 vertices.
10 8 6 4 2 0 2 4 6 8 10
10
8
6
4
2
0
2
4
6
8
10
x
1
x
2
(a) V-representation.
10 8 6 4 2 0 2 4 6 8 10
10
8
6
4
2
0
2
4
6
8
10
x
1
x
2
(b) H-representation.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 39 / 73
Polytopal Complexes
A set C R
n
is called a P-collection (in R
n
) if it is a collection of a nite
number of n-dimensional polytopes, i.e.
C = {C
i
}
N
C
i=1
,
where C
i
:= {x R
n
: C
x
i
x C
c
i
}, dim(C
i
) = n, i = 1, . . . , N
C
, with N
C
< .
The underlying set of a P-collection C = {C
i
}
N
C
i=1
is the point set
C :=
_
PC
P =
N
C
_
i=1
C
i
.
Special Classes
A collection of sets {C
i
}
N
C
i=1
is a strict partition of a set C if (i)
N
C
i=1
C
i
= C and (ii) C
i
C
j
= , i = j.
{C
i
}
N
C
i=1
is a strict polyhedral partition of a polyhedral set C if {C
i
}
N
C
i=1
is a strict partition of C and
C
i
is a polyhedron for all i, where
C
i
denotes
the closure of the set C
i
A collection of sets {C
i
}
N
C
i=1
is a partition of a set C if (i)
N
C
i=1
C
i
= C
and (ii) (C
i
\C
i
) (C
j
\C
j
) = , i = j.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 40 / 73
Functions on Polytopal Complexes
A function h() : R
k
, where R
s
, is piecewise ane (PWA) if
there exists a strict partition R
1
,. . . ,R
N
of and h() = H
i
+k
i
,
R
i
, i = 1, . . . , N.
A function h() : R
k
, where R
s
, is piecewise ane on
polyhedra (PPWA) if there exists a strict polyhedral partition
R
1
,. . . ,R
N
of and h() = H
i
+k
i
, R
i
, i = 1, . . . , N.
A function h() : R, where R
s
, is piecewise quadratic
(PWQ) if there exists a strict partition R
1
,. . . ,R
N
of and
h() =
H
i
+k
i
+l
i
, R
i
, i = 1, . . . , N.
A function h() : R, where R
s
, is piecewise quadratic on
polyhedra (PPWQ) if there exists a strict polyhedral partition
R
1
,. . . ,R
N
of and h() =
H
i
+k
i
+l
i
, R
i
, i = 1, . . . , N.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 41 / 73
Basic Operations on Polytopes
Convex Hull of a set of points V = {V
i
}
N
V
i=1
, with V
i
R
n
,
conv(V ) = {x R
n
: x =
N
V
i=1
i
V
i
, 0
i
1,
N
V
i=1
i
= 1}. (13)
Used to switch from a V-representation of a polytope to an
H-representation.
Vertex Enumeration of a polytope P given in H-representation. (dual
of the convex hull operation)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 42 / 73
Basic Operations on Polytopes
Polytope reduction is the computation of the minimal representation
of a polytope. A polytope P R
n
, P = {x R
n
: P
x
x P
c
} is in a
minimal representation if the removal of any row in P
x
x P
c
would
change it (i.e., if there are no redundant constraints).
The Chebychev Ball of a polytope P = {x R
n
| P
x
x P
c
}, with
P
x
R
n
P
n
, P
c
R
n
P
, corresponds to the largest radius ball B(x
c
, R)
with center x
c
, such that B(x
c
, R) P.
10 8 6 4 2 0 2 4 6 8 10
10
8
6
4
2
0
2
4
6
8
10
x
1
x
2
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 43 / 73
Basic Operations on Polytopes
Projection Given a polytope
P = {[x
R
n+m
: P
x
x +P
y
y P
c
} R
n+m
the projection onto
the x-space R
n
is dened as
proj
x
(P) := {x R
n
| y R
m
: P
x
x +P
y
y P
c
}.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 44 / 73
Basic Operations on Polytopes
Set-Dierence The set-dierence of two polytopes Y and R
0
R = Y \ R
0
:= {x R
n
: x Y, x / R
0
},
in general, can be a nonconvex and disconnected set and can be described
as a P-collection R =
m
i=1
R
i
, where Y =
m
i=1
R
i
(R
0
Y).
The P-collection R =
m
i=1
R
i
can be computed by consecutively
inverting the half-spaces dening R
0
as described in the following
Theorem
Let Y R
n
be a polyhedron, R
0
{x R
n
: Ax b}, and
R
0
{x Y : Ax b} = R
0
Y, where b R
m
, R
0
= and Ax b is a
minimal representation of R
0
. Also let
R
i
=
_
x Y :
A
i
x > b
i
A
j
x b
j
, j < i
_
i = 1, . . . , m
Let R
m
i=1
R
i
. Then, R is a P-collection and {
R
0
, R
1
, . . . , R
m
} is a strict
polyhedral partition of Y.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 45 / 73
Basic Operations on Polytopes
R
0
X
x
2
x
+
2
x
2
x
1
x
+
1
x
1
R
0
X
x
2
x
+
2
x
2
x
1
x
+
1
x
1
R
1
g
1
0g
1
0
R
0
X
x
2
x
+
2
x
2
x
1
x
+
1
x
1
R
1
R
2
g
1
0
g
2
0
R
0
X
x
2
x
+
2
x
2
x
1
x
+
1
x
1
R
1
R
2
R
3
R
4
R
5
Figure: Two dimensional example: partition of the rest of the space X\R
0
.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 46 / 73
Basic Operations on Polytopes
Pontryagin Dierence The Pontryagin dierence (also known as
Minkowski dierence) of two polytopes P and Q is a polytope
P Q := {x R
n
: x +q P, q Q}.
The Minkowski sum of two polytopes P and Q is a polytope
P Q := {x R
n
: y P, z Q, x = y +z}.
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
x
1
x
2
P Q
P
Q
(a) Pontryagin dierence
P Q.
5 4 3 2 1 0 1 2 3 4 5
5
4
3
2
1
0
1
2
3
4
5
x
1
x
2
P Q
P
Q
(b) Minkowski sum P Q.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 47 / 73
Minkowski Sum of Polytopes
The Minkowski sum is computationally expensive.
Consider
P = {y R
n
| P
y
y P
c
}, Q = {z R
n
| Q
z
z Q
c
},
it holds that
W = P Q
=
_
x R
n
| y P
y
y P
c
, z Q
z
z Q
c
, y, z R
n
, x = y +z
_
=
_
x R
n
| y R
n
, s.t. P
y
y P
c
, Q
z
(x y) Q
c
_
=
_
x R
n
| y R
n
, s.t.
_
0 P
y
Q
z
Q
z
_ _
x
y
_
_
P
c
Q
c
_
_
= proj
x
_
_
[x
] R
n+n
|
_
0 P
y
Q
z
Q
z
_ _
x
y
_
_
P
c
Q
c
_
_
_
.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 48 / 73
Pontryagin Dierence of Polytopes
The Pontryagin dierence is not computationally expensive.
Consider
P = {y R
n
s.t. , P
y
y P
b
}, Q = {z R
n
s.t. Q
z
z Q
b
},
Then:
P Q = {x R
n
s.t.P
y
x P
b
H(P
y
, Q)}
where the i-th element of H(P
y
, Q) is
H
i
(P
y
, Q) max
xQ
P
y
i
x
and P
y
i
is the i-th row of the matrix P
y
.
For special cases (e.g. when Q is a hypercube), more ecient
computational methods exist.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 49 / 73
Basic Operations on Polytopes
Note that (P Q) Q P.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x
1
x
2
P
Q
(c) Two polytopes P and Q.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x
1
x
2
P Q
P
(d) Polytope P and Pon-
tryagin dierence P Q.
2 1.5 1 0.5 0 0.5 1 1.5 2
2
1.5
1
0.5
0
0.5
1
1.5
2
x
1
x
2
P Q
(P Q) Q
(e) Polytope P Q and the
set (P Q) Q.
Figure: Illustration that (P Q) Q P.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 50 / 73
Ane Mappings and Polyhedra
Consider a polyhedron P = {x R
n
| P
x
x P
c
}, with P
x
R
n
P
n
and
an ane mapping f(z)
f : z R
n
Az +b, A R
nn
, b R
n
Dene the composition of P and f as the following polyhedron
P f {z R
m
| P
x
f(z) P
c
} = {z R
m
| P
x
Az P
c
P
x
b}
Useful for backward-reachability
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 51 / 73
Ane Mappings and Polyhedra
Consider a polyhedron P = {x R
n
| P
x
x P
c
}, with P
x
R
n
P
n
and
an ane mapping f(z)
f : z R
n
Az +b, A R
nn
, b R
n
Dene the composition of f and P as the following polyhedron
f P {y R
m
A
| y = Ax +b x R
n
, P
x
x P
c
}
The polyhedron f P in can be computed as follows. Write P in
V-representation P = conv(V ) and map the vertices V = {V
1
, . . . , V
k
}
through the transformation f. Because the transformation is ane, the
set f P is the convex hull of the transformed vertices
f P = conv(F), F = {AV
1
+b, . . . , AV
k
+b}.
If f is invertible x = A
1
y A
1
b and therefore
f P = {y R
m
A
| P
x
A
1
y P
c
+P
x
A
1
b}
Useful for forward-reachability
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 52 / 73
Outline
1
Main concepts
2
Optimality conditions: Lagrange duality theory and KKT conditions
3
Polyhedra, polytopes and simplices
4
Linear and quadratic programming
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 53 / 73
Linear Programming
inf
z
c
z
subj. to Gz W
where z R
s
.
Convex optimization problems.
Other common forms:
inf
z
c
z
subj. to Gz W
G
eq
z = W
eq
or
inf
z
c
z
subj. to G
eq
z = W
eq
z 0
Always possible to convert one of the three forms into the other.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 54 / 73
Graphical Interpretation and Solutions Properties
Let P be the feasible set. P is a polyhedron.
If P is empty, then the problem is infeasible.
Denote by J
= argmin
zP
c
z
Case 1. The LP solution is unbounded, i.e., J
= .
Case 2. The LP solution is bounded, i.e., J
is a singleton.
Case 3. The LP solution is bounded and there are multiple optima. Z
is an
uncountable subset of R
s
which can be bounded or unbounded.
cz = k
4
cz = k
1
P
(a) Case 1
cz = k
4
cz = k
3
cz = k
1
z
P
(b) Case 2
cz = k
4
cz = k
1
z
P
(c) Case 3
Figure: Graphical Interpretation of the Linear Program Solution, k
i
< k
i1
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 55 / 73
Dual of LP
Consider the LP
inf
z
c
z
subj. to Gz W
(14)
with z R
s
and G R
ms
. The Lagrange function is
L(z, u) = c
z +u
(Gz W).
The dual cost is
(u) = inf
z
L(z, u) = inf
z
(c
+u
G)z u
W =
_
u
W if G
u = c
if G
u = c
Since we are interested only in cases where is nite,
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 56 / 73
sup
u
u
W
subj. to G
u = c
u 0
inf
u
W
u
subj. to G
u = c
u 0
KKT condition for LP
The KKT conditions for the LP (14) become
G
u = c, (15a)
(G
j
z W
j
)u
j
= 0, (15b)
u 0, (15c)
Gz W (15d)
which are: primal feasibility (15d), dual feasibility (15a), (15c) and slackness
complementary conditions (15b).
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 57 / 73
Active Constraints and Degeneracies
Be J {1, . . . , m} the set of constraint indices and consider the sets
A(z) {j J : G
j
z = W
j
}, of active constraints at feasible z
NA(z) {j J : G
j
z < W
j
}, of inactive constraints at feasible z.
C 1. A(z
) undened, since z
is undened
C 2. The cardinality of A(z
= min
z
J(z)
subj. to Gz W
(16)
where the cost function has the form
J(z) = max
i=1,...,k
{c
i
z +d
i
} (17)
where c
i
R
s
and d
i
R.
0 1 2 3 4 5 6 7
0
1
2
3
4
5
x
P
1
f
x
(
)
f
1
( ) x
f
2
( ) x
f
3
( ) x
f
4
( ) x
P
2
P
3
P
4
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 60 / 73
Convex Piecewise Linear Optimization
The cost function J(z) in (17) is a convex PWA function. The optimization
problem (16)-(17) can be solved by the following linear program:
J
= min
z,
subj. to Gz W
c
i
z +d
i
, i = 1, . . . , k
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 61 / 73
Convex Piecewise Linear Optimization
Consider
J
= min
z
J
1
(z
1
) +J
2
(z
2
)
subj. to G
1
z
1
+G
2
z
2
W
(18)
where the cost function has the form
J
1
(z
1
) = max
i=1,...,k
{c
i
z
1
+d
i
}
J
2
(z
2
) = max
i=1,...,j
{m
i
z
2
+n
i
}
(19)
The optimization problem (18)-(19) can be solved by the following linear
program:
J
= min
z,
1
,
2
1
+
2
subj. to G
1
z
1
+G
2
z
2
W
c
i
z
1
+d
i
1
, i = 1, . . . , k
m
i
z
2
+n
i
2
, i = 1, . . . , j
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 62 / 73
Example
The optimization problem:
min
z
1
,z
2
f(x, u) = min |z
1
+ 5| +|z
2
3|
subject to 2.5 z
1
5
1 z
2
1
can be solved in Matlab by using v=linprog(f,A,b) where v = [
1
,
2
, z
1
, z
2
],
f = [1 1 0 0],
A =
_
_
0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
1 0 1 0
1 0 1 0
0 10 0 1
0 10 0 1
_
_
, b =
_
_
5
2.5
1
1
5
5
3
3
_
_
Solution is v = [7.5 0.2 2.5 1]
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 63 / 73
Quadratic Programming
min
z
1
2
z
Hz +q
z +r
subj. to Gz W
(20)
where z R
s
, H = H
> 0 R
ss
.
Other QP forms often include equality and inequality constraints.
Let P be the feasible set. Two cases can occur if P is not empty:
Case 1. The optimizer lies strictly inside the feasible polyhedron
Case 2. The optimizer lies on the boundary of the feasible polyhedron
z'Hz+qz+r=k
i
P
z*
(a) Case 1
z'Hz+qz+r=k
i
P
z*
(b) Case 2
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 64 / 73
Dual of QP
min
z
1
2
z
Hz +q
z
subj. to Gz W
The Lagrange function is
L(z, u) = {
1
2
z
Hz +q
z +u
(Gz W)}
The the dual cost is
(u) = min
z
{
1
2
z
Hz +q
z +u
Hz +q
z +u
(Gz W)}.
For a given u the Lagrange function
1
2
z
Hz +q
z +u
(Gz W) is convex.
Therefore it is necessary and sucient for optimality that the gradient is zero
Hz +q +G
u = 0.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 65 / 73
Dual of QP
From the previous equation we can derive z = H
1
(q +G
u) and
substituting this in equation (21) we obtain:
(u) =
1
2
u
(GH
1
G
)u u
(W +GH
1
q)
1
2
q
H
1
q (22)
By using (22) the dual problem can be rewritten as:
min
u
1
2
u
(GH
1
G
)u +u
(W +GH
1
q) +
1
2
q
H
1
q
subj. to u 0
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 66 / 73
KKT condition for QP
Consider the QP (20), f(z) = Hz +q, g
i
(z) = G
i
z W
i
, g
i
(z) = G
i
. The
KKT conditions become
Hz +q +G
u = 0 (23a)
u
i
(G
i
z W
i
) = 0 (23b)
u 0 (23c)
Gz W 0 (23d)
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 67 / 73
Active Constraints and Degeneracies
Let J {1, . . . , m} be the set of constraint indices. Consider the set of active
and inactive constraints at feasible z:
A(z) {j J : G
j
z = W
j
}
NA(z) {j J : G
j
z < W
j
}.
We have two cases:
Case 1. A(z
) = {}.
Case 2. A(z
such that
the number of active constraints at z
Az 2b
Az +b
b
The minimizer is
z
= (A
A)
1
A
b A
b
When linear inequality constraints are added, the problem is called
constrained linear regression or constrained least-squares, and there is no
longer a simple analytical solution. As an example we can consider regression
with lower and upper bounds on the variables, i.e.,
min
z
Az b
2
2
subj. to l
i
z
i
u
i
, i = 1, . . . , n,
which is a QP.
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 69 / 73
Example
Consider
min
z
Az b
2
2
where
A =
_
_
0.7513 0.5472 0.8143
0.2551 0.1386 0.2435
0.5060 0.1493 0.9293
0.6991 0.2575 0.3500
0.8909 0.8407 0.1966
0.9593 0.2543 0.2511
_
_
, b =
_
_
0.6160
0.4733
0.3517
0.8308
0.5853
0.5497
_
_
Unconstrained Least-Squares: in Matlab z = A\b or
z=quadprog(A
A,b
A). z
A,b
A,[0 -1 0],0 ). z
= [0.7045, 0, 0.1194].
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 70 / 73
Nonlinear Programming
Consider
min
z
f(z)
subj. to g
i
(z) 0 for i = 1, . . . , m
h
i
(z) = 0 for i = 1, . . . , p
z Z
A variety of softwares exists
In general, global optimality not guaranteed
Solutions are usually computed by recursive algorithms which start from
an initial guess z
0
and at step k generate a point z
k
such that
{f(z
k
)}
k=0,1,2,...
converges to J
.
These algorithms recursively use and/or solve analytical conditions for
optimality
In this class we will use NPSOL
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 71 / 73
Nonlinear Programming - NPSOL
Possible syntax:
[INFORM,ITER,ISTATE,C,CJAC,CLAMDA,OBJF,OBJGRAD,R,X] =
npsol(X0,A,L,U,funobj,funcon,OPTION);
Solving
min
z
funobj(z)
subj. to L
_
_
z
Az
funcon(z)
_
_
U
NPSOL Manual and Example on bSpace
Note it is a mex-function
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 72 / 73
Nonlinear Programming - Matlab Optimization toolbox
Possible syntax:
[X,FVAL,EXITFLAG] =
fmincon(funobj,X0,A,B,Aeq,Beq,LB,UB,NONLCON);
Solving
min
z
funobj(z)
subj. to A z B
Aeq z = Beq
LB z UB
Ceq(z) = 0
C(z) 0
The function NONLCON accepts z and returns the vectors C(z) and
Ceq(z)
help fmincon in Matlab
Francesco Borrelli (UC Berkeley) Basics on Optimization February 3, 2011 73 / 73