0% found this document useful (0 votes)
2 views

optimization

The document provides an overview of numerical optimization, covering mathematical programming problems, applications, and classifications of optimization problems, including convex and nonconvex types. It discusses the formal definitions, properties, and examples of convex optimization problems, emphasizing their efficiency and the significance of local versus global solutions. Additionally, it introduces various optimization techniques and methods for solving both convex and nonconvex problems.

Uploaded by

kociro5434
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

optimization

The document provides an overview of numerical optimization, covering mathematical programming problems, applications, and classifications of optimization problems, including convex and nonconvex types. It discusses the formal definitions, properties, and examples of convex optimization problems, emphasizing their efficiency and the significance of local versus global solutions. Additionally, it introduces various optimization techniques and methods for solving both convex and nonconvex problems.

Uploaded by

kociro5434
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

15-780 – Numerical Optimization

J. Zico Kolter

January 29, 2014

1
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


2
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


2
Introduction to mathematical optimization

• Casting AI problems as optimization / mathematical


programming problems has been one of the primary trends of
the last 15 years

• A topic not highlighted in textbook (see website for additional


readings)

• A seemingly remarkable fact:

Search problems Mathematical programs


Variable type Discrete Continuous
# of possible solutions Finite Infinite
“Difficulty” of solving Exponential Polynomial (often)

3
Formal definition
• Mathematical programs are problems of the form
minimize f (x)
x
subject to gi (x) ≤ 0 i = 1, . . . , m

– x ∈ Rn is the optimization variable

– f : Rn → R is objective function

– gi : Rn → R are (inequality) constraint functions

• Feasible region: C = {x : gi (x) ≤ 0, ∀i = 1, . . . , m}

• x? ∈ Rn is an optimal solution if x? ∈ C, and


f (x? ) ≤ f (x), ∀x ∈ C
4
Note on naming

• Mathematical program (or mathematical optimization problem,


and sometimes just optimization problem) refer to the problem
of the form
minimize f (x)
x
subject to gi (x) ≤ 0 i = 1, . . . , m

• Numerical optimization (or mathematical optimization, or just


optimization) refers to the general study of these problems, as
well as methods for solving them on a computer

5
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


6
Least-squares fitting

h(a)

b
h(ai) − bi

• Given ai , bi , i = 1, . . . , m, find h(a) = x1 a + x2 that optimizes


m
X
minimize (x1 ai + x2 − bi )2
x
i=1

(x1 is slope, x2 is intercept)


7
Weber point
(ai, bi)
(x1, x2)

• Given m points in 2D space (ai , bi ), i = 1, . . . m, find the point


that minimizes the sum of the Euclidean distances
Xm p
minimize (x1 − ai )2 + (x2 − bi )2
x
i=1

• Many modifications, e.g. keep x within range (al , bl ), (au , bu )


m p
X
minimize (x1 − ai )2 + (x2 − bi )2
x
i=1
subject to al ≤ x1 ≤ au , bl ≤ x2 ≤ bu 8
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


9
Optimization problems

Convex problems Nonconvex problems

10
Optimization problems

Convex problems Nonconvex problems

Semidefinite programming

Quadratic programming

Linear programming

10
Optimization problems

Convex problems Nonconvex problems

Semidefinite programming
Integer programming
Quadratic programming

Linear programming

10
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


11
Notation
• A (column) vector with n real-valued entries

x ∈ Rn

xi denotes the ith entry

• A matrix with real-valued entries, m rows, and n columns

A ∈ Rm×n

Aij denotes the entry in the ith row and jth column

• The transpose operator AT switches rows and columns of a


matrix
Aij = (AT )ji
12
Operations

• Addition: For A, B ∈ Rm×n ,

(A + B)ij = Aij + Bij

• Multiplication: For A ∈ Rm×n , B ∈ Rn×p


n
X
(AB)ij = Aik Bkj , AB ∈ Rm×p
k=1

associative: A(BC) = (AB)C = ABC


distributive: A(B + C) = AB + AC
not commutative: AB 6= BA
transpose of product: (AB)T = B T AT

13
• Inner product: For x, y ∈ Rn
n
X
T
x y = hx, yi = x i yi
i=1

• Identity matrix: I ∈ Rn×n has ones on diagonal and zeros


elsewhere, has property that IA = A

• Matrix inverse: For A ∈ Rn×n , A−1 is unique matrix such that

AA−1 = I = A−1 A

(may not exist for all square matrices)


inverse of product: (AB)−1 = B −1 A−1 when A and B both
invertible

14
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


15
Convex optimization problems

• An extremely powerful subset of all optimization problems

• Roughly speaking, allow for efficient (polynomial time) global


solutions

• Beautiful theory

• Lots of applications (but be careful, lots of problems we’d like to


solve are definitely not convex)

• At this point, a fairly mature technology (off-the-shelf libraries


for medium-sized problems)

16
Formal definition

• A convex optimization problem is a specialization of a


mathematical programming problem

minimize f (x)
x
subject to x ∈ C

where x ∈ Rn is optimization variable, f : Rn → R is a convex


function and feasible region C is a convex set

17
Convex sets
• A set C is convex if, for a x, y ∈ C and θ ∈ [0, 1],

θx + (1 − θ)y ∈ C

Convex set Nonconvex set

• All of Rn : x, y ∈ Rn =⇒ θx + (1 − θ)y ∈ Rn

• Intervals: C = {x ∈ Rn : l ≤ x ≤ u} (elementwise inequality)


18
• Linear inequalities: C = {x ∈ Rn : Ax ≤ b}, A ∈ Rm×n , b ∈ Rm
Proof:
x, y ∈ Rn ∈ C =⇒ Ax ≤ b, Ay ≤ b
=⇒ θAx ≤ θb, (1 − θ)Ay ≤ (1 − θ)b
=⇒ θAx + (1 − θ)Ay ≤ b
=⇒ A(θx + (1 − θ)y) ≤ b
=⇒ θx + (1 − θ)y ∈ C

• Linear equalities C = {x ∈ Rn : Ax = b}, A ∈ Rm×n , b ∈ Rm


Tm
• Intersection of convex sets: C = i=1 Ci for convex sets Ci ,
i = 1, . . . , m

19
Convex functions
• A function f : Rn → R is convex if, for any x, y ∈ Rn and
θ ∈ [0, 1],
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

(y, f (y))
(x, f (x))

• f is concave if −f is convex

• f is affine if f is convex and concave, must have form


f (x) = aT x + b
for a ∈ Rn , b ∈ R.
20
Testing for convexity
• Convex function must “curve upwards” everywhere

• For functions with scalar input f : R → R, equivalent to


condition that f 00 (x) ≥ 0, ∀x

• For vector-input functions, corresponding condition is that


∇2x f (x)  0
where ∇2x f (x) is the Hessian of f ,
∂ 2 f (x)
(∇2x f (x))ij =
∂xi ∂xj
and  0 denotes positive definiteness, the condition that for all
z ∈ Rn ,
z T ∇2x f (x) z ≥ 0


21
Examples
• Exponential: f (x) = exp(ax), [f 00 (x) = a2 exp(ax) ≥ 0∀x]

• Negative logarithm: f (x) = −log(x), [f 00 (x) = 1/x2 ≥ 0∀x]

• Squared Euclidean distance: f (x) = xT x [After some


derivations, you’ll find that ∇2x f (x) = I, ∀x, which is positive
semidefinite since z T Iz = z T z ≥ 0]

• Euclidean distance: f (x) = kxk2 = xT x

• Non-negative weighted sum of convex functions:


m
X
f (x) = wi fi (x)
i=1
where wi ≥ 0 and fi convex, ∀i = 1, . . . , m
22

• Negative square root: f (x) = − x convex for x ≥ 0

• Log-sum-exp: f (x) = log(ex1 + ex2 + . . . + exn )

• Maximum of convex functions:


m
f (x) = max fi (x)
i=1

where fi convex, ∀i = 1, . . . , m

• Composition of convex and affine function: if f (y) convex in y,


f (Ax − b) is convex in x

• Sublevel sets: for f convex, C = {x : f (x) ≤ c} is a convex set

23
Convex optimization problems (again)

• Using sublevel sets property, it’s more common to write generic


convex optimization problems as

minimize f (x)
x
subject to gi (x) ≤ 0 i = 1, . . . , m
hi (x) = 0 i = 1, . . . , p

where f (objective) is convex, gi ’s (inequality constraints) are


convex, and hi ’s (equality constraints) are affine

• Key property of convex optimization problems: all local


solutions are global solutions

24
• Definition: a point x is globally optimal if x is feasible and
there is no feasible y such that f (y) < f (x)

• Definition: a point x is locally optimal if x is feasible and there


exists some R > 0 such that for all feasible y with that
kx − yk2 ≤ R, f (x) ≤ f (y)

• Theorem: for a convex optimization problem, all locally optimal


points are globally optimal

Proof (by contradiction): Suppose there exists feasible y such


that f (y) < f (x). Pick the point
R
z = θx + (1 − θ)y, θ = 1 − .
2kx − yk2

We will show kx − zk2 < R, z is feasible, and f (z) < f (x),


contradicting local optimality.
25
Proof (cont)

  
R R
kx − zk2 = x − 1− x+ y
2kx − yk2 2kx − yk2 2
R
= (x − y) = R/2 < R
2kx − yk2 2

Furthermore, by convexity of feasible set, z is feasible. This shows


all the points above, contradicting the supposition that there was a
non-local feasible y with f (y) < f (x).

26
Example: least-squares

• General (potentially multi-variable) least-squares problems is


given by
minimize kAx − bk22
x

for optimization variable x ∈ Rn , data matrices A ∈ Rm×n ,


b ∈ Rm , and where kyk22 is the `2 norm of vector y ∈ Rn is
m
X
kyk22 T
=y y= yi2 .
i=1

• A convex optimization problem (why?)

27
Example: Weber point

• Weber point in n dimensions


m
X
minimize kx − ai k2
x
i=1

where x ∈ Rn is optimization variable and a1 , . . . , am are


problem data.

• A convex optimization problem (why?)

28
Example: linear programming
• General form of a linear program

minimize cT x
x
subject to Ax = b
Fx ≤ g

where x ∈ Rn is optimization variable, A ∈ Rm×n , b ∈ Rm ,


F ∈ Rp×n , g ∈ Rp are problem data

• You may have also seen this with just the inequality constraint
x ≥ 0 (these are equivalent, why?)

• A convex problem (affine objective, convex constraints)

29
Example: quadratic programming

• General form of quadratic program

minimize xT Qx + rT x
x
subject to Ax = b
Fx ≤ g

where x ∈ Rn is optimization variable, A ∈ Rm×n , b ∈ Rm ,


F ∈ Rp×n , g ∈ Rp are problem data

• A convex problem if Q is semidefinite

30
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


31
Nonconvex problems

• Any optimization problem

minimize f (x)
x
subject to gi (x) ≤ 0 i = 1, . . . , m
hi (x) = 0 i = 1, . . . , p

where f or any gi not convex, any hi not affine, is a nonconvex


problem

• Two classes of approaches for solving non-convex problems:


local and global

32
• Local methods: Given some initial point x0 , repeatedly search
“nearby” points until finding a (feasible) solution x̂ that is
better than all it’s nearby points

– Typically, same approximate computational complexity as convex


methods (often just use convex methods directly)

– Can fail to find any feasible solution at all

• Global methods: Find actual optimal solution x? over the


entire domain of feasible solution
– Typically, exponential time complexity

• Both are used extremely frequently in practice, for different


types of problems

33
Integer programming

• A class of nonconvex problems that we will look at in more


detail later:

minimize cT x
x
subject to Ax = b
Fx ≤ g
x∈Z

with optimization variable x, problem data A, b, F, g as in linear


programming

• A nonconvex problem (why?)

34
Overview
• Introduction to mathematical programming problems

• Applications

• Classification of optimization problems

• (Linear algebra review)

• Convex optimization problems

• Nonconvex optimization problems

• Solving optimization problems


35
Solving optimization problems
• Starting with the unconstrained, one dimensional case

f (x)

• To find minimum point x? , we can look at the derivative of the


function f 0 (x): any location where f 0 (x) = 0 will be a “flat”
point in the function

• For convex problems, this is guaranteed to be a minimum


(instead of a maximum)
36
• Generalization for multivariate function f : Rn → R: the
gradient of f must be zero

∇x f (x) = 0

• For f defined as above, gradient is a n-dimensional vector


containing partial derivatives with respect to each dimension

∂f (x)
(∇x f (x))i =
∂xi

• Important: only a sufficient condition for unconstrained


optimization

37
How do we find ∇x f (x) = 0?

• Direct solution: Analytically compute ∇x f (x) and set resulting


equation to zero

– Example: quadratic function


1
f (x) = xT Qx+rT x =⇒ ∇x f (x) = 2Qx+r =⇒ x? = − Q−1 r
2
for Q ∈ Rn×n , r ∈ Rn

– Will be a minimum assuming Q is positive definite

38
• Gradient descent: Take small steps in the direction of the
current (negative) gradient

– Intuition: negative gradient of a function always points


“downhill” (actually points downhill in the steepest possible
direction, for multivariate function)

f (x) f ′(x0)

x x0

– Repeat: x ← x − α∇x f (x) where α ∈ R > 0 is a step size

39
• Newton’s method: Use root-finding algorithm to find solution
to (nonlinear) equation ∇x f (x) = 0
– Newton’s method: given g : R → R, find g(x) = 0 by iteration

f (x)
x←x−
f 0 (x)

– Multivariate generalization for finding ∇x f (x) = 0

x ← x − (∇2x f (x))−1 ∇x f (x)

– Each iteration is more expensive than gradient descent, but can


converge to numerical precision much faster

40
Constrained optimization
• Now consider problem with constraints

minimize f (x)
x
subject to gi (x) ≤ 0 i = 1, . . . , m

• Projected gradient: if we can come up with an “easy”


projection operator P(x) such that for any x, P(x) returns
“closest” feasible point

x ← P(x − α∇x f (x))

– Example C = x ≥ 0,

P(x) = max{x, 0} (elementwise)

41
• Barrier method: Approximate problem via unconstrained
optimization
m
X
minimize f (x) − t log(−gi (x))
x
i=1

as t → 0, this approaches original problem

– Can quickly lead to numerical instability, requires a lot of care to


get right

– Equality constraints need to be handled separately

42
Practically solving optimization problems

• The good news: for many classes of optimization problems,


people have already done all the “hard work” of developing
numerical algorithms

• A wide range of tools that can take optimization problems in


“natural” forms and compute a solution

• Some well-known libraries: CVX (MATLAB), YALMIP


(MATLAB), AMPL (custom language), GAMS (custom
language)

43
cvxpy

• We’ll be using a relative newcomer to the game: cvxpy


(http://github.com/cvxgrp/cvxpy)

• Python library for specifying and solving convex optimization


problems

• Very much “alpha” software, under active development

44
• Constrained Weber point, given ai ∈ R2 , i = 1, . . . , m
m
X
minimize kx − ai k2 , subject to x1 + x2 = 0
x
i=1

• cvxpy code

import cvxpy as cp
import cvxopt

n = 2
m = 10
A = cvxopt.normal(n,m)

x = cp.Variable(n)
f = sum([cp.norm(x - A[:,i],2) for i in range(m)])
constraints = [sum(x) == 0]
result = cp.Problem(cp.Minimize(f), constraints).solve()
print x.value

45
Take home points

• Optimization (and formulating AI tasks as optimization


problems) has been one of the primary themes in AI of the past
15 years

• Convex optimization problems are a useful (though restricted)


subset that can be solved efficiently and which still find a huge
number of applications

• Many algorithms for solving optimization problems, but a lot of


the “hard work” has been done by freely available libraries

46

You might also like