0% found this document useful (0 votes)
29 views37 pages

Concave + Convex

The document discusses optimization techniques for convex and non-convex functions. It introduces convexity and describes how convex problems have globally optimal solutions. Non-convex problems are harder to optimize and techniques like grid search, branch and bound, multiple coverings, and simulated annealing are discussed to optimize non-convex functions.

Uploaded by

Kannan Govindan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views37 pages

Concave + Convex

The document discusses optimization techniques for convex and non-convex functions. It introduces convexity and describes how convex problems have globally optimal solutions. Non-convex problems are harder to optimize and techniques like grid search, branch and bound, multiple coverings, and simulated annealing are discussed to optimize non-convex functions.

Uploaded by

Kannan Govindan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture 4

B1 Optimization

Michaelmas 2013

Convexity

Robust cost functions


Optimizing non-convex functions
grid search
branch and bound
multiple coverings
simulated annealing

A. Zisserman

The Optimization Tree

Unconstrained optimization
function of one
variable

f (x)

min f (x)
x

local
minimum

global
minimum

down-hill search (gradient descent) algorithms can find local minima


which of the minima is found depends on the starting point
such minima often occur in real applications

How can you tell if an optimization has a single optimum?

The answer is: see if the optimization problem is convex.


If it is, then a local optimum is the global optimum.

First, we need to introduce


Convex Sets, and
Convex Functions

Note sketch introduction only

convex

Not convex

Convex functions

Convex function examples

convex

Not convex

A non-negative sum of convex functions is convex

Convex Optimization Problem


Minimize:
a convex function
over a convex set
Then locally optimal points are globally optimal

Also, such problems can be solved both in theory and practice

Why do we need the domain to be convex?

f (x)

domain (not a convex set)

local
optimum

cant get from here to


here by downhill search

Examples of convex optimization problems


1. Linear programming
2. Least squares
f (x) = (Ax b)2 , for any A
3. Quadratic functions
f (x) = x>Px+q>x+r, provided that P is positive definite

Many more useful examples, see Boyd & Vandenberghe

Second order condition


The Hessian of a function f (x1, x2, . . . , xn) is the matrix of partial derivatives

2f

x1 x1

2f

= x2.x1

2f
xn x1

2f

2f

x1 x2
2f
x2 x2

. . . x xn
1
2f
. . . x xn
2
..
...

2f
xn x2

2f
xn xn

..

...

Diagonalize the Hessian by an orthogonal change of coordinates.


Diagonals are the eigenvalues.
If the eigenvalues are all positive, then the Hessian is positive definite, and
f is convex.

Strictly convex
A function f (x) is strictly convex if
f ((1 )x0 + x1) < (1 )f (x0) + f (x1) .

strictly convex
one global optimum

Not strictly convex


multiple local optima
(but all are global)

Robust Cost Functions


In formulating an optimization problem there is often some
room for design and choice
The cost function can be chosen to be:
convex
robust to noise (outliers) in the data/measurements

Consider minimizing the cost function

f (x) =

X
i

(x ai)2

{ ai }

the data { ai } may be thought of as repeated measurements of a fixed


value (at 0), subject to Gaussian noise and some outliers
it has 10% of outliers biased towards the right of the true value
the minimum of f(x) does not correspond to the true value

Examine the behaviour of various cost functions f (x) =

X
i

quadratic

L1

C(|x ai|)

truncated quadratic

huber

Quadratic cost function


squared error the usual default cost

C()

function
arises in Maximum Likelihood
Estimation for Gaussian noise
convex

C() = 2

Truncated Quadratic cost function


C()

for inliers behaves as a quadratic


truncated so that outliers only incur a
fixed cost
non-convex

C() = min( 2, )
=

2
if || <

otherwise.

L1 cost function
absolute error

C()

called total variation


convex
non-differentiable at origin
finds the median of { ai }

C() = ||

Huber cost function


C()

hybrid between quadratic and L1


continuous first derivative
for small values is quadratic
for larger values becomes linear
thus has the outlier stability of L1
convex

C() =

2
if || <
2|| 2 otherwise.

{ ai }

Example 1: measurements with outliers

f (x) =

X
i

quadratic

C(|x ai|)

huber

truncated quadratic

zoom

{ ai }

Example 2: bimodal measurements


70% in principal mode
30% in outlier mode

f (x) =

X
i

quadratic

C(|x ai|)

truncated quadratic

huber

Summary
Squared cost function very susceptible to outliers
Truncated quadratic has a stable minimum, but is non-convex and
also has other local minima. Also basin of attraction of global minimum
limited
Huber has stable minimum and is convex

Optimizing non-convex functions


function of one
variable

f (x)

min f (x)
x

Sketch four methods:

local
minimum

global
minimum

1.

grid search: uniform grid space covering

2.

branch and bound

3.

multiple coverings: Newton like methods within regions

4.

simulated annealing: stochastic optimization

Branch and bound

min f (x)
x

f (x)

UB

LB

Key idea:

Split region into sub-regions and compute bounds

Consider two regions A and C

If lower bound of A is greater than upper bound of C then A can be discarded

divide (branch) regions and repeat

Multiple coverings
Key idea is to cover the parameter space with overlapping
regions to deal with local optima, and then take advantage
of efficient continuous optimization for each region.
Example from Matlab Global Optimization toolbox
Pattern Contours with Constraint Boundaries
5
4
3
2

1
0
-1
-2
-3
-4
-3

-2

-1

2
x

Multiple coverings ctd

Use multiple starting points

Continuous optimization method


for each

Record optimum for each starting


point

Sort values to find global optimum

Simulated Annealing

f (x)
1

local
minimum

global
minimum

The algorithm has a mechanism to jump out of local minima


It is a stochastic search method, i.e. it uses randomness in the search

Simulated annealing algorithm

At each iteration propose a move in the parameter space

If the move decreases the cost, then accept it

If the move increases the cost by E, then

accept it with a probability exp(-E/T),

Otherwise, dont move

Note probability depends on temperature T

Decrease the temperature according to a schedule so that at the start


cost increases are likely to accepted, and at the end they are not

Boltzmann distribution and the cooling schedule

start with T high, then exp(-E/T)


is approx. 1, and all moves are
accepted
many cooling schedules are
possible, but the simplest is

Tk+1 = Tk , 0 < < 1

Boltzmann distribution exp(-E/T)


1

exp(-x)
exp(-x/10)
exp(-x/100)

0.8

0.6

0.4

where k is the iteration number


The algorithm can be very slow
to converge

0.2

0
0

20

40

60

80

100

Simulated annealing
The name and inspiration come from annealing in metallurgy, a technique
involving heating and controlled cooling of a material to increase the size
of its crystals and reduce their defects.
The heat causes the atoms to become unstuck from their initial positions
(a local minimum of the internal energy) and wander randomly through
states of higher energy; the slow cooling gives them more chances of
finding configurations with lower internal energy than the initial one.
Algorithms due to: Kirkpatrick et al. 1982; Metropolis et al.1953.

Example: Convergence of simulated annealing

AT INIT_TEMP

unconditional Acceptance

COST FUNCTION, C

HILL CLIMBING

move accepted with


probability exp(-E/T)
HILL CLIMBING

HILL CLIMBING

AT FINAL_TEMP

NUMBER OF ITERATIONS

Steepest descent on a graph

Slide from Adam Kalai and Santosh Vempala

Random Search on a graph

Simulated Annealing on a graph

Phase 1: Hot (Random)


Phase 2: Warm (Bias down)
Phase 3: Cold (Descent)
(Descend)

There is more
There are many other classes of optimization problem, and also many
efficient optimization algorithms developed for problems with special
structure. Examples include:
Combinatorial and discrete optimization
Dynamic programming
Max-flow/Min-cut graph cuts

See the links on the web page


http://www.robots.ox.ac.uk/~az/lectures/b1/index.html
and come to the C Optimization lectures next year

You might also like