10 Convex Optimisation
10 Convex Optimisation
MSc: https://lms.uzh.ch/url/RepositoryEntry/17589469505
PhD: https://lms.uzh.ch/url/RepositoryEntry/17589469506
Solving Machine Learning Problems
1
Solving Machine Learning Problems
1
A Crash Course in Optimisation
Today:
• Convex optimisation
Next time:
• Gradient Descent
• Constrained optimisation
2
Convex Sets
x1, x2 ∈ C, 0≤θ≤1 =⇒ θx1 + (1 −
sets
3
Examples of Convex Sets
• Set RD
λ x + (1 − λ) y ∈ RD for all x, y ∈ RD
• Norm balls
For any L-norm || · ||, the set B = {x ∈ RD : ||x|| ≤ 1} is convex
• Polyhedra
Given A ∈ Rm×n and b ∈ Rm , the polyhedron {x ∈ Rn : A x ≤ b} is convex
4
Showing the Set of PSD Matrices is Convex
5
Showing the Norm Balls Form Convex Sets
6
Showing the Polyhedron is Convex + Example
7
Convex Functions
f (λ · x + (1 − λ) · y) ≤ λ · f (x) + (1 − λ) · f (y)
8
Examples of Convex Functions
• Norms: ∥ · ∥p except p = 0
9
Convex Optimisation
10
Convex Optimisation
10
Convex Optimisation
• x is feasible and
• There is B > 0 s.t. f (x) ≤ f (y) for all feasible y with ||x − y||2 ≤ B.
• x is feasible and
• f (x) ≤ f (y) for all feasible y.
11
Local Optima are Global Optima for Convex Optimisation Problems
• x is feasible and
• There is B > 0 s.t. f (x) ≤ f (y) for all feasible y with ||x − y||2 ≤ B.
• x is feasible and
• f (x) ≤ f (y) for all feasible y.
Theorem: For any convex optimisation problem, all locally optimal points are
globally optimal.
11
Local Optima are Global Optima for Convex Optimisation Problems: Proof
12
Local Optima are Global Optima for Convex Optimisation Problems: Figure
f (x )
B
f (x )
f (z )
f (y )
x
x z y
13
Classes of Convex Optimisation Problems
Linear Programming:
T
minimize c x + d
subject to A x ≤ e
Bx=f
14
Classes of Convex Optimisation Problems
Linear Programming:
T
minimize c x + d
subject to A x ≤ e
Bx=f
14
Classes of Convex Optimisation Problems
Linear Programming:
T
minimize c x + d
subject to A x ≤ e
Bx=f
Semidefinite Programming:
minimize tr(C X)
subject to tr(Ai X) = bi i ∈ [m ]
X positive semidefinite
T
minimize c x + d
subject to A x ≤ e
Bx=f
• No closed-form solution
• Efficient algorithms exist, both in
theory and practice (for tens of
thousands of variables)
15
Linear Model with Absolute Loss
Suppose we have data (X, y) and that we want to minimise the objective:
N
X
L(w) = |wT xi − yi |
i =1
16
Linear Model with Absolute Loss
Suppose we have data (X, y) and that we want to minimise the objective:
N
X
L(w) = |wT xi − yi |
i =1
subject to:
T
w xi − yi ≤ ζi , i ∈ [N ]
T
yi − w xi ≤ ζi , i ∈ [N ]
16
Linear Model with Absolute Loss
Suppose we have data (X, y) and that we want to minimise the objective:
N
X
L(w) = |wT xi − yi |
i =1
subject to:
T
w xi − yi ≤ ζi , i ∈ [N ]
T
yi − w xi ≤ ζi , i ∈ [N ]
The solution to this linear program gives w that minimises the objective L.
16
Linear Model with Absolute Loss via Linear Programming (1/2)
N
X
minimize ζi
N
X i =1
L(w) = |wT xi − yi | subject to:
i =1
T
w xi − yi ≤ ζi , i ∈ [N ]
T
yi − w xi ≤ ζi , i ∈ [N ]
Claim: The solution to this linear program gives w that minimises the objective L.
17
Linear Model with Absolute Loss via Linear Programming (2/2)
18
Recall: Likelihood of Linear Regression (Gaussian Noise Model)
Likelihood
N /2
1 1
p(y | X, w, σ) = exp − 2 (Xw − y)T (Xw − y)
2πσ 2 2σ
19
Recall: Likelihood of Linear Regression (Gaussian Noise Model)
Likelihood
N /2
1 1
p(y | X, w, σ) = exp − 2 (Xw − y)T (Xw − y)
2πσ 2 2σ
For the Lasso objective, i.e., linear model with ℓ1 -regularisation, we have
N
X D
X D
X
Llasso (w) = (wT xi − yi )2 + λ |wi | = wT XT Xw − 2yT Xw + yT y + λ | wi |
i =1 i =1 i =1
20
Minimising the Lasso Objective
For the Lasso objective, i.e., linear model with ℓ1 -regularisation, we have
N
X D
X D
X
Llasso (w) = (wT xi − yi )2 + λ |wi | = wT XT Xw − 2yT Xw + yT y + λ | wi |
i =1 i =1 i =1
20