Lect 6
Lect 6
• Lasso
• L1 regularization
• other regularizers
• SVM regression
• epsilon-insensitive loss
Regression
y
d
λkwk2 λ
X
|wj |
ridge regression lasso j
0.5
y
-0.5
• N = 9, M = 7 -1
M
X -1.5
j > 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
f (x, w) = wj x = w Φ(x) x
j=0
w is a M+1
dimensional vector
ridge regression
lasso
100
ridge regression 500
lasso
400
50
300
0 200
wj
100
wj
-50
0
-100
-100
-200
-150
-300
-8 -7 -6 -5 -4 -3 -2
10 10 10 10 10 10 10 -8 -7 -6 -5 -4 -3 -2
10 10 10 10 10 10 10
Variation of weightslog λ lambda
with
with λlambda
Variation of weights log
30
200
detail detail
20 150
100
10
50
wj
0
wj
-10
-50
-20 -100
-150
-5 -4 -7 -6 -5
10 10 10 10 10
log λ log λ
Second example – lasso in action
1.5
0.5
weights
−0.5
−1
0 0.5 1 1.5
regularization parameter λ
• For q < 1, the problem is not convex, and obtaining the global
minimum is more difficult
SVMs for Regression
Use ε-insensitive error measure square
( loss
0 if |r| ≤ ε
Vε(r) =
|r| − ε otherwise. Vε(r)
This can also be written as
0.5
y
centred on data points
-0.5
-1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
N
X 2
/σ 2
f (x, w) = wi e−(x−xi ) = w> Φ(x)
i=1
Φ : x → Φ(x) R → RN w is a N-vector
1.5 1.5
Sample points Sample points
Ideal fit Validation set fit
1 1 Support vectors
0.5 0.5
0 0
y
-0.5 -0.5
-1 -1
-1.5 -1.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
epsilon = 0.01
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
As epsilon increases:
• fit becomes looser
• less data points are support vectors
0
−3 −2 −1 0 1 2 3
y−f(x)
Final notes on cost functions
• L1—SVM
N
X
min max (0, 1 − yif (xi)) + λ||w||1
w∈Rd i
Background reading