0% found this document useful (0 votes)

8 views

Lect5 Reg

Uploaded by

Ark Mtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lect5 Reg

Uploaded by

Ark Mtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Lecture 5: Regression

C4B Machine Learning Hilary 2011 A. Zisserman

• Linear regression
• Loss function
• Ridge regression

• Basis functions

• Dual representation and kernels

Regression
y

• Suppose we are given a training set of N observations

((x1, y1), . . . , (xN , yN )) with xi ∈ Rd, yi ∈ R

• The regression problem is to estimate f (x) from this data

such that
yi = f (xi)
Learning by optimization

• Asin the case of classification, learning a regressor can be

formulated as an optimization:

Minimize with respect to f ∈ F

N
X
l (f (xi), yi) + λR (f )
i=1
loss function regularization

• There is a choice of both loss functions and regularization

• e.g. squared loss, SVM “hinge-like” loss
• squared regularizer, lasso regularizer

• Algorithms can be kernelized

Choice of regression function – non-linear basis functions

• Function for regression y(x, w) is a non-linear function of x, but

linear in w:

f (x, w) = w0 + w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x)

• For example, for x ∈ R, polynomial regression with φj (x) = xj :

M
X
f (x, w) = w0 + w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = wj xj
j=0
⎛ ⎞
e.g. for M = 3, 1
⎜ x ⎟
f (x, w) = (w0 , w1 , w2 , w3 ) ⎜ ⎟ >
⎝ x2 ⎠ = w Φ(x)
1 4 x3
Φ : x → Φ(x) R →R
• or the basis functions can be Gaussians centred on the training
data:
φj (x) = exp −(x − xj )2 /2σ2

e.g. for 3 points,

⎛ ⎞
−(x−x1 )2 /2σ2
e
⎜ −(x−x2 )2 /2σ2 ⎟
f (x, w) = (w1 , w2 , w3 ) ⎝ e ⎠ = w> Φ(x)
2 2
e−(x−x3 ) /2σ

Φ : x → Φ(x) R1 → R 3

Least squares “ridge regression”

• Cost function – squared loss:
target value
yi

loss function regularization xi

• Regression function for x (1D):

f (x, w) = w0 + w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w > Φ(x)

• NB squared loss arises in Maximum Likelihood estimation for an error model

yi = ỹi + ni ni ∼ N (0, σ 2)
measured value
true value
Solving for the weights w

Notation: write the target and regressed values as N -vectors

⎛ ⎞ ⎛ ⎞ ⎡ ⎤⎛ ⎞
y1 Φ(x1 )>w 1 φ1(x1) . . . φM (x1) w0
⎜ ⎟ ⎜ ⎟ ⎢ ⎥⎜ ⎟
⎜
⎜
y2 ⎟
⎟
⎜
⎜ Φ(x2 )>w ⎟
⎟
⎢
⎢
1 φ1(x2) . . . φM (x2) ⎥⎜
⎥⎜
w1 ⎟
⎟
y =⎜
⎜ . ⎟
⎟ f =⎜
⎜ . ⎟=
⎟ Φw =⎢
⎢ . . ⎥⎜
⎥⎜ . ⎟
⎟
⎜ . ⎟ ⎜ ⎟ ⎢ . . ⎥⎜ . ⎟
⎝ ⎠ ⎝ . ⎠ ⎣ ⎦⎝ ⎠
yN Φ(xN )>w 1 φ1(xN ) . . . φM (xN ) wM

Φ is an N × M design matrix
e.g. for polynomial regression with basis functions up to x2
⎡ ⎤
1 x1 2
x1 ⎛
⎢ ⎥ ⎞
⎢ 1
⎢ x2 x2 ⎥ w
2 ⎥⎜ 0 ⎟
Φw =⎢
⎢ . . ⎥⎥ ⎝ w1 ⎠
⎢
⎣ . . ⎥⎦ w2
1 xN 2
xN

N
1 X λ
e
E(w) = {f (xi, w) − yi}2 + kwk2
2 i=1 2
N ³ ´2
1 X λ
= yi − w>Φ(xi) + kwk2
2 i=1 2
1 λ
= (y − Φw)2 + kwk2
2 2

Now, compute where derivative w.r.t. w is zero for minimum

e w)
E(
= −Φ> (y − Φw) + λw = 0
dw
Hence
³ ´
Φ>Φ + λI w = Φ> y
³ ´−1
w = Φ> Φ + λ I Φ> y
M basis functions, N data points
³ ´−1
w = Φ > Φ + λI Φ> y
= assume N > M

Mx1 MxM MxN Nx1

• This shows that there is a unique solution.

• If λ = 0 (no regularization), then

w = (Φ>Φ )−1Φ>y = Φ+y

where Φ+ is the pseudo-inverse of Φ (pinv in Matlab)

• Adding the term λI improves the conditioning of the inverse, since if Φ

is not full rank, then (Φ>Φ + λI) will be (for suﬃciently large λ)

• As λ → ∞, w → 1 >
λΦ y → 0

• Often the regularization is applied only to the inhomogeneous part of w,

i.e. to w̃, where w = (w0, w̃)

³ ´−1
w >
= Φ Φ + λI Φ> y

f (x, w) = w>Φ(x) = Φ(x)>w

³ ´−1
= Φ(x)> >
Φ Φ + λI Φ>y
= b(x)>y
Output is a linear blend, b(x), of the training values {yi}
Example 1: polynomial basis functions
ideal fit
• The red curve is the true function 1.5
Sample points

(which is not a polynomial) 1

Ideal fit

0.5

• The data points are samples from the

curve with added noise in y. 0

y
-0.5

• There is a choice in both the degree, -1

M, of the basis functions used, and in
the strength of the regularization -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

M
X
f (x, w) = wj xj = w> Φ(x) Φ : x → Φ(x) R → RM +1
j=0

w is a M+1
dimensional vector

N = 9 samples, M = 7
1.5 1.5
Sample points Sample points
Ideal fit Ideal fit
1 lambda = 100 1 lambda = 0.001

0.5 0.5

0 0
y
y

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x

1.5 1.5
Sample points Sample points
Ideal fit Ideal fit
1 lambda = 1e-010 1 lambda = 1e-015

0.5 0.5

0 0
y
y

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x
M=3 M=5
least-squares fit least-squares fit
1.5 1.5
Sample points Sample points
Ideal fit Ideal fit
1 Least-squares solution 1 Least-squares solution

0.5 0.5

y
0
y

-0.5
-0.5

-1
-1

-1.5
-1.5 Polynomial basis functions 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Polynomial basis
x functions
15 x
400

10
300

5
200

0
100

-5
y

y
-10
-100
-15
-200
-20
-300
-25
-400
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x

Example 2: Gaussian basis functions

ideal fit
• The red curve is the true function 1.5
Sample points
(which is not a polynomial) 1
Ideal fit

• The data points are samples from the

curve with added noise in y. 0.5

0
y

• Basis functions are centred on the -0.5

training data (N points)
• There is a choice in both the scale, -1

sigma, of the basis functions used, and -1.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
in the strength of the regularization x

N
X 2
/σ2 R → RN
f (x, w) = wi e−(x−xi ) = w> Φ(x) Φ : x → Φ(x)
i=1

w is a N-vector
N = 9 samples, sigma = 0.334

1.5 1.5
Sample points Sample points
Ideal fit Ideal fit
1 lambda = 100 1 lambda = 0.001

0.5 0.5

0 0
y

y
-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x

1.5 1.5
Sample points Sample points
Ideal fit Ideal fit
1 lambda = 1e-010 1 lambda = 1e-015

0.5 0.5

0 0
y

-0.5 -0.5

-1 -1

-1.5 -1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x x

Choosing lambda using a validation set

6 1.5
Ideal fit Sample points
Validation Ideal fit
5 Training 1 Validation set fit
Min error

4 0.5
error norm

3 0
y

2 -0.5

1 -1

0 -1.5
-10 -5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10 10 10
log λ x
Sigma = 0.334 Sigma = 0.1
1.5
1.5
Sample points
Sample points
Ideal fit
Ideal fit
1 Validation set fit
1 Validation set fit

0.5
0.5

y
0
y

-0.5
-0.5

-1
-1

-1.5
-1.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
x
Gaussian basis functions
Gaussian basis functions

0.8
2000

0.6
1500

1000
0.4

500 0.2

0 0

y
y

-500 -0.2

-1000 -0.4

-1500 -0.6

-2000 -0.8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x
x

Summary and preview

So far we have considered the primal problem where
M
X
f (x, w ) = wiφi(x) = w>Φ(x)
i=1
and we wanted a solution for w ∈ RM

Now we will consider the dual problem where

N
X
w= aiΦ(xi)
i=1
and we want a solution for a ∈ RN .

We will see that

• there is a closed form solution for a,

• the solution involves the N ×N Gram matrix k(xi, xj ) = Φ(xi)>Φ(xj ),

• so we can use the kernel trick again to replace scalar products

Dual Representation
N ³ ´2
1 X λ
e w) =
E( yi − w>Φ(xi) + kwk2 Φ : x → Φ(x) R → RM
2 i=1 2
and from the derivative w.r.t. w
e w) N
X ³ ´
E(
= − yi − w>Φ(xi) Φ(xi) + λw = 0
dw i=1
Hence
N N
X yi − w>Φ(xi) X
w= Φ(xi) = aiΦ(xi)
i=1 λ i=1
Again the vector w can be written as a linear combination of
the training data — Representer Theorem
⎛ ⎞
a1
⎜ ⎟
N
X h i⎜
⎜
a2 ⎟
⎟
w= aiΦ(xi) = Φ(x1) Φ(x2) . . . Φ(xN ) ⎜
⎜ . ⎟=
⎟ Φ> a assume N > M
i=1 ⎜ . ⎟
⎝ ⎠
aN =
where Φ is the N × M design matrix
Mx1 MxN Nx1

Substitute w = Φ>a into

e w) = 1 (y − Φw)2 + λ kwk2
E(
2 2
1³ ´2 λ
= y − ΦΦ>a + a>ΦΦ>a
2 2
1 λ
= (y − Ka)2 + a>Ka
2 2
where K = ΦΦ> is the N × N Kernel Gram matrix with entries
k(xi, xj ) = Φ(xi)>Φ(xj ), and minimize E(
e a) w.r.t. a, to show
that

a = (K + λI)−1 y Exercise

Nx1 NxN Nx1

• dual version involves inverting a N x N matrix, cf inverting a M x M matrix in the primal

f (x, w) = w>Φ(x) = Φ(x)>w
= Φ(x)>Φ>a
h i
= Φ(x)> Φ(x1) Φ(x2) . . . Φ(xN ) a
³ ´
= Φ(x)>Φ(x1) Φ(x)>Φ(x2) . . . Φ(x)>Φ(xN ) a
³ ´
= k(x, x1) k(x, x2) . . . k(x, xN ) a
³ ´
Write k(x) = k(x, x1) k(x, x2) . . . k(x, xN ) >, then

f (x) = k(x)>a = k(x)> (K + λI)−1 y

• Again, see that output is a linear blend, b(x)>y, of the training

values {yi} where

b(x)> = k(x)> (K + λI)−1

• All the advantages of kernels: it is only necessary to provide

k(xi, xj ) rather than compute Φ(x) explicitly.

Example: 3D Human Pose from Silhouettes

Objective: Recover 3D human body pose from image silhouettes

• 3D pose = joint angles

Applications:
• motion capture, resynthesis
• human-computer interaction
• action recognition
• visual surveillance

Ankur Agarwal and Bill Triggs, ICML 2004

Silhouette descriptors to represent input

Use Shape Context Histograms – distributions of local

shape context responses

Regression for pose vector y

Predict 3D pose y given the shape context histogram x as

y = Ak(x)
where

• k(x) = (k(x, x1), k(x, x2), . . . , k(x, xN )) > is a vector of scalar ba-
sis functions.

• A = (a1, a2, . . . , aN ) is a matrix of dual vectors aj

PN
(Compare with scalar version y = i ai k(xi, x)).

Learn A from training data {xi, yi} by optimizing the cost function
N
X
min ||yi − Ak(xi)||2 + λtrace(A>A)
A
i
Training and test data

Video Motion capture Re-rendered Re-rendered

recordings data poses silhouettes

(y)

Results: Synthetic Spiral Walk Test Sequence

•Mean angular error per d.o.f. = 6.0o

•Instances of error due to ambiguities
Complete sequence from individual images

15% instances of error due to ambiguities

Ambiguities in pose reconstruction

Silhouette to pose problem is inherently multi-valued

Add tracking to disambiguate

Tracking results

Tracking a real motion sequence

1. Preprocessing

a) Background subtraction
b) Shadow removal for
silhouette extraction
Tracking a real motion sequence

2. Regression

obtain 3D body joint

angles as output

-can be used to render

synthetic models …

Background reading

• Bishop, chapters 3 & 6.1

• More on web page:

http://www.robots.ox.ac.uk/~az/lectures/ml

• e.g. Gaussian process regression

1 Addition
No ratings yet
1 Addition
4 pages
EKOR ORMAZABAL Manual
50% (2)
EKOR ORMAZABAL Manual
84 pages
MIS Lab Practice Exercises
No ratings yet
MIS Lab Practice Exercises
5 pages
The Occult Character of The United Nations
No ratings yet
The Occult Character of The United Nations
11 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Regression Interpolation
No ratings yet
Regression Interpolation
34 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
No ratings yet
Modeling Basics: Compartment Models Dimensional Analysis Stochastic Modeling
58 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
Wk05 machine learning
No ratings yet
Wk05 machine learning
6 pages
Practice 1130
No ratings yet
Practice 1130
20 pages
Notes Linearregression
No ratings yet
Notes Linearregression
4 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
Lec 03
No ratings yet
Lec 03
42 pages
Lec6 Linear Model With LSP
No ratings yet
Lec6 Linear Model With LSP
35 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture 2 - Linear Regression
No ratings yet
Lecture 2 - Linear Regression
54 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Notes5_Regression
No ratings yet
Notes5_Regression
14 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Homework2 - Tran Anh Vu
No ratings yet
Homework2 - Tran Anh Vu
3 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
eng
No ratings yet
eng
10 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
11_Học máy cơ bản_Hồi quy tuyến tính 1
No ratings yet
11_Học máy cơ bản_Hồi quy tuyến tính 1
105 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
ML_Lec 5_Regression_Gradient Descent Least Square
No ratings yet
ML_Lec 5_Regression_Gradient Descent Least Square
59 pages
Representer Function
No ratings yet
Representer Function
12 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
7 pages
slides_foundations
No ratings yet
slides_foundations
81 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Lec 9
No ratings yet
Lec 9
14 pages
SkriptOptMach
No ratings yet
SkriptOptMach
49 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution
No ratings yet
Dokumen - Tips - Homework 3 Solution Ee263 Introduction To Linear Ee263 Homework 3 Solution
27 pages
LinearRegression LectureNotesPublic PDF
No ratings yet
LinearRegression LectureNotesPublic PDF
7 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
No ratings yet
Chapter 1 Simple Linear Regression (Part 6: Matrix Version)
12 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
MATA2754 Least Square Fitting
No ratings yet
MATA2754 Least Square Fitting
19 pages
HW 1
No ratings yet
HW 1
4 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
50 % of Marks For SC/ST/OBC (Non Creamy Layer) /person With Disability 3. 55 % of Marks For All Other Candidates
No ratings yet
50 % of Marks For SC/ST/OBC (Non Creamy Layer) /person With Disability 3. 55 % of Marks For All Other Candidates
1 page
Human Disease Prediction Using Rule Based Expert System: R.Karthikeyan Assistant Professor SRM College Chennai
No ratings yet
Human Disease Prediction Using Rule Based Expert System: R.Karthikeyan Assistant Professor SRM College Chennai
15 pages
History of Crypto
No ratings yet
History of Crypto
1 page
Water Fall Model
No ratings yet
Water Fall Model
2 pages
SDLC
No ratings yet
SDLC
1 page
Formulate A Project Strategy
No ratings yet
Formulate A Project Strategy
1 page
2024上半年四级翻译课讲义 PDF打印版
No ratings yet
2024上半年四级翻译课讲义 PDF打印版
28 pages
Health Faucet Specs
No ratings yet
Health Faucet Specs
2 pages
Assignment No. 1 Name:M Uzair Usman Rana ROLL NO:CB434394 Course Code:5403 Q. 1 (A) Define The Term ICT. Describe It With The Help of Proper Examples
100% (1)
Assignment No. 1 Name:M Uzair Usman Rana ROLL NO:CB434394 Course Code:5403 Q. 1 (A) Define The Term ICT. Describe It With The Help of Proper Examples
15 pages
Mass Spectrometry Assignment-4
No ratings yet
Mass Spectrometry Assignment-4
2 pages
Serbian Prepositions
No ratings yet
Serbian Prepositions
3 pages
THE GRANGE CHRISTIAN SCHOOL MATHS GRADE 6 PAPER1
No ratings yet
THE GRANGE CHRISTIAN SCHOOL MATHS GRADE 6 PAPER1
2 pages
Viessmann, SCU224-Solar Control Unit For Multi-Load Solar Systems Installationand Operating Manual
No ratings yet
Viessmann, SCU224-Solar Control Unit For Multi-Load Solar Systems Installationand Operating Manual
24 pages
Ratu Fianita Priningrum - ABAP - Summarry Weekly
No ratings yet
Ratu Fianita Priningrum - ABAP - Summarry Weekly
20 pages
Real Time Data Get From Stock Exchange Using PHP
No ratings yet
Real Time Data Get From Stock Exchange Using PHP
6 pages
BIMESTRAL 11° 1ER CORTE 2023
No ratings yet
BIMESTRAL 11° 1ER CORTE 2023
2 pages
YANE MAGLOIRE MAKAYA 23 JAN
No ratings yet
YANE MAGLOIRE MAKAYA 23 JAN
1 page
Karcher b102 Plus Operation User S Manual 20
No ratings yet
Karcher b102 Plus Operation User S Manual 20
20 pages
MODULE 2 - History of Computer: Basic Computer Period
No ratings yet
MODULE 2 - History of Computer: Basic Computer Period
23 pages
Digital Transforamtion Report on Hulu
No ratings yet
Digital Transforamtion Report on Hulu
14 pages
Startup India Scheme Upsc Notes 97
No ratings yet
Startup India Scheme Upsc Notes 97
4 pages
Explaining International Relations Since 1945 PDF
0% (4)
Explaining International Relations Since 1945 PDF
2 pages
Check Point Admin Guide - Dynamic - CLI - Commands
No ratings yet
Check Point Admin Guide - Dynamic - CLI - Commands
2 pages
DSMP 2.0 FAQs
No ratings yet
DSMP 2.0 FAQs
6 pages
Items List
No ratings yet
Items List
1 page
Xii Ip Records
No ratings yet
Xii Ip Records
10 pages
Neutral Earthing
No ratings yet
Neutral Earthing
2 pages
Introduction and History of Pharmacovigilance
33% (3)
Introduction and History of Pharmacovigilance
37 pages
Visit To Special Needs - Disabled Students Main Campus 28-01-2020
No ratings yet
Visit To Special Needs - Disabled Students Main Campus 28-01-2020
10 pages
Principles of Working Capital Management - Lecture 3
No ratings yet
Principles of Working Capital Management - Lecture 3
40 pages
T3TAFJ-T24 and TAFJ Upgrade-R16
100% (1)
T3TAFJ-T24 and TAFJ Upgrade-R16
21 pages
Download Full Patent Management: Protecting Intellectual Property and Innovation 1st Edition Oliver Gassmann PDF All Chapters
No ratings yet
Download Full Patent Management: Protecting Intellectual Property and Innovation 1st Edition Oliver Gassmann PDF All Chapters
55 pages

Lect5 Reg

Uploaded by

Lect5 Reg

Uploaded by

Lecture 5: Regression

C4B Machine Learning Hilary 2011 A. Zisserman

• Dual representation and kernels

• Suppose we are given a training set of N observations

((x1, y1), . . . , (xN , yN )) with xi ∈ Rd, yi ∈ R

• The regression problem is to estimate f (x) from this data

• Asin the case of classification, learning a regressor can be

Minimize with respect to f ∈ F

• There is a choice of both loss functions and regularization

• Algorithms can be kernelized

Choice of regression function – non-linear basis functions

• Function for regression y(x, w) is a non-linear function of x, but

f (x, w) = w0 + w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w> Φ(x)

• For example, for x ∈ R, polynomial regression with φj (x) = xj :

e.g. for 3 points,

Least squares “ridge regression”

loss function regularization xi

• Regression function for x (1D):

f (x, w) = w0 + w1 φ1 (x) + w2 φ2 (x) + . . . + wM φM (x) = w > Φ(x)

• NB squared loss arises in Maximum Likelihood estimation for an error model

Notation: write the target and regressed values as N -vectors

Now, compute where derivative w.r.t. w is zero for minimum

Mx1 MxM MxN Nx1

• If λ = 0 (no regularization), then

w = (Φ>Φ )−1Φ>y = Φ+y

• Adding the term λI improves the conditioning of the inverse, since if Φ

• Often the regularization is applied only to the inhomogeneous part of w,

f (x, w) = w>Φ(x) = Φ(x)>w

(which is not a polynomial) 1

• The data points are samples from the

• There is a choice in both the degree, -1

Example 2: Gaussian basis functions

• The data points are samples from the

• Basis functions are centred on the -0.5

sigma, of the basis functions used, and -1.5

Choosing lambda using a validation set

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Summary and preview

Now we will consider the dual problem where

We will see that

• there is a closed form solution for a,

• the solution involves the N ×N Gram matrix k(xi, xj ) = Φ(xi)>Φ(xj ),

• so we can use the kernel trick again to replace scalar products

Substitute w = Φ>a into

Nx1 NxN Nx1

• dual version involves inverting a N x N matrix, cf inverting a M x M matrix in the primal

f (x) = k(x)>a = k(x)> (K + λI)−1 y

• Again, see that output is a linear blend, b(x)>y, of the training

b(x)> = k(x)> (K + λI)−1

• All the advantages of kernels: it is only necessary to provide

Example: 3D Human Pose from Silhouettes

Objective: Recover 3D human body pose from image silhouettes

Ankur Agarwal and Bill Triggs, ICML 2004

Use Shape Context Histograms – distributions of local

Regression for pose vector y

• A = (a1, a2, . . . , aN ) is a matrix of dual vectors aj

Video Motion capture Re-rendered Re-rendered

Results: Synthetic Spiral Walk Test Sequence

•Mean angular error per d.o.f. = 6.0o

15% instances of error due to ambiguities

Ambiguities in pose reconstruction

Silhouette to pose problem is inherently multi-valued

Add tracking to disambiguate

Tracking a real motion sequence

obtain 3D body joint

-can be used to render

• Bishop, chapters 3 & 6.1

• More on web page:

• e.g. Gaussian process regression

You might also like