0% found this document useful (0 votes)
7 views

Lecture15 Regression

Uploaded by

yitongwu766
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture15 Regression

Uploaded by

yitongwu766
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Regression

George Lan

A. Russell Chandler III Chair Professor


H. Milton Stewart School of Industrial & Systems
Engineering
Machine learning for apartment hunting
Suppose you are to move to Atlanta
And you want to find the most
reasonably priced apartment satisfying
your needs:
square-ft., # of bedroom, distance to campus …

Living area (ft2) # bedroom Rent ($)

230 1 600
506 2 1000
433 2 1100
109 1 500

150 1 ?
270 1.5 ?
The learning problem
Features:
Living area, distance to campus, # bedroom …
Denote as 𝑥 = 𝑥! , 𝑥" , … , 𝑥# $
rent

Target:
Rent
Living area Denoted as y

Training set:
𝑋 = 𝑥! , 𝑥 " , … 𝑥 %
rent

𝑦 = 𝑦! , 𝑦 " , … , 𝑦 % $

Location

Living area
Linear Regression Model
Assume 𝑦 is a linear function of 𝑥 (features) plus noise 𝜖

𝑦 = 𝜃! + 𝜃" 𝑥" + ⋯ + 𝜃# 𝑥# + 𝜖

where 𝜖 is an error term of unmodeled effects or random noise

Let 𝜃 = 𝜃! , 𝜃" , … , 𝜃# $, and augment data by one dimension

𝑥 ← 1, 𝑥 $

Then 𝑦 = 𝜃 $ 𝑥 + 𝜖

4
Least mean square method
Given m data points, find 𝜃 that minimizes the mean square
error
(
1 )
-
𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛% 𝐿 𝜃 = 5 𝑦 − 𝜃 𝑥 & $ &
𝑚
&'"

Our usual trick: set gradient to 0 and find parameter

(
𝜕𝐿 𝜃 2
= − 5 𝑦& − 𝜃$𝑥 & 𝑥 & = 0
𝜕𝜃 𝑚
&'"
( (
2 & &
2 & & !
⇔ − 5𝑦 𝑥 + 5𝑥 𝑥 𝜃 = 0
𝑚 𝑚
&'" &'"

5
Matrix version of the gradient
*+ % ) ( ) ( !
*%
= − ( ∑&'" 𝑦 & 𝑥 & & &
+ ( ∑&'" 𝑥 𝑥 𝜃 =0

Equivalent to

𝜕𝐿 𝜃 2 ! 2 !
=− 𝑥 … , 𝑥% !
𝑦 …,𝑦 % $
+ 𝑥 , … 𝑥 % 𝑥! , … 𝑥 % $
𝜃=0
𝜕𝜃 𝑚 𝑚

Define 𝑋 = 𝑥 ! , 𝑥 " , … 𝑥 % , 𝑦 = 𝑦 ! , 𝑦 " , … , 𝑦 % $ , gradient becomes

𝜕𝐿 𝜃 2 2
= − 𝑋𝑦 + 𝑋𝑋 ! 𝜃 = 0
𝜕𝜃 𝑚 𝑚

⇒ 𝜃+ = 𝑋𝑋 ! "#
𝑋𝑦
6
Alternative way of obtaining 𝜃!
The matrix inversion in 𝜃+ = 𝑋𝑋 ! "#
𝑋𝑦 can be very expensive to
compute

*+ % ) (
*%
= − ( ∑&'" 𝑦& − 𝜃$𝑥 & 𝑥 &

Gradient descent
(
𝛼 !
𝜃+ $%# ← 𝜃+ $ + . 𝑦 & − 𝜃+ $ 𝑥 & 𝑥 &
𝑚
&'#

Stochastic gradient descent (use one data point at a time)

+ $%# + + $! &
𝜃 ← 𝜃 + 𝛽$ 𝑦 − 𝜃 𝑥 𝑥 &
$ &

7
A recap:
Stochastic gradient update rule
%
𝜃, !"# ← 𝜃, ! + 𝛽 𝑦 $ − 𝜃, ! 𝑥 $ 𝑥 $

Pros: on-line, low per-step cost


Cons: coordinate, maybe slow-converging
Gradient descent
'
𝛼 %
𝜃, !"# ← 𝜃, ! + 6 𝑦 $ − 𝜃, ! 𝑥 $ 𝑥 $
𝑚
$&#

Pros: fast-converging, easy to implement


Cons: need to read all data
Solve normal equations
(𝑋𝑋 % )𝜃, = 𝑋𝑦
Pros: a single-shot algorithm! Easiest to implement.
Cons: need to compute inverse 𝑋𝑇𝑋 "# , expensive, numerical
issues (e.g., matrix is singular ..)
Geometric Interpretation of LMS
The predictions on the training data are:
𝑦< = 𝑋 $ 𝜃 = 𝑋 $ 𝑋𝑋 $ ;" 𝑋𝑦
Look at residue 𝑦< − 𝑦

𝑦< − 𝑦 = 𝑋 $ 𝑋𝑋 $ ;" 𝑋 −𝐼 𝑦

𝑋 𝑦< − 𝑦 = 𝑋 𝑋 $ 𝑋𝑋 $ ;" 𝑋 −𝐼 𝑦 =0

𝑦< is the orthogonal projection of 𝑦 into the


space spanned by the columns of 𝑋
Probabilistic Interpretation of LMS
Assume 𝑦 is a linear in 𝑥 plus noise 𝜖
𝑦 = 𝜃$𝑥 + 𝜖

Assume 𝜖 follows a Gaussian N(0,σ)


)
1 𝑦& $
−𝜃 𝑥 &
𝑝 𝑦& 𝑥& ; 𝜃 = exp −
2𝜋𝜎 2𝜎 )

By independence assumption, likelihood is


𝐿 𝜃
( ( ( & − 𝜃$𝑥 & )
1 ∑ &'" 𝑦
= F 𝑝 𝑦& 𝑥& ; 𝜃 = exp −
2𝜋𝜎 2𝜎 )
&'"
Probabilistic Interpretation of LMS, cont.
Hence the log-likelihood is:

(
1
1 )
log 𝐿 𝜃 = 𝑚 log − ) 5 𝑦& − 𝜃$𝑥 &
2𝜋𝜎 2𝜎 &'"

Do you recognize the last term?

(
1 )
𝐿𝑀𝑆: 5 𝑦& − 𝜃$𝑥 &
𝑚
&

Thus under independence assumption and Gaussian noise


assumption, LMS is equivalent to MLE of 𝜃 !
Nonlinear regression

Want to fit a polynomial regression model

𝑦 = 𝜃! + 𝜃" 𝑥 + 𝜃) 𝑥 ) + ⋯ + 𝜃# 𝑥 # + 𝜖

Let 𝑥M = 1, 𝑥, 𝑥 ) , … , 𝑥 # $ and 𝜃 = 𝜃! , 𝜃" , 𝜃) , … , 𝜃# $

y = 𝜃 $ 𝑥M
12
Least mean square method
Given 𝑚 data points, find 𝜃 that minimizes the mean square
error
(
1 )
-
𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛% 𝐿 𝜃 = 5 𝑦 − 𝜃 𝑥M & $ &
𝑚
&'"

Our usual trick: set gradient to 0 and find parameter

(
𝜕𝐿 𝜃 2
= − 5 𝑦 & − 𝜃 $ 𝑥M & 𝑥M & = 0
𝜕𝜃 𝑚
&'"
( (
2 & &
2 & & $
⇔ − 5 𝑦 𝑥M + 5 𝑥M 𝑥M 𝜃 = 0
𝑚 𝑚
&'" &'"

13
Matrix version of the gradient
$
Define 𝑋0 = 𝑥1 (!) , 𝑥1 (") , … 𝑥1 (%) , 𝑦 = 𝑦 (!) , 𝑦 (") , … , 𝑦 (%) , gradient
becomes

𝜕𝐿 𝜃 2 2
= − 𝑋𝑦O + 𝑋O 𝑋O $ 𝜃 = 0
𝜕𝜃 𝑚 𝑚
;"
- O
⇒ 𝜃 = 𝑋𝑋 O $ O
𝑋𝑦

Note that 𝑥M = 1, 𝑥, 𝑥 ) , … , 𝑥 # $

If we choose a different maximal degree 𝑛 for the polynomial,


the solution will be different.

14
Example: head acceleration in accident

15

You might also like