Lecture15 Regression
Lecture15 Regression
George Lan
230 1 600
506 2 1000
433 2 1100
109 1 500
…
150 1 ?
270 1.5 ?
The learning problem
Features:
Living area, distance to campus, # bedroom …
Denote as 𝑥 = 𝑥! , 𝑥" , … , 𝑥# $
rent
Target:
Rent
Living area Denoted as y
Training set:
𝑋 = 𝑥! , 𝑥 " , … 𝑥 %
rent
𝑦 = 𝑦! , 𝑦 " , … , 𝑦 % $
Location
Living area
Linear Regression Model
Assume 𝑦 is a linear function of 𝑥 (features) plus noise 𝜖
𝑦 = 𝜃! + 𝜃" 𝑥" + ⋯ + 𝜃# 𝑥# + 𝜖
𝑥 ← 1, 𝑥 $
Then 𝑦 = 𝜃 $ 𝑥 + 𝜖
4
Least mean square method
Given m data points, find 𝜃 that minimizes the mean square
error
(
1 )
-
𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛% 𝐿 𝜃 = 5 𝑦 − 𝜃 𝑥 & $ &
𝑚
&'"
(
𝜕𝐿 𝜃 2
= − 5 𝑦& − 𝜃$𝑥 & 𝑥 & = 0
𝜕𝜃 𝑚
&'"
( (
2 & &
2 & & !
⇔ − 5𝑦 𝑥 + 5𝑥 𝑥 𝜃 = 0
𝑚 𝑚
&'" &'"
5
Matrix version of the gradient
*+ % ) ( ) ( !
*%
= − ( ∑&'" 𝑦 & 𝑥 & & &
+ ( ∑&'" 𝑥 𝑥 𝜃 =0
Equivalent to
𝜕𝐿 𝜃 2 ! 2 !
=− 𝑥 … , 𝑥% !
𝑦 …,𝑦 % $
+ 𝑥 , … 𝑥 % 𝑥! , … 𝑥 % $
𝜃=0
𝜕𝜃 𝑚 𝑚
𝜕𝐿 𝜃 2 2
= − 𝑋𝑦 + 𝑋𝑋 ! 𝜃 = 0
𝜕𝜃 𝑚 𝑚
⇒ 𝜃+ = 𝑋𝑋 ! "#
𝑋𝑦
6
Alternative way of obtaining 𝜃!
The matrix inversion in 𝜃+ = 𝑋𝑋 ! "#
𝑋𝑦 can be very expensive to
compute
*+ % ) (
*%
= − ( ∑&'" 𝑦& − 𝜃$𝑥 & 𝑥 &
Gradient descent
(
𝛼 !
𝜃+ $%# ← 𝜃+ $ + . 𝑦 & − 𝜃+ $ 𝑥 & 𝑥 &
𝑚
&'#
+ $%# + + $! &
𝜃 ← 𝜃 + 𝛽$ 𝑦 − 𝜃 𝑥 𝑥 &
$ &
7
A recap:
Stochastic gradient update rule
%
𝜃, !"# ← 𝜃, ! + 𝛽 𝑦 $ − 𝜃, ! 𝑥 $ 𝑥 $
𝑦< − 𝑦 = 𝑋 $ 𝑋𝑋 $ ;" 𝑋 −𝐼 𝑦
𝑋 𝑦< − 𝑦 = 𝑋 𝑋 $ 𝑋𝑋 $ ;" 𝑋 −𝐼 𝑦 =0
(
1
1 )
log 𝐿 𝜃 = 𝑚 log − ) 5 𝑦& − 𝜃$𝑥 &
2𝜋𝜎 2𝜎 &'"
(
1 )
𝐿𝑀𝑆: 5 𝑦& − 𝜃$𝑥 &
𝑚
&
𝑦 = 𝜃! + 𝜃" 𝑥 + 𝜃) 𝑥 ) + ⋯ + 𝜃# 𝑥 # + 𝜖
y = 𝜃 $ 𝑥M
12
Least mean square method
Given 𝑚 data points, find 𝜃 that minimizes the mean square
error
(
1 )
-
𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛% 𝐿 𝜃 = 5 𝑦 − 𝜃 𝑥M & $ &
𝑚
&'"
(
𝜕𝐿 𝜃 2
= − 5 𝑦 & − 𝜃 $ 𝑥M & 𝑥M & = 0
𝜕𝜃 𝑚
&'"
( (
2 & &
2 & & $
⇔ − 5 𝑦 𝑥M + 5 𝑥M 𝑥M 𝜃 = 0
𝑚 𝑚
&'" &'"
13
Matrix version of the gradient
$
Define 𝑋0 = 𝑥1 (!) , 𝑥1 (") , … 𝑥1 (%) , 𝑦 = 𝑦 (!) , 𝑦 (") , … , 𝑦 (%) , gradient
becomes
𝜕𝐿 𝜃 2 2
= − 𝑋𝑦O + 𝑋O 𝑋O $ 𝜃 = 0
𝜕𝜃 𝑚 𝑚
;"
- O
⇒ 𝜃 = 𝑋𝑋 O $ O
𝑋𝑦
Note that 𝑥M = 1, 𝑥, 𝑥 ) , … , 𝑥 # $
14
Example: head acceleration in accident
15