Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
Learning
Lecture 6
Semester 2
2021/2022
Acknowledgement:
EE2211 development team
(Thomas, Kar-Ann, Chen Khong, Helen, Robby and Haizhou
2
© Copyright EE, NUS. All Rights Reserved.
Ridge Regression & Polynomial
Regression
Module II Contents
• Notations, Vectors, Matrices
• Operations on Vectors and Matrices
• Systems of Linear Equations
• Functions, Derivative and Gradient
• Least Squares, Linear Regression
• Linear Regression with Multiple Outputs
• Linear Regression for Classification
• Ridge Regression
• Polynomial Regression
3
© Copyright EE, NUS. All Rights Reserved.
Review: Linear Regression
Learning of Scalar Function (Single Output)
For one sample: a linear model -% . = . & ( scalar function
For m samples: /% ! = !( = +
.!& (
+= ⁞ where . +& = [1, $+,!, … , $+,# ]
&(
.$
1 $!,! … $!,# ) ,!
!= ⁞ ⁞ ⋱ ⁞ ( = *! += ⁞
1 $$,! … $$,# ⁞ ,$
* #
$
Objective:7+,!(-% . + − y+ )- = :& : = (!( − +)& (!( − +)
Learning/training when ! & ! is invertible
0 = (! & !)'!! & + Leftinverse
Least square solution: (
Prediction/testing: 3()* = /4 % ! ()* = ! ()* ( 0
4
© Copyright EE, NUS. All Rights Reserved.
Review: Linear Regression
Learning of Vectored Function (Multiple Outputs)
<% ! = !; = =
*.,! … *.,/
Sample 1 .!& 1 $!,! … $!,#
*!,! … *!,/
⁞ = ⁞ ;= ⁞ ⁞ ⋱ ⁞
& 1 $$,! … $$,# ⁞ ⋱ ⁞
Sample m .$ *#,! … *#,/
Sample 1’s output ,!,! … ,!,/
⁞ = ⁞
Sample m’s output ,$,! … ,$,/ Least Squares Regression
• Note:
• when (+ is continuous valued à a regression problem
• when (+ is discrete valued àa classification problem
• Linear model: /%,4 " = " & 0 + 1 or in compact form /% " = " & 0
(having the offset term absorbed into the inner product)
Ref: [Book4] Stephen Boyd and Lieven Vandenberghe, “Introduction to Applied Linear Algebra”, Cambridge University Press, 2018 (chp.14)
6
© Copyright EE, NUS. All Rights Reserved.
Linear Regression (for classification)
i
Linear Methods for Classification
Binary Classification:
If 2 & 2 is invertible, then
Learning: 3 = (2 & 2)'! 2 & 6,
0 (+ ∈ −1, +1 , # = 1, … , !
Prediction: /7%5 " ()* = sign(" ()*
& 0) &
3 for each row " ()* of 2 ()*
sign(&)
+1
0
&
-1
Ref: [Book4] Stephen Boyd and Lieven Vandenberghe, “Introduction to Applied Linear Algebra”, Cambridge University Press, 2018 (chp.14)
7
© Copyright EE, NUS. All Rights Reserved.
Linear Regression (for classification)
Example 1 Training set {.6 , y6 }8
67$ {1 = −9} → { 4 = −1 }
) ( * {1 = −7} → { 4 = −1 }
1 = −5 → { 4 = −1 }
Bias 1 −9 −1
{ 1 = 1} → 4 = +1
1 −7 −1 { 1 = 5} → { 4 = +1 }
?. −1
1 −5 = { 1 = 9} → { 4 = +1 }
?! 1
1 1
1 5 1
1 9 1 over determinedsystem
Prediction:
::;<=
Test set {. = −2} → {: = ? }
(()* = /7%5 " ()* = sign " ()* 0
3
Bias
0.1406
{7 = −9} → { = = −1 } −2
= sign( 1 − 2 )
{7 = −7} → { = = −1 } 0.1406
7 = −5 → { = = −1 } .:;< = sign(− 0.1406) = −1
Python
Linear Regression for one-dimensional classification
demo 1
9
© Copyright EE, NUS. All Rights Reserved.
Linear Regression (for classification)
Linear Methods for Classification
I
Multi-Category Classification:
10
© Copyright EE, NUS. All Rights Reserved.
Linear Regression (for classification)
Example 2 Three class classification
{1! = 1, 1- = 1} → { 4! = 1, 4- = 0, 4C = 0} Class 1
Training set
{1! = −1, 1- = 1} → { 4! = 0, 4- = 1, 4C = 0} Class 2
{@ 6 , *6 }8
67$ {1! = 1, 1- = 3} → { 4! = 1, 4- = 0, 4C = 0} Class 1
{1! = 1, 1- = 0} → { 4! = 0, 4- = 0, 4C = 1} Class 3
) = >
Bias 1 1 1 ? ?!,- ?!,C 1 0 0
!,!
1 −1 1 ?-,! ?-,- ?-,C = 0 1 0
1 1 3 ?C,! ?C,- ?C,C 1 0 0
1 1 0 0 0 1
This set of linear equations has NO exact solution. Least square
onehot encoder
? = ) > = () )) ) >
= ! " #$ " &
2 2 is invertible approximation
4 2 5 '!
1 1 1 1 1 0 0
0 0.5 0.5
= 2 4 3 0 1 0 = 0.2857 − 0.5 0.2143
1 −1 1 1 1 0 0
5 3 11 1 1 3 0 0.2857 0 − 0.2857
0 0 1
11
© Copyright EE, NUS. All Rights Reserved.
fromskiearn preprocessing inportone
UKEncoder
Linear Regression (for classification)
Example 2 Prediction iii
Yerone
Ii militia
ht fit
ease
Test set ) :;<
hot one
{1! = 6, 1- = 8} → {CDEFF 1, 2, GH 3? }
encoder
ly
transform class
print Etr men
{1! = 0, 1- = −1} → {CDEFF 1, 2, GH 3? }
W invIXT x x T Yerone
hot
0 0.5 0.5 g test Xtest W
J = 2 ()* K 1 6 8
D L= 0.2857 − 0.5 0.2143
1 0 −1
0.2857 0 − 0.2857
Category prediction:
If I ifinymax x else0for
y x forx inytestda
print ytestclass
NM 5% 2 ()* = arg maxD,!,…,F (DJ(: , T))
= arg maxD,!,…,F (
4 − 2.50 − 0.50
)
−0.2587 0.50 0.7857 ytest class upargmaxlytest
I
1 Class 1 axis l
=
3 Class 3 For each row of Y, the column position of the largest number
(across all columns for that row) determines the class label.
Python
E.g. in the first row, the maximum number is 4 which is in
demo 2 column 1. Therefore, the resulting predicted class is 1.
12
© Copyright EE, NUS. All Rights Reserved.
Ridge Regression reduce
refitting
Solution:
N
(()( − *)" )( − * + λ( " () = I
NJ
⇒ 2) " )( − 2) " * + 2λ( = I
⇒ ) " )( + λ( = ) " *
⇒ () " ) + λK)( = ) " *
where I is the dxd identity matrix
Here on, we shall focus on single column of output 6 in derivations in
the sequel a
Learning: + = () " ) +λK)#$ ) " *
(
Fem
Ref: Hastie, Tibshirani, Friedman, “The Elements of Statistical Learning”, (2nd ed., 12th printing) 2017 (chp.3)
14
© Copyright EE, NUS. All Rights Reserved.
Ridge Regression
ayyy
Ridge Regression in Primal Form (when m > d)
Ref: Hastie, Tibshirani, Friedman, “The Elements of Statistical Learning”, (2nd ed., 12th printing) 2017 (chp.3)
15
© Copyright EE, NUS. All Rights Reserved.
Ridge Regression
underdetermined
Ridge Regression in Dual Form (when m < d)
16
© Copyright EE, NUS. All Rights Reserved.
Polynomial Regression
Motivation: nonlinear decision surface
• Based on the sum of products of the variables
• E.g. when the input dimension is d=2, X X2
a polynomial function of degree = 2 is:
K K
CJ @ = GO + G$.$+ GK.K + G$K .$.K + G$$ .O
$ + GKK .O
Fsa0
K.
number.it
I polynomialof
owaytoseparateReddBlue
XOR problem
I 41 Jetfindaplanetoseparatethepoi
1
11 11
CJ @ = .$.K
I I
17
© Copyright EE, NUS. All Rights Reserved.
Polynomial Regression
Polynomial Expansion
• The linear model CJ @ = @ " ( can be written as
"
CJ @ = @ ( of
number
features order
= ∑M . G , . =1 Ccday
67O 6 6 O
= GO + ∑M67$ .6 G6 . r n 1
Notes:
• For high dimensional input features (large d value) and high polynomial order, the
number of polynomial terms becomes explosive! (i.e., grows exponentially)
• For high dimensional problems, polynomials of order larger than 3 is seldom used.
Ref: Duda, Hart, and Stork, “Pattern Classification”, 2001 (Chp.5) online
19
© Copyright EE, NUS. All Rights Reserved.
Polynomial Regression
Generalized Linear Discriminant Function
mm
?% P = Q. + ∑#+,! Q+ 1+ + ∑#+,! ∑#B,! Q+B 1+ 1B + ∑#+,! ∑#B,! ∑#D,! Q+BD 1+ 1B 1D + ⋯
Z!& J P is a polynomialexpansionof X ?.
= ⁞ thefunctionof X isthelinearfunctionof P ?!
&
Z$ J ⁞
?#
where \&G 0 = [1, ]G,! , … , ]G,# , … , ]G,+ ]G,B , … , ]G,+ ]G,B ]G,D , … ] ⁞
?+B
⁞
?+BD
D = 1, … , ]; d denotes the dimension of input features; m
denotes the number of samples ⁞
Ref: Duda, Hart, and Stork, “Pattern Classification”, 2001 (Chp.5)
20
© Copyright EE, NUS. All Rights Reserved.
d d IX ex
Jj
Example 3 {1! = 0, 1- = 0} → {4 = −1}
{1! = 1, 1- = 1} → {4 = −1}
Training set {@ 6 , *6 }8
67$ {1! = 1, 1- = 0} → {4 = +1}
{1! = 0, 1- = 1} → {4 = +1}
2nd order polynomial model
th
CJ @ = GO + G$.$+ GK.K + G$K .$.K + G$$ .$K+ GKK .KK
GO
mum
Gt
G$ ysuparrayE l 1.1.11
GK
= [1 .$ .K .$.K .$K .KK] G$K iii
iii
poly Polynomialfeaturesorder
P polyfittransformX
G$$ checkrankD drank
Ply
Stack the 4 training samples as a matrix GKK
underdetermined
system J Pmatrix
P
K K uniqueconstrained
1 .$,$ .$,K .$,$.$,K .$,$ .$,K solution
1 0 0 0 0 0
K K
1 .K,$ .K,K .K,$.K,K .K,$ .K,K 1 1 1 1 1 1
O= K K =
1 ._,$ ._,K ._,$._,K ._,$ ._,K 1 1 0 0 1 0
1 .`,$ .`,K .`,$.`,K K
.`,$ K
.`,K 1 0 1 0 0 1
Pmatrix
21
© Copyright EE, NUS. All Rights Reserved.
Polynomial Regression
Summary
sizeof Pmatrix
Ky thismeansconvert
Ridge Regression in Dual Form (m < d)
sizeof Pmatrix
n
23
© Copyright EE, NUS. All Rights Reserved.
{1! = 0, 1- = 0} → {4 = −1}
Example 3 (cont’d) {1! = 1, 1- = 1} → {4 = −1}
Training set {1! = 1, 1- = 0} → {4 = +1}
{1! = 0, 1- = 1} → {4 = +1}
2nd order polynomial model
K K d m
1 .$,$ .$,K .$,$.$,K .$,$ .$,K
1 0 0 0 0 0
K K
1 .K,$ .K,K .K,$.K,K .K,$ .K,K 1 1 1 1 1 1
O= K K =
1 ._,$ ._,K ._,$._,K ._,$ ._,K 1 1 0 0 1 0
1 .`,$ .`,K .`,$.`,K K
.`,$ K
.`,K 1 0 1 0 0 1
withdidgeRegression
25
© Copyright EE, NUS. All Rights Reserved.
Summary
• Notations, Vectors, Matrices
• Operations on Vectors and Matrices
• Systems of Linear Equations N% 2 = 20 = 6
• Functions, Derivative and Gradient
• Least Squares, Linear Regression with Single and Multiple Outputs
• Learning of vectored function, binary and multi-category classification
• Ridge Regression: penalty term, primal and dual forms
• Polynomial Regression: nonlinear decision boundary