0% found this document useful (0 votes)
8 views

P05 LinearRegression SolutionNotes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

P05 LinearRegression SolutionNotes

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Aprendizagem 2023

Lab 5: Linear and kernel regression

Practical exercises

1. Consider the following training data: y1 y2 output


𝐱1 1 1 1.4
a) Find the closed form solution for a linear regression 𝐱2 2 1 0.5
. minimizing the sum of squared errors 𝐱3 1 3 2
𝐱4 3 3 2.5
0.275
𝐰 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑍 = ( 0.02 )
0.645

b) Predict the target value for 𝐱 𝑛𝑒𝑤 = [2 3]𝑇


𝑜𝑢𝑡𝑝𝑢𝑡(𝐱) = 𝐰 · 𝐱 = 2.25

c) Sketch the predicted three-dimensional hyperplane

0.275 + 0.02𝑥1 + 0.645𝑥2

d) Compute the MSE and MAE produced by the linear regression

𝒛̂ = (0.94 0.96 2.23 2.27)


𝑀𝑆𝐸(𝒛, 𝒛̂) = 0.13225, 𝑀𝐴𝐸(𝒛, 𝒛̂) = 0.345

e) Are there biases on the residuals against y1? And y2?

𝑟𝑒𝑠𝑖𝑑𝑢𝑒𝑠 = 𝒛 − 𝒛̂ = (0.46 −0.46 −0.23 0.23)


0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 0 1 2 3 4 -0.2 0 1 2 3 4
-0.4 -0.4
-0.6 -0.6

There is not evidence in favor of biases, as residues appear to be randomly distributed against y1 and y2.
f) Compute the closed form solution considering Ridge regularization term with 𝜆 = 0.2.
0.24
𝐰 = (𝑋 𝑇 𝑋 + 𝜆 𝐼)−1 𝑋 𝑇 𝑍 = (0.05)
0.63

g) Compare the hyperplanes obtained using ordinary least squares and Ridge regression.

The norm of the Ridge vector describing the hyperplane has lower norm as expected.

h) Why is Lasso regression suggested for data spaces of higher dimensionality?


It provides an elegant way of regularizing predictors (sparse vector 𝐰). Zero entries correspond to
variables that do not affect the regression. In this way, it can be seen as an alternative to feature
selection, supporting the learning convergence and generalization ability of the regression model.

2. Consider the following training data output


y1 y2
where output is an ordinal variable 𝐱1 1 1 1
𝐱2 2 1 1
a) Find a linear regression using the closed form solution 1 3 0
𝐱3
𝐱4 3 3 0

1.5
𝐰 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑍 = ( 0 )
−0.5

b) Assuming the output threshold θ=0.5, use the regression to classify 𝐱 new = [2 2.5]𝑇

The input is classified as 0 since 𝑜𝑢𝑡𝑝𝑢𝑡(𝐱) = 𝐰 · 𝐱 = 0.25

3. Considering the following data to learn a model y1 y2 output


𝑧 = 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝜀, where 𝜀 ∼ 𝑁(0, 0.1) 𝐱1 3 -1 2
𝐱2 4 2 1
Compare: 𝐱3 2 2 1

a) 𝒘 = [𝑤1 𝑤2 ]𝑇 using the maximum likelihood approach

Maximum likelihood can be given by (proof on the slides): 𝐰 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑍


Solve exercise similarly as previous ones.

0.2 0
b) 𝒘 using the Bayesian approach, assuming 𝑝(𝒘) = 𝑁 (𝒘 | 𝒖 = [0 0], 𝝈 = [ ])
0 0.2
Maximum posterior is given by (proof on the slides): 𝐰 = (𝑋 𝑇 𝑋 + 𝜆 𝐼)−1 𝑋 𝑇 𝑍
𝜎𝒑𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓 𝟐 0.12
λ= 𝟐 = . Solve exercise similarly as 1. 𝑓).
𝜎𝒑𝒓𝒊𝒐𝒓 0.22
4. Identify a transformation to aid the linearly y1 y2 output
𝐱1 -0.95 0.62 0
modelling of the following data points. 𝐱2 0.63 0.31 0
𝐱3 -0.12 -0.21 1
Sketch the predicted surface. 𝐱4 -0.24 -0.5 0
𝐱5 0.07 -0.42 1
𝐱6 0.03 0.91 0
𝐱7 0.05 0.09 1
𝐱8 -0.83 0.22 0

Plotting the data points we see that the labels seem to change with the distance from the origin.
A way to capture is this is to perform a quadratic feature transform

𝜑(𝑥1 , 𝑥2 ) = (𝑥1 2 , 𝑥2 2 )

1 −0.952 0.622 0
1 0.632 0.312 0
1 −0.122 −0.212 1
Φ= 1 −0.242 −0.52 , 𝒛 = 0
1 0.072 −0.422 1
1 0.032 0.912 0
1 0.052 0.092 1
(1 −0.832 0.222 ) (0 )

0.817
𝐰 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝒛 = (−0.865)
−0.95
0.817 − 0.865𝑥1 − 0.95𝑥2

5. Consider logarithmic and quadratic transformations: input output


𝜑1 (𝑥) = 𝑙𝑜𝑔(𝑥), 𝜑2 (𝑥) = 𝑥 2 𝐱1 3 1.5
𝐱2 4 9.3
a) Plot both of the closed form regressions. 𝐱3 6 23.4
𝐱4 10 45.8
𝐱5 12 60.1
1 1.0986 1.5
1 1.3863 9.3
Φ1 = 1 1.7918 , 𝑍 = 23.4
1 2.3026 45.8
(1 2.4849) (60.1)
−47.02
𝐰𝟏 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑍 = ( )
41.395
−47.02 + 41.395 log(𝑥)

1 9 1.5
1 16 9.3
Φ2 = 1 36 , 𝑍 = 23.4
1 100 45.8
(1 144) (60.1)
2.7895
𝐰𝟐 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑍 = ( )
0.4136
2.7895 + 0.4136 × 𝑥 𝟐

b) Which one minimizes the sum of squared errors on the original training data

𝑀𝑆𝐸𝑙𝑜𝑔 = 9.7618, 𝑀𝑆𝐸𝑞𝑢𝑎𝑑𝑟𝑎𝑡𝑖𝑐 = 13.1273

6. Select the criteria promoting a smoother regression model:


• Applying Lasso and Ridge regularization to linear regression models True
• Increasing the depth of a decision tree regressor False
• Increasing the k of a kNN regressor True
• Parameterizing a kNN regressor with uniform weights instead of distance-based weights False

Programming quests
7. Consider the housing dataset available at https://web.ist.utl.pt/~rmch/dscience/data/housing.arff and
the Regression notebook available at the course’s webpage:

a) Compare the determination coefficient of the non-regularized, Lasso and Ridge linear regression

b) Compare the MAE and RMSE of linear, kNN and decision tree regressors on housing

You might also like