0% found this document useful (0 votes)

11 views5 pages

TP2 Reg 2024

The document outlines a practical session focused on regularized regression models, specifically Ridge and Lasso regression, using R programming. It includes instructions for model selection, significance testing, and empirical analysis of datasets, along with guidelines for reporting results. Students are required to work in pairs, analyze specific datasets, and submit a report generated in R markdown format.

Uploaded by

traorehamed589

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views5 pages

TP2 Reg 2024

Uploaded by

traorehamed589

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Travaux Pratiques - Modèles de Régression régularisée

September 22th 2024

Goal of the practical session

• Model selection for linear models in R. Ridge and Lasso regression

Remarks
• The work has to be carried out by a team of 2 students and R studio is used to perform the practical sessions.
• A report should be written only for exercice IV , automatically generated using a R markdown file format
for ‘R studio’.
• The ‘R markdown file’ and the corresponding pdf file have to be uploaded before next practical session
on the ENSIIE project web site in the folder MRR2024TP2.

I. Tests of significativity and model selection

a) Analyze and study the following instructions. Specify the underlaying theoretical model.
n=100; X=cbind(((1:n)/n)ˆ3,((1:n)/n)ˆ4 ); Y=X%*%c(1,1)+rnorm(n)/4;
res=summary(lm(Y~X)); print(res); print(res$coef[2,4]);

Compare the results provided by a multiple regression model and the results computed independently using two
simple models. Conclusion.
reg1=lm(Y~X[,1]);print(summary(reg1));
reg2=lm(Y~X[,2]);print(summary(reg2));

b) Execute the previous instructions several times (2 or 3 times) and describe the behaviour of the estimators of
the coefficients. Compute the empirical correlation matrix. Instruction cor().
cor(X[,1],X[,2])

II Model selection in a linear regression framework

− ŷi )2 ;
P
Following table details several criteria used in model selection. We denote RSS = i (yi

Notation Definition
P Criteria Objective R Instruction
(ŷi −ȳ)2
R2 2
R = P(y −ȳ)2 R-squared - lm()
i

2
Radj 2
Radj =1− n−1
n−p (1 − R2 ) Adjusted R-squared 2
Max. Radj lm()
σ̂p2 σ̂p2 = RSS
n−p Non biased residual est. Min. σ̂p Fonction lm()
AIC ' nlog( RSS
n ) + 2p Aikaike Information (1971) Min. AIC extractAIC()
BIC ' RSS
nlog( n ) + log(n)p Bayesian Information C (1978) Min. BIC extractAIC(,k=log(n))
RSS(p)
Cp = σ2 − (n − 2p) Cp Mallows (1973) Min. Cp regsubsets()

The step() function is used to compare and select parcimonious models (models based on few variables). The
fonction starts from the global model and withdraw step by step one variable. The procedure stops when the
coefficient of the variable which should be removed is significative (α = 0.1 threshold).

1
Applications
Analyze the files “USCrimeinfo.txt” et “UsCrime.txt”. The target variable, Y , is stored in the first column.
• Upload the file in the R environment using tab=read.table(). What is the number of available observations?
Provide a scatterplot of all joint distributions. Conclusion.
• Compute the empirical correlation matrix. Conclusion. Use the corrplot() function of corrplot() library to
highlight potential linear relations between variables.

A. Multiple regression model.

The goal is now to study the opportunity to use a linear model to explain the target variable Y . Specify the model.
a) What can you briefly say on the results provided by a linear model on the USCrime data set using the function
reg=lm("R~.",data=tab) where Y denotes the target variable and X the explanatory variables (p = 14).
b) Does the linear model globally have a interest? Justify your answer.
c) What can you say about the significativity of the coefficients? Justify your answer.
d) Compute the Residual Sum of Square (RSS) in this case with p = 14 variables ?

B. Model selection.
The goal of this section is to find a sparse model based on a small subset of variables of size p0 to explain the target
variable Y . Prior writing any R instruction, read carefully the help of the R step() function.
a) Backward regression. Study and implement the following instruction
regbackward=step(reg,direction='backward'); summary(regbackward)

Comment the successive removed variables. What is the final model ? How may variables are selected ?
b) Forward regression.
regforward=step(lm(R~1,data=tab),list(upper=reg),direction='forward');
summary(regforward);

Comment the successive added variables. Compute the AIC criteria using the instruction AIC(). What is the final
model ? How may variables are selected ? Compare this model with the model computed with the Backward
regression method.
c) Stepwise regression:
regboth=step(reg,direction='both')
summary(regboth)

Comment the added and removed variables for the stepwise regression. Compare the selected models obtained with
all the previous selection methods.
d) Remarks. Use the formula(s0) function where s0 denotes the R output object computed with the step()
function. Note that the instruction reg0=lm(formula(s0),data=tab); let you use the computed selected
model for further applications and that summary(reg0) provides you detailed information on this model.

2
III RIDGE and LASSO penalized regression.
A. Simulated data. Illustration.
a) Execute and comments the results using the following instructions.
rm(list=ls()); n=10000; p=5;
X=matrix(rnorm(n*(p)),nrow=n,ncol=p); X=scale(X)*sqrt(n/(n-1));
beta=matrix(10*rev(1:p),nrow=p,ncol=1); print(beta)
epsi=rnorm(n,1/nˆ2); Y=X%*%beta +epsi;
Z=cbind(Y,data.frame(X)); Z=data.frame(Z);

b) Considering a linear model, provide an estimation of the coefficients using X and Y data with the help of the
lm() function. Conclusion.
Execute t(X)\%*\%Y/n and comment the result. The lars() function of R lars library can be used to implement
a LASSO regression as the glmnet() function of R glmnet library. Upload the library in your R environment and
read carefully the help of the function.
In this section, the goal is now to implement and study a linear model with a `1 penalization on the coefficients
using X and Y data.
c) Execute:
library(lars);
modlasso=lars(X,Y,type="lasso"); attributes(modlasso);

What do the fields modlasso$meanx and modlasso$normx store ? What are they for ?
d) Comment the following graphs:
par(mfrow=c(1,2));
plot(modlasso); plot(c(modlasso$lambda,0),pch=16,type="b",col="blue"); grid()

e) Execute and comment the results? Why is it possible to guess, before any computation, the results computed
with the LASSO in this situation? Justify carefully.
print(coef(modlasso));
coef=predict.lars(modlasso,X,type="coefficients",mode="lambda",s=2500);
coeflasso=coef$coefficients;
par(mfrow=c(1,1)); barplot(coeflasso,main='lasso, l=1',col='cyan');

B. Applications
The data studied in this section are indicators of development used for economy, demography, sociology in the
United states over a period of 15 years. Our goal, in this application, is to identify the indicators which best
explained the CO2 emissions observed in the athmosphere. For this purpose, the RIDGE and the lasso regression
are both used and study.
a) Describe the content of the files “usa_indicators_info.txt” and “usa_indicators.txt”.
b) Upload the data in the R environment using tab=read.table(). What are the number of observations and
the number of variables? Can you use, in this situation, a multiple linear model ? Justify your answer.
c) What is the variable used for the CO2 emission ? Plot the temporal evolution on this indicator on a graph.
d) As the data correspond to various indicators, the units may also be very different. Explain how it can be a
difficulty either for regular linear models or penalized linear models ? Scale the variables of the data set using
function scale(tab, center=FALSE).
e) Use the lm() function to estimate the parameters of a linear model. Conclusion.

3
RIDGE. Regression with `2 penalization.
a) Recall the definition of the Ridge regression.
The function lm.ridge of the MASS library is used to compute a Ridge regression. Upload the MASS library in
your R environment, and read carefully the help of the function lm.ridge().
b) Compute a Ridge regression for values of the penalization parameter equaled to λ = 0, λ = 100 without using
the Year variable in your model. Print the computed coefficients using the instruction coef(). Plot the five
largest coefficients. What do they represent? What are the differences between
coef(resridge) and resridge$coef instructions ?
c) Compute ridge regression models for different values of λ starting from 0 to 10 with an increment of 0.01
(λ = seq(0, 10, 0.01)). Plot the performances computed by cross-validation given the values of λ (field $GCV of
the ridge R object, GCV for Generalized Cross Validation). Plot the evolution of the values of the coefficients
given λ using the instruction plot(resridge). Conclusion. What model may you advice ? Print the
corresponding value for the regularization parameter λ. Print and store automatically the parameters of the
best model with the help of the functions which.min() and coef() #coefridge=...
d) Compute the mean quadratic error between the observed target and the estimated target Ŷridge using matrix
computation where X denotes the input matrix: Yridge=as.matrix(X)\%*\%as.vector(coefridge).

LASSO. Regression with `1 penalization

a) Compute a lasso regression using the instruction reslasso=lars(X,Y,type="lasso") where X denotes the
input matrix and Y the target matrix.
Execute both following instructions: plot(reslasso) and plot(reslasso$lambda). Comment.
b) Plot the values of the model coefficients for λ = 0 with the help of the instruction:
coef=predict.lars(reslasso,X,type="coefficients",mode="lambda",s=0). Conclusion.
c) Plot the values of the coefficients for λ = 100. Conclusion.
Compare these results with the results already obtained with the ridge regression. Conclusion.
d) Compute the mean quadratic error between the observed target and the estimated target (Ŷlasso ):
pY=predict.lars(reslasso,X,type="fit",mode="lambda",s=0.06).
e) How can you chose lambda ?

4
IV. Wind Turbine Modelling
As a data scientist, you are now asked to study the ProjWindTurbine.txt dataset. The aim of this study is to
propose a sparse linear models able to explain the power produced by some wind turbines (the target variable,
Y ) given some other variables as (1) the free stream velocity of some components (m/s), (FSV 1-2-3-4) (2) the
rotational speed of some components (RPM 1-2-3-4), (3) the current intensity of some components (mA), (CIN
1-2-3_4) (4) the power (mW).

A Preliminary
Study the following empirical joint distributions between the variables (POW,RPM1), (POW, CIN1), (RPM1,
CIN1).
What can you observe ?
Based on your previous observation, split the initial n = 3000 observations in 3 equaled parts called D1 , D2 , D3 by
chosing smart frontiers using only the 2 covariables RPM1 and CIN1. Each subsets contains 1000 observations
Propose a linear regression model to explain the power given the explanatory variables for each data set D1 , D2 , D3 .

B Model selection
Study the possibility to provide a sparse model using forward, backward, stepwise regression or ridge and lasso
regression. for each dataset (D1 or D2 or D3 ).
Conclusion.

C Regression model a categorical explanatory variable

Run and study the following instructions. Conclusion.
rm(list=ls());
turbinedata=read.table(file="ProjWindTurbine.txt",header=TRUE,sep=',')
numturbine=as.factor(c(rep(1,1000),rep(2,1000),rep(3,1000)));
mydata=cbind(turbinedata,numturbine)
modlm=lm("POW~.",data=mydata);
summary(modlm)
plot(modlm$fit,mydata$POW,type="p"); abline(a=0,b=1,col="red");

Bill of Quantities Sample 01
50% (4)
Bill of Quantities Sample 01
13 pages
Conspiracy To Erase A Nation Mini-Booklet
100% (1)
Conspiracy To Erase A Nation Mini-Booklet
21 pages
Brain Function
No ratings yet
Brain Function
68 pages
2014-2015 Ece Consoldate
No ratings yet
2014-2015 Ece Consoldate
579 pages
Ps04cmic02 - Environmental Biotechnology
No ratings yet
Ps04cmic02 - Environmental Biotechnology
1 page
Propeller Owner's Manual: and Logbook
No ratings yet
Propeller Owner's Manual: and Logbook
278 pages
Uppsc 2019 Question Paper e 50 PDF
No ratings yet
Uppsc 2019 Question Paper e 50 PDF
86 pages
Chapter 6 Water Safety - Recognition and Response
No ratings yet
Chapter 6 Water Safety - Recognition and Response
41 pages
2016 WMI Competition Grade 7 Part 1 Logical Reasoning Test
100% (1)
2016 WMI Competition Grade 7 Part 1 Logical Reasoning Test
4 pages
65406-Area of 2D Shapes + Surface Area of 3D Shapes
No ratings yet
65406-Area of 2D Shapes + Surface Area of 3D Shapes
49 pages
Pyrocrete 241 PDF
No ratings yet
Pyrocrete 241 PDF
4 pages
Policarpio 5 - Refresher SEC
100% (1)
Policarpio 5 - Refresher SEC
2 pages
Cinta Teflon Cafe PTFE 5151
No ratings yet
Cinta Teflon Cafe PTFE 5151
2 pages
Book of Quests
No ratings yet
Book of Quests
45 pages
Sterilfilter P SRF
No ratings yet
Sterilfilter P SRF
3 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
2089 340065 Multimeter Mastech Ms8240c
No ratings yet
2089 340065 Multimeter Mastech Ms8240c
11 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Superman
No ratings yet
Superman
2 pages
TSSA Field Approval Information July 29, 2011 PDF
No ratings yet
TSSA Field Approval Information July 29, 2011 PDF
5 pages
40rusa - 20 Ton Carrier
No ratings yet
40rusa - 20 Ton Carrier
13 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Test Your Knowledge of Linear Regression and PCA in R
No ratings yet
Test Your Knowledge of Linear Regression and PCA in R
7 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
Week 2
No ratings yet
Week 2
66 pages
Econ452: Problem Set 2: University of Michigan - Department of Economics
No ratings yet
Econ452: Problem Set 2: University of Michigan - Department of Economics
4 pages
STM 003 Sas Module #24
No ratings yet
STM 003 Sas Module #24
8 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Generation Control: Generation Control: An Integrated Secondary Controller For Generators in Industry Grids
No ratings yet
Generation Control: Generation Control: An Integrated Secondary Controller For Generators in Industry Grids
2 pages
DoE Regression Models 3jan19 v20
No ratings yet
DoE Regression Models 3jan19 v20
48 pages
DS - Tute 2
No ratings yet
DS - Tute 2
15 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
No ratings yet
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
11 pages
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
No ratings yet
Assignment Responsion 08 Linear Regression Line: By: Panji Indra Wadharta 03411640000037
11 pages
t45 Atlas Copco
No ratings yet
t45 Atlas Copco
4 pages
Activity 7
No ratings yet
Activity 7
5 pages
Simple Regression Model
No ratings yet
Simple Regression Model
55 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Proycto Final Karla Tamayo Bioestadistica - Ingles.
No ratings yet
Proycto Final Karla Tamayo Bioestadistica - Ingles.
5 pages
Module 3 - SimpleLinearRegression - Afterclass1b
No ratings yet
Module 3 - SimpleLinearRegression - Afterclass1b
26 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
A Study On Integrating Water Element in Preschool To Enhance Learning Environment For Children
No ratings yet
A Study On Integrating Water Element in Preschool To Enhance Learning Environment For Children
7 pages
Linear Regression
100% (2)
Linear Regression
228 pages
How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable
No ratings yet
How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable
6 pages
HW2 Solution
No ratings yet
HW2 Solution
7 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Breanna Lee Fall 2023
No ratings yet
Breanna Lee Fall 2023
1 page
Environmental Management Accounting System and Value Creation: An Institutional Perspective
No ratings yet
Environmental Management Accounting System and Value Creation: An Institutional Perspective
7 pages
REVIEW TEST Có Đáp Án
No ratings yet
REVIEW TEST Có Đáp Án
3 pages
Digestive System Class Prints
No ratings yet
Digestive System Class Prints
10 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Lab-5-1-Regression and Multiple Regression
100% (2)
Lab-5-1-Regression and Multiple Regression
8 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
SLC 500 Analog Io Scaling Example
No ratings yet
SLC 500 Analog Io Scaling Example
4 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Series 1
No ratings yet
Series 1
2 pages
Shanghai Jiaotong University Shanghai Advanced Institution of Finance
No ratings yet
Shanghai Jiaotong University Shanghai Advanced Institution of Finance
3 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
Time Series Practice P2
No ratings yet
Time Series Practice P2
4 pages
Block 5 ST3189
No ratings yet
Block 5 ST3189
6 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Stats Notes
No ratings yet
Stats Notes
4 pages
Research Hypotheses-2025
No ratings yet
Research Hypotheses-2025
54 pages
Lab 10 Forest Regression
No ratings yet
Lab 10 Forest Regression
5 pages
Analysis Course HW5
No ratings yet
Analysis Course HW5
7 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
Unit III
No ratings yet
Unit III
18 pages
Computer Networking Notes For Tech Placements
No ratings yet
Computer Networking Notes For Tech Placements
16 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Aml 3
No ratings yet
Aml 3
19 pages
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
No ratings yet
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
29 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Lin Reg
No ratings yet
Lin Reg
1 page
Exercice V
No ratings yet
Exercice V
5 pages
Lec 05 2 - Time Series Regression Model
No ratings yet
Lec 05 2 - Time Series Regression Model
75 pages
Linear Regression Analysis - 6
No ratings yet
Linear Regression Analysis - 6
29 pages
Topic 2
No ratings yet
Topic 2
23 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
Lec 05 - Time Series Regression Model
No ratings yet
Lec 05 - Time Series Regression Model
32 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Lec10 PSet
No ratings yet
Lec10 PSet
4 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Project A
No ratings yet
Project A
7 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages

TP2 Reg 2024

Uploaded by

TP2 Reg 2024

Uploaded by

Travaux Pratiques - Modèles de Régression régularisée

September 22th 2024

Goal of the practical session

I. Tests of significativity and model selection

II Model selection in a linear regression framework

A. Multiple regression model.

LASSO. Regression with `1 penalization

C Regression model a categorical explanatory variable

You might also like