0% found this document useful (0 votes)

3K views

PCR and Pls Regression

This document summarizes the use of principal component regression (PCR) and partial least squares regression (PLS) on baseball salary data. PCR and PLS are applied to select important variables and reduce dimensionality. 10-fold cross-validation is used to evaluate the models. For both PCR and PLS, the cross-validated RMSE decreases as more components are added, leveling off around 7 components. PLS performs slightly better than PCR with a lower RMSE.

Uploaded by

api-285777244

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3K views

PCR and Pls Regression

Uploaded by

api-285777244

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

PCR and PLS Regression

YIK LUN, KEI

[email protected]
This paper is a lab from the book called An Introduction to Statistical Learning
with Applications in R. All R codes and comments below are belonged to the
book and authors.

Principal Components Regression

library(ISLR)
library(pls)
##
## Attaching package: 'pls'
##
## The following object is masked from 'package:stats':
##
##
loadings
set.seed(2)
Hitters<-na.omit(Hitters)
x=model.matrix (Salary~.,Hitters )[,-1]
y=Hitters$Salary
pcr.fit=pcr(Salary~.,data=Hitters,scale=TRUE,validation ="CV")
summary(pcr.fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Data:
X dimension: 263 19
Y dimension: 263 1
Fit method: svdpc
Number of components considered: 19
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV
452
348.9
352.2
353.5
adjCV
452
348.7
351.8
352.9
7 comps 8 comps 9 comps 10 comps 11
CV
349.6
350.9
352.9
353.8
adjCV
348.5
349.8
351.6
352.3
14 comps 15 comps 16 comps 17 comps
CV
355.2
357.4
347.6
350.1
adjCV
352.8
355.2
345.5
347.6
TRAINING:
1
X
Salary

% variance explained
comps 2 comps 3 comps
38.31
60.16
70.84
40.63
41.58
42.17

4 comps
79.03
43.22
1

4 comps 5 comps 6
352.8
350.1
352.1
349.3
comps 12 comps 13
355.0
356.2
353.4
354.5
18 comps 19 comps
349.2
352.6
346.7
349.8

5 comps
84.29
44.90

6 comps
88.63
46.48

comps
349.1
348.0
comps
363.5
361.6

7 comps
92.26
46.69

##
##
##
##
##
##

X
Salary
X
Salary

8 comps 9 comps 10 comps 11 comps 12 comps 13 comps

94.96
96.28
97.26
97.98
98.65
99.15
46.75
46.86
47.76
47.82
47.85
48.10
15 comps 16 comps 17 comps 18 comps 19 comps
99.75
99.89
99.97
99.99
100.00
50.55
53.01
53.85
54.61
54.61

14 comps
99.47
50.40

validationplot(pcr.fit ,val.type="MSEP")

160000
120000

MSEP

200000

Salary

number of components

set.seed(1)
train=sample(1: nrow(Hitters), nrow(Hitters)/2)
test=(-train)
y.test=y[test]
pcr.fit=pcr(Salary~.,data=Hitters,subset =train,scale =TRUE,validation ="CV")
summary(pcr.fit)
##
##
##
##
##
##
##
##
##

Data:
X dimension: 131 19
Y dimension: 131 1
Fit method: svdpc
Number of components considered: 19
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV
464.6
396.2
395.5
394.0
2

4 comps
393.8

5 comps
393.0

6 comps
384.4

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

adjCV
CV
adjCV
CV
adjCV

464.6
395.8
394.8
7 comps 8 comps 9 comps 10
381.3
385.5
387.4
380.0
383.9
385.6
14 comps 15 comps 16 comps
406.7
409.3
407.8
403.4
405.8
404.2

393.3
comps 11
401.2
398.7
17 comps
402.5
398.4

392.9
comps 12
403.5
400.8
18 comps
398.6
394.5

392.5
comps 13
409.6
406.6
19 comps
403.8
399.4

TRAINING: % variance explained

1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
X
38.89
60.25
70.85
79.06
84.01
88.51
92.61
Salary
28.44
31.33
32.53
33.69
36.64
40.28
40.41
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
X
95.20
96.78
97.63
98.27
98.89
99.27
99.56
Salary
41.07
41.25
41.27
41.41
41.44
43.20
44.24
15 comps 16 comps 17 comps 18 comps 19 comps
X
99.78
99.91
99.97
100.00
100.00
Salary
44.30
45.50
49.66
51.13
51.18

validationplot(pcr.fit ,val.type="MSEP")

150000

180000

210000

Salary

MSEP

381.5
comps
405.6
402.4

10
number of components

pcr.pred=predict (pcr.fit ,x[test ,], ncomp =7)

mean((pcr.pred-y.test)^2)
## [1] 96556.22
3

pcr.fit=pcr(y~x,scale =TRUE ,ncomp =7)

summary(pcr.fit)
##
##
##
##
##
##
##
##

Data:
X dimension: 263 19
Y dimension: 263 1
Fit method: svdpc
Number of components considered: 7
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps
X
38.31
60.16
70.84
79.03
y
40.63
41.58
42.17
43.22

5 comps
84.29
44.90

6 comps
88.63
46.48

7 comps
92.26
46.69

Partial Least Squares Regression

set.seed(1)
pls.fit=plsr(Salary~.,data=Hitters,subset =train,scale=TRUE,validation ="CV")
summary(pls.fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Data:
X dimension: 131 19
Y dimension: 131 1
Fit method: kernelpls
Number of components considered: 19
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV
464.6
394.2
391.5
393.1
adjCV
464.6
393.4
390.2
391.1
7 comps 8 comps 9 comps 10 comps 11
CV
424.5
415.8
404.6
407.1
adjCV
418.9
411.4
400.7
402.2
14 comps 15 comps 16 comps 17 comps
CV
406.2
408.6
410.5
408.8
adjCV
401.8
403.9
405.6
404.1

4 comps 5 comps 6
395.0
415.0
392.9
411.5
comps 12 comps 13
412.0
414.4
407.2
409.3
18 comps 19 comps
407.8
410.2
403.2
405.5

comps
424.0
418.8
comps
410.3
405.6

TRAINING: % variance explained

1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
X
38.12
53.46
66.05
74.49
79.33
84.56
87.09
Salary
33.58
38.96
41.57
42.43
44.04
45.59
47.05
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
X
90.74
92.55
93.94
97.23
97.88
98.35
98.85
Salary
47.53
48.42
49.68
50.04
50.54
50.78
50.92
15 comps 16 comps 17 comps 18 comps 19 comps
X
99.11
99.43
99.78
99.99
100.00
Salary
51.04
51.11
51.15
51.16
51.18

validationplot(pls.fit ,val.type="MSEP")

190000
150000

170000

MSEP

210000

Salary

number of components

pls.pred=predict (pls.fit ,x[test ,], ncomp =2)

mean((pls.pred -y.test)^2)
## [1] 101417.5
pls.fit=plsr(Salary~., data=Hitters ,scale=TRUE ,ncomp =2)
summary(pls.fit)
##
##
##
##
##
##
##
##

Data:
X dimension: 263 19
Y dimension: 263 1
Fit method: kernelpls
Number of components considered: 2
TRAINING: % variance explained
1 comps 2 comps
X
38.08
51.03
Salary
43.05
46.40

Reference:
James, Gareth, et al. An introduction to statistical learning. New
York: springer, 2013.
5

2018 Biosatics MCQ
100% (4)
2018 Biosatics MCQ
33 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Support Vector Machine With Multiple Classes
100% (1)
Support Vector Machine With Multiple Classes
5 pages
Cappstone
No ratings yet
Cappstone
2 pages
WEEK
No ratings yet
WEEK
17 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
PLS Tutorial PDF
No ratings yet
PLS Tutorial PDF
12 pages
Partial Least Squares A Tutorial
No ratings yet
Partial Least Squares A Tutorial
12 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Machine Learning-Lecture 1(Student)
No ratings yet
Machine Learning-Lecture 1(Student)
14 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
PE IV - Practical Machine Learning
No ratings yet
PE IV - Practical Machine Learning
7 pages
Principal Component Regression, Partial Least Squares, Linear Classification
No ratings yet
Principal Component Regression, Partial Least Squares, Linear Classification
19 pages
Examples Using The PLS Procedure: Example 1. Predicting Biological Activity
No ratings yet
Examples Using The PLS Procedure: Example 1. Predicting Biological Activity
72 pages
examples-lab-2022
No ratings yet
examples-lab-2022
12 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
Stepwiseselection MATTOUHI AICHA
No ratings yet
Stepwiseselection MATTOUHI AICHA
7 pages
Journal of Statistical Software: The Pls Package: Principal Component and Partial Least Squares Regression in R
No ratings yet
Journal of Statistical Software: The Pls Package: Principal Component and Partial Least Squares Regression in R
23 pages
UnivariateRegression Summary
No ratings yet
UnivariateRegression Summary
36 pages
Output Da Record
No ratings yet
Output Da Record
16 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Bootstrap Regression With R: Histogram of KPL
No ratings yet
Bootstrap Regression With R: Histogram of KPL
5 pages
PLSR using MATLAB
No ratings yet
PLSR using MATLAB
6 pages
Rsimpls
No ratings yet
Rsimpls
37 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Final Data Lab
No ratings yet
Final Data Lab
20 pages
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
No ratings yet
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
4 pages
DTSL B2
No ratings yet
DTSL B2
4 pages
Appendix: Ps Matching in R: (With Attached Dataset and Code)
No ratings yet
Appendix: Ps Matching in R: (With Attached Dataset and Code)
24 pages
hw2
No ratings yet
hw2
9 pages
Abdi 2003 PLSRegression
No ratings yet
Abdi 2003 PLSRegression
7 pages
20BCE1205 Lab6
No ratings yet
20BCE1205 Lab6
12 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Multiple Regression
No ratings yet
Multiple Regression
57 pages
5a Curve Fitting Regression
No ratings yet
5a Curve Fitting Regression
68 pages
classification
No ratings yet
classification
4 pages
ISLR
No ratings yet
ISLR
9 pages
Appendix: Answers To Selected Exercises: /user
No ratings yet
Appendix: Answers To Selected Exercises: /user
8 pages
Copy of 8 In class examples (Excel)
No ratings yet
Copy of 8 In class examples (Excel)
28 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Regression Dataset Example
No ratings yet
Regression Dataset Example
14 pages
Final Predictive Vaibhav 2020
No ratings yet
Final Predictive Vaibhav 2020
101 pages
A Short Introduction To The Caret Package: Max Kuhn June 20, 2013
No ratings yet
A Short Introduction To The Caret Package: Max Kuhn June 20, 2013
10 pages
Toc ch1
No ratings yet
Toc ch1
9 pages
En Tanagra PLS DA
No ratings yet
En Tanagra PLS DA
10 pages
Hitters
No ratings yet
Hitters
2 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
11668a5f867641748200d0bfd6a889a3_hst951_7
No ratings yet
11668a5f867641748200d0bfd6a889a3_hst951_7
32 pages
Econometrics All R Codes Final
No ratings yet
Econometrics All R Codes Final
12 pages
R Programming Student Lab Manual-52-63-3-12
No ratings yet
R Programming Student Lab Manual-52-63-3-12
10 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Package Plsvarsel': R Topics Documented
No ratings yet
Package Plsvarsel': R Topics Documented
23 pages
TaylorFit Regression Manual
No ratings yet
TaylorFit Regression Manual
15 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
SQL Statement
No ratings yet
SQL Statement
1 page
Coordinate Descent and Golden Selection Search
No ratings yet
Coordinate Descent and Golden Selection Search
2 pages
Monte Carlo Integration
No ratings yet
Monte Carlo Integration
3 pages
HW 2
No ratings yet
HW 2
13 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
HW 3
No ratings yet
HW 3
10 pages
Anova Review
100% (1)
Anova Review
8 pages
Stats101a Homework8
No ratings yet
Stats101a Homework8
7 pages
HW 2
No ratings yet
HW 2
8 pages
HW 4
No ratings yet
HW 4
12 pages
Adjusting Betas
No ratings yet
Adjusting Betas
2 pages
Non-Stationary Models
No ratings yet
Non-Stationary Models
13 pages
Point of Tangency
No ratings yet
Point of Tangency
5 pages
Gradient Steepest Descent
No ratings yet
Gradient Steepest Descent
7 pages
Harmonic Seasonal Models
No ratings yet
Harmonic Seasonal Models
10 pages
Clustering
No ratings yet
Clustering
8 pages
Cross-Validation and The Bootstrap
No ratings yet
Cross-Validation and The Bootstrap
5 pages
Em Algorithm
No ratings yet
Em Algorithm
4 pages
Random Forests
No ratings yet
Random Forests
10 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Constant Correlation Model
No ratings yet
Constant Correlation Model
3 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
Regression Splines
No ratings yet
Regression Splines
4 pages
Support Vector Classification
No ratings yet
Support Vector Classification
8 pages
Single Index Model
No ratings yet
Single Index Model
4 pages
Ridge Regression and The Lasso
No ratings yet
Ridge Regression and The Lasso
7 pages
Multi-Group Model
No ratings yet
Multi-Group Model
2 pages
Stockportfolio
No ratings yet
Stockportfolio
9 pages
Formula Sheet TMQU03
No ratings yet
Formula Sheet TMQU03
3 pages
Experimental Design REVIEWER 1
No ratings yet
Experimental Design REVIEWER 1
2 pages
One-Sample Kolmogorov-Smirnov Test: Npar Tests
No ratings yet
One-Sample Kolmogorov-Smirnov Test: Npar Tests
25 pages
Chapter 6 6.7 T Test
No ratings yet
Chapter 6 6.7 T Test
8 pages
Design and Analysis of Experiments 8th Edition Montgomery Solutions Manual - Quickly Download And Never Miss Important Content
100% (3)
Design and Analysis of Experiments 8th Edition Montgomery Solutions Manual - Quickly Download And Never Miss Important Content
30 pages
004-5-MATH 361 Probability & Statistics
No ratings yet
004-5-MATH 361 Probability & Statistics
1 page
Torój - OLS Revisited
No ratings yet
Torój - OLS Revisited
45 pages
Normal Distribution
No ratings yet
Normal Distribution
9 pages
ALY 6000 Project 6
No ratings yet
ALY 6000 Project 6
4 pages
Lesson 3 Measures of Central TendencY FINAL
No ratings yet
Lesson 3 Measures of Central TendencY FINAL
18 pages
Ex Simulare Monte Carlo - Ex Din Excel
No ratings yet
Ex Simulare Monte Carlo - Ex Din Excel
14 pages
2014 Aguinis Methodological Wishes
No ratings yet
2014 Aguinis Methodological Wishes
32 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Goodness of Fit - Chi Square Test-1
No ratings yet
Goodness of Fit - Chi Square Test-1
17 pages
Fraction Percent Decimal Cheat Sheet
100% (1)
Fraction Percent Decimal Cheat Sheet
1 page
Holtz-Eakin, Newey and Rosen (1988)
No ratings yet
Holtz-Eakin, Newey and Rosen (1988)
26 pages
Skittles Report
No ratings yet
Skittles Report
9 pages
Ngangi - An Assessment of The Influence of Parents' Role in Monitoring Learning Activities On Students' Academic Performance in Public Secondary Schools in Kenya
No ratings yet
Ngangi - An Assessment of The Influence of Parents' Role in Monitoring Learning Activities On Students' Academic Performance in Public Secondary Schools in Kenya
9 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
One - Way Analysis of Variance 34
No ratings yet
One - Way Analysis of Variance 34
14 pages
Pengaruh Peer Group Support Terhadap Perilaku Jajanan Sehat Siswa Kelas 5 SDN Ajung 2 Kalisat Jember
No ratings yet
Pengaruh Peer Group Support Terhadap Perilaku Jajanan Sehat Siswa Kelas 5 SDN Ajung 2 Kalisat Jember
9 pages
Compre Advanced Stat 2022
No ratings yet
Compre Advanced Stat 2022
5 pages
Review Article: International International Journal of Current Research
No ratings yet
Review Article: International International Journal of Current Research
4 pages
Eksperimen
No ratings yet
Eksperimen
11 pages
MATM111-Midterms-REVIEWER
No ratings yet
MATM111-Midterms-REVIEWER
3 pages
The Effects of School Vouchers On College Enrollment:c
No ratings yet
The Effects of School Vouchers On College Enrollment:c
8 pages
CABRERA - Coursera Report
No ratings yet
CABRERA - Coursera Report
12 pages
Reliability: Case Processing Summary
No ratings yet
Reliability: Case Processing Summary
1 page
Bigras Et Al - Keeping Collecting Device in Liquid Medium Is Mandatory To Ensure Optimized LB Cervical Cytologic Sampling
No ratings yet
Bigras Et Al - Keeping Collecting Device in Liquid Medium Is Mandatory To Ensure Optimized LB Cervical Cytologic Sampling
7 pages

PCR and Pls Regression

Uploaded by

PCR and Pls Regression

Uploaded by

PCR and PLS Regression

YIK LUN, KEI

Principal Components Regression

8 comps 9 comps 10 comps 11 comps 12 comps 13 comps

TRAINING: % variance explained

pcr.pred=predict (pcr.fit ,x[test ,], ncomp =7)

pcr.fit=pcr(y~x,scale =TRUE ,ncomp =7)

Partial Least Squares Regression

TRAINING: % variance explained

pls.pred=predict (pls.fit ,x[test ,], ncomp =2)

You might also like