0% found this document useful (0 votes)
3K views

PCR and Pls Regression

This document summarizes the use of principal component regression (PCR) and partial least squares regression (PLS) on baseball salary data. PCR and PLS are applied to select important variables and reduce dimensionality. 10-fold cross-validation is used to evaluate the models. For both PCR and PLS, the cross-validated RMSE decreases as more components are added, leveling off around 7 components. PLS performs slightly better than PCR with a lower RMSE.

Uploaded by

api-285777244
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

PCR and Pls Regression

This document summarizes the use of principal component regression (PCR) and partial least squares regression (PLS) on baseball salary data. PCR and PLS are applied to select important variables and reduce dimensionality. 10-fold cross-validation is used to evaluate the models. For both PCR and PLS, the cross-validated RMSE decreases as more components are added, leveling off around 7 components. PLS performs slightly better than PCR with a lower RMSE.

Uploaded by

api-285777244
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

PCR and PLS Regression

YIK LUN, KEI


[email protected]
This paper is a lab from the book called An Introduction to Statistical Learning
with Applications in R. All R codes and comments below are belonged to the
book and authors.

Principal Components Regression


library(ISLR)
library(pls)
##
## Attaching package: 'pls'
##
## The following object is masked from 'package:stats':
##
##
loadings
set.seed(2)
Hitters<-na.omit(Hitters)
x=model.matrix (Salary~.,Hitters )[,-1]
y=Hitters$Salary
pcr.fit=pcr(Salary~.,data=Hitters,scale=TRUE,validation ="CV")
summary(pcr.fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Data:
X dimension: 263 19
Y dimension: 263 1
Fit method: svdpc
Number of components considered: 19
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV
452
348.9
352.2
353.5
adjCV
452
348.7
351.8
352.9
7 comps 8 comps 9 comps 10 comps 11
CV
349.6
350.9
352.9
353.8
adjCV
348.5
349.8
351.6
352.3
14 comps 15 comps 16 comps 17 comps
CV
355.2
357.4
347.6
350.1
adjCV
352.8
355.2
345.5
347.6
TRAINING:
1
X
Salary

% variance explained
comps 2 comps 3 comps
38.31
60.16
70.84
40.63
41.58
42.17

4 comps
79.03
43.22
1

4 comps 5 comps 6
352.8
350.1
352.1
349.3
comps 12 comps 13
355.0
356.2
353.4
354.5
18 comps 19 comps
349.2
352.6
346.7
349.8

5 comps
84.29
44.90

6 comps
88.63
46.48

comps
349.1
348.0
comps
363.5
361.6

7 comps
92.26
46.69

##
##
##
##
##
##

X
Salary
X
Salary

8 comps 9 comps 10 comps 11 comps 12 comps 13 comps


94.96
96.28
97.26
97.98
98.65
99.15
46.75
46.86
47.76
47.82
47.85
48.10
15 comps 16 comps 17 comps 18 comps 19 comps
99.75
99.89
99.97
99.99
100.00
50.55
53.01
53.85
54.61
54.61

14 comps
99.47
50.40

validationplot(pcr.fit ,val.type="MSEP")

160000
120000

MSEP

200000

Salary

10

15

number of components

set.seed(1)
train=sample(1: nrow(Hitters), nrow(Hitters)/2)
test=(-train)
y.test=y[test]
pcr.fit=pcr(Salary~.,data=Hitters,subset =train,scale =TRUE,validation ="CV")
summary(pcr.fit)
##
##
##
##
##
##
##
##
##

Data:
X dimension: 131 19
Y dimension: 131 1
Fit method: svdpc
Number of components considered: 19
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV
464.6
396.2
395.5
394.0
2

4 comps
393.8

5 comps
393.0

6 comps
384.4

##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

adjCV
CV
adjCV
CV
adjCV

464.6
395.8
394.8
7 comps 8 comps 9 comps 10
381.3
385.5
387.4
380.0
383.9
385.6
14 comps 15 comps 16 comps
406.7
409.3
407.8
403.4
405.8
404.2

393.3
comps 11
401.2
398.7
17 comps
402.5
398.4

392.9
comps 12
403.5
400.8
18 comps
398.6
394.5

392.5
comps 13
409.6
406.6
19 comps
403.8
399.4

TRAINING: % variance explained


1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
X
38.89
60.25
70.85
79.06
84.01
88.51
92.61
Salary
28.44
31.33
32.53
33.69
36.64
40.28
40.41
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
X
95.20
96.78
97.63
98.27
98.89
99.27
99.56
Salary
41.07
41.25
41.27
41.41
41.44
43.20
44.24
15 comps 16 comps 17 comps 18 comps 19 comps
X
99.78
99.91
99.97
100.00
100.00
Salary
44.30
45.50
49.66
51.13
51.18

validationplot(pcr.fit ,val.type="MSEP")

150000

180000

210000

Salary

MSEP

381.5
comps
405.6
402.4

10
number of components

pcr.pred=predict (pcr.fit ,x[test ,], ncomp =7)


mean((pcr.pred-y.test)^2)
## [1] 96556.22
3

15

pcr.fit=pcr(y~x,scale =TRUE ,ncomp =7)


summary(pcr.fit)
##
##
##
##
##
##
##
##

Data:
X dimension: 263 19
Y dimension: 263 1
Fit method: svdpc
Number of components considered: 7
TRAINING: % variance explained
1 comps 2 comps 3 comps 4 comps
X
38.31
60.16
70.84
79.03
y
40.63
41.58
42.17
43.22

5 comps
84.29
44.90

6 comps
88.63
46.48

7 comps
92.26
46.69

Partial Least Squares Regression


set.seed(1)
pls.fit=plsr(Salary~.,data=Hitters,subset =train,scale=TRUE,validation ="CV")
summary(pls.fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Data:
X dimension: 131 19
Y dimension: 131 1
Fit method: kernelpls
Number of components considered: 19
VALIDATION: RMSEP
Cross-validated using 10 random segments.
(Intercept) 1 comps 2 comps 3 comps
CV
464.6
394.2
391.5
393.1
adjCV
464.6
393.4
390.2
391.1
7 comps 8 comps 9 comps 10 comps 11
CV
424.5
415.8
404.6
407.1
adjCV
418.9
411.4
400.7
402.2
14 comps 15 comps 16 comps 17 comps
CV
406.2
408.6
410.5
408.8
adjCV
401.8
403.9
405.6
404.1

4 comps 5 comps 6
395.0
415.0
392.9
411.5
comps 12 comps 13
412.0
414.4
407.2
409.3
18 comps 19 comps
407.8
410.2
403.2
405.5

comps
424.0
418.8
comps
410.3
405.6

TRAINING: % variance explained


1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps
X
38.12
53.46
66.05
74.49
79.33
84.56
87.09
Salary
33.58
38.96
41.57
42.43
44.04
45.59
47.05
8 comps 9 comps 10 comps 11 comps 12 comps 13 comps 14 comps
X
90.74
92.55
93.94
97.23
97.88
98.35
98.85
Salary
47.53
48.42
49.68
50.04
50.54
50.78
50.92
15 comps 16 comps 17 comps 18 comps 19 comps
X
99.11
99.43
99.78
99.99
100.00
Salary
51.04
51.11
51.15
51.16
51.18

validationplot(pls.fit ,val.type="MSEP")

190000
150000

170000

MSEP

210000

Salary

10

15

number of components

pls.pred=predict (pls.fit ,x[test ,], ncomp =2)


mean((pls.pred -y.test)^2)
## [1] 101417.5
pls.fit=plsr(Salary~., data=Hitters ,scale=TRUE ,ncomp =2)
summary(pls.fit)
##
##
##
##
##
##
##
##

Data:
X dimension: 263 19
Y dimension: 263 1
Fit method: kernelpls
Number of components considered: 2
TRAINING: % variance explained
1 comps 2 comps
X
38.08
51.03
Salary
43.05
46.40

Reference:
James, Gareth, et al. An introduction to statistical learning. New
York: springer, 2013.
5

You might also like