100% found this document useful (1 vote)
6K views

Polynomial Regression and Step Function

Polynomial Regression and Step Function YIK LUN, KEI This paper is a lab from the book called An Introduction to Statistical Learning with Applications in R.

Uploaded by

api-285777244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
6K views

Polynomial Regression and Step Function

Polynomial Regression and Step Function YIK LUN, KEI This paper is a lab from the book called An Introduction to Statistical Learning with Applications in R.

Uploaded by

api-285777244
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Polynomial Regression and Step Function

YIK LUN, KEI


[email protected]
This paper is a lab from the book called An Introduction to Statistical Learning
with Applications in R. All R codes and comments below are belonged to the
book and authors.

Polynomial Regression
library(ISLR)
attach(Wage)
fit=lm(wage~poly(age,4),data=Wage)
summary(fit)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = wage ~ poly(age, 4), data = Wage)
Residuals:
Min
1Q
-98.707 -24.626

Median
-4.993

3Q
Max
15.217 203.693

Coefficients:

Estimate Std. Error t value Pr(>|t|)


(Intercept)
111.7036
0.7287 153.283 < 2e-16 ***
poly(age, 4)1 447.0679
39.9148 11.201 < 2e-16 ***
poly(age, 4)2 -478.3158
39.9148 -11.983 < 2e-16 ***
poly(age, 4)3 125.5217
39.9148
3.145 0.00168 **
poly(age, 4)4 -77.9112
39.9148 -1.952 0.05104 .
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 39.91 on 2995 degrees of freedom
Multiple R-squared: 0.08626,
Adjusted R-squared: 0.08504
F-statistic: 70.69 on 4 and 2995 DF, p-value: < 2.2e-16

fit2=lm(wage~poly(age,4,raw =T),data=Wage)
summary(fit2)
##
##
##
##
##
##
##
##

Call:
lm(formula = wage ~ poly(age, 4, raw = T), data = Wage)
Residuals:
Min
1Q
-98.707 -24.626

Median
-4.993

3Q
Max
15.217 203.693

##
##
##
##
##
##
##
##
##
##
##
##
##

Coefficients:

Estimate Std. Error t value


(Intercept)
-1.842e+02 6.004e+01 -3.067
poly(age, 4, raw = T)1 2.125e+01 5.887e+00
3.609
poly(age, 4, raw = T)2 -5.639e-01 2.061e-01 -2.736
poly(age, 4, raw = T)3 6.811e-03 3.066e-03
2.221
poly(age, 4, raw = T)4 -3.204e-05 1.641e-05 -1.952
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'

Pr(>|t|)
0.002180
0.000312
0.006261
0.026398
0.051039

**
***
**
*
.

0.1 ' ' 1

Residual standard error: 39.91 on 2995 degrees of freedom


Multiple R-squared: 0.08626,
Adjusted R-squared: 0.08504
F-statistic: 70.69 on 4 and 2995 DF, p-value: < 2.2e-16

agelims<-range(age)
age.grid=seq(min(age),max(age))
preds=predict(fit,newdata =list(age=age.grid),se=TRUE) # want standard error
se.bands=cbind(preds$fit+2*preds$se.fit, preds$fit-2*preds$se.fit)
plot(age,wage,xlim=agelims,cex =.5,col="darkgrey",main="Degree-4 Polynomial")
lines(age.grid,preds$fit,lwd=2,col ="red")
matlines(age.grid,se.bands,lwd=2,col ="blue",lty=3)

200
100
50

wage

300

Degree4 Polynomial

20

30

40

50
age

60

70

80

anova() sequentially compare simpler model to more complex


model.
Hence, either a cubic or a quartic polynomial appear to provide a reasonable fit to the data,
but lower- or higher-order models are not justified.
fit.1= lm(wage~age,data=Wage)
fit.2= lm(wage~poly(age,2) ,data=Wage)
fit.3= lm(wage~poly(age,3) ,data=Wage)
fit.4= lm(wage~poly(age,4) ,data=Wage)
fit.5= lm(wage~poly(age,5) ,data=Wage)
anova(fit.1, fit.2, fit.3, fit.4, fit.5)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Analysis of Variance Table


Model 1: wage ~ age
Model 2: wage ~ poly(age, 2)
Model 3: wage ~ poly(age, 3)
Model 4: wage ~ poly(age, 4)
Model 5: wage ~ poly(age, 5)
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
2998 5022216
2
2997 4793430 1
228786 143.5931 < 2.2e-16
3
2996 4777674 1
15756
9.8888 0.001679
4
2995 4771604 1
6070
3.8098 0.051046
5
2994 4770322 1
1283
0.8050 0.369682
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05

***
**
.
'.' 0.1 ' ' 1

polynomial logistic regression model


fit=glm(I(wage>250)~poly(age ,4),data=Wage ,family =binomial)
preds=predict(fit,list(age=age.grid),se=T) #prediction for logit not probability
pfit=exp(preds$fit)/(1+exp(preds$fit)) #type="response" will return negative se.bands
se.bands.logit=cbind(preds$fit+2*preds$se.fit,preds$fit-2*preds$se.fit)
se.bands = exp(se.bands.logit)/(1+ exp(se.bands.logit))
plot(age ,I(wage >250) ,xlim=agelims ,type ="n",ylim=c(0 ,.2) )
points(jitter(age), I((wage >250) /5) ,cex =.5, pch ="|",col =" darkgrey ")
lines(age.grid ,pfit ,lwd =2, col =" blue")
matlines (age.grid ,se.bands ,lwd =1, col =" blue",lty =3)

0.20

| | || | | || || || || | | | |

0.10

0.15

| ||| | | || | || ||| || || | ||| || || || || || || | ||

0.05
0.00

I(wage > 250)

||| ||| ||| ||| |||| |||| |||||| |||| |||| |||||| |||
||| |||||| |||||| |||| |||| |||| |||| |||| |||| |||||| |||| |||| |||| |||||| |||| |||||| |||| |||| |||| |||||| |||||| |||||| |||||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||||| ||| |||||| ||| ||| ||| || || ||| ||| ||| || || || | || |

20

30

40

50

60

70

age

Step fucnction
fit=lm(wage~cut(age,4),data=Wage)
preds<-predict(fit,newdata=list(age=age.grid),se=T)
se.bands=cbind(preds$fit+2*preds$se.fit,preds$fit-2*preds$se.fit)
plot(age,wage,xlim=agelims,cex =.5,col="darkgrey",main="Degree-4 Polynomial")
lines(age.grid,preds$fit,lwd=2,col="red")
matlines(age.grid,se.bands,lwd=2,col="blue",lty=3)

||

80

200
50

100

wage

300

Degree4 Polynomial

20

30

40

50

60

70

age

Logistic regression under step function


fit=glm(I(wage>250)~cut(age,4),data=Wage,family=binomial)
preds<-predict(fit,list(age=age.grid),se=T)
pfit<-exp(preds$fit) / (1 + exp(preds$fit))
se.bands.logit<-cbind(preds$fit + 2*preds$se.fit , preds$fit - 2*preds$se.fit)
se.bands<-exp(se.bands.logit) / (1 + exp(se.bands.logit))
plot(age ,I(wage >250) ,xlim=agelims ,type ="n",ylim=c(0 ,.2) )
points(jitter(age), I((wage >250) /5) ,cex =.5, pch ="|",col =" darkgrey ")
lines(age.grid ,pfit ,lwd =2, col =" blue")
matlines (age.grid ,se.bands ,lwd =1, col =" blue",lty =3)

80

0.20

|| | | || | | || || | | || |

0.10

0.15

| || | || | | || ||| || || | ||| || || || || | || | ||

0.05
0.00

I(wage > 250)

||| ||| ||| ||| |||| |||| |||||| |||||| |||| |||||| |||||| |||||| |||| |||| |||
||| |||||| |||||| |||| |||||| |||||| |||| |||| |||| |||| |||| |||| |||||| |||| |||| |||||| |||||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| |||| ||||| ||| |||| |||| ||| || || ||| ||| || ||| || | || |

20

30

40

50

60

70

||

80

age

Reference:
James, Gareth, et al. An introduction to statistical learning. New
York: springer, 2013.

You might also like