100% found this document useful (1 vote)
140 views4 pages

Cu 3008 Assignment 2 Solutions

STAT 3008: Applied Linear Regression is the course title. The document provides solutions to problems from Assignment #2. Problem 1 provides mathematical derivations and expectations of estimators. Problem 2 uses R code to analyze weight and height data and conduct hypothesis tests. Problem 3 analyzes a dataset with 3 observations and derives expectations. Problem 4 fits multiple linear regression models to another dataset and calculates various statistics.

Uploaded by

Jim Hack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
140 views4 pages

Cu 3008 Assignment 2 Solutions

STAT 3008: Applied Linear Regression is the course title. The document provides solutions to problems from Assignment #2. Problem 1 provides mathematical derivations and expectations of estimators. Problem 2 uses R code to analyze weight and height data and conduct hypothesis tests. Problem 3 analyzes a dataset with 3 observations and derives expectations. Problem 4 fits multiple linear regression models to another dataset and calculates various statistics.

Uploaded by

Jim Hack
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

STAT 3008: Applied Linear Regression

2014-15 Term 2
Assignment #2 Solutions
Problem 1:

'Y
) tr E X(X' X) 1 X' Y ' X(X' X) 1 X' Y
E (Y
tr ( E ( Y' X(X' X) 1 X' X(X' X) 1 X' Y ))

(a)

tr ( X(X' X) 1 X' E ( YY ' ))


tr ( X(X' X) 1 X' E ( X' X 'ee'2X' ))
tr ( X(X' X) 1 X' ( X' X ' 2 I n 0))
tr ( X(X' X) 1 X' X' X ' ) tr ( X(X' X) 1 X' 2 I n ) 0
' X ' X tr ( 2 (X' X) 1 X' X ) ' X ' X ( p 1) 2

(b) E (Y' Y) tr( E (' X' X e'e 2X' ' e)) tr(' X' X 2 I n 0) ' X' X tr( 2 I n ) ' X' X n 2
Hence, E(Y' Y) XX ( p 1) 2 (n p 1) 2 E(Y ' Y ) E(e' e ) .
Problem 2: (Please refer to the R codes at the last page)

(a)
(b) R outputs: 0 36.8759, 1 0.5821, 2 8.4562 71.5039 , SE ( 0 ) 64.4728, SE ( 1 ) 0.3892
(c) R outputs:

Df
Sum Sq
Mean Sq
F value Pr(>F)
x
1
159.95
159.947
2.237 0.1731
Residuals
8
572.01
71.502
Hypotheses: H0: y= 0 vs H1: y= 0 + 1x
Decision: Since p-value =Pr(F1,8 >2.237)=0.1731>0.05, we do not reject H0 at =0.05.
Conclusion: We do not have sufficient evidence that the response and the predictor are dependent.

(d) R outputs:

Estimate
0.5821

Std. Error
0.3892

t value Pr(>|t|)
1.496
0.173

Hypotheses: H0: 1 = 0 vs H1: 10


Decision: Since p-value=2Pr(t8 >|t0|)=2Pr(t8 >1.496)=0.1731> 0.05, we do not reject H0
at =0.05.
Conclusion: We do not have sufficient evidence that the response and the predictor are dependent.

(e) Hypotheses: H0: 1 = 2 vs H1: 12. Under H0, t0=(0.5821-2.0)/0.3892=-3.64,


Page 1/4

Decision: Since p-value=2Pr(t8>|t0|)=2Pr(t8>3.64)=0.0066 < 0.05, we reject H0 at =0.05.


Conclusion: We have sufficient evidence that 1 is not equal to 2.0.
(f) A 95% CI for W t is given by

1/ 2

1 ( x x )2

0 1 x t8,0.025
SXX
n

(53.94598, 66.49078).
1/ 2

(g) A 99% PI for Wt is given by

1 ( x x )2

0 1 x* t8,0.005 1 *
SXX
n

(30.41346, 90.02330)

Problem 3:
1
3

(a) (X' X) 1 , (X' X) 1 X' Y

y1 y2 y3
y,
3

1 1 1
1
2
2
2
1 1 , Y' (I H)Y (2Y1 2Y2 2Y3 2Y1Y2 2Y1Y3 2Y2Y3 )

3
3
1 1 1

(b) H X(X' X) 1 X' 1 1

1
E ( 2Y12 2Y22 2Y32 2Y1Y2 2Y1Y3 2Y2Y3 )
3
1
2(3 2 2 ) 2(1 ( 1) 2 ) 2(1 12 ) 2( 1 2( 1)) 2(0 2(1)) 2(0 ( 1)1)
3
(14 4 4 6 4 2) / 3 26 / 3

E[Y' (I H)Y ]

(c)

E (Y12 ) E (Y1Y2 ) E (Y1Y3 ) 3 2 2


1 ( 1)( 2) 0 (2)(1) 7 3 2

2
E ( YY' ) E (Y1Y2 ) E (Y2 ) E (Y2Y3 ) 1 ( 1)( 2)
1 ( 1) 2
0 ( 1)(1) 3 2 1

2
E (Y1Y3 ) E (Y2Y3 ) E (Y32 ) 0 ( 2)(1)
2 1 2
0

1
)(
1
)
1

2 1 1 7 3 2
15 7 3
1
1
( I H) E ( YY' ) 1 2 1 3 2 1 15 8 6 ,

3
1 3
1 1 2 2 1 2
0

E[ tr( Y' (I H)Y)] tr[(I H) E ( YY ' )]

15 8 3 26

3
3

Problem 4:
(a) From the R Codes, SYY=517.875, RSS=145.0752, SSreg=372.7998, 2 =29.01503. R2= 0.7198
16.5784

10.1144 ,
8.3464

21.88235
0.88235

32.49020
7.49020
23.65033
4.65033

17.35213 21.05012
24.74812

21.88235

2.11765

Y
,
e

,
V
ar
(

17
.
35213
41.62614 43.99665

28.95425
7.04575

21.05012 43.99665 47.03090


32.49020
3.50980
23.65033
0.34967

10.0000
0.00000

(b) Hypotheses H0: E(Y|X) = 0+ 1x1 vs H1: E(Y|X) = 0 + 1x1 + 2x2


Source
Regression

df
1

SS
42.98

MS
42.98

Residual
Total

5
6

145.07
188.05

29.014

F0
1.4814

p-value
0.2779

Decision: Since p-value =0.2779 > = 0.05, we do not reject H0 at = 0.05.


Conclusion: We do not have sufficient evidence that Model 2 is the appropriate model vs Model 1.

(c) x* (1,2.5,3), ~y x* 16.3317 , t5,0.025 = 2.5706, sepred( y | x* ) 1 x*' ( X ' X ) 1 x* 10.80717


y t5,0.025sepred( y | x* ) (11.44902, 44.11242) .
A 95% PI for the response is ~
Page 2/4

R Codes for Problem #2


### Problem 2(b) ###
library(car); library(alr3); y<-htwt$Wt; x<-htwt$Ht
fit<-lm(y~x); fit
Coefficients:
(Intercept)
x
-36.8759
0.5821
summary(fit)
Coefficients:
Estimate
Std. Error t value Pr(>|t|)
(Intercept) -36.8759
64.4728 -0.572
0.583
x
0.5821
0.3892 1.496
0.173
Residual standard error: 8.456 on 8 degrees of freedom
Multiple R-squared: 0.2185,
Adjusted R-squared: 0.1208
F-statistic: 2.237 on 1 and 8 DF, p-value: 0.1731
### Problem 2(c) ###
anova(fit)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x
1 159.95 159.947 2.237 0.1731
Residuals
8 572.01 71.502
### Problem 2(d) ###
2*(1-pt(abs((0.5821-2)/.3892),8)) # p-value
[1] 0.006559349
### Problem 2(e) ###
wthat<--36.8759+0.5821*(166.8)
sefit<-8.456*sqrt(1/10+(166.8-mean(x))^2/ sum((x-mean(x))^2))
c(wthat-qt(0.975,8)*sefit,wthat+qt(0.975,8)*sefit)
[1] 53.94598 66.49078
### Problem 2(f) ###
wthat<--36.8759+0.5821*(166.8)
sepred<-8.456*sqrt(1+1/10+(166.8-mean(x))^2/sum((x-mean(x))^2))
c(wthat-qt(0.995,8)*sepred,wthat+qt(0.995,8)*sepred)
[1] 30.41346 90.02330

Page 3/4

R Codes for Problem #4


### Problem 4(a) ###
y<-c(21,25,19,24,36,36,24,10); x1<-c(3,9,4,3,7,9,4,1); x2<-c(3,9,4,3,7,9,4,2)
X<-cbind(rep(1,length(x1)),x1,x2)
betahat<-solve(t(X)%*%X)%*%t(X)%*%y
yhat<-X%*%betahat; ehat<-y-yhat
SYY<-as.numeric(t(y-mean(y))%*%(y-mean(y)))
RSS<-as.numeric(t(ehat)%*%ehat); SSreg<-SYY-RSS; R2<-SSreg/SYY
sigmahat2<-RSS/(length(y)-2-1);
varbetahat<-sigmahat2*solve(t(X)%*%X)
### Problem 4(b) ### (NOT Required but the anova function will provide the answers right away)
fit0<-lm(y~x1)
anova(fit0)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1
1 329.82 329.82 10.523 0.0176 *
Residuals
6 188.05 31.34
fit1<-lm(y~x1+x2)
anova(fit0,fit1)
Analysis of Variance Table
Model 1: y ~ x1
Model 2: y ~ x1 + x2
Res.Df
RSS Df Sum of Sq
F Pr(>F)
1
6 188.05
2
5 145.07 1
42.977
1.4812 0.2779
### Problem 4(c) ###
xstar<-c(1,-2.5,-3)
xstar%*%betahat
[,1]
[1,] 16.3317
xstar%*%betahat+c(-1,1)*qt(0.975,length(y)-2-1)*sqrt(sigmahat2)*sqrt(1+t(xstar)%*%solve
(t(X)%*%X)%*%xstar)
[1] -11.44902 44.11242

Page 4/4

You might also like