0% found this document useful (0 votes)
86 views7 pages

36-401 Modern Regression HW #5 Solutions: Air - Flow

The document contains the solutions to homework problems involving linear regression. Problem 1 involves fitting a linear regression model to predict stack loss using variables like air flow, water temperature and acid concentration. Problem 2 derives the closed form solution for the least squares estimators in univariate linear regression through the origin. Problem 3 fits a linear regression model with an intercept term removed to a dataset with 1 response and 3 predictor variables.

Uploaded by

S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views7 pages

36-401 Modern Regression HW #5 Solutions: Air - Flow

The document contains the solutions to homework problems involving linear regression. Problem 1 involves fitting a linear regression model to predict stack loss using variables like air flow, water temperature and acid concentration. Problem 2 derives the closed form solution for the least squares estimators in univariate linear regression through the origin. Problem 3 fits a linear regression model with an intercept term removed to a dataset with 1 response and 3 predictor variables.

Uploaded by

S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

36-401 Modern Regression HW #5 Solutions

DUE: 10/20/2017 at 3PM

Problem 1 [20 points]


(a)

pairs(stackloss, font.labels = 3, font.axis = 5, pch = 19)

18 20 22 24 26 10 15 20 25 30 35 40

50 55 60 65 70 75 80
Air.Flow
26
24

Water.Temp
22
20
18

90
85
Acid.Conc.
80
75
10 15 20 25 30 35 40

stack.loss

50 55 60 65 70 75 80 75 80 85 90

Figure 1: Pairwise associations of variables from stackloss data set

1
(b)

model <- lm(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc., data = stackloss)


summary(model)

##
## Call:
## lm(formula = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,
## data = stackloss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.2377 -1.7117 -0.4551 2.3614 5.6978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -39.9197 11.8960 -3.356 0.00375 **
## Air.Flow 0.7156 0.1349 5.307 5.8e-05 ***
## Water.Temp 1.2953 0.3680 3.520 0.00263 **
## Acid.Conc. -0.1521 0.1563 -0.973 0.34405
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.243 on 17 degrees of freedom
## Multiple R-squared: 0.9136, Adjusted R-squared: 0.8983
## F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09
The F -test yields a p-value of 3.016 × 10−9 , which strongly suggests that at least one of the predictors
has a significant association with Stack Loss. The univariate t-tests suggest that both Air Flow and
Water Temperature have significant associations with Stack Loss, even after accounting for all other
predictors.

(c)

library(knitr)
kable(confint(model, level = 0.9), digits = 2,
caption = "90% Confidence Intervals for Regression Coefficients")

Table 1: 90% Confidence Intervals for Regression Coefficients

5% 95 %
(Intercept) -60.61 -19.23
Air.Flow 0.48 0.95
Water.Temp 0.66 1.94
Acid.Conc. -0.42 0.12

2
(d)

kable(predict(model, newdata = data.frame(Air.Flow = 58, Water.Temp = 20, Acid.Conc. = 86),


interval = "prediction", level = 0.99), digits = 3,
caption = "99% Prediction Interval for Stack Loss given Airflow = 58,
Water temperature = 20 and Acid = 86", col.names = c("Prediction", "Lower bound",
"Upper bound"))

Table 2: 99% Prediction Interval for Stack Loss given Airflow = 58,
Water temperature = 20 and Acid = 86

Prediction Lower bound Upper bound


14.411 4.76 24.061

(e)

In part (b) we saw the p-value for the t-test testing this hypothesis is 0.3440, so we fail to reject H0 .
This hypothesis can also equivalently be tested using a partial F -test. Table 3 shows the ANOVA table for
the regression and the partial F -test of H0 again yields the p-value of 0.3440.
kable(anova(model), digits = 3, caption = "ANOVA Table for Regression")

Table 3: ANOVA Table for Regression

Df Sum Sq Mean Sq F value Pr(>F)


Air.Flow 1 1750.122 1750.122 166.371 0.000
Water.Temp 1 130.321 130.321 12.389 0.003
Acid.Conc. 1 9.965 9.965 0.947 0.344
Residuals 17 178.830 10.519 NA NA

3
Problem 2 [20 points]
!
βb1
βb =
βb2
= (X T X)−1 X T Y
  −1  
  X11 X21   Y1
 X11 · · · X1n  .. ..  X11 ··· X1n  .. 
=
X21 · · · X2n  . .  X21 ··· X2n  . 
X1n X2n Yn
 X n Xn −1 X n 
2
 X1i X1i X2i   X1i Yi 
 i=1 i=1   i=1
= X

n Xn  X n 
 2   
X1i X2i X2i X2i Yi
i=1 i=1 i=1
X n −1 X
n 
2
 X1i 0   X1i Yi 
 i=1  i=1
=
 
n
X  X n 
 2  
0 X2i X2i Yi
i=1 i=1
 1 
n 0  n 
X 2
 X

 X1i 
 X Y
1i i 
=  i=1
   i=1 
1  X n 
 0  
 n
X  X2i Yi
 2 
X2i i=1
i=1
n
 
X
 X1i Yi 
 i=1 
 n 
 X 
2 
X1i

 
 i=1 
=
X n



 X2i Yi 

 i=1 
 n 
 X 
2 
 X 2i
i=1

As we saw in Homework 2, these are the least square estimators for the two separate univariate regressions
through the origin.

4
Problem 3 [20 points]
(a)

X <- matrix(c(1,1,1,1,4,3,10,7,5,4,9,10), ncol = 3)


Y <- matrix(c(25,20,57,51), ncol = 1)
model3 <- lm(Y ~ X - 1)
summary(model3)

##
## Call:
## lm(formula = Y ~ X - 1)
##
## Residuals:
## 1 2 3 4
## -0.70098 0.57353 0.06373 0.06373
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## X1 -2.6029 1.3382 -1.945 0.3023
## X2 3.0686 0.3249 9.444 0.0672 .
## X3 3.2059 0.3490 9.185 0.0690 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9102 on 1 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9995
## F-statistic: 2766 on 3 and 1 DF, p-value: 0.01398
mytable <- summary(model3)$coefficients
row.names(mytable) <- c("Intercept","X1","X2")
kable(mytable)

Estimate Std. Error t value Pr(>|t|)


Intercept -2.602941 1.3382353 -1.945055 0.3023200
X1 3.068627 0.3249375 9.443746 0.0671615
X2 3.205882 0.3490389 9.184886 0.0690397

The F -test yields a p-value of 0.0139, which suggests that at least one of X1 and X2 has a significant
association with Y . However, as indicated by the t-tests, there is not enough evidence to conclude individually
that X1 has a significant association with Y (after accounting for X2 ), or that X2 has a significant association
with Y (after accounting for X1 ).

5
(b)

library(xtable)
# you can use this package to convert R objects into nice LaTeX tables/matrices
print(xtable(t(X) %*% X), tabular.environment = "pmatrix",
include.rownames = FALSE, include.colnames = FALSE, hline.after = NULL)
print(xtable(solve(t(X) %*% X)), tabular.environment = "pmatrix",
include.rownames = FALSE, include.colnames = FALSE, hline.after = NULL)

 
4.00 24.00 28.00
X T X = 24.00 174.00 192.00
28.00 192.00 222.00
 
2.16 0.06 −0.32
(X T X)−1 =  0.06 0.13 −0.12
−0.32 −0.12 0.15

(c)

print(xtable(solve(t(X) %*% X) %*% t(X) %*% Y), tabular.environment = "pmatrix",


include.rownames = FALSE, include.colnames = FALSE, hline.after = NULL)

βb = (X T X)−1 X T Y
 
−2.60
=  3.07 
3.21

(d)

print(xtable(X %*% solve(t(X) %*% X) %*% t(X)), tabular.environment = "pmatrix",


include.rownames = FALSE, include.colnames = FALSE, hline.after = NULL)

H = X(X T X)−1 X T
 
0.41 0.49 0.05 0.05
0.49 0.60 −0.04 −0.04
=0.05 −0.04 1.00 −0.00

0.05 −0.04 −0.00 1.00

(e)

b = σ 2 (X T X)−1
Var(β)
 
2.16 0.06 −0.32
= σ 2  0.06 0.13 −0.12
−0.32 −0.12 0.15

6
Note: This is the variance-covariance matrix of β.
b Notice it depends on the true (unknown) distribution of
the data. The standard errors provided by R’s summary command are plug-in estimates for the square root
of the diagonal elements of Var(β).
b That is, R gives you
n op n q op+1
b βbk )
se( = σ b (X T X)−1jj
k=0 j=1
v
u n op+1
1 X nq
(X T X)−1
u
=t e2i jj
n − (p + 1) i=1 j=1
 
1.4702941
= 0.9101821 · 0.3570028
0.3834825
 
1.3382353
= 0.3249375
0.3490389

Problem 4 [20 points]

tr(H) = tr(X(X T X)−1 X T )


= tr((X T X)−1 X T X)
= tr((X T X)−1 (X T X))
= tr(I p+1 )
=p+1
where we have used the cyclic property of the trace operator and I p+1 is the (p + 1) × (p + 1) identity matrix.

Problem 5 [20 points]


T
Yb e = (HY )T (Y − HY )
= Y T H T (I n − H)Y
= Y T H(I n − H)Y
= Y T (H − H 2 )Y
= Y T (H − H)Y
= Y T (0n×n )Y
=0

You might also like