0% found this document useful (0 votes)
48 views14 pages

36-401 Modern Regression HW #9 Solutions: Problem 1 (44 Points)

This document contains solutions to homework problems involving modern regression. Problem 1 has three parts: (a) derives the least squares estimator for a simple linear regression model, (b) derives the weighted least squares estimator, and (c) calculates the expected value and variance of the estimators derived in parts (a) and (b).

Uploaded by

S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views14 pages

36-401 Modern Regression HW #9 Solutions: Problem 1 (44 Points)

This document contains solutions to homework problems involving modern regression. Problem 1 has three parts: (a) derives the least squares estimator for a simple linear regression model, (b) derives the weighted least squares estimator, and (c) calculates the expected value and variance of the estimators derived in parts (a) and (b).

Uploaded by

S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

36-401 Modern Regression HW #9 Solutions

DUE: 12/1/2017 at 3PM

Problem 1 [44 points]


(a) (7 pts.)

Let
n
X
SSE = (Yi − βXi )2 .
i=1

n
∂ X
SSE = −2 (Yi − βXi )Xi
∂β i=1

Set

SSE = 0.
∂β
Then,
n
X
−2 (Yi − βXi )Xi = 0
i=1
Xn
(Yi Xi − βXi2 ) = 0
i=1
Pn
Yi Xi
β = Pi=1
n 2 .
i=1 Xi
And
n
∂2 X
SSE = 2 Xi2
∂β 2 i=1
>0
So Pn
Yi Xi
Pi=1
n 2
i=1 Xi

is indeed the unique least squares estimator, denoted β.


b

(b) (7 pts.)

Let
n
X (Yi − βXi )2
W SSE =
i=1
σi2
n
!2
X Yi − βXi
=
i=1
σi

n
!
∂ X Yi Xi − βXi2
W SSE = −2
∂β i=1
σi2

1
Set

W SSE = 0.
∂β
Then,
n
X Yi Xi
i=1
σi2
β= n
X Xi2
i=1
σi2
And
n
∂2 X Xi2
W SSE = 2
∂β 2
i=1
σi2
>0
So
n
X Yi Xi
i=1
σi2
n
X Xi2
i=1
σi2

is indeed the unique weighted least squares estimator, denoted β.


e

(c) (7 pts.)
"P #
n
i=1 Yi Xi
E[β] = E Pn
b
2
i=1 Xi
Pn
i=1 Xi E[Yi ]
= P n
X2
Pni=1 2i
β X
= Pni=1 2i
i=1 Xi

Pn !
i=1 Y i X i
Var(β)
b = Var Pn
2
i=1 Xi
Pn
Xi2 σi2
= Pi=1
n 2 2

i=1 Xi

" Pn Yi X i #
i=1 σi2
E[β]
e =E
Pn Xi2
i=1 σi2
n
1 X Xi E[Yi ]
=P
n Xi2
i=1
σi2
i=1 σi2
n
β X X2 i
=P
n Xi2
i=1
σi2
i=1 σi2

2
Pn Yi X i !
i=1 σi2
Var(β)
e = Var
Pn Xi2
i=1 σi2
n
!
1 X Yi Xi
= P 2 Var
n Xi2
i=1
σi2
i=1 σi2
n
1 X X 2 Var(Yi )
i
= P 2
n Xi2
i=1
σi4
i=1 σi2
Pn Xi2
i=1 σi2
= P 2
n Xi2
i=1 σi2
1
=P Xi2
n
i=1 σi2

(d) (8 pts.)

1
Var(β)
e =
Pn Xi2
i=1 σi2
Pn 2 2
i=1 Xi σi
= Pn Xi2 Pn 2 2
i=1 σi2 i=1 Xi σi
Pn 2 2
i=1 Xi σi
= Pn   2P 2
Xi n
i=1 σi i=1 Xi σi
Pn
X 2 σ2
≤  Pi=1 i i 2
n 2
i=1 Xi

= Var(β),
b
where the inequality comes from Cauchy-Schwartz.

(e) (7 pts.)

Above we showed
n
1 X Xi
βe = n Yi .
X Xi2 i=1
σi2

i=1
σi2

βe is a linear combination of Normal random variables Yi , and thus also Normally distributed. We have
already found the mean and variance in part (c). Therefore,
!
1
βe ∼ N β, P Xi2
n
i=1 σi2

and a 1 − α confidence interval for β is


v
u 1
β ± zα/2 t P .
e u
n Xi2
i=1 σi2

3
(f) (8 pts.)

set.seed(100)
n = 100
b.OLS <- rep(NA,1000)
b.WLS.known.var <- rep(NA,1000)
b.WLS.unknown.var <- rep(NA,1000)
for (itr in 1:1000){
x = runif(n)
s = x^2
y = 3*x + rnorm(n, mean = 0, sd = s)

out <- lm(y ~ x - 1)


b.OLS[itr] <- out$coefficients[1]
out2 <- lm(y ~ x - 1, weights = 1/s^2)
b.WLS.known.var[itr] <- out2$coefficients[1]
u = log((resid(out))^2)
tmp = loess(u ~ x)
s2 = exp(tmp$fitted)

w = 1/s2
out3 = lm(y ~ x - 1,weights=w)
b.WLS.unknown.var[itr] <- out3$coefficients[1]
}

4
80

300
70
60

250
50

200
Frequency

Frequency
40

150
30

100
20

50
10
0

0
2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5
OLS beta WLS beta, known s
150
125
100
Frequency
75
50
25
0

2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5

WLS beta, unknown s

Table 1: Variances of Estimators

OLS WLS.known.s WLS.unknown.s


0.0131942 6.93e-05 0.0008159

If we are dealing with a highly heteroskedastic data set such as this, and do not know the variance of the
noise, using weighted least squares based on estimated variances is a better strategy.

5
Problem 2 [28 points]
(a) (7 pts.)

5 10 20 0.0 0.5 1.0 1.5

80
Day

40
0
25

n
15
5

ybar

20
10
1.0

SD
0.0

0 40 80 10 15 20 25
5 10 15 0.5 1.0 1.5 2.0

80
Day
40
0
10 15

n
5

30

ybar
20
10
1.5

SD
0.5

0 40 80 10 20 30

6
Type 0
Type 1

25
20
ybar

15
10

0 20 40 60 80 100

Day
A common intercept looks feasible, however, ybar appears to increase at a faster rate in Type 1.

(b) (7 pts.)

##
## Call:
## lm(formula = ybar ~ Day * Type, data = allshoots)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.74747 -0.21000 0.08631 0.35212 0.89507
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.475879 0.230981 41.025 < 2e-16 ***
## Day 0.187238 0.003696 50.655 < 2e-16 ***
## Type 0.339406 0.329997 1.029 0.309
## Day:Type 0.031217 0.005625 5.550 1.21e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5917 on 48 degrees of freedom
## Multiple R-squared: 0.9909, Adjusted R-squared: 0.9903
## F-statistic: 1741 on 3 and 48 DF, p-value: < 2.2e-16

7
Table 2: 90% confidence intervals for regression coefficeints

5% 95 %
(Intercept) 9.0884732 9.8632855
Day 0.1810386 0.1934377
Type -0.2140739 0.8928853
Day:Type 0.0217825 0.0406507

Type 0

1
Type 1
1
Studentized residuals

Studentized residuals

0
0

−1
−1

−2
−3
−2

0 20 40 60 80 100 0 1

Day Type
Residuals vs Fitted Normal Q−Q
2
1.0
0.5

1
Standardized residuals
0.0

0
Residuals

−1
−1.0

3 15
−2

15
3

35
−3
−2.0

35

10 15 20 25 30 −2 −1 0 1 2

Fitted values Theoretical Quantiles

Cook's distance Residuals vs Leverage


0.25

35
0.20

1
Standardized residuals

52
Cook's distance

0
0.15

3
−1
0.10

52
−2

3
0.05

0.5
−3

35
0.00

Cook's distance
1

0 10 20 30 40 50 0.00 0.05 0.10 0.15 0.20

Obs. number Leverage

8
(c) (7 pts.)

##
## Call:
## lm(formula = ybar ~ Day * Type, data = allshoots, weights = n)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -4.2166 -0.8300 0.1597 0.9882 3.3196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.488374 0.238615 39.764 < 2e-16 ***
## Day 0.187258 0.003486 53.722 < 2e-16 ***
## Type 0.485380 0.362496 1.339 0.187
## Day:Type 0.030072 0.005800 5.185 4.28e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.675 on 48 degrees of freedom
## Multiple R-squared: 0.9906, Adjusted R-squared: 0.9901
## F-statistic: 1695 on 3 and 48 DF, p-value: < 2.2e-16

Table 3: 90% confidence intervals for weighted regression coefficeints

5% 95 %
(Intercept) 9.0881641 9.8885842
Day 0.1814118 0.1931043
Type -0.1226072 1.0933663
Day:Type 0.0203446 0.0397999

9
(d) (7 pts.)

Type 0
Type 1
25
20
ybar

15
10

0 20 40 60 80 100

Day

10
Problem 3 [28 points]
(a) (7 pts.)

##
## Call:
## lm(formula = FoodIndex ~ ., data = BigMac2003)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.0642 -6.3965 -0.0262 5.6928 26.3002
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.09968 11.19872 -0.098 0.9221
## BigMac -0.20569 0.07798 -2.638 0.0107 *
## Bread 0.44383 0.10564 4.201 9.11e-05 ***
## Rice 0.26881 0.13597 1.977 0.0527 .
## Bus 3.59014 2.83317 1.267 0.2101
## Apt 0.01825 0.00434 4.204 9.02e-05 ***
## TeachGI -0.97768 0.86750 -1.127 0.2643
## TeachNI 2.22275 1.13819 1.953 0.0556 .
## TaxRate 0.26530 0.25724 1.031 0.3066
## TeachHours 0.48015 0.20478 2.345 0.0224 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.86 on 59 degrees of freedom
## Multiple R-squared: 0.7981, Adjusted R-squared: 0.7673
## F-statistic: 25.91 on 9 and 59 DF, p-value: < 2.2e-16

11
Stud. Resids Stud. Resids Stud. Resids Stud. Resids

−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

0
20

10
50

500

20
40

Apt

30
Rice

1000
100

BigMac

TeachNI
60

40
1500
150

80

50
2000

12
Stud. Resids Stud. Resids Stud. Resids Stud. Resids

−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

0
20

20

10
40

40
2

Bus
Bread

20

TaxRate
TeachGI
60

30
60
3
80

40
80
2
1
Stud. Resids

0
−1
−2

20 30 40 50

TeachHours
Residuals vs Fitted 3
Normal Q−Q
30

Tokyo Tokyo
2
20

Standardized residuals
10

1
Residuals

0
−30 −20 −10

−1

London
−2

Miami
Miami
Mumbai

20 40 60 80 100 −2 −1 0 1 2

Fitted values Theoretical Quantiles

Cook's distance Residuals vs Leverage


3

Mumbai
0.5

0.5
2
Standardized residuals

Nairobi
0.4
Cook's distance

1
0.3

Nairobi
0.2

−1

Shanghi

Shanghi
0.1

−2

0.5
Mumbai
Cook's distance
0.0

−3

0 10 20 30 40 50 60 70 0.0 0.1 0.2 0.3 0.4 0.5

Obs. number Leverage

13
(b) (7 pts.)

0.5 % 99.5 %
BigMac -0.4132679 0.0018835

The confidence interval includes 0 so, given all the other variables, we cannot conclude the price of a BigMac
has a significant association with Food Index at level α = 0.01.

(c) (7 pts.)

Res.Df RSS Df Sum of Sq F Pr(>F)


67 27532.922 NA NA NA NA
59 8299.912 8 19233.01 17.08975 0

We are testing the hypothesis that all other variables are conditionally uncorrelated with Food Index, given
the price of a BigMac. The ANOVA table shows there is very strong evidence in favor of the alternative (we
reject).

(d) (7 pts.)

library(DAAG)
out1 <- cv.lm(df = BigMac2003, form.lm = formula(FoodIndex ~ .), m = 10, plotit = F)
out2 <- cv.lm(df = BigMac2003, form.lm = formula(FoodIndex ~ BigMac), m = 10, plotit = F)

The predictive MSEs of each model are estimated by 10-fold cross validation. We conclude the model only
utilizing BigMac has better predictive accuracy.

Err
df ull = 1764, Err
dBigM ac = 472

14

You might also like