0% found this document useful (0 votes)
140 views3 pages

3008 Assignment 1 - Due Oct 9th Revised

This document provides instructions for Assignment #1 Problem 2b revised of the Applied Regression Analysis course. It asks students to: 1. Show that the sum of squared x-values (SXX) for a simple linear regression with repeated x-values equals n(n-1)δ^2, where δ is the difference between the maximum and minimum x-values. 2. Derive the ordinary least squares estimate for the slope parameter β1, showing it equals the difference between the highest and average y-values, divided by the difference between the maximum and minimum x-values. 3. Determine if the regression line would pass through two specified points based on the OLS estimates from part 2

Uploaded by

Oliver Lockwood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views3 pages

3008 Assignment 1 - Due Oct 9th Revised

This document provides instructions for Assignment #1 Problem 2b revised of the Applied Regression Analysis course. It asks students to: 1. Show that the sum of squared x-values (SXX) for a simple linear regression with repeated x-values equals n(n-1)δ^2, where δ is the difference between the maximum and minimum x-values. 2. Derive the ordinary least squares estimate for the slope parameter β1, showing it equals the difference between the highest and average y-values, divided by the difference between the maximum and minimum x-values. 3. Determine if the regression line would pass through two specified points based on the OLS estimates from part 2

Uploaded by

Oliver Lockwood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

STAT 3008: Applied Regression Analysis

2020-21 Term 1
Assignment #1(Problem 2b revised)

Due: October 9th, 2020 (Friday) at 11:30pm


This assignment covers material up to Section 2.4 of the lecture notes.
You need to show your calculation in details order to obtain full scores.
Please submit the hardcopy of the R codes and results for Problems 5(a) and 5(b).

Problem 1 [30 points]: Suppose the following regression model is fitted to a data set with
observations {(xi, yi), i = 1, 2, …, n}:
i .i .d .
yi   xi  ei , ei ~ N (0, 2 )

(a) [9 points] Based on the least squares method and the fact that RSS/  2 ~  n21 (df = n-1
since df = n from the data and df = 1 from estimating  ), compute the least squares
estimates for  and  2 .

(b) [5 points] Is ˆ an unbiased estimator for β? Verify.


(c) [3 points] Show that the fitted regression line passes through the point
x , xy   1n 
2 n
x2 ,
i 1 i
1 n
n
i 1 i i

x y  , but not the average point ( x , y ) .

~
(d) [7 points] Derive the maximum likelihood estimates (MLE)  and ~ 2 .
(e) [6 points] Suppose (x1, x2, x3, x4, x5) = (1,2,3,4,5) and (y1, y2, y3, y4, y5) = (3, 8, 11, 17, 20).

What the values of the least squares estimates ˆ and ̂ ? Does the sum of residuals
2

equal to zero?

Problem 2 [18 points]: Suppose a simple linear regression is fitted to the data {(xi, yi), i = 1,
2, …, n} with x1 = x2 = xn-1 = a and xn = a+nδ. Should be (n-1).
i.e. average of
(a) [5 points] Show that SXX  n(n  1) 2 . the first (n-1) yi
n
(b) [7 points] Show that the OLS estimate for β1 is ˆ1   yn  yn 1  , where yn 1 
1 1
n
y .
n  1 i 1
i

(c) [6 points] Do you think the regression line obtained from the OLS estimates would pass
through Point A and B below? Verify
Point A: ( x, y)  a, yn 1  Point B: ( x, y)  ( xn , yn )

Page 1/3
Problem 3 [10 points]: Consider the residuals { êi } from the simple linear regression:

eˆi  yi  yˆ i  yi  ˆ0  ˆ1 xi , i = 1, 2, …, n

where ˆ1  SXY/SXX and ˆ0  y - ˆ1 x are the OLS estimates for β1and β0.

Show that { êi , i=1,2,…n} are uncorrelated with the explanatory variables {xi, i= 1,2,…n}.

1 n
That is, ˆ ( x, eˆ)   ( xi  x )(eˆi  eˆ)  0 .
n  1 i 1

Problem 4 [22 points]: Suppose simple linear regression is fitted to the data {(x1, y1), … (x19, y19)},
with E(Y | X  x)  0  1x, Var(Y | X  x)   2

The coefficient table and ANOVA table below shows some of the estimated values:

(a) [11 points] Replicate the two tables above, and fill in ALL the missing values (in 5 significant
figures) from the two tables.
(The p-values can be obtained from R commands like “> 1-pf(F0 , df1, df2)” for the
right-hand tailed probability of Fdf1, df2, or “pt(t0,d)” for the cdf of td)
(b) [3 points] Based on the results in part (a), what is the sample correlation coefficient between

x and y? That is, rxy  Cˆ orr( x, y)   ( xi  x )( yi  y ) /  (x  x)  ( y


i
2
i  y)2 .

(c) [8 points] Based on the results in part (a), test the hypotheses on whether β1 = -0.2 at α=0.05.
You should setup the 4 steps of hypothesis testing as on Ch2 page 65.

Problem 5 (R problem) [20 points]: The R library ‘alr3’ contains the “segreg” data, which
contains the electricity consumption (in KWH) and mean temperature (in F) for a building at
the University of Minnesota Twin Cities campus for 39 months in 1988-1992.
(https://www.rdocumentation.org/packages/alr3/versions/2.0.5/topics/segreg)
Suppose that we are interested in how the electricity consumption (y=segreg$C) is affected
by the monthly mean temperature (x=segreg$Temp), primarily driven by the use of air
conditioning.
(a) [10 points] Based on the R codes similar to those from Ch2 page 23, obtain the OLS

estimates ˆ0 , ˆ1 and ̂ 2 .


(b) [6 points] Based on the plot and the abline functions as in Ch1 page 26, generate the

Page 2/3
scatterplot of the data, and add the regression line obtained in part (a) to the plot.
(c) [4 points] Suppose an outlier is defined as observation (xi, yi) with | eˆi | 2̂ . Do you

think there is outlier in the data set? Verify.


(Note: A more precise definition of outlier will be introduced in Chapter 7, which
removes the impact of the outlier (xi, yi) itself when estimating ˆ ).

- End of the Assignment-

Page 3/3

You might also like