Simple Linear Regression 35
Problems
1. Consider a set of data (xi , yi ), i = 1, 2, · · · , n, and the following two
regression models:
yi = β0 + β1 xi + ε, (i = 1, 2, · · · , n), Model A
yi = γ0 + γ1 xi + γ2 x2i + ε, (i = 1, 2, · · · , n), Model B
Suppose both models are fitted to the same data. Show that
SSRes, A ≥ SSRes, B
If more higher order terms are added into the above Model B, i.e.,
yi = γ0 + γ1 xi + γ2 x2i + γ3 x3i + · · · + γk xki + ε, (i = 1, 2, · · · , n),
show that the inequality SSRes, A ≥ SSRes, B still holds.
2. Consider the zero intercept model given by
yi = β1 xi + εi , (i = 1, 2, · · · , n)
where the εi ’s are independent normal variables with constant variance
σ 2 . Show that the 100(1 − α)% confidence interval on E(y|x0 ) is given
by
x2
b1 x0 + tα/2, n−1 s n 0 2
i=1 xi
n n
yi xi
where s = (yi − b1 xi )/(n − 1) and b1 = i=1
n .
2
i=1 i=1 xi
3. Derive and discuss the (1 − α)100% confidence interval on the slope β1
for the simple linear model with zero intercept.
4. Consider the fixed zero intercept regression model
yi = β1 xi + εi , (i = 1, 2, · · · , n)
The appropriate estimator of σ 2 is given by
n
(yi − ŷi )2
s2 =
i=1
n−1
Show that s2 is an unbiased estimator of σ 2 .
36 Linear Regression Analysis: Theory and Computing
Table 2.10 Data for Two Parallel
Regression Lines
x y
x1 y1
. .
.. ..
xn1 yn1
xn1 +1 yn1 +1
.. ..
. .
xn1 +n2 yn1 +n2
5. Consider a situation in which the regression data set is divided into two
parts as shown in Table 2.10.
The regression model is given by
⎧
(1)
⎪
⎨ β0 + β1 xi + εi , i = 1, 2, · · · , n1 ;
yi =
⎪
⎩ β (2) + β x + ε , i = n + 1, · · · , n + n .
0 1 i i 1 1 2
In other words, there are two regression lines with common slope. Using
the centered regression model
⎧
(1∗)
⎪
⎨ β0 + β1 (xi − x̄1 ) + εi , i = 1, 2, · · · , n1 ;
yi =
⎪
⎩ β (2∗) + β (x − x̄ ) + ε , i = n + 1, · · · , n + n ,
0 1 i 2 i 1 1 2
n1 n1 +n2
where x̄1 = i=1 xi /n1 and x̄2 = i=n1 +1 xi /n2 . Show that the least
squares estimate of β1 is given by
n1 n1 +n2
i=1 (xi − x̄1 )yi + i=n1 +1 (xi − x̄2 )yi
b1 = n1 n1 +n
2 2
i=1 (xi − x̄1 ) + i=n1 +1 (xi − x̄2 )
2
6. Consider two simple linear models
Y1j = α1 + β1 x1j + ε1j , j = 1, 2, · · · , n1
and
Y2j = α2 + β2 x2j + ε2j , j = 1, 2, · · · , n2
Assume that β1 = β2 the above two simple linear models intersect. Let
x0 be the point on the x-axis at which the two linear models intersect.
Also assume that εij are independent normal variable with a variance
σ 2 . Show that
Simple Linear Regression 37
α1 − α2
(a). x0 =
β1 − β 2
(b). Find the maximum likelihood estimates (MLE) of x0 using the
least squares estimators α̂1 , α̂2 , β̂1 , and β̂2 .
(c). Show that the distribution of Z, where
Z = (α̂1 − α̂2 ) + x0 (β̂1 − β̂2 ),
is the normal distribution with mean 0 and variance A2 σ 2 , where
2 2
2
x1j − 2x0 x1j + x20 n1 x2j − 2x0 x2j + x20 n2
A = + .
n1 (x1j − x̄1 )2 n2 (x2j − x̄2 )2
(d). Show that U = N σ̂ 2 /σ 2 is distributed as χ2 (N ), where N =
n1 + n2 − 4.
(e). Show that U and Z are independent.
(f). Show that W = Z 2 /A2 σ̂ 2 has the F distribution with degrees of
freedom 1 and N .
(g). Let S12 = (x1j − x̄1 )2 and S22 = (x2j − x̄2 )2 , show that the
solution of the following quadratic equation about x0 , q(x0 ) =
ax20 + 2bx0 + c = 0,
1 1
(β̂1 − β̂2 )2 − 2 + 2 σ̂ 2 Fα,1,N x20
S1 S2
x̄ x̄2 2
1
+ 2 (α̂1 − α̂2 )(β̂1 − β̂2 ) + + σ̂ Fα,1,N x0
S12 S22
x2 2
x2j 2
1j
+ (α̂1 − α̂2 )2 − 2 + 2 σ̂ Fα,1,N = 0.
n 1 S1 n 2 S2
Show that if a ≥ 0 and b2 − ac ≥ 0, then 1 − α confidence interval
on x0 is
√ √
−b − b2 − ac −b + b2 − ac
≤ x0 ≤ .
a a
7. Observations on the yield of a chemical reaction taken at various tem-
peratures were recorded in Table 2.11:
(a). Fit a simple linear regression and estimate β0 and β1 using the
least squares method.
(b). Compute 95% confidence intervals on E(y|x) at 4 levels of temper-
atures in the data. Plot the upper and lower confidence intervals
around the regression line.
38 Linear Regression Analysis: Theory and Computing
Table 2.11 Chemical Reaction Data
temperature (C 0 ) yield of chemical reaction (%)
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
150 77.4
Data Source: Raymond H. Myers, Classical and Mod-
ern Regression Analysis With Applications, P77.
(c). Plot a 95% confidence band on the regression line. Plot on the
same graph for part (b) and comment on it.
8. The study “Development of LIFETEST, a Dynamic Technique to As-
sess Individual Capability to Lift Material” was conducted in Virginia
Polytechnic Institute and State University in 1982 to determine if cer-
tain static arm strength measures have influence on the “dynamic lift”
characteristics of individual. 25 individuals were subjected to strength
tests and then were asked to perform a weight-lifting test in which
weight was dynamically lifted overhead. The data are in Table 2.12:
(a). Find the linear regression line using the least squares method.
(b). Define the joint hypothesis H0 : β0 = 0, β1 = 2.2. Test this
hypothesis problem using a 95% joint confidence region and β0
and β1 to draw your conclusion.
(c). Calculate the studentized residuals for the regression model. Plot
the studentized residuals against x and comment on the plot.
Simple Linear Regression 39
Table 2.12 Weight-lifting Test Data
Individual Arm Strength (x) Dynamic Lift (y)
1 17.3 71.4
2 19.5 48.3
3 19.5 88.3
4 19.7 75.0
5 22.9 91.7
6 23.1 100.0
7 26.4 73.3
8 26.8 65.0
9 27.6 75.0
10 28.1 88.3
11 28.1 68.3
12 28.7 96.7
13 29.0 76.7
14 29.6 78.3
15 29.9 60.0
16 29.9 71.7
17 30.3 85.0
18 31.3 85.0
19 36.0 88.3
20 39.5 100.0
21 40.4 100.0
22 44.3 100.0
23 44.6 91.7
24 50.4 100.0
25 55.9 71.7
Data Source: Raymond H. Myers, Classical and Mod-
ern Regression Analysis With Applications, P76.