0% found this document useful (0 votes)
35 views4 pages

Yy 1 Xy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views4 pages

Yy 1 Xy

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Institute of Technology of Cambodia Statistics

I3–TD6
( Linear Regression Models)

1. For a random sample of size n.

(a) Show that the error sum of squares can be expressed by

SSE = Syy − β̂1 Sxy .

(b) Show that E[SSE] = (n − 2)σ 2 .

2. The following are midterm and final examination test scores for 10 students from a
calculus class, where x denotes the midterm score and y denotes the final score for
each student.

x 68 87 75 91 82 77 86 82 75 79
y 74 79 80 93 88 79 97 95 89 92

(a) Calculate the least-squares regression line for these data.


(b) Plot the points and the least-squares regression line on the same graph.

3. (a) Show that the least-squares estimates of β0 and β1 of a line can be expressed as
Pn
(x − x̄)(yi − ȳ)
ˆ Pn i
β0 = ȳ − β̂1 x̄, β̂1 = i=1 2
.
i=1 (xi − x̄)

(b) Using part (a), show that the line fitted by the method of least squares passes
through the point (x, y).

4. Show that the mle’s of β0 and β1 are indeed the least squares estimates. [Hint: The
pdf of Yi is normal with mean µi = β0 + β1 xi and variance σ 2 ; the likelihood is the
product of the n pdf’s.]

5. A farmer collected the following data, which show crop yields for various amounts of
fertilizer used.

Fertilizer (pounds/100 sq. ft) 0 4 8 10 15 18 20 25


Yield (bushels) 6 7 10 13 17 18 22 23

(a) Calculate the least-squares regression line for these data.


(b) Plot the points and the least-squares regression line on the same graph.

6. The accompanying data table gives observations on total acidity of coal samples of
three different types, with determinations made using three different concentrations of
ethanolic NaOH (“Chemistry of Brown Coals,” Australian J. Applied Science, 1958:
375-379).

x 38 26 48 22 40 15 30 33
y 10 11 16 8 12 5 10 11

(a) Find the least-squares line appropriate for these data.

Mr. Phok Ponna 1/4 2023–2024


Institute of Technology of Cambodia Statistics

(b) Plot the points and graph the line as a check on your calculations.
(c) Calculate the 95% confidence intervals for β0 and β1 , respectively.

7. Show that Y and βˆ1 are independent, under the usual assumptions of a simple linear
regression model.

8. The following data represent survival time in days after a heart transplant and patient
age in years at the time of transplant for 10 randomly selected patients.

Age at transplant 28 41 46 53 39 36 47 29 48 44
Survival time, in days 7 278 44 48 406 382 1995 176 323 1846

(a) Find the least-squares line appropriate for these data.


(b) Plot the points and graph the line.
(c) Calculate the 95% confidence intervals for β0 and β1 , respectively.

9. The following are midterm and final examination test scores for 10 calculus students,
where x denotes the midterm score and y denotes the final score for each student.

x 68 87 75 91 82 77 86 82 75 79
y 74 89 80 93 88 79 97 95 89 92

Obtain a 95% prediction interval for x = 92 and interpret its meaning.

10. The following data give the annual incomes (in thousands of dollars) and amounts (in
thousands of dollars) of life insurance policies for eight persons.

Annual income 42 58 27 36 70 24 53 37
Life insurance 150 175 25 75 250 50 250 100

Obtain a 90% prediction interval for x = 59 and interpret its meaning.

11. The Turbine Oil Oxidation Test (TOST) and the Rotating Bomb Oxidation Test
(RBOT) are two different procedures for evaluating the oxidation stability of steam
turbine oils. The article “Dependence of Oxidation Stability of Steam Turbine Oil
on Base Oil Composition” (J. Soc. Tribologists Lubricat. Engrs., Oct. 1997: 19-24)
reported the accompanying observations on x = TOST time (hr) and y = RBOT time
(min) for 12 oil specimens.

TOST 4200 3600 3750 3675 4050 2770


RBOT 370 340 375 310 350 200
TOST 4870 4500 3450 2700 3750 3300
RBOT 400 375 285 225 345 285

a. Calculate and interpret the value of the sample correlation coefficient.


b. How would the value of r be affected if we had let x = RBOT time and y =
TOST time?
c. How would the value of r be affected if RBOT time were expressed in hours?
d. Construct a scatter plot and normal probability plots and comment.

Mr. Phok Ponna 2/4 2023–2024


Institute of Technology of Cambodia Statistics

e. Carry out a test of hypotheses to decide whether RBOT time and TOST time are
linearly related.

12. Verify that the t ratio for testing H0 : β1 = 0 is identical to t for testing H0 : ρ = 0.

13. The following are midterm and final examination test scores for 10 calculus students,
where x denotes the midterm score and y denotes the final score for each student.

x 68 87 75 91 82 77 86 82 75 79
y 74 89 80 93 88 79 97 95 89 92

(a) At 95% confidence level, test whether X and Y are independent.


(b) Find the P -value.
(c) State any assumptions you have made in solving the problem.

14. The following data give the annual incomes (in thousands of dollars) and amounts (in
thousands of dollars) of life insurance policies for eight persons.

Annual income 42 58 27 36 70 24 53 37
Life insurance 150 175 25 75 250 50 250 100

(a) At the 98% confidence level, test whether annual income and the amount of life
insurance policies are independent.
(b) Find the attained significance level.
(c) State any assumptions you have made in solving the problem.

15. A new drug is tested for serum cholesterol-lowering properties on six randomly selected
volunteers. The serum cholesterol values are given in the following table.

Before treatment: 232 254 220 200 213 222


After treatment: 212 240 225 205 204 218

(a) At 95% confidence level, test whether X and Y are independent.


(b) Find the p-value.
(c) Calculate the least-squares regression line for these data.
(d) Interpret the usefulness of the model.
(e) State any assumptions you have made in solving the problem.

16. Given the data

X1 X2 y

3 1 4
2 5 3
3 3 6
1 2 5

(a) Write the multiple regression model in matrix form.

Mr. Phok Ponna 3/4 2023–2024


Institute of Technology of Cambodia Statistics

(b) Find XT X, (XT X)−1 , XT Y


(c) Estimate β.
(d) Estimate the error variance.

17. The following is a random sample of height (in inches) and weight (in pounds) of seven
basketball players.

Height 73 83 77 80 85 71 80
Weight 186 234 208 237 265 190 220

Calculate the least-squares regression line for these data using matrix operations.

18. Fit the model Y = β0 + β1 x1 + β2 x2 + ε to the data

x1 x2 y
-1 -1 1
-1 1 1
1 -1 0
1 1 4

a. Determine X and Y and express the normal equations in terms of matrices.


b. Determine the β̂ vector, which contains the estimates for the three coefficients in
the model.
c. Determine Ŷ , the predictions for the four observations, and also the four residuals.
Find SSE by summing the four squared residuals. Use this to get the estimated
variance MSE.
d. Use the MSE and c11 to get a 95% confidence interval for β1 .
e. Carry out a t test for the hypothesis H0 : β1 = 0 against a two-tailed alternative,
and interpret the result.
f. Form the analysis of variance table and carry out the F test for the hypothesis
H0 : β1 = β2 = 0. Find R2 and interpret.

Mr. Phok Ponna 4/4 2023–2024

You might also like