Chapter 13 of What This Is
Chapter 13 of What This Is
In order to calculate the regression equation we need to calculate regression coefficients 𝒃𝟎 and 𝒃𝟏
𝑆𝑆𝑥𝑦
𝑆𝑆 (𝑆𝑆𝑥𝑦 )2 𝑟= ⁄
𝑟 = 𝑏1 𝑥𝑦⁄𝑆𝑆 𝑜𝑟
2 ⁄𝑆𝑆 𝑆𝑆 √𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
𝑦𝑦 𝑥𝑥 𝑦𝑦
0 ≤ 𝑟2 ≤ 1 −1 ≤ 𝑟 ≤ 1
Notes:
The value of 𝑆𝑆𝑋𝑋 is never negative, and that of 𝑆𝑆𝑋𝑌 can be positive or negative, so the sign of
be depend on 𝑆𝑆𝑋𝑌 .
If 𝒃𝟏 > 𝟎, then the relation between x and y is called a positive linear relationship.
If 𝒃𝟏 < 𝟎, then the relation between x and y is called a negative linear relationship.
5
If 𝒓 is close to 1, then there is strong positive linear correlation.
If 𝒓 is close to -1, then there is strong negative linear correlation.
If 𝒓 is positive but close to 0, then there is weak positive linear correlation.
If 𝒓 is negative but close to 0, then there is weak negative linear correlation.
Question:
A random sample of 8 drivers selected from a small town insured with a company and having
similar minimum required auto insurance policies was selected. The following table lists their
driving experiences (in years) and monthly auto insurance premiums (in dollars):
Driving Experience Monthly Auto Insurance
(years) premiums ($)
5 64
2 87
12 50
9 71
15 44
6 56
25 42
16 60
(a) Does the insurance premium depend on the driving experience, or does the driving experience
depend on the insurance premium? Do you expect a positive or negative relationship between
these two variables?
Sol.
We expect the insurance premium to depend on the driving experience, consequently, the
insurance premium is a dependent variable (variable y) and driving experience is and independent
variable (variable x) in the regression model, we expect a negative relationship between these two
variables, so both correlation coefficient and population regression slope are expected to be
negative.
(∑ 𝑥)2 (90)2
𝑆𝑆𝑥𝑥 = ∑ 𝑥 2 − = 1396 − = 𝟑𝟖𝟑. 𝟓𝟎𝟎
𝑛 8
(∑ 𝑦)2 (474)2
𝑆𝑆𝑦𝑦 = ∑ 𝑦 2 − = 29,642 − = 𝟏𝟓𝟓𝟕. 𝟓𝟎𝟎
𝑛 8
6
∑𝑥∑𝑦 (90)(474)
𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑦 − = 4739 − = −𝟓𝟗𝟑. 𝟓𝟎𝟎
𝑛 8
(c) Find the least squares regression line by choosing appropriate dependent and independent
variables based on your answer in part a
Sol.
𝑥̅ = ∑ 𝑥 /𝑛 =90/8 =11.25 𝑦̅ = ∑ 𝑦 /𝑛 =474/8 =59.25
𝑆𝑆𝑥𝑦
𝑏1 = ⁄𝑆𝑆 = −593.500⁄383.500 = −𝟏. 𝟓𝟒𝟕𝟔
𝑥𝑥
(f) Predict the monthly auto insurance premium for a driver with 10 years of driving experience.
Sol.
Using the estimated regression line, we find the predicted value of y for x=10 as:
𝑦̂ = 76.6605 − 1.5476 𝑥 = 76.6605 − 1.5476 (10) = $𝟔𝟏. 𝟏𝟖
Thus, we expect the monthly insurance premium for a driver of 10 years of experience to be $61.18
7
Correct
Questions
Answer
1) One use of a regression line is:
A) to determine if any x-values are outliers.
B) to determine if any y-values are outliers. D
C) to determine if a change in x causes a change in y.
D) to estimate the change in y for a one-unit change in x.
2) The Y-intercept (b0) represents the
A) predicted value of Y when X = 0. C) change in estimated Y per unit change in X. A
C) predicted value of Y. D) variation around the sample regression line.
3) The slope (b1) represents
A) predicted value of Y when X = 0. C) the estimated average change in Y per unit change in X. C
B) the predicted value of Y. D) variation around the line of regression.
4) The least squares method minimizes which of the following?
A) SSR B) SSE (sum of squares error) C) SST D) All of the above B
5) A simple regression model contains:
A) two independent variables C) one independent and one dependent variable C
B) two dependent variables D) more than one independent variable
6) A linear regression:
A) gives a relationship between two variables that can be described by a line
B) gives a relationship between two variables that cannot be described by a line A
C) gives a relationship between three variables that can be described by a line
D) contains only two variables
7) The slope, b1, of the regression line:
A) is a point estimator of the slope of the population regression line
B) possesses no sampling distribution A
C) will have the same value for all samples taken from the population
D) is not affected by the elements taken from the population
8) In the equation 𝑦 = 12 + 6𝑥, (𝑦) is the:
A) independent variable C) dependent variable C
B) slope of the line D) y-intercept
9) In the equation 𝑦 = 12 + 6𝑥, (𝑥) is the:
A) independent variable C) dependent variable A
B) slope of the line D) y-intercept
10) In the equation 𝑦 = 12 + 6𝑥, (12) is the:
A) independent variable C) dependent variable D
B) slope of the line D) y-intercept
11)In the equation 𝑦 = 12 + 6𝑥, (6) is the:
A) independent variable C) dependent variable B
B) slope of the line D) y-intercept
12)Past data has shown that the regression line relating the final exam score and the midterm exam score
for students who take statistics from a certain professor is: Final exam = 50 + 0.5 × midterm
One interpretation of the slope is:
A) a student who scored 0 on the midterm would be predicted to score 50 on the final exam. C
B) a student who scored 0 on the final exam would be predicted to score 50 on the midterm exam.
C) a student who scored 10 points higher than another student on the midterm would be predicted to
score 5 points higher than the other student on the final exam.
D) none of the above.
8
13)In the regression model 𝑦̂ = 𝑏0 + 𝑏1 𝑥 ,( 𝑦̂) is the:
A) actual value of y C) value of y when a and b are zero B
B) predicted value of y D) missing value of y
14)The value of y obtained for an element from a survey is the:
C
A) predicted value of y B) estimated value of y C) actual value of y D) residual
15)The residuals (error) represent
A) the difference between the actual Y values and the mean of Y.
B) the difference between the actual Y values and the predicted Y values. B
C) the square root of the slope.
D) the predicted value of Y for the average X value.
16)The least squares method minimizes the:
A) difference between the y and x values C) sum of the squares of errors (SSE) C
B) sum of the errors D) sum of the y values
17)The value of 𝑆𝑆𝑥𝑥 :
A) is always negative C) is always positive B
B) is always non-negative D) can be negative, zero, or positive
18)The value of 𝑆𝑆𝑦𝑦 :
A) is always negative C) is always positive B
B) is always non-negative D) can be negative, zero, or positive
19)The value of 𝑆𝑆𝑥𝑦 :
A) is always negative C) is always positive D
B) is always non-negative D) can be negative, zero, or positive
20)The value of the coefficient of determination is always:
A) less than 1 B) greater than 1 C) in the range zero to 1 D) between -1 and 1 C
21)The value of the correlation coefficient is always:
A) in the range 0 to 1 B) less than 0 C) in the range -1 to 1 D) greater than 0 C
22)The coefficient of determination (r2) tells you
A) Whether r has any significance.
B) The coefficient of correlation is larger than one. D
C) Whether total variation should be partitioned.
D) The proportion of total variation in Y that is explained by X.
23)One way to measure the quality of the regression model is to inspect the value of the:
A) constant term B) coefficient on x C) coefficient of determination D) mean of x C
24)A perfect positive correlation means:
A) the points in a scatter diagram lie on an upward sloping line C) r is equal to -1 A
B) the points in a scatter diagram lie on a downward sloping line D) r is equal to 0
25)A strong positive correlation means:
A) the points in a scatter diagram lie on an upward sloping line C) r is close to -1 C
B) the points in a scatter diagram lie on a downward sloping line D) r is close to 1
9
27)The strength of the linear relationship between two numerical variables may be measured by the
A) scatter plot. B) coefficient of correlation. C) slope. D) Y-intercept. B
28)If the correlation coefficient (r) = 1.00, then
A) the Y-intercept (b0) must equal 0. C) there is no unexplained variation. C
B) the explained variation equals the unexplained variation. D) there is no explained variation.
29)Assuming a linear relationship between X and Y, if the coefficient of correlation (r) equals – 0.30,
A) there is no correlation. C) variable X is larger than variable Y. B
B) the slope (b1) is negative. D) the variance of X is negative.
30)In a simple linear regression problem, (r and b1 )
A) may have opposite signs. C) must have opposite signs. B
B) must have the same sign. D) are equal.
31)Given that 𝑆𝑆𝑥𝑥 =875 and 𝑆𝑆𝑥𝑦 =275, the value of b in the regression of y on x, rounded to two
decimal places, is:
A
A) 0.31 B) 3.18 C) 0.13 D) None of the above
32)For a data set on x and y, the value of 𝑆𝑆𝑥𝑥 is 979, 𝑆𝑆𝑥𝑦 is –1,538, the mean of the x values is 88 ,
and the mean of the y values is 55. The value of a in the regression of y on x, rounded to two decimal
places, is: A
A) 193.25 B) 106.02 C) 174.40 D) None of the above
33)For a sample of 19 values of x and y, 𝑆𝑆𝑦𝑦 =294, 𝑆𝑆𝑥𝑦 = 502, and b1 = 0.27. The coefficient of
determination for the regression of y on x, rounded to three decimal places, is:
B
A) 0.158 B) 0.461 C) 0.632 D) None of the above
34)Given that 𝑆𝑆𝑥𝑥 = 276, 𝑆𝑆𝑦𝑦 = 183, and 𝑆𝑆𝑥𝑦 = 153, what is the value of the correlation coefficient,
rounded to three decimal places?
B
A) 1.649 B) 0.681 C) 0.003 D) None of the above
Questions from (35) to (42) refer to the following information:
The following table lists the data for incomes and food expenditures of seven households. Use income as
independent variable and food expenditures as a dependent variable. Find
Income(in hundreds) Food expenditures
(x) (y)
55 14
83 24
38 13
61 16
33 9
49 15
67 17
35)The values of 𝑆𝑆𝑥𝑥 , rounded to two decimal places is:
A) 447.57 B) 1772.86 C) 747.57 D) None of the above B
36)The values of 𝑆𝑆𝑥𝑦 , rounded to two decimal places is:
A
A) 447.57 B) 1772.86 C) 747.57 D) None of the above
37)The values of 𝑆𝑆𝑦𝑦 , rounded to two decimal places is:
C
A) 447.57 B) 1772.86 C) 125.71 D) None of the above
38)For the regression of y on x, the values of b0 , rounded to two decimal places, is:
A) -2.03 B) 0.25 C) 1.51 D) None of the above C
39)For the regression of y on x, the values of b1, rounded to two decimal places, is:
A) -2.03 B) 0.25 C) 1.51 D) None of the above B
10
40)Using the estimated regression model, the predicted of food expenditures with a monthly income of
$6100, rounded to the nearest dollar, is:
A
A) $1690.75 B) $9206.1 C) -$1390.5 D) None of the above
41)The coefficient of determination for the regression of y on x, rounded to three decimal places, is:
A) 0.948 B) 0.899 C) 0.894 D) None of the above B
42)The coefficient of correlation for the regression of y on x, rounded to three decimal places, is:
A) 0.948 B) 0.899 C) 0.894 D) None of the above A
Questions from (43) to (47) refer to the following exercise:
The following data is used to construct a regression model:
X 3 2 5 1 7
Y 10 8 12 4 19
43)Which of the following values indicates the slope?
A) 3.25 B) 2.75 C) 2.25 D) None of the above C
44)Which of the following values indicates the intercept?
A) 3.00 B) 2.25 C) 2.50 D) None of the above C
45)Which of the following values indicates the coefficient of correlation?
A) 0.976 B) 0.796 C) -0.435 D) -0.976 A
46)Which of the following values indicates the coefficient of determination?
A) 90% B) 80% C) 85% D) 95% D
47)Which of the following values indicates the covariance?
A) 10.44 B) 9.46 C) 11.85 D) None of the above D
*********************************************************************************
**************************************************************
11