STAT 2601 Final Exam Extra Practice Questions
STAT 2601 Final Exam Extra Practice Questions
QUESTION 1. A Canadian manufacturing company operates two facilities that specialize in producing
electronic equipment. The facility in Toronto has enhanced production capability while the facility in Ottawa
does not have modern production equipment. The CEO of the manufacturing company is keen to learn the
difference in mean time of production in these two facilities. He has taken a random sample of 15 parts from
each facility and tracked the production time. The following sample data are recorded.
Toronto Ottawa
x1 = 56.7 hours x2 = 70.4 hours
s1 = 7.1 hours s2 = 8.3 hours
Assume that production times are normally distributed for both facilities.
(a) Do you think the population variances can be assumed equal for this study? Why?
(b) Use the critical value method with α = 0.05 to test whether there is significant difference between the
average production time of electronic parts produced at Toronto and Ottawa.
(c) What is the margin of error for constructing the 98% confidence interval between the average production
time of electronics at Toronto and Ottawa?
QUESTION 2. Wayne’s Widgets is a popular manufacturer of widgets. Unfortunately, the company has had
10% of all its widgets returned under warranty due to defects. Several quality control improvements were
then made to the manufacturing process to help decrease the defect rate. A random sample of 250 widgets
contained 19 widgets with defects. At the 5% significance level, is there sufficient evidence to indicate that
the quality control improvements were successful in reducing the defect rate? Conduct an appropriate test
using the p-value method. Comment on the validity of the testing procedure.
QUESTION 3. A personal trainer at a gym wants to be able to tell her clients how many calories, y, they
will burn based on the number of minutes, x, they spend on an elliptical exercise machine. She takes a
random sample of clients who have used the machine and records the amount of time they spend on the
machine and the total number of calories burned. The data are presented in the table below.
Minutes, x 5 8 10 12 15 17 19 22 25
Calories, y 71 112 96 156 122 202 189 239 218
9 9 9 9 9
= xi 133,
= ∑ 2
xi 2317,
= ∑
yi 1405,
=
2
yi 247191,
= ∑xi yi ∑ ∑ 23646
=i 1 =i 1 =i 1 =i 1 =i 1
(a) Compute and state the least squares regression line for predicting number of calories burned based on
number of minutes spent using the elliptical exercise machine.
(b) Estimate the mean number of calories burned by clients who ride the machine for 20 minutes.
(c) The trainer is asked to predict the number of calories burned if a client uses the machine for one hour.
What answer should the trainer give to the client?
QUESTION 4. This question is a continuation of Question 3. You may use any of the information given or
solutions derived in Question 3 to answer the following questions. Consider the following ANOVA table
provided by Excel.
ANOVA
Significance
df SS MS F F
Regression 1 23646.2495 23646.2495 39.3295 0.0004
Residual 7 4208.6394 601.2342
Total 8 27854.8889
(a) Which value in the ANOVA table represents estimates the variance of the error term ε ?
(b) Compute and interpret the coefficient of determination.
(c) Compute and interpret a 95% confidence interval for the population slope parameter.
(d) A client plans to use the machine for 15 minutes. Predict the number of calories that this client will burn
with 95% confidence. Interpret your result.
(a) At α = 0.05, does the distribution of answers to this survey question in the February of this year differ
from the distribution of answers last year? Conduct an appropriate test using the critical value method.
(b) Based on the result of part (a), do you think the retraining efforts were effective? Answer this question by
describing the nature of the differences using the two comparisons of observed and expected frequencies that
contributed the greatest to the value of the test statistic.
(c) Comment on the validity of the test result in (a).
QUESTION 6. Health scientists believe that personal health can be assessed by analyzing body fat. Body fat
(%) may depend on multiple factors including but not limited to age (years), height (cm), weight (lb), chest
(cm), abdomen (cm), hip (cm), thigh (cm). A heath analyst would like to identify the significant predictors of
body fat on a sample of 50 health records and starts with the following subset of regression models.
(a) Consider the Best Subsets output given below. Based on the metrics given below, which model would
you recommend they choose?
(b) Suppose the analyst is considering the model based on the following Excel regression analysis output to
predict body fat. Write the estimated multiple linear regression model and interpret the regression
coefficients based on the following regression output.
(c) Test the overall significance of the model at the 5% significance level using the p-value method.
(d) Use the critical value method to test whether weight is significant predictor of body fat at the 5%
significance level.
QUESTION 8. Here is the cost ($million) of production of a sample of 10 manufacturing companies in 2020
who produce carbonated drinks.
QUESTION 10. The manufacturer of the soft drink Crazy Caffeine Cola advertises that their bottles contain
500 mL of the product. For the purposes of quality control, the bottle filling process is periodically tested to
ensure that the average volume per bottle does not differ significantly from 500 mL. A random sample of 22
bottles yielded a mean volume of 495.6 mL. Assume that the population standard deviation is known to be
4.1 mL.
(a) Compute and interpret a 99% confidence interval for the average volume of all cans. Should the
manufacturer be concerned about this result?
(b) In addition to the fact a random sample was taken, what other condition is needed for the result of part (a)
to be valid?
(c) Suppose the manufacturer wants their estimated mean to be within 1.1 mL of the true mean with 99%
confidence. How many bottles should they sample?
(d) All else remaining the same, suppose the value 4.1 mL is the sample standard deviation from our sample
of 22 bottles and not the population standard deviation as previously stated. Give the value of the one
quantity that would change when computing the margin of error in part (a). DO NOT compute the margin of
error or another confidence interval.
QUESTION 11. A local coffee shop records the amount of coffee beans, in kilograms, used each week.
Assume the weekly coffee bean usage can be modelled using a normal probability distribution with a mean
of 184 kg and a standard deviation of 47 kg. Let X represent the usage for a randomly chosen week.
(a) In what proportion of all weeks will the coffee shop use between 165 kg and 240 kg of coffee beans?
(b) How many kilograms of coffee beans should the coffee shop have on hand to have only a 1% chance of
running out of coffee beans?
(c) The coffee shop manager plans to select a random sample of 36 weeks and measures the demand in each
week. What is the sampling distribution of the sample mean X ? Explain.
(d) What is the probability that in a sample of 36 randomly chosen weeks, the average weekly usage will not
exceed 200 kg?
(e) Suppose that the weekly coffee bean usage does not follow a normal distribution. Would your answer in
part (d) still be correct? Explain.
QUESTION 1.
𝑠𝑠2 (7.1)2
(a) Since 𝑠𝑠12 = (8.3)2 = 0.73 is between 1/3 and 3, assume the population variances are equal.
2
(b) 𝐻𝐻0 : 𝜇𝜇1 = 𝜇𝜇2 , 𝐻𝐻𝐴𝐴 : 𝜇𝜇1 ≠ 𝜇𝜇2 where 𝜇𝜇1 represents population average production time of Toronto facility
and 𝜇𝜇2 represents population average production time of Ottawa facility.
(𝑥𝑥
���1�−𝑥𝑥
���2�)−(𝜇𝜇1 −𝜇𝜇2 ) (56.7−70.4)
Test statistic: 𝑡𝑡 = = = −4.86
2( 1 + 1 ) 1 1
�𝑠𝑠𝑝𝑝 𝑛𝑛1 𝑛𝑛2
�59.65� + �
15 15
df = 𝑛𝑛1 + 𝑛𝑛2 − 2 = 15 + 15 − 2 = 28
Critical value at 𝛼𝛼 = 0.05 is 𝑡𝑡0.05,28 = ±2.048
Decision: Since 𝑡𝑡 = −4.86 < 𝑡𝑡𝑐𝑐𝑐𝑐 = −2.048 and falls in the lower rejection region
𝐻𝐻0 may be rejected and Ha accepted.
Note: For full marks, some students may just take the absolute value of the test statistic and compare it to
+2.048, and therefore only look up +2.048 above.
The average production times of Toronto and Ottawa facilities are significantly different at 5% level of
significance.
QUESTION 2.
H 0 : p = 0.10 H a : p < 0.10
19 pˆ − p0 0.076 − 0.1
=pˆ = 0.076 , z=
0 = ≈ −1.26
250 p0 q0 / n 0.1× 0.9 / 250
p-value = P(Z < –1.26) = 0.1038
Since the p-value = 0.1038 > α = 0.05, we fail to reject H0.
At α = 0.05, there is insufficient evidence to indicate that less than 10% of widgets are defective.
Since a random sample was used and np0 = 25 ≥ 5 and nq0 = 225 ≥ 5, the result is approximately valid.
QUESTION 3.
n n n
(a) ∑ i i ∑ xi ∑ yi n =
SS xy = x y − 23646 − 133(1405) 9 = 2883.2
=i 1 = i 1=
i 1
n 2
n
SS xx =− ∑
xi2 ∑ xi 2317 − 1332 9 = 351.5
n=
=i 1 = i 1
SS xy 2883.2
b1
= = ≈ 8.201327
SS xx 351.5
1405 133
b0 =y − b1 x ≈ − 8.201327 ≈ 34.9137
9 9
yˆ ≈ 34.9137 + 8.20133x
(c) Such a prediction should not be attempted because predicting for x = 60 would require us to extrapolate
too far outside of the experimental region.
2 SSR 23646.2
=
(b) R = ≈ 0.8489
SSTO 27854.9
84.89% of the observed variation in the number of calories burned can be explained by a linear relationship
with the number of minutes spent on the machine.
Note: Students could also say 84.89% of the observed variation in the number of calories burned can be
explained by the simple linear regression model which uses the number of minutes as the
explanatory/independent/predictor variable.
s 601.2
The interval is given by b1 ± tα /2,n −2 = 8.20 ± 2.365 = 8.20 ± 3.09 = (5.11, 11.29)
SS xx 351.5
For each additional 1 minute spent on the machine, we are 95% confident that on average the number of
calories burned will increase by 5.11 to 11.29 tires.
1 ( x0 − x ) 2 1 (15 − 133 / 9) 2
yˆ ± tα /2,n − 2 s 1 + + ⇒ 157.93 ± 2.365 601.2 1 + +
n SS xx 9 351.5
⇒ 157.93 ± 61.14 ⇒ (96.79, 219.07)
If one client spends 15 minutes on the machine, then we are 95% confident they will burn between 96.79
and 219.07 calories.
QUESTION 5.
The table below contains the expected values for each category and the computations for the test statistic.
Rating fi pi Ei = 420 × pi (fi – Ei)2/Ei
Excellent 24 0.05 21 0.428571
Very Good 58 0.12 50.4 1.146032
Good 157 0.25 105 25.752381
Fair 142 0.37 155.4 1.155470
Poor 39 0.21 88.2 27.444898
χ 0 = 55.9274
2
(b) More specifically, if the distribution of answers had not changed since 2020, we observed significantly
more “Good” ratings and fewer “Poor” ratings than expected. The significant increase in “Good” and
decrease in “Poor” ratings indicates that the retraining efforts were at least somewhat effective.
(c) Since we took a random sample and all expected values are at least 5, it is appropriate to conduct a chi-
square test for this situation.
(a) They should choose the model predicting body fat % from age, weight, abdomen, and thigh.
It has the highest adjusted R2 = 82.0 and the lowest s = 3.8530.
Note: Students could also argue that they should choose the model predicting body fat % from weight,
abdomen, and thigh. Although it has a slightly lower adjusted R2 = 81.7 and a slightly higher s = 3.8892, it
has the additional benefit of having one less independent variable to measure.
Interpretation of 𝑏𝑏1 = 0.1151: Average body fat is estimated to increase by 0.1151 percent for each
additional 1 pound (lb) of weight, while holding age constant.
Interpretation of 𝑏𝑏2 = 0.41: Average body fat is estimated to increase by 0.4129 percent for
each additional 1 year of age, while holding weight constant.
From the Excel output, we see the p-value is 5.27922 × 10–8 , or approximately 0.
Since p-value < 0.05, we reject H0 and accept Ha.
At the 5% significance level, the model is significant in predicting body fat percentage.
(d) 𝐻𝐻0 : 𝛽𝛽1 = 0 𝐻𝐻𝐴𝐴 : 𝛽𝛽1 ≠ 0
From the Excel output, we see the test statistic is 𝑡𝑡 = 4.819
Critical value of t at df =47 and 𝛼𝛼 = 0.05 is 𝑡𝑡0.025,47 ≈ 𝑧𝑧0.025 ≈ 1.96
Since 𝑡𝑡 = 4.819 > 1.96 𝐻𝐻0 may be rejected and Ha accepted.
Based on sample evidence, weight is a significant predictor of body fat at 5% level of significance.
QUESTION 7.
(a) Let X represent the number of people who express concern over their inability to pay rent or mortgage due
to COVID. X is binomial with n = 20 and p = 0.3
(b) 𝑃𝑃(𝑋𝑋 = 8) = 𝐶𝐶820 (0.30)8 (0.70)20−8 = 0.11
(c) P(X ≥ 3) = 1 − P(X ≤ 2) = 1 − [P(X = 0) + P(X = 1) + P(X = 2)]
= 1 − [𝐶𝐶020 (0.30)0 (0.70)20−0 + 𝐶𝐶120 (0.30)1 (0.70)20−1 + 𝐶𝐶220 (0.30)2 (0.70)20−2 ]
= 1 − [0.0008 + 0.0068 + 0.0278] = 0.9646
(d) 𝐸𝐸(𝑋𝑋) = 𝑛𝑛𝑛𝑛 = 20 × 0.30 = 6 (e) 𝜎𝜎 2 = 𝑛𝑛𝑛𝑛𝑛𝑛 = 20 × 0.30 × 0.70 = 4.2 , 𝜎𝜎 = �𝑛𝑛𝑛𝑛𝑛𝑛 = √4.2 = 2.0494
QUESTION 8.
∑ 𝑥𝑥 313
(a) (𝑖𝑖) 𝑥𝑥̅ = 𝑛𝑛
= 10
= $31.3𝑚𝑚
(𝑖𝑖𝑖𝑖)𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀: (1)𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷: 12,15,17,21,23,26,31,43,57,68 (2)𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃:0.5(𝑛𝑛 + 1) = 0.5(10 + 1) = 5.5
23+26
(3)𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀: = $24.5𝑚𝑚
2
(∑ 𝑥𝑥)2 (313)2
�∑ 𝑥𝑥 −
2
𝑛𝑛 �12,987 − 10
(𝑖𝑖𝑖𝑖𝑖𝑖)𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷: 𝑠𝑠 = = = $18.83𝑚𝑚
𝑛𝑛 − 1 10 − 1
(iv) Selection of Measure of Central Tendency: Since mean ($31.3m) > median ($24.5m), the distribution of
cost of production is right-skewed and hence median is recommended as median is not affected by extreme
values.
85
(b) p𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜:100 (𝑛𝑛 + 1) = 0.85(10 + 1) = 9.35
𝑃𝑃85 : 9𝑡𝑡ℎ + 0.35 (10𝑡𝑡ℎ − 9𝑡𝑡ℎ) = 57 + 0.35(68 − 57) = $60.85𝑚𝑚
QUESTION 9.
(a) 𝑃𝑃(𝑆𝑆) = 𝑃𝑃(𝑀𝑀 ∩ 𝑆𝑆) + 𝑃𝑃(𝑀𝑀𝑐𝑐 ∩ 𝑆𝑆) = P(S|M)P(M) + P(S|𝑀𝑀𝑐𝑐 )P(𝑀𝑀𝑐𝑐 ) = 0.12 + 0.24 = 0.36
𝑃𝑃(𝑀𝑀∩𝑆𝑆) 0.12
(b) 𝑃𝑃(𝑀𝑀|𝑆𝑆)[0.5] = = 0.12+0.24 = 0.33
𝑃𝑃(𝑆𝑆)
(c) The events 𝑆𝑆 and 𝑀𝑀 are NOT independent because (any one of the following is sufficient)
(i) 𝑃𝑃(𝑀𝑀 ∩ 𝑆𝑆) ≠ 𝑃𝑃(𝑀𝑀) × 𝑃𝑃(𝑆𝑆) (𝑖𝑖. 𝑒𝑒. , 0.12 ≠ 0.20 × 0.36) OR
(ii) 𝑃𝑃(𝑀𝑀|𝑆𝑆) ≠ 𝑃𝑃(𝑀𝑀) (𝑖𝑖. 𝑒𝑒. , 0.33 ≠ 0.20) OR
(iii) 𝑃𝑃(𝑆𝑆|𝑀𝑀) ≠ 𝑃𝑃(𝑆𝑆) (𝑖𝑖. 𝑒𝑒. , 0.60 ≠ 0.36)
(d) The events 𝑀𝑀 and 𝑆𝑆 are NOT mutually exclusive because 𝑃𝑃(𝑀𝑀 ∩ 𝑆𝑆) ≠ 0.
QUESTION 10.
σ
(a) Because the population standard deviation is known, we use the formula x ± zα 2 .
n
4.1
For 99% confidence, we use z0.005= 2.576. 495.6 ± 2.576 ⇒ 495.6 ± 2.3 ⇒ (493.3, 497.9)
22
We are 99% confident that the mean volume of all bottles is between 493.3 mL and 497.9 mL.
Yes, the manufacturer should be concerned because the confidence interval does not include the value 500. In
other words, we are extremely confident the target mean is not being met.
(b) The volume per bottle must follow a normal probability distribution.
2
z σ
2
2.576 × 4.1
(c) n =
= α2 ≈ 92.2 . We should round up and take a sample of size n = 93 bottles.
E 1.1
(d) For 99% confidence, we use t0.005, 21 = 2.831.