PracticeforTest3 s24
PracticeforTest3 s24
1. A Type I error occurs if you reject the null hypothesis but the null was false. TRUE / FALSE
2. When making a decision, you compare the test statistic to the p-value. TRUE / FALSE
3. If p is less than α, your decision is that H0 is true. TRUE / FALSE
4. Conclusions are written in terms of H0 – stating whether is H0 true or not true. TRUE / FALSE
5. Describe the strength & direction of each scatterplot:
6. When describing a scatterplot, we would never describe the relationship as both strong and negative. TRUE/FALSE
7. A new diet is being tested. Six individuals have agreed to participate on the diet for a pre-determined number of
weeks. At the conclusion of their participation, the number of pounds lost for each individual is determined. Let x
denote the number of weeks that a subject has been on a special diet, and y denote the number of pounds lost
during that period.
8. Professor Moore is an avid swimmer. For 23 days, he records the time (in minutes) it takes him to swim 2000 yards
and his pulse rate (beats per minute) after swimming. These data are in StatCrunch in a dataset named Swim.
Time (x) 34.12 35.72 34.72 34.05 34.13 35.72 36.17 35.57
Pulse (y) 152 124 140 152 146 128 136 144
Time (x) 35.37 35.57 35.43 36.05 34.85 34.70 34.75 33.93
Pulse (y) 148 144 136 124 148 144 140 156
Time (x) 34.60 34.00 34.35 35.62 35.68 35.28 35.97
Pulse (y) 136 148 148 132 124 132 139
a) Create a scatter plot of Pulse vs Time. Describe what you see (direction, form, strength, outliers).
b) Using time (x) to predict Professor Moore’s pulse rate (y), find the equation of the least-squares regression line.
c) If Professor Moore completes his 2000-yard swim in 35.00 minutes, what would you predict his pulse rate
would be?
d) What is the value of r for this linear relationship? Interpret this value.
e) Interpret the value of the slope in terms of the question.
f) A test of the null hypothesis H0: Slope= 0 (versus Ha: Slope ≠ 0) results in a test statistic t= –5.133 and a p-value
of 0.0000438. What decision and conclusion can you draw from the results of this hypothesis test?
g) Examine the residuals. Create a plot of Residuals vs Time. What do you observe?
h) For the time = 34.12, what would you predict the pulse to be? The actual pulse = 152. What is the residual?
9. Earlier, our class participated in a survey proving responses to a variety of questions. These data are in StatCrunch
in a dataset named MAT175 – Survey Fall 2023 (available in our group).
a) Create a scatter plot of Salary vs Minutes_SocialMedia. Describe what you see (direction, form, strength, outliers).
b) Using Minutes_SocialMedia (x) to predict Salary (y), find the equation of the least-squares regression line.
c) If a student spent 240 minutes on social media, what would you predict her future salary would be?
d) What is the value of r for this linear relationship? Interpret this value.
e) Interpret the value of the slope in terms of the question.
f) A test of the null hypothesis H0: Slope= 0 (versus Ha: Slope ≠ 0) results in a test statistic t= -1.6164683 and a p-
value of 0.1234. . What decision and conclusion can you draw from the results of this hypothesis test?
g) Examine the residuals. Create a plot of Residuals vs Minutes_SocialMedia. What do you observe?
h) For the Minutes_SocialMedian =120 what would you predict the Salary to be? The actual Salary = $67153. What
is the residual?
10. When we sample a population, we want to randomize if at all possible. TRUE/FALSE
11. A good way to select a sample is to just ask those that you know. TRUE/FALSE
12. We want a sample with a lot of bias. TRUE/FALSE
13. A surveyor asks: “Many people think this playground is too small and in need of repair. Would you agree?” This is
an example of a leading question. TRUE/FALSE
14. Stopping students leaving BDH is a good way to collect a sample if we want to know the quality of the food.
TRUE/FALSE
15. The true proportion of students who enjoy statistics is called the ‘population statistic’. TRUE/FALSE
16. You must always take a census and never sample. TRUE/FALSE
17. A scatter plot is created for the population vs storks. The linear regression equation is:
Population = 35652.888 + 149.76712 * Storks ( or y = 35652.888 + 149.76712 x) Add
the line to the scatter plot.
18. Which of these has the higher variability? Which of these has the higher bias?
19. A statistics professor has been collecting student pulse data (resting) and their pulse after jogging. A sample of 82 students is available in a
StatCrunch dataset called Pulse. The professor is interested in Pulse_afterJog VS Pulse_Resting. Complete the following using the Pulse
dataset.
a. Generate a scatter plot. Describe the direction, form, strength, and any outliers.
b. Find r (correlation coefficient). What is the value of r? Describe r.
c. Find the least squares regression equation that would predict Pulse_afterJog from Pulse_Resting. Write the full equation (round
the slope and y-intercept to 2 decimal places).
d. What is the value of the slope? Interpret this value in terms of this equation.
e. Find r2. What is the value of r2? Describe r2
f. A student’s resting pulse is 60. What would you predict her Pulse after jogging to be?
g. If the actual value of her pulse after jogging is 75, what is her residual (use your prediction above)?
h. Test for a non-zero slope What is your decision? What is your conclusion?
i. Create a scatter plot of residuals vs Pulse_Resting. Describe what you see.
ANSWERS:
1. FALSE – a Type I error occurs if you reject the null hypothesis, but the null was true.
2. FALSE -- When making a decision, you compare the p-value to the level of significance.
3. FALSE -- If p is less than α, your decision is to reject H 0 (meaning you think H0 is not true)
4. FALSE -- Conclusions are written in terms of Ha – is there evidence that the alternate is proven or not proven
5. Strong positive Weak negative
H0: slope = 0
Ha: slope ≠ 0
α=0.05
t= 1.9932318
p= 0.117
Do not reject H0
At α=0.05, there is not evidence that the slope is different from 0 (so we cannot say the slope is non-zero).
Notice the pattern with the residuals …. An upside-down V …. This indicates that a straight line may not be the best model for the
relationship between pounds lost and weeks
9.
Create a scatter plot of Salary vs Minutes_SocialMedia. Describe what you see (direction, form,
strength, outliers).
Direction is negative
Form is possibly linear
Strength is weak
Outliers – maybe the value near (180,24500)
least-squares regression line
Salary = 70299.131 - 49.017492 Minutes_SocialMedia
If a student spent 240 minutes on social media, predicted Salary would be $58534.933
Predicted Salary = 70299.131 - 49.017492 (240)
Predicted Salary = 58534.933
R is -0.35603846. This is a weak negative linear relationship
The slope is -49.17492. This means for each increase of 1 minute on social media, the predicted salary would decrease by about
$49.17.
A test of the null hypothesis H0: Slope= 0 (versus Ha: Slope ≠ 0) results in a test statistic t= -1.6164683 and a p-value of 0.1234. The
decision is Do Not Reject H0; The Conclusion is There is Not evidence that the slope is different from 0 (meaning, we do not have evidence
that the slope is non-zero)
Examine the residuals. Create a plot of Residuals vs Minutes_SocialMedia. What do you observe?
There is not a distinct patter, but there is more variability/spread for the lower values of
Minutes_SocialMedia than there is for the larger values (however, this may be due to the lack of data
for the larger values of Minutes_SocialMedia).
For the Minutes_SocialMedian = 120, what would you predict the Salary to be?
Predicted Salary = 70299.131 - 49.017492 (120)
Predicted Salary = $64417.032
The actual Salary = 67153. What is the residual?
Residual = actual – predicted
Residual = 67153 - 64417.032
Residual = $2,735.968
(remember a positive residual implies an under prediction)
18. High variability -- The one on the right has high variability
High bias -- The one on the left has high bias .
19.
a. Direction: positive
Form: linear
Strength: moderate
Outliers: None that appear obvious
I
Residual plot – fairly random scatter (which is what we want to see when looking at the residuals)