Final 21
Final 21
Instructions:
• You may use your one page of notes, R, StatKey, and a calculator. You may NOT a book or
the rest of the internet.
• Show your work. Organize your work in a reasonably neat and coherent way.
• You are not permitted to talk to anybody who has not yet taken the test about the test.
Applied Statistics Final Exam
iii. You gather data on 100 trees from Portland and 80 trees from Seattle. What test [3]
statistic (or method) would you use?
(b) You want to test to see if the average height of trees is the same in Portland, Seattle, and [3]
Eugene. What test statistic would you use?
(c) It is estimated that 8% of men (and 0.5% of women) have some degree of ‘color vision [10]
deficiency’ or CVD (colloquially called being color blind). Your friend invents a new type of
surgery that they claim can cure CVD. Unfortunately, like all surgeries there are potential
side-effects. In this case, a very low chance of major and permanent vision damage. You
are hired to test whether this new surgery is more effective than standard treatments. You
gather data on 100 individuals, 50 of whom are given the surgery and the other 50 are
given a fake surgery (placebo).
i. What type of study do you perform (the study, not the statistical technique)?
iv. Which type of error do you consider to be worse? Explain. (there are different possible
interpretations here)
Page 1 of 8
Applied Statistics Final Exam
(b) You want to test if Systolic Blood Pressure has a negative correlation with Heart Rate.
You randomly sample 200 patients in the intensive care unit. You then perform linear
regression and find the sample correlation of your regression line is r = −.057 with a
standard error of 0.071. (In StatKey this is labelled as ‘ICU Admissions’)
i. Compute the p-value for this statistical test. [3]
iii. Does this method of data collection allow us to make any conclusions about the [2]
relationship between blood pressure and heart rate in the general population?
Page 2 of 8
Applied Statistics Final Exam
3. At his previous college, your professor gave grades with the following percentage of students
getting each grade:
Grade A B C D F
% of students 28% 36% 20% 8% 8%
You suspect that your professor will have a different grade distribution here at UP.
(a) State the null hypothesis [3]
(b) You gather data from the grades he gave out last semester and see that he gave out 75 [1]
grades. What test statistic should you use?
(c) Calculate the p-value (may be helpful to use R or StatKey) given that he gave the following [5]
grades last semester:
Grade A B C D F
Number 15 22 23 10 5
(e) Calculate the contribution from the “C” grade to the test statistic [3]
Page 3 of 8
Applied Statistics Final Exam
4. You are interested in determining if different companies have different amounts of sugar in their
cereals. Consider the following ANOVA table generated using R. The label ‘sugar’ measures
the amount of sugar in each cereal and the label ‘company’ refers to the company that makes
that particular brand of cereal (Kellogs, Quakers, General Mills)
(a) State the null hypothesis and alternative hypothesis in english [4]
(b) Notice that there is usually one more column in ANOVA tables. Use the table and a [3]
computational tool (R/StatKey) to compute the p-value.
(d) Notice that the Sum of Squares and Mean Squared values are much higher for ‘Residuals’ [3]
than they are for ‘company’. Interpret what these mean in context.
Page 4 of 8
Applied Statistics Final Exam
5. More details on the cereal data can be found in the table below which shows the number of
cereals randomly chosen from each of the three companies, the mean amount of sugar, and
standard deviation.
(a) Using just the Kellogs data, find a 90% confidence interval for the mean amount of sugar [5]
in Kellogs cereal.
(b) Using this table and the ANOVA table from the previous problem, find a narrower 90% [5]
confidence interval for the mean amount of sugar in Kellogs cereal.
(c) Compare these two intervals. Explain why part (b) is narrower than part (a) [2]
Page 5 of 8
Applied Statistics Final Exam
6. We want to perform linear regression to predict ‘GDP’ from the ‘BirthRate’, ‘Density’, and
‘Population’. We download the ‘All Countries’ data set from StatKey and type the following
command into R, and we get the following output. Note that GDP is measure in $ per person
and Birth Rate is measured in Births per 1000 people.
(b) What percentage of the variation in ‘GDP’ scores is explained by our linear model (round [2]
to 4 digits)?
(d) If the Birth Rate increased by 1 unit (one birth per 1000 people), and everything else [3]
remained constant, how much would we expect GDP to increase?
Page 6 of 8
Applied Statistics Final Exam
7. Continuing with the ‘All Countries’ multiple regression from the previous problem
(a) Population was measured in millions of people. Consider the population row on the pre- [6]
vious page. How would each of the 4 numbers in that row change if we instead measure
the population in thousands of people? Be specific. If something didn’t change, write
‘unchanged’.
i. How would the ‘Estimate’ change?
(b) If we wanted to refine our linear model and get rid of one predictor, which predictor would
we remove? Why?
(c) Your friend says removing that predictor was good and that this new model is better
than the original. Give one reason (based on the above table) that supports your friend’s
statement.
(d) A different friend says removing that predictor was bad and that the original model was
better. Give one reason (based on the above table) that supports this friend’s statement.
Page 7 of 8
Applied Statistics Final Exam
(b) We didn’t get to the section on probability (Appendix P) or Bayes Theorem. Would you
have preferred to skip some of the material we covered so that we could have gotten to
that section?
(c) Give one (or more) example of you used statistics or probability outside of this class.
Page 8 of 8