3080project4 StatisticalIntervals
3080project4 StatisticalIntervals
Amy Nguyen
February 2025
Contents
Problem 1 1
Problem 2 2
Problem 3 3
Problem 4 3
Problem 5 3
Problem 6 4
Problem 1
The cats data set (MASS) contains the heart and body weight of a sample of male and female cats. Use
the data set to estimate a 95% prediction interval for the body weight of a male cat. Assume that the body
weight of cats is Normally distributed.
mean(malecats$Bwt)
## [1] 2.9
sd(malecats$Bwt)
## [1] 0.4674844
1
# Compute the 95% prediction interval
n <- length(malecats$Bwt)
t_value <- qt(0.975, df = n - 1)
Problem 2
The data set SP500 (MASS) contains the returns of the S&P 500 stock index for the 1990s; that is, it’s the
ratio of the change of the index’s price divided by the preceding day price. In principle, when predicting the
direction of the stock market with the intention of buying stock, we are willing to be wrong in one direction but
not another; we are okay with predicting the market grows too little and be pleasantly surprised than to predict
the market grows more than it actually does. So compute a 99% lower prediction bound, assuming that stock
returns are Normally distributed. (You should not trust this number. First the Normality assumption, despite
being assumed a lot in finance, is not true. Second, stock returns are not an independent and identically
distributed sample.)
## [1] 0.04575267
sd(SP500)
## [1] 0.9477464
length(SP500)
## [1] 2780
## [1] -2.160308
2
Problem 3
The data set abbey (MASS) contains determinations of nickel content (ppm) in a Canadian syenite rock.
The assumption of a Normal distribution clearly is inappropriate for this data set. Construct a 90% prediction
interval for the next measurement from the data set. Use a nonparametric procedure.
## 5% 95%
## 11.10323 23.07758
Problem 4
Use the data from Problem 1 to construct a 95% tolerance interval for 99% of cats’ body weight.
Problem 5
The data set geyser (MASS) contains both wait time between and duration of eruptions of the Old Faithful
geyser in Yellowstone National Park. Use the data set to construct a nonparametric tolerance interval
containing 90% of geyser eruptions with 99% confidence.
3
# Bootstrap and calculate 90% tolerance interval
bootstrap_intervals <- replicate(10000,
quantile(sample(data, replace = TRUE), c(0.05, 0.95)))
## 1% 99%
## 48 93
Problem 6
The data set accdeaths (MASS) contains a count of accidental deaths in the United States between 1973
and 1978. What was the mean count of accidental deaths per month? Use this data set to construct a
statistical interval for the mean number of accidental deaths over the next five years. (Bonus points if you
can compare your interval to the observed mean over those years and assess how well it did.)