0% found this document useful (0 votes)
16 views4 pages

3080project4 StatisticalIntervals

The document outlines a lab project for MATH 3080, detailing six statistical problems involving various data sets from the MASS package. Each problem includes code snippets for calculating prediction intervals, tolerance intervals, and confidence intervals based on different assumptions and data distributions. The project emphasizes the application of statistical methods to real-world data, such as cat weights, stock returns, and accidental deaths.

Uploaded by

nhtmai3105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

3080project4 StatisticalIntervals

The document outlines a lab project for MATH 3080, detailing six statistical problems involving various data sets from the MASS package. Each problem includes code snippets for calculating prediction intervals, tolerance intervals, and confidence intervals based on different assumptions and data distributions. The project emphasizes the application of statistical methods to real-world data, such as cat weights, stock returns, and accidental deaths.

Uploaded by

nhtmai3105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

MATH 3080 Lab Project 4

Amy Nguyen

February 2025

Contents
Problem 1 1

Problem 2 2

Problem 3 3

Problem 4 3

Problem 5 3

Problem 6 4

Problem 1
The cats data set (MASS) contains the heart and body weight of a sample of male and female cats. Use
the data set to estimate a 95% prediction interval for the body weight of a male cat. Assume that the body
weight of cats is Normally distributed.

# Your code here


library(MASS)

malecats <- subset(cats, Sex == "M")

mean(malecats$Bwt)

## [1] 2.9

sd(malecats$Bwt)

## [1] 0.4674844

1
# Compute the 95% prediction interval
n <- length(malecats$Bwt)
t_value <- qt(0.975, df = n - 1)

# Standard error of the mean


se <- sd(malecats$Bwt) / sqrt(n)

# Prediction interval formula:


# mean ± t_value * standard deviation * sqrt(1 + 1/n)
pred_lower <- (mean(malecats$Bwt) - (t_value * sd(malecats$Bwt) * sqrt(1 + 1/n)))
pred_upper <- (mean(malecats$Bwt) + (t_value * sd(malecats$Bwt) * sqrt(1 + 1/n)))

# Output the prediction interval


c(pred_lower, pred_upper)

## [1] 1.96728 3.83272

Problem 2
The data set SP500 (MASS) contains the returns of the S&P 500 stock index for the 1990s; that is, it’s the
ratio of the change of the index’s price divided by the preceding day price. In principle, when predicting the
direction of the stock market with the intention of buying stock, we are willing to be wrong in one direction but
not another; we are okay with predicting the market grows too little and be pleasantly surprised than to predict
the market grows more than it actually does. So compute a 99% lower prediction bound, assuming that stock
returns are Normally distributed. (You should not trust this number. First the Normality assumption, despite
being assumed a lot in finance, is not true. Second, stock returns are not an independent and identically
distributed sample.)

# Your code here


mean(SP500)

## [1] 0.04575267

sd(SP500)

## [1] 0.9477464

length(SP500)

## [1] 2780

# t-value for 99% confidence level (lower bound)


t_value <- qt(0.01, df = length(SP500) - 1) # Using 0.01 for the 99% lower bound

# Computing the 99% lower prediction bound


lower_bound <- mean(SP500) + t_value * sd(SP500)

# Output the lower prediction bound


lower_bound

## [1] -2.160308

2
Problem 3
The data set abbey (MASS) contains determinations of nickel content (ppm) in a Canadian syenite rock.
The assumption of a Normal distribution clearly is inappropriate for this data set. Construct a 90% prediction
interval for the next measurement from the data set. Use a nonparametric procedure.

# Your code here

# Data for nickel content in syenite rock


data <- abbey

# Performing bootstrap sampling and store results


bootstrap_samples <- replicate(10000, mean(sample(data, replace = TRUE)))

# Computing the 90% prediction interval


lower_bound <- quantile(bootstrap_samples, 0.05)
upper_bound <- quantile(bootstrap_samples, 0.95)

# Output the prediction interval


c(lower_bound, upper_bound)

## 5% 95%
## 11.10323 23.07758

Problem 4
Use the data from Problem 1 to construct a 95% tolerance interval for 99% of cats’ body weight.

# Your code here


# Calculate the 95% tolerance interval for 99% of cats' body weight
k <- qnorm(0.975) * sqrt(1 + 1/length(malecats$Bwt))

# Printing the output


c(mean(malecats$Bwt) - k * sd(malecats$Bwt),
mean(malecats$Bwt) + k * sd(malecats$Bwt))

## [1] 1.979037 3.820963

Problem 5
The data set geyser (MASS) contains both wait time between and duration of eruptions of the Old Faithful
geyser in Yellowstone National Park. Use the data set to construct a nonparametric tolerance interval
containing 90% of geyser eruptions with 99% confidence.

# Your code here

# Data for geyser eruptions


data <- geyser$waiting

3
# Bootstrap and calculate 90% tolerance interval
bootstrap_intervals <- replicate(10000,
quantile(sample(data, replace = TRUE), c(0.05, 0.95)))

# 99% confidence interval for the 90% tolerance interval


c(quantile(bootstrap_intervals[1, ], 0.01),
quantile(bootstrap_intervals[2, ], 0.99))

## 1% 99%
## 48 93

Problem 6
The data set accdeaths (MASS) contains a count of accidental deaths in the United States between 1973
and 1978. What was the mean count of accidental deaths per month? Use this data set to construct a
statistical interval for the mean number of accidental deaths over the next five years. (Bonus points if you
can compare your interval to the observed mean over those years and assess how well it did.)

# Your code here

# Calculating the mean and standard error


mean_deaths <- mean(accdeaths)
se_deaths <- sd(accdeaths) / sqrt(length(accdeaths))

# 95% confidence interval for the mean


t_value <- qt(0.975, df = length(accdeaths) - 1)
c(mean_deaths - t_value * se_deaths, mean_deaths + t_value * se_deaths)

## [1] 8563.731 9013.852

You might also like