Data Sci HW1
Data Sci HW1
Question 1
# Estimate the bias, variance, and RMSE for the uniform estimator
theta = 20
n = 200
# Generate samples
def generate_samples(n):
y_s = []
for i in range(n):
y_s.append(np.random.uniform(0, 20))
return y_s
# Run many trials of this generation and to obtain many theta hats
results = []
for i in range(10000):
results.append(np.max(generate_samples(n)))
# Set the mean of these 10000 trials to be the expected value of theta
hat
exp_theta_hat = np.mean(results)
# Calculate bias
bias = exp_theta_hat - theta
# Calculate variance
variance = np.var(results)
# Calculate RMSE
rmse = np.sqrt(bias ** 2 + variance)
The estimated values of bias, variance, and RMSE decrease significantly when increasing n from
200 to 1000. This is due to the property of estimators that increasing the sample size leads to a
more accurate estimation of the true theta value.
Question 2
# Perform the bootstrap
orig_data = [3.0, 1.9, 6.4, 5.9, 4.2, 6.2, 1.4, 2.9, 2.3, 4.8, 7.8,
4.5, 0.7, 4.4, 4.4, 6.5, 7.6, 6.1, 2.7, 1.6]
x_boot_list = []
for i in range(1000):
new_sample = np.random.choice(orig_data, len(orig_data), replace =
True)
x_boot = t_stat(new_sample)
x_boot_list.append(x_boot)
t_1_boots = []
t_2_boots = []
for i in range(1000):
sample = np.random.choice(y, len(y), replace = True)
x_boot = t_stat1(sample)
t_1_boots.append(x_boot)
for i in range(1000):
sample1 = np.random.choice(y, len(y), replace = True)
x_boot1 = t_stat2(sample1)
t_2_boots.append(x_boot1)
for i in range(10000):
median_list.append(np.median(y))
for i in range(10000):
max_list.append(np.max(y))
As we can see from the above results, the actual standard error for both the sample median and
the sample maximum is 0. This makes analytical sense because our sample is the true
distribution, so there is no variance between the sample and the true distribution. In the
bootstrap scenarios, we have standard errors of about 0.02, which indicates that the bootstrap
does a very good job of approximating the true population statistic.
Question 3
# Use observed data to compute the estimated mean for this
distribution
theta_hat = np.mean(orig_data)
theta_hat
4.265
0.4511328551853174
The following is a link to the original Google Colab I used to code this assignment:
https://colab.research.google.com/drive/193oUh1wg7hmCsUNRP0dyQGbaVSu-hNfO?
usp=sharing. The format here is a bit weird since I used an online file format converter
to PDF.