0% found this document useful (0 votes)

6 views8 pages

Data Sci HW1

Data science homework

Uploaded by

De Zheng Zhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views8 pages

Data Sci HW1

Data science homework

Uploaded by

De Zheng Zhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

import numpy as np

Question 1
# Estimate the bias, variance, and RMSE for the uniform estimator
theta = 20
n = 200

# Generate samples
def generate_samples(n):
y_s = []
for i in range(n):
y_s.append(np.random.uniform(0, 20))
return y_s

# Run many trials of this generation and to obtain many theta hats
results = []

for i in range(10000):
results.append(np.max(generate_samples(n)))

# Set the mean of these 10000 trials to be the expected value of theta
hat
exp_theta_hat = np.mean(results)

# Calculate bias
bias = exp_theta_hat - theta

# Calculate variance
variance = np.var(results)

# Calculate RMSE
rmse = np.sqrt(bias ** 2 + variance)

print("Bias: " + str(bias) + ' , Variance: ' + str(variance) + ' ,

RMSE: ' + str(rmse))

Bias: -0.09908465984527837 , Variance: 0.009774149390922771 , RMSE:

0.13997113705181252

The estimated values of bias, variance, and RMSE decrease significantly when increasing n from
200 to 1000. This is due to the property of estimators that increasing the sample size leads to a
more accurate estimation of the true theta value.

Question 2
# Perform the bootstrap
orig_data = [3.0, 1.9, 6.4, 5.9, 4.2, 6.2, 1.4, 2.9, 2.3, 4.8, 7.8,
4.5, 0.7, 4.4, 4.4, 6.5, 7.6, 6.1, 2.7, 1.6]
x_boot_list = []

# Define the test statistic

def t_stat(x):
return np.median(x)

for i in range(1000):
new_sample = np.random.choice(orig_data, len(orig_data), replace =
True)
x_boot = t_stat(new_sample)
x_boot_list.append(x_boot)

std_err = np.std(x_boot_list) / np.sqrt(len(x_boot_list))

# Find the 95% confidence interval

def conf_int(data_list):
p_hat = np.median(data_list)
a = p_hat - 1.96 * std_err
b = p_hat + 1.96 * std_err
return a, b

print('Standard error: ' + str(std_err))

print(conf_int(x_boot_list))

Standard error: 0.02405101406386018

(4.352860012434834, 4.4471399875651665)

# Generate 100 data points from normal distribution

y = np.random.normal(0, 5, 100)

t_1_boots = []
t_2_boots = []

# Find standard error for sample median using bootstrap

def t_stat1(x):
return np.median(x)

for i in range(1000):
sample = np.random.choice(y, len(y), replace = True)
x_boot = t_stat1(sample)
t_1_boots.append(x_boot)

std_err1 = np.std(t_1_boots) / np.sqrt(len(t_1_boots))

# Find standard error for sample maximum using bootstrap

def t_stat2(x):
return np.max(x)

for i in range(1000):
sample1 = np.random.choice(y, len(y), replace = True)
x_boot1 = t_stat2(sample1)
t_2_boots.append(x_boot1)

std_err2 = np.std(t_2_boots) / np.sqrt(len(t_2_boots))

# Compute actual standard error for sample median through simulations

median_list = []

for i in range(10000):
median_list.append(np.median(y))

median_stderr = np.std(median_list) / np.sqrt(len(median_list))

# Compute actual standard error for sample maximum through simulations

max_list = []

for i in range(10000):
max_list.append(np.max(y))

max_stderr = np.std(max_list) / np.sqrt(len(max_list))

print('Actual Standard Error for Sample Median: ' + str(median_stderr)

+ ', Bootstrap Estimate: ' + str(std_err1))
print('Actual Standard Error for Sample Maximum: ' + str(max_stderr) +
', Bootstrap Estimate: ' + str(std_err2))

Actual Standard Error for Sample Median: 1.1102230246251566e-18,

Bootstrap Estimate: 0.022243242162067033
Actual Standard Error for Sample Maximum: 1.7763568394002505e-17,
Bootstrap Estimate: 0.040556826403483785

As we can see from the above results, the actual standard error for both the sample median and
the sample maximum is 0. This makes analytical sense because our sample is the true
distribution, so there is no variance between the sample and the true distribution. In the
bootstrap scenarios, we have standard errors of about 0.02, which indicates that the bootstrap
does a very good job of approximating the true population statistic.

Question 3
# Use observed data to compute the estimated mean for this
distribution
theta_hat = np.mean(orig_data)
theta_hat

4.265

# Generate many samples from N(theta, 2) with 20 data points per

sample
sample_results = []
for i in range(10000):
sample_norm = np.random.normal(theta_hat, 2, 20)
sample_results.append(sample_norm)

# Estimate the value of theta in each sample

thetas_list = []
for i in sample_results:
thetas_list.append(np.mean(i))

# Compute the standard deviation among the samples

print(np.std(thetas_list))

0.4511328551853174

The following is a link to the original Google Colab I used to code this assignment:
https://colab.research.google.com/drive/193oUh1wg7hmCsUNRP0dyQGbaVSu-hNfO?
usp=sharing. The format here is a bit weird since I used an online file format converter
to PDF.

BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
IC ENGINES PPT Revised1
No ratings yet
IC ENGINES PPT Revised1
80 pages
Part I - Sample Questions: COMPETENCY 1: Patient Care
No ratings yet
Part I - Sample Questions: COMPETENCY 1: Patient Care
20 pages
Exercise 3 Computer Intensive Statistics
No ratings yet
Exercise 3 Computer Intensive Statistics
10 pages
Kelompok 3 - Sheet 4
No ratings yet
Kelompok 3 - Sheet 4
13 pages
Crime Rate of The Total Population in Toronto
No ratings yet
Crime Rate of The Total Population in Toronto
3 pages
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
No ratings yet
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
10 pages
4 12
No ratings yet
4 12
17 pages
Machine Learning Lab Word 12-1-2025. Document
No ratings yet
Machine Learning Lab Word 12-1-2025. Document
68 pages
Final Notes
No ratings yet
Final Notes
3 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
No ratings yet
AD3411 DATA SCIENCE AND ANALYTICS LAB (2) - Removed
24 pages
Statistics For Data Sciences - Extra Credit Assignment: Edgardo Javier García Cartagena
No ratings yet
Statistics For Data Sciences - Extra Credit Assignment: Edgardo Javier García Cartagena
3 pages
Bootstrap: Estimate Statistical Uncertainties
No ratings yet
Bootstrap: Estimate Statistical Uncertainties
22 pages
Datascience Lab
No ratings yet
Datascience Lab
24 pages
Ad3411-Data Science and Analytics Laboratory
No ratings yet
Ad3411-Data Science and Analytics Laboratory
27 pages
Coding Final Study Guide Notes
No ratings yet
Coding Final Study Guide Notes
3 pages
Python Programs
No ratings yet
Python Programs
7 pages
Advance Python Lab Solution
No ratings yet
Advance Python Lab Solution
4 pages
ML Lab
No ratings yet
ML Lab
12 pages
STATSCHEATSHeet
No ratings yet
STATSCHEATSHeet
5 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
ML Exp 3-7 Manuval
No ratings yet
ML Exp 3-7 Manuval
21 pages
ML Lab Manual
No ratings yet
ML Lab Manual
21 pages
hw1 Part2
No ratings yet
hw1 Part2
10 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
ML Programs
No ratings yet
ML Programs
41 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Lab02 Summary Measures - Ipynb
No ratings yet
Lab02 Summary Measures - Ipynb
2 pages
ESTIMASS
No ratings yet
ESTIMASS
5 pages
Python Code - Summary Statistics
No ratings yet
Python Code - Summary Statistics
6 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
Boots Trapping
No ratings yet
Boots Trapping
4 pages
AD3411
No ratings yet
AD3411
28 pages
Dsbda Viva Ans
No ratings yet
Dsbda Viva Ans
8 pages
Data Science Lab Experiments
No ratings yet
Data Science Lab Experiments
32 pages
Stats Lab (7-9)
No ratings yet
Stats Lab (7-9)
8 pages
Prac2 23bme053
No ratings yet
Prac2 23bme053
3 pages
DOC-20250505-WA0041. Die Ersten Beiden Tage Sind in Den
No ratings yet
DOC-20250505-WA0041. Die Ersten Beiden Tage Sind in Den
11 pages
Week 9 - Activity 6
No ratings yet
Week 9 - Activity 6
5 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
GU4291 GR5291 Homework1 23079925
No ratings yet
GU4291 GR5291 Homework1 23079925
3 pages
Lab 11,12
No ratings yet
Lab 11,12
7 pages
Excercise Sheet
No ratings yet
Excercise Sheet
9 pages
Bootstrapping Techniques in Statistical Analysis and Approaches in R MATH 289
No ratings yet
Bootstrapping Techniques in Statistical Analysis and Approaches in R MATH 289
10 pages
L - AND - T - Project - Naveen 24cs002895
No ratings yet
L - AND - T - Project - Naveen 24cs002895
7 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
52 pages
Bootstrap Methods 2020
No ratings yet
Bootstrap Methods 2020
16 pages
L22 Bootstrap
No ratings yet
L22 Bootstrap
7 pages
Statistical Analysis With Scipy?
No ratings yet
Statistical Analysis With Scipy?
9 pages
Lecture 1
No ratings yet
Lecture 1
8 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Exp 7
No ratings yet
Exp 7
3 pages
Lab 3
No ratings yet
Lab 3
14 pages
HW 9 Bootstrap, Jackknife, and Permutation Tests
No ratings yet
HW 9 Bootstrap, Jackknife, and Permutation Tests
7 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
ABD Formulas
No ratings yet
ABD Formulas
55 pages
Machine Learning and Pattern Recognition Minimal GP Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal GP Demo
3 pages
Smec ML Lab Manual R22
No ratings yet
Smec ML Lab Manual R22
21 pages
Revised List of Items & Norms of Assistance From State Disaster Response Fund (SDRF) / National Disaster Response Fund (NDRF)
No ratings yet
Revised List of Items & Norms of Assistance From State Disaster Response Fund (SDRF) / National Disaster Response Fund (NDRF)
8 pages
Cibse Ken Dale Award Report 2020 2022 John Smyth
No ratings yet
Cibse Ken Dale Award Report 2020 2022 John Smyth
213 pages
Credit Awareness
100% (2)
Credit Awareness
62 pages
Heroes of Might & Magic 2 - Manual UK
No ratings yet
Heroes of Might & Magic 2 - Manual UK
142 pages
100 Consumer Behavior Questions
No ratings yet
100 Consumer Behavior Questions
50 pages
Cato DLP WP
No ratings yet
Cato DLP WP
10 pages
Multiple Injuries After Ship Tips Over at Edinburgh Dockyard
No ratings yet
Multiple Injuries After Ship Tips Over at Edinburgh Dockyard
10 pages
En Girafe
No ratings yet
En Girafe
4 pages
Cohesive Nouns
100% (1)
Cohesive Nouns
3 pages
(Handwritten Solutions) JEE ADVANCED PYQs - Straight Lines and Circles
No ratings yet
(Handwritten Solutions) JEE ADVANCED PYQs - Straight Lines and Circles
35 pages
Exaugural Speech by Outgoing President Ronaldo Nilo
No ratings yet
Exaugural Speech by Outgoing President Ronaldo Nilo
1 page
Project: Date:: Short-Circuit Summary Report
No ratings yet
Project: Date:: Short-Circuit Summary Report
1 page
Q3 Brochure
No ratings yet
Q3 Brochure
24 pages
Annexure-I Sanchar Mitra Scheme 1. Background
No ratings yet
Annexure-I Sanchar Mitra Scheme 1. Background
7 pages
Barangay Situational Analysis 2025
No ratings yet
Barangay Situational Analysis 2025
3 pages
IFA New
No ratings yet
IFA New
18 pages
Air Act 1981 Project Arjun Dubey 4046
No ratings yet
Air Act 1981 Project Arjun Dubey 4046
3 pages
Chapter I-Iii For Printing
No ratings yet
Chapter I-Iii For Printing
26 pages
G4-T3 Exponential Moving Average (EMA)
No ratings yet
G4-T3 Exponential Moving Average (EMA)
4 pages
Home Stay Registration Way of Sri Lanka Tourism
No ratings yet
Home Stay Registration Way of Sri Lanka Tourism
12 pages
SQC L9
No ratings yet
SQC L9
33 pages
Eec 114 - 045901
No ratings yet
Eec 114 - 045901
14 pages
Me170a - Lab 01 - Instrumentation Handout - Edited2015
No ratings yet
Me170a - Lab 01 - Instrumentation Handout - Edited2015
8 pages
Defecte Multiplexare
No ratings yet
Defecte Multiplexare
22 pages
NSDL Conversion Request Form
No ratings yet
NSDL Conversion Request Form
1 page
RF Oscillator
100% (2)
RF Oscillator
25 pages
Linearization OpenFAST
No ratings yet
Linearization OpenFAST
13 pages

Data Sci HW1

Uploaded by

Data Sci HW1

Uploaded by

import numpy as np

print("Bias: " + str(bias) + ' , Variance: ' + str(variance) + ' ,

Bias: -0.09908465984527837 , Variance: 0.009774149390922771 , RMSE:

# Define the test statistic

std_err = np.std(x_boot_list) / np.sqrt(len(x_boot_list))

# Find the 95% confidence interval

print('Standard error: ' + str(std_err))

Standard error: 0.02405101406386018

# Generate 100 data points from normal distribution

# Find standard error for sample median using bootstrap

std_err1 = np.std(t_1_boots) / np.sqrt(len(t_1_boots))

# Find standard error for sample maximum using bootstrap

std_err2 = np.std(t_2_boots) / np.sqrt(len(t_2_boots))

# Compute actual standard error for sample median through simulations

median_stderr = np.std(median_list) / np.sqrt(len(median_list))

# Compute actual standard error for sample maximum through simulations

max_stderr = np.std(max_list) / np.sqrt(len(max_list))

print('Actual Standard Error for Sample Median: ' + str(median_stderr)

Actual Standard Error for Sample Median: 1.1102230246251566e-18,

# Generate many samples from N(theta, 2) with 20 data points per

# Estimate the value of theta in each sample

# Compute the standard deviation among the samples

You might also like