0% found this document useful (0 votes)

6 views4 pages

cs447 - Tool Using Simulation To Test A Hypothesis

The document outlines a simulation-based approach to hypothesis testing using R, emphasizing the importance of quantifying uncertainty in data-driven discoveries. It details the steps to test a hypothesis, including specifying null and alternative hypotheses, constructing a null distribution, calculating p-values, and making decisions based on those values. An example involving a Randomized Controlled Trial is provided to illustrate the process of determining if a new treatment is more effective than an old one.

Uploaded by

hasiba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views4 pages

cs447 - Tool Using Simulation To Test A Hypothesis

Uploaded by

hasiba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

TOOL

Using Simulation to Test

a Hypothesis
Data-driven discoveries usually occur when you observe a sample statistic that is larger than you
would expect due to random chance. For example, you might find that 70% of patients who take
a new experimental drug are cured of a disease, in contrast to the only 30% of people who are
cured when they take the current drug treatment. When you find a sample statistic that indicates a
new discovery, you should always quantify the uncertainty associated with your results before you
generalize your conclusion to a larger population. This will help you assess if your conclusion is real,
or just the result of the randomness inherent in taking a small sample from a large population. Data
scientists often quantify uncertainty with simulation-based hypothesis tests.

Use this tool as a general guide to testing hypotheses with a simulation-based approach in R.

Step 1: Specify null (H0) and alternative (HA) hypotheses. The H0 is the status quo, or what you would
expect based on your current understanding. The HA is a new discovery or finding.

Step 2: Construct a null distribution of the sample statistic. Do this by simulating many samples
under the assumption that the H0 is correct, and visualizing the distribution of sample statistics with
a histogram.

Step 3: Plot the observed sample statistic on the histogram and calculate the p-value. Both the
histogram and the p-value indicate the chance of seeing a sample statistic of the size you observed
in your sample if the H0 is correct.

Step 4: Use the cut-off value to make a decision. If the p-value is less than the cut-off value, reject
the H0 and select the HA. If the p-value is greater than the cut-off value select the H0.

Using R With This Tool

The portions of this tool with a grey background are code text you can use to do the examples
included in this tool. You can also modify them to use with your own data. In these examples:
• Commands are the lines of code that don’t begin with a pound sign (#). Type these lines into R
to carry out the command.
• Commented text begins with one pound sign and explains what the code does.
• Code output begins with two pound signs.

Measuring Relationships and Uncertainty

© 2021 Cornell University 1
Cornell Bowers College of Computing and Information Science
Using R to Test a Hypothesis
The steps and code below demonstrate using simulations to test a hypothesis about a Randomized
Controlled Trial. Suppose you are interested in determining whether a new treatment for a disease
works better than an old treatment for the same disease.

Here is the table of the results of this Randomized Controlled Trial:

Patients Worked Didn’t Work % Success

New 52 27 25 51.9

Old 51 22 29 43.1

Total 103 49 54 47.6

The observed sample statistic from these data is the percent success for the new treatment minus
the percent success of the old treatment, which is 8.8%. This sample statistic could indicate that the
new treatment is better than the old treatment, but we should check this by testing the uncertainty
around this result.

Step 1:
H0: The new treatment and the old treatment work equally well in the population.
HA: The new treatment works better than the old treatment in the population.

Step 2: Construct the null distribution that assumes both new and old treatments have the same
chance of success. Based on the table of results above, the total % success is 47.6%. Simulate many
samples and record the difference between the success rates of new and old treatments (sample
statistic) on each sample. Visualize this null distribution in a histogram using the code below:

set.seed(1) # set seed for reproducibility

# Set up this scenario:

outcome = c("Worked", "Did not Work") # Possible outcomes
nsim = 100000 # Number of iterations
store_p_diff = rep(0, nsim) # Vector to store results

p_new = 27/52 # Proportion of success with new treatment

p_old = 22/51 # Proportion of success with old treatment
p_all = (22+27)/(51+52) # Total proportion of success

Measuring Relationships and Uncertainty

© 2021 Cornell University 2
Cornell Bowers College of Computing and Information Science
# Run simulation:
for (i in 1:nsim){ # Create a for loop

# Simulate results of the NEW treatment, assuming the probability of

# success is p_all:
result_new = sample(outcome, 52, replace = TRUE, prob = c(p_all, 1-p_all))
p_new_sim = mean(result_new == "Worked")
# Simulates results of the OLD treatment, assuming the probability of
# success is p_all:
result_old = sample(outcome, 51, replace = TRUE, prob = c(p_all, 1-p_all))
p_old_sim = mean(result_old == "Worked")

# Calculate and store the sample statistic for this sample iteration:
p_diff = p_new_sim - p_old_sim
store_p_diff[i] = p_diff

# Draw the histogram:

hist(store_p_diff, breaks = 40, freq = FALSE,
main = "Null Distribution of the Sample Statistic",
ylab = "Sample Statistic", col = "lightgrey")

Null Distribution of the Sample Statistic

4
Sample Statistic
3
2
1
0

−0.4 −0.2 0.0 0.2 0.4

store_p_diff

Measuring Relationships and Uncertainty

© 2021 Cornell University 3
Cornell Bowers College of Computing and Information Science
Step 3: Plot the sample statistic on the null distribution and calculate the p-value.

# Draw the histogram:

hist(store_p_diff, breaks = 40, freq = FALSE,
main = "Null Distribution of the Sample Statistic",
ylab = "Sample Statistic", col = "lightgrey")

# Plot the Observed Statistic:

abline(v = 0.088, lwd = 2, col = "blue")

# Calculate the p-value:

mean(store_p_diff > 0.088)
## [1] 0.18146

Null Distribution of the Sample Statistic

4
Sample Statistic
3
2
1
0

−0.4 −0.2 0.0 0.2 0.4

store_p_diff

Step 4: Make a decision. Here, we’ll use the standard cut-off value of p-value = 0.05, which keeps the
false positive rate at 5%. Based on our simulation, the p-value is 0.18 (18%), which is higher than the
cut-off value, so we are unable to reject the null hypothesis. Based on this result, we cannot reject
the null hypothesis, so our data do not indicate that the new treatment works better than the
old one.

Measuring Relationships and Uncertainty

Hypothesis Testing by Example Hands On Approach Using R
No ratings yet
Hypothesis Testing by Example Hands On Approach Using R
39 pages
4.2 Hypothesis Testing
No ratings yet
4.2 Hypothesis Testing
49 pages
Hypothesis Testing - Intro - Summer 2025
No ratings yet
Hypothesis Testing - Intro - Summer 2025
59 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
54 pages
Isds361b Notes
No ratings yet
Isds361b Notes
103 pages
Chapter 7 - Statistical Inference
No ratings yet
Chapter 7 - Statistical Inference
62 pages
Unit 3 (Hypothesis Testing)
No ratings yet
Unit 3 (Hypothesis Testing)
40 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
8 pages
STOR 120 - Lecture Slides - Review For Midterm 2 - Solutions
No ratings yet
STOR 120 - Lecture Slides - Review For Midterm 2 - Solutions
64 pages
U02Lecture05 - Statistical Experiments and Significance Testing
No ratings yet
U02Lecture05 - Statistical Experiments and Significance Testing
51 pages
08-Data Science-S25-Comparing Two Samples
No ratings yet
08-Data Science-S25-Comparing Two Samples
33 pages
cs447 - Tool Using Simulation To Understand Uncertainty
No ratings yet
cs447 - Tool Using Simulation To Understand Uncertainty
5 pages
Stats Final Review
No ratings yet
Stats Final Review
11 pages
IE5005 Lecture 04
No ratings yet
IE5005 Lecture 04
57 pages
Lesson - 4.1 - Hypothesis Testing - Analyze - Phase
No ratings yet
Lesson - 4.1 - Hypothesis Testing - Analyze - Phase
81 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
58 pages
Module 3 Half
No ratings yet
Module 3 Half
48 pages
Module2 DS
No ratings yet
Module2 DS
46 pages
Lec 15
No ratings yet
Lec 15
43 pages
DAF1101 Business Statistics-1
No ratings yet
DAF1101 Business Statistics-1
219 pages
R Print
No ratings yet
R Print
10 pages
Statistical Hypothesis Testing
No ratings yet
Statistical Hypothesis Testing
20 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
37 pages
Advanced Statistic
No ratings yet
Advanced Statistic
33 pages
Data Analytics Module 1 Lesson 6 Summary Notes
No ratings yet
Data Analytics Module 1 Lesson 6 Summary Notes
17 pages
5 - Stat Lecture..
No ratings yet
5 - Stat Lecture..
44 pages
Hypothesis Python
No ratings yet
Hypothesis Python
42 pages
Lecture 1: Course Introduction, Review and Paired-Samples T-Test
No ratings yet
Lecture 1: Course Introduction, Review and Paired-Samples T-Test
13 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
57 pages
Computational Data Science - Unit 4
No ratings yet
Computational Data Science - Unit 4
18 pages
DMDA Unit-5 Notes
No ratings yet
DMDA Unit-5 Notes
35 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
DataScience Interview Master Doc
No ratings yet
DataScience Interview Master Doc
120 pages
PDF Merge
No ratings yet
PDF Merge
23 pages
PSAI Unit 4
No ratings yet
PSAI Unit 4
38 pages
Hypothesis
No ratings yet
Hypothesis
27 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Chapter 5
No ratings yet
Chapter 5
35 pages
Stat 139 - Unit 03 - Hypothesis Testing - 1 Per Page
No ratings yet
Stat 139 - Unit 03 - Hypothesis Testing - 1 Per Page
32 pages
Hypothesis
No ratings yet
Hypothesis
16 pages
00 - Inrroduction To Statistics
No ratings yet
00 - Inrroduction To Statistics
30 pages
BRM-Chapter-10-Hypothesis Testing For Single Populations - Revised
No ratings yet
BRM-Chapter-10-Hypothesis Testing For Single Populations - Revised
28 pages
CS194 Lec 06 EDA
No ratings yet
CS194 Lec 06 EDA
40 pages
Unit Iv
No ratings yet
Unit Iv
21 pages
Statistics Lecture Part 4
No ratings yet
Statistics Lecture Part 4
100 pages
Module 3 Hypothesis Testing Using R
No ratings yet
Module 3 Hypothesis Testing Using R
7 pages
Introduction To Statistical Hypothesis Testing in R
No ratings yet
Introduction To Statistical Hypothesis Testing in R
8 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
60 pages
R Session Bootstrapping Randomisation 2024
No ratings yet
R Session Bootstrapping Randomisation 2024
4 pages
Statistical Analysis (T-Test)
No ratings yet
Statistical Analysis (T-Test)
61 pages
Ken Black QA ch09
No ratings yet
Ken Black QA ch09
60 pages
Hypothesis Lecture
No ratings yet
Hypothesis Lecture
7 pages
Theory
No ratings yet
Theory
7 pages
Assignment No. 02 Introduction To Educational Statistics (8614)
No ratings yet
Assignment No. 02 Introduction To Educational Statistics (8614)
19 pages
12 Hypothesis
No ratings yet
12 Hypothesis
20 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
Chapter 15: Chi Squared Tests
No ratings yet
Chapter 15: Chi Squared Tests
29 pages
Hypothesis Testing Lecture
No ratings yet
Hypothesis Testing Lecture
28 pages
Key Statistical Ideas For Research Students v2
No ratings yet
Key Statistical Ideas For Research Students v2
4 pages
Chapter 1 Data and Statistics Cengage
100% (4)
Chapter 1 Data and Statistics Cengage
31 pages
Chapter6 10
100% (2)
Chapter6 10
28 pages
Quantile Regression (Final) PDF
100% (1)
Quantile Regression (Final) PDF
22 pages
أساسيات الاقتصاد القياسي باستخدام ايفيوز د خالد السواعي موقع المكتبة
No ratings yet
أساسيات الاقتصاد القياسي باستخدام ايفيوز د خالد السواعي موقع المكتبة
290 pages
Chapter Three
No ratings yet
Chapter Three
7 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
Anderson-Darling Test
No ratings yet
Anderson-Darling Test
5 pages
Cameron & Trivedi - Solution Manual Cap. 4-5
0% (1)
Cameron & Trivedi - Solution Manual Cap. 4-5
12 pages
Ch8 Statistics
No ratings yet
Ch8 Statistics
43 pages
Hypothesis Test and Significance Level
No ratings yet
Hypothesis Test and Significance Level
27 pages
cs446 - Tool Summarizing and Visualizing Numerical Variables in Bbivariate and Multivariate Analyses
No ratings yet
cs446 - Tool Summarizing and Visualizing Numerical Variables in Bbivariate and Multivariate Analyses
14 pages
cs446 - Tool Assembling A Document With R Markdown
No ratings yet
cs446 - Tool Assembling A Document With R Markdown
2 pages
Tutorial
No ratings yet
Tutorial
42 pages
Reservoir
No ratings yet
Reservoir
13 pages
Chapter 13
No ratings yet
Chapter 13
108 pages
cs446 Glossary
No ratings yet
cs446 Glossary
4 pages
Examining The Advantages and Disadvantages of Pilot Studies - Mon
No ratings yet
Examining The Advantages and Disadvantages of Pilot Studies - Mon
139 pages
Pengaruh Pengalaman Kerja, Independensi, Integritas, Kompetensi Dan Etika Auditor Kualitas Audit
No ratings yet
Pengaruh Pengalaman Kerja, Independensi, Integritas, Kompetensi Dan Etika Auditor Kualitas Audit
13 pages
B1. Ecology and Life
No ratings yet
B1. Ecology and Life
72 pages
cs446 - Course Project Part
No ratings yet
cs446 - Course Project Part
4 pages
FormualSheet Final 2025 QM1
No ratings yet
FormualSheet Final 2025 QM1
2 pages
CH 11 Slides
No ratings yet
CH 11 Slides
41 pages
Section 8.2
No ratings yet
Section 8.2
23 pages
DBCA
No ratings yet
DBCA
11 pages
Statistical Hypothesis
No ratings yet
Statistical Hypothesis
70 pages
Sample Final Solutions
No ratings yet
Sample Final Solutions
12 pages
TP6 Matlab
No ratings yet
TP6 Matlab
5 pages
Hasil Analisa Univariat
No ratings yet
Hasil Analisa Univariat
17 pages
Comparing Two Measurement Devices
No ratings yet
Comparing Two Measurement Devices
32 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
Quantum XLExample
No ratings yet
Quantum XLExample
83 pages
Eviews Packages Eviews Add-Ins, User Objects, and Library Packages
No ratings yet
Eviews Packages Eviews Add-Ins, User Objects, and Library Packages
1 page

cs447 - Tool Using Simulation To Test A Hypothesis

Uploaded by

cs447 - Tool Using Simulation To Test A Hypothesis

Uploaded by

TOOL

Using Simulation to Test

Using R With This Tool

Measuring Relationships and Uncertainty

Here is the table of the results of this Randomized Controlled Trial:

Patients Worked Didn’t Work % Success

Total 103 49 54 47.6

set.seed(1) # set seed for reproducibility

# Set up this scenario:

p_new = 27/52 # Proportion of success with new treatment

Measuring Relationships and Uncertainty

# Simulate results of the NEW treatment, assuming the probability of

# Draw the histogram:

Null Distribution of the Sample Statistic

−0.4 −0.2 0.0 0.2 0.4

Measuring Relationships and Uncertainty

# Draw the histogram:

# Plot the Observed Statistic:

# Calculate the p-value:

Null Distribution of the Sample Statistic

−0.4 −0.2 0.0 0.2 0.4

Measuring Relationships and Uncertainty

You might also like