0% found this document useful (0 votes)

20 views

Hypothesis Testing in Python

Hypothesis testing lets you answer questions about your datasets in a statistically rigorous way. In this course, you'll grow your Python analytical skills as you learn how and when to use common tests like t-tests, proportion tests, and chi-square tests. Working with real-world data, including Stack Overflow user feedback and supply-chain data for medical supply shipments, you'll gain a deep understanding of how these tests work and the key assumptions that underpin them.

Uploaded by

jcmayac

We take content rights seriously. If you suspect this is your content, claim it here.

0% found this document useful (0 votes)

20 views

Hypothesis Testing in Python

Uploaded by

jcmayac

We take content rights seriously. If you suspect this is your content, claim it here.

You are on page 1/ 149

Hypothesis tests and

z-scores
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
A/B testing
In 2013, Electronic Arts (EA) released
SimCity 5

They wanted to increase pre-orders of the

game

They used A/B testing to test different

advertising scenarios

This involves splitting users into control and

treatment groups

1 Image credit: "Electronic Arts" by majaX1 CC BY-NC-SA 2.0

HYPOTHESIS TESTING IN PYTHON

Retail webpage A/B test
Control: Treatment:

HYPOTHESIS TESTING IN PYTHON

A/B test results
The treatment group (no ad) got 43.4% more purchases than the control group (with ad)
Intuition that "showing an ad would increase sales" was false

Was this result statistically significant or just chance?

Need EA's data to determine this

Techniques from Sampling in Python + this course to do so

HYPOTHESIS TESTING IN PYTHON

Stack Overflow Developer Survey 2020
import pandas as pd
print(stack_overflow)

respondent age_1st_code ... age hobbyist

0 36.0 30.0 ... 34.0 Yes
1 47.0 10.0 ... 53.0 Yes
2 69.0 12.0 ... 25.0 Yes
3 125.0 30.0 ... 41.0 Yes
4 147.0 15.0 ... 28.0 No
... ... ... ... ... ...
2259 62867.0 13.0 ... 33.0 Yes
2260 62882.0 13.0 ... 28.0 Yes

[2261 rows x 8 columns]

HYPOTHESIS TESTING IN PYTHON

Hypothesizing about the mean
A hypothesis:

The mean annual compensation of the population of data scientists is $110,000

The point estimate (sample statistic):

mean_comp_samp = stack_overflow['converted_comp'].mean()

119574.71738168952

HYPOTHESIS TESTING IN PYTHON

Generating a bootstrap distribution
import numpy as np
# Step 3. Repeat steps 1 & 2 many times, appending to a list
so_boot_distn = []
for i in range(5000):
so_boot_distn.append(
# Step 2. Calculate point estimate
np.mean(
# Step 1. Resample
stack_overflow.sample(frac=1, replace=True)['converted_comp']
)
)

1 Bootstrap distributions are taught in Chapter 4 of Sampling in Python

HYPOTHESIS TESTING IN PYTHON

Visualizing the bootstrap distribution
import matplotlib.pyplot as plt
plt.hist(so_boot_distn, bins=50)
plt.show()

HYPOTHESIS TESTING IN PYTHON

Standard error
std_error = np.std(so_boot_distn, ddof=1)

5607.997577378606

HYPOTHESIS TESTING IN PYTHON

z-scores
value − mean
standardized value =
standard deviation
sample stat − hypoth. param. value
z=
standard error

HYPOTHESIS TESTING IN PYTHON

sample stat − hypoth. param. value
z=
standard error
stack_overflow['converted_comp'].mean()

119574.71738168952

mean_comp_hyp = 110000

std_error

5607.997577378606

z_score = (mean_comp_samp - mean_comp_hyp) / std_error

1.7073326529796957

HYPOTHESIS TESTING IN PYTHON

Testing the hypothesis
Is 1.707 a high or low number?
This is the goal of the course!

HYPOTHESIS TESTING IN PYTHON

Testing the hypothesis
Is 1.707 a high or low number?
This is the goal of the course!

Hypothesis testing use case:

Determine whether sample statistics are close to or far away from expected (or
"hypothesized" values)

HYPOTHESIS TESTING IN PYTHON

Standard normal (z) distribution
Standard normal distribution: normal distribution with mean = 0 + standard deviation = 1

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
p-values
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Criminal trials
Two possible true states:
1. Defendant committed the crime

2. Defendant did not commit the crime

Two possible verdicts:

1. Guilty

2. Not guilty

Initially the defendant is assumed to be not guilty

Prosecution must present evidence "beyond reasonable doubt" for a guilty verdict

HYPOTHESIS TESTING IN PYTHON

Age of first programming experience
age_first_code_cut classifies when Stack Overflow user first started programming
"adult" means they started at 14 or older

"child" means they started before 14

Previous research: 35% of software developers started programming as children

Evidence that a greater proportion of data scientists starting programming as children?

HYPOTHESIS TESTING IN PYTHON

Definitions
A hypothesis is a statement about an unknown population parameter

A hypothesis test is a test of two competing hypotheses

The null hypothesis (H0 ) is the existing idea

The alternative hypothesis (HA ) is the new "challenger" idea of the researcher

For our problem:

H0 : The proportion of data scientists starting programming as children is 35%

HA : The proportion of data scientists starting programming as children is greater than 35%

1"Naught" is British English for "zero". For historical reasons, "H-naught" is the international convention for
pronouncing the null hypothesis.

HYPOTHESIS TESTING IN PYTHON

Criminal trials vs. hypothesis testing
Either HA or H0 is true (not both)
Initially, H0 is assumed to be true

The test ends in either "reject H0 " or "fail to reject H0 "

If the evidence from the sample is "significant" that HA is true, reject H0 , else choose H0

Significance level is "beyond a reasonable doubt" for hypothesis testing

HYPOTHESIS TESTING IN PYTHON

One-tailed and two-tailed tests
Hypothesis tests check if the sample statistics
lie in the tails of the null distribution

Test Tails
alternative different from null two-tailed
alternative greater than null right-tailed
alternative less than null left-tailed

HA : The proportion of data scientists starting

programming as children is greater than 35%

This is a right-tailed test

HYPOTHESIS TESTING IN PYTHON

p-values
p-values: probability of obtaining a result,
assuming the null hypothesis is true

Large p-value, large support for H0

Statistic likely not in the tail of the null
distribution
Small p-value, strong evidence against H0
Statistic likely in the tail of the null
distribution
"p" in p-value → probability

"small" means "close to zero"

HYPOTHESIS TESTING IN PYTHON

Calculating the z-score
prop_child_samp = (stack_overflow['age_first_code_cut'] == "child").mean()

0.39141972578505085

prop_child_hyp = 0.35

std_error = np.std(first_code_boot_distn, ddof=1)

0.010351057228878566

z_score = (prop_child_samp - prop_child_hyp) / std_error

4.001497129152506

HYPOTHESIS TESTING IN PYTHON

Calculating the p-value
norm.cdf() is normal CDF from scipy.stats .

Left-tailed test → use norm.cdf() .

Right-tailed test → use 1 - norm.cdf() .

from scipy.stats import norm

1 - norm.cdf(z_score, loc=0, scale=1)

3.1471479512323874e-05

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Statistical
significance
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
p-value recap
p-values quantify evidence for the null hypothesis
Large p-value → fail to reject null hypothesis

Small p-value → reject null hypothesis

Where is the cutoff point?

HYPOTHESIS TESTING IN PYTHON

Significance level
The significance level of a hypothesis test (α) is the threshold point for "beyond a
reasonable doubt"

Common values of α are 0.2 , 0.1 , 0.05 , and 0.01

If p ≤ α, reject H0 , else fail to reject H0

α should be set prior to conducting the hypothesis test

HYPOTHESIS TESTING IN PYTHON

Calculating the p-value
alpha = 0.05
prop_child_samp = (stack_overflow['age_first_code_cut'] == "child").mean()
prop_child_hyp = 0.35
std_error = np.std(first_code_boot_distn, ddof=1)

z_score = (prop_child_samp - prop_child_hyp) / std_error

p_value = 1 - norm.cdf(z_score, loc=0, scale=1)

3.1471479512323874e-05

HYPOTHESIS TESTING IN PYTHON

Making a decision
alpha = 0.05
print(p_value)

3.1471479512323874e-05

p_value <= alpha

True

Reject H0 in favor of HA

HYPOTHESIS TESTING IN PYTHON

Confidence intervals
For a significance level of α, it's common to choose a confidence interval level of 1 - α

α = 0.05 → 95% confidence interval

import numpy as np
lower = np.quantile(first_code_boot_distn, 0.025)
upper = np.quantile(first_code_boot_distn, 0.975)
print((lower, upper))

(0.37063246351172047, 0.41132242370632466)

HYPOTHESIS TESTING IN PYTHON

Types of errors
Truly didn't commit crime Truly committed crime
Verdict not guilty correct they got away with it
Verdict guilty wrongful conviction correct

actual H0 actual HA

chosen H0 correct false negative

chosen HA false positive correct

False positives are Type I errors; false negatives are Type II errors.

HYPOTHESIS TESTING IN PYTHON

Possible errors in our example
If p ≤ α, we reject H0 :

A false positive (Type I) error: data scientists didn't start coding as children at a higher rate

If p > α, we fail to reject H0 :

A false negative (Type II) error: data scientists started coding as children at a higher rate

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Performing t-tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Two-sample problems
Compare sample statistics across groups of a variable
converted_comp is a numerical variable

age_first_code_cut is a categorical variable with levels ( "child" and "adult" )

Are users who first programmed as a child compensated higher than those that started as
adults?

HYPOTHESIS TESTING IN PYTHON

Hypotheses
H0 : The mean compensation (in USD) is the same for those that coded first as a child and
those that coded first as an adult.

H0 : μchild = μadult

H0 : μchild − μadult = 0

HA : The mean compensation (in USD) is greater for those that coded first as a child
compared to those that coded first as an adult.

HA : μchild > μadult

HA : μchild − μadult > 0

HYPOTHESIS TESTING IN PYTHON

Calculating groupwise summary statistics
stack_overflow.groupby('age_first_code_cut')['converted_comp'].mean()

age_first_code_cut
adult 111313.311047
child 132419.570621
Name: converted_comp, dtype: float64

HYPOTHESIS TESTING IN PYTHON

Test statistics
Sample mean estimates the population mean

x̄ - a sample mean
x̄child - sample mean compensation for coding first as a child
x̄adult - sample mean compensation for coding first as an adult
x̄child − x̄adult - a test statistic
z-score - a (standardized) test statistic

HYPOTHESIS TESTING IN PYTHON

Standardizing the test statistic
sample stat − population parameter
z=
standard error
difference in sample stats − difference in population parameters
t=
standard error
(x̄child − x̄adult ) − (μchild − μadult )
t=
SE(x̄child − x̄adult )

HYPOTHESIS TESTING IN PYTHON

Standard error
SE(x̄child − x̄adult ) ≈ √
s2child s2adult
+
nchild nadult

s is the standard deviation of the variable

n is the sample size (number of observations/rows in sample)

HYPOTHESIS TESTING IN PYTHON

Assuming the null hypothesis is true
(x̄child − x̄adult ) − (μchild − μadult )
t=
SE(x̄child − x̄adult )
(x̄child − x̄adult )
H0 : μchild − μadult = 0 → t=
SE(x̄child − x̄adult )
(x̄child − x̄adult )
t=
√
s2child s2adult
+
nchild nadult

HYPOTHESIS TESTING IN PYTHON

Calculations assuming the null hypothesis is true
xbar = stack_overflow.groupby('age_first_code_cut')['converted_comp'].mean()

adult 111313.311047
child 132419.570621
Name: converted_comp, dtype: float64 age_first_code_cut

s = stack_overflow.groupby('age_first_code_cut')['converted_comp'].std()

adult 271546.521729
child 255585.240115
Name: converted_comp, dtype: float64 age_first_code_cut

n = stack_overflow.groupby('age_first_code_cut')['converted_comp'].count()

adult 1376
child 885
Name: converted_comp, dtype: int64

HYPOTHESIS TESTING IN PYTHON

Calculating the test statistic
(x̄child − x̄adult )
t=
√
s2child s2adult
+
nchild nadult

import numpy as np
numerator = xbar_child - xbar_adult
denominator = np.sqrt(s_child ** 2 / n_child + s_adult ** 2 / n_adult)
t_stat = numerator / denominator

1.8699313316221844

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Calculating p-values
from t-statistics
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
t-distributions
t statistic follows a t-distribution
Have a parameter named degrees of
freedom, or df
Look like normal distributions, with fatter
tails

HYPOTHESIS TESTING IN PYTHON

Degrees of freedom
Larger degrees of freedom → t-distribution
gets closer to the normal distribution

Normal distribution → t-distribution with

infinite df

Degrees of freedom: maximum number of

logically independent values in the data
sample

HYPOTHESIS TESTING IN PYTHON

Calculating degrees of freedom
Dataset has 5 independent observations
Four of the values are 2, 6, 8, and 5

The sample mean is 5

The last value must be 4

Here, there are 4 degrees of freedom

df = nchild + nadult − 2

HYPOTHESIS TESTING IN PYTHON

Hypotheses
H0 : The mean compensation (in USD) is the same for those that coded first as a child and
those that coded first as an adult

HA : The mean compensation (in USD) is greater for those that coded first as a child
compared to those that coded first as an adult

Use a right-tailed test

HYPOTHESIS TESTING IN PYTHON

Significance level
α = 0.1

If p ≤ α then reject H0 .

HYPOTHESIS TESTING IN PYTHON

Calculating p-values: one proportion vs. a value
from scipy.stats import norm
1 - norm.cdf(z_score)

SE(x̄child − x̄adult ) ≈ √
s2child s2adult
+
nchild nadult

z-statistic: needed when using one sample statistic to estimate a population parameter

t-statistic: needed when using multiple sample statistics to estimate a population parameter

HYPOTHESIS TESTING IN PYTHON

Calculating p-values: two means from different groups
numerator = xbar_child - xbar_adult
denominator = np.sqrt(s_child ** 2 / n_child + s_adult ** 2 / n_adult)
t_stat = numerator / denominator

1.8699313316221844

degrees_of_freedom = n_child + n_adult - 2

2259

HYPOTHESIS TESTING IN PYTHON

Calculating p-values: two means from different groups
Use t-distribution CDF not normal CDF

from scipy.stats import t

1 - t.cdf(t_stat, df=degrees_of_freedom)

0.030811302165157595

Evidence that Stack Overflow data scientists who started coding as a child earn more.

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Paired t-tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
US Republican presidents dataset
state county repub_percent_08 repub_percent_12
0 Alabama Hale 38.957877 37.139882
1 Arkansas Nevada 56.726272 58.983452
2 California Lake 38.896719 39.331367
3 California Ventura 42.923190 45.250693
.. ... ... ... ...
96 Wisconsin La Crosse 37.490904 40.577038
97 Wisconsin Lafayette 38.104967 41.675050
98 Wyoming Weston 76.684241 83.983328
99 Alaska District 34 77.063259 40.789626

[100 rows x 4 columns]

100 rows; each row represents county-level votes in a presidential election.

1 https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ

HYPOTHESIS TESTING IN PYTHON

Hypotheses
Question: Was the percentage of Republican candidate votes lower in 2008 than 2012?

H0 : μ2008 − μ2012 = 0

HA : μ2008 − μ2012 < 0

Set α = 0.05 significance level.

Data is paired → each voter percentage refers to the same county

Want to capture voting patterns in model

HYPOTHESIS TESTING IN PYTHON

From two samples to one
sample_data = repub_votes_potus_08_12
sample_data['diff'] = sample_data['repub_percent_08'] - sample_data['repub_percent_12']

import matplotlib.pyplot as plt

sample_data['diff'].hist(bins=20)

HYPOTHESIS TESTING IN PYTHON

Calculate sample statistics of the difference
xbar_diff = sample_data['diff'].mean()

-2.877109041242944

HYPOTHESIS TESTING IN PYTHON

Revised hypotheses
Old hypotheses: x̄diff − μdiff
t=
√
H0 : μ2008 − μ2012 = 0 s2dif f
HA : μ2008 − μ2012 < 0 ndiff

df = ndif f − 1

New hypotheses:
H0 : μdiff = 0

HA : μdiff < 0

HYPOTHESIS TESTING IN PYTHON

Calculating the p-value
x̄diff − μdiff
n_diff = len(sample_data) t=
√
s2diff
100
ndiff
s_diff = sample_data['diff'].std()
df = ndiff − 1
t_stat = (xbar_diff-0) / np.sqrt(s_diff**2/n_diff)

-5.601043121928489 from scipy.stats import t

p_value = t.cdf(t_stat, df=n_diff-1)

degrees_of_freedom = n_diff - 1
9.572537285272411e-08

HYPOTHESIS TESTING IN PYTHON

Testing differences between two means using ttest()
import pingouin
pingouin.ttest(x=sample_data['diff'],
y=0,
alternative="less")

T dof alternative p-val CI95% cohen-d \

T-test -5.601043 99 less 9.572537e-08 [-inf, -2.02] 0.560104

BF10 power
T-test 1.323e+05 1.0

1Details on Returns from pingouin.ttest() are available in the API docs for pingouin at https://pingouin-
stats.org/generated/pingouin.ttest.html#pingouin.ttest.

HYPOTHESIS TESTING IN PYTHON

ttest() with paired=True
pingouin.ttest(x=sample_data['repub_percent_08'],
y=sample_data['repub_percent_12'],
paired=True,
alternative="less")

T dof alternative p-val CI95% cohen-d \

T-test -5.601043 99 less 9.572537e-08 [-inf, -2.02] 0.217364

BF10 power
T-test 1.323e+05 0.696338

HYPOTHESIS TESTING IN PYTHON

Unpaired ttest()
pingouin.ttest(x=sample_data['repub_percent_08'],
y=sample_data['repub_percent_12'],
paired=False, # The default
alternative="less")

T dof alternative p-val CI95% cohen-d BF10 \

T-test -1.536997 198 less 0.062945 [-inf, 0.22] 0.217364 0.927

power
T-test 0.454972

Unpaired t-tests on paired data increases the chances of false negative errors

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
ANOVA tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Job satisfaction: 5 categories
stack_overflow['job_sat'].value_counts()

Very satisfied 879

Slightly satisfied 680
Slightly dissatisfied 342
Neither 201
Very dissatisfied 159
Name: job_sat, dtype: int64

HYPOTHESIS TESTING IN PYTHON

Visualizing multiple distributions
Is mean annual compensation different for
different levels of job satisfaction?

import seaborn as sns

import matplotlib.pyplot as plt
sns.boxplot(x="converted_comp",
y="job_sat",
data=stack_overflow)
plt.show()

HYPOTHESIS TESTING IN PYTHON

Analysis of variance (ANOVA)
A test for differences between groups

alpha = 0.2

pingouin.anova(data=stack_overflow,
dv="converted_comp",
between="job_sat")

Source ddof1 ddof2 F p-unc np2

0 job_sat 4 2256 4.480485 0.001315 0.007882

0.001315 <α
At least two categories have significantly different compensation

HYPOTHESIS TESTING IN PYTHON

Pairwise tests
μvery dissatisfied ≠ μslightly dissatisfied μslightly dissatisfied ≠ μslightly satisfied
μvery dissatisfied ≠ μneither μslightly dissatisfied ≠ μvery satisfied
μvery dissatisfied ≠ μslightly satisfied μneither ≠ μslightly satisfied
μvery dissatisfied ≠ μvery satisfied μneither ≠ μvery satisfied
μslightly dissatisfied ≠ μneither μslightly satisfied ≠ μvery satisfied

Set significance level to α = 0.2.

HYPOTHESIS TESTING IN PYTHON

pairwise_tests()
pingouin.pairwise_tests(data=stack_overflow,
dv="converted_comp",
between="job_sat",
padjust="none")

Contrast A B Paired Parametric ... dof alternative p-unc BF10 hedges

0 job_sat Slightly satisfied Very satisfied False True ... 1478.622799 two-sided 0.000064 158.564 -0.192931
1 job_sat Slightly satisfied Neither False True ... 258.204546 two-sided 0.484088 0.114 -0.068513
2 job_sat Slightly satisfied Very dissatisfied False True ... 187.153329 two-sided 0.215179 0.208 -0.145624
3 job_sat Slightly satisfied Slightly dissatisfied False True ... 569.926329 two-sided 0.969491 0.074 -0.002719
4 job_sat Very satisfied Neither False True ... 328.326639 two-sided 0.097286 0.337 0.120115
5 job_sat Very satisfied Very dissatisfied False True ... 221.666205 two-sided 0.455627 0.126 0.063479
6 job_sat Very satisfied Slightly dissatisfied False True ... 821.303063 two-sided 0.002166 7.43 0.173247
7 job_sat Neither Very dissatisfied False True ... 321.165726 two-sided 0.585481 0.135 -0.058537
8 job_sat Neither Slightly dissatisfied False True ... 367.730081 two-sided 0.547406 0.118 0.055707
9 job_sat Very dissatisfied Slightly dissatisfied False True ... 247.570187 two-sided 0.259590 0.197 0.119131

[10 rows x 11 columns]

HYPOTHESIS TESTING IN PYTHON

As the number of groups increases...

HYPOTHESIS TESTING IN PYTHON

Bonferroni correction
pingouin.pairwise_tests(data=stack_overflow,
dv="converted_comp",
between="job_sat",
padjust="bonf")

Contrast A B ... p-unc p-corr p-adjust BF10 hedges

0 job_sat Slightly satisfied Very satisfied ... 0.000064 0.000638 bonf 158.564 -0.192931
1 job_sat Slightly satisfied Neither ... 0.484088 1.000000 bonf 0.114 -0.068513
2 job_sat Slightly satisfied Very dissatisfied ... 0.215179 1.000000 bonf 0.208 -0.145624
3 job_sat Slightly satisfied Slightly dissatisfied ... 0.969491 1.000000 bonf 0.074 -0.002719
4 job_sat Very satisfied Neither ... 0.097286 0.972864 bonf 0.337 0.120115
5 job_sat Very satisfied Very dissatisfied ... 0.455627 1.000000 bonf 0.126 0.063479
6 job_sat Very satisfied Slightly dissatisfied ... 0.002166 0.021659 bonf 7.43 0.173247
7 job_sat Neither Very dissatisfied ... 0.585481 1.000000 bonf 0.135 -0.058537
8 job_sat Neither Slightly dissatisfied ... 0.547406 1.000000 bonf 0.118 0.055707
9 job_sat Very dissatisfied Slightly dissatisfied ... 0.259590 1.000000 bonf 0.197 0.119131

[10 rows x 11 columns]

HYPOTHESIS TESTING IN PYTHON

More methods
padjust : string

Method used for testing and adjustment of pvalues.

'none' : no correction [default]

'bonf' : one-step Bonferroni correction

'sidak' : one-step Sidak correction

'holm' : step-down method using Bonferroni adjustments

'fdr_bh' : Benjamini/Hochberg FDR correction

'fdr_by' : Benjamini/Yekutieli FDR correction

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
One-sample
proportion tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Chapter 1 recap
Is a claim about an unknown population proportion feasible?

1. Standard error of sample statistic from bootstrap distribution

2. Compute a standardized test statistic

3. Calculate a p-value

4. Decide which hypothesis made most sense

Now, calculate the test statistic without using the bootstrap distribution

HYPOTHESIS TESTING IN PYTHON

Standardized test statistic for proportions
p: population proportion (unknown population parameter)

p^: sample proportion (sample statistic)

p0 : hypothesized population proportion

p^ − mean( p^) p^ − p
z= =
SE( p^) SE( p^)
Assuming H0 is true, p = p0 , so
p^ − p0
z=
SE( p^)

HYPOTHESIS TESTING IN PYTHON

Simplifying the standard error calculations
SE p^ = √
p0 ∗ (1 − p0 )
→ Under H0 , SE p^ depends on hypothesized p0 and sample size n
n
Assuming H0 is true,

p^ − p0
z=
√ p0 ∗ (1 − p0 )
n
^ and n) and the hypothesized parameter (p0 )
Only uses sample information ( p

HYPOTHESIS TESTING IN PYTHON

Why z instead of t?
(x̄child − x̄adult )
t=
√
s2child s2adult
+
nchild nadult

s is calculated from x̄
x̄ estimates the population mean
s estimates the population standard deviation
↑ uncertainty in our estimate of the parameter
t-distribution - fatter tails than a normal distribution

p^ only appears in the numerator, so z-scores are fine

HYPOTHESIS TESTING IN PYTHON

Stack Overflow age categories
H0 : Proportion of Stack Overflow users under thirty = 0.5

HA : Proportion of Stack Overflow users under thirty ≠ 0.5

alpha = 0.01

stack_overflow['age_cat'].value_counts(normalize=True)

Under 30 0.535604
At least 30 0.464396
Name: age_cat, dtype: float64

HYPOTHESIS TESTING IN PYTHON

Variables for z
p_hat = (stack_overflow['age_cat'] == 'Under 30').mean()

0.5356037151702786

p_0 = 0.50

n = len(stack_overflow)

2261

HYPOTHESIS TESTING IN PYTHON

Calculating the z-score
p^ − p0
z=
√
p0 ∗ (1 − p0 )
n

import numpy as np
numerator = p_hat - p_0
denominator = np.sqrt(p_0 * (1 - p_0) / n)
z_score = numerator / denominator

3.385911440783663

HYPOTHESIS TESTING IN PYTHON

Calculating the p-value
Two-tailed ("not equal"):

p_value = norm.cdf(-z_score) +
1 - norm.cdf(z_score)

p_value = 2 * (1 - norm.cdf(z_score))
Left-tailed ("less than"):
0.0007094227368100725
from scipy.stats import norm
p_value = norm.cdf(z_score)
p_value <= alpha

Right-tailed ("greater than"):

True

p_value = 1 - norm.cdf(z_score)

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Two-sample
proportion tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Comparing two proportions
H0 : Proportion of hobbyist users is the same for those under thirty as those at least thirty

H0 : p≥30 − p<30 = 0

HA : Proportion of hobbyist users is different for those under thirty to those at least thirty

HA : p≥30 − p<30 ≠ 0

alpha = 0.05

HYPOTHESIS TESTING IN PYTHON

Calculating the z-score
z-score equation for a proportion test:
( p^≥30 − p^<30 ) − 0
z=
SE( p^≥30 − p^<30 )
Standard error equation:

SE( p^≥30 − p^<30 ) = √

p^ × (1 − p^) p^ × (1 − p^)
+
n≥30 n<30

p^ → weighted mean of p^≥30 and p^<30

n≥30 × p^≥30 + n<30 × p^<30
p^ =
n≥30 + n<30
Only require p^≥30 , p^<30 , n≥30 , n<30 from the sample to calculate the z-score

HYPOTHESIS TESTING IN PYTHON

Getting the numbers for the z-score
p_hats = stack_overflow.groupby("age_cat")['hobbyist'].value_counts(normalize=True)

age_cat hobbyist
At least 30 Yes 0.773333
No 0.226667
Under 30 Yes 0.843105
No 0.156895
Name: hobbyist, dtype: float64

n = stack_overflow.groupby("age_cat")['hobbyist'].count()

age_cat
At least 30 1050
Under 30 1211
Name: hobbyist, dtype: int64

HYPOTHESIS TESTING IN PYTHON

Getting the numbers for the z-score
p_hats = stack_overflow.groupby("age_cat")['hobbyist'].value_counts(normalize=True)

age_cat hobbyist
At least 30 Yes 0.773333
No 0.226667
Under 30 Yes 0.843105
No 0.156895
Name: hobbyist, dtype: float64

p_hat_at_least_30 = p_hats[("At least 30", "Yes")]

p_hat_under_30 = p_hats[("Under 30", "Yes")]
print(p_hat_at_least_30, p_hat_under_30)

0.773333 0.843105

HYPOTHESIS TESTING IN PYTHON

Getting the numbers for the z-score
n = stack_overflow.groupby("age_cat")['hobbyist'].count()

age_cat
At least 30 1050
Under 30 1211
Name: hobbyist, dtype: int64

n_at_least_30 = n["At least 30"]

n_under_30 = n["Under 30"]
print(n_at_least_30, n_under_30)

1050 1211

HYPOTHESIS TESTING IN PYTHON

Getting the numbers for the z-score
p_hat = (n_at_least_30 * p_hat_at_least_30 + n_under_30 * p_hat_under_30) /
(n_at_least_30 + n_under_30)

std_error = np.sqrt(p_hat * (1-p_hat) / n_at_least_30 +

p_hat * (1-p_hat) / n_under_30)

z_score = (p_hat_at_least_30 - p_hat_under_30) / std_error

print(z_score)

-4.223718652693034

HYPOTHESIS TESTING IN PYTHON

Proportion tests using proportions_ztest()
stack_overflow.groupby("age_cat")['hobbyist'].value_counts()

age_cat hobbyist
At least 30 Yes 812
No 238
Under 30 Yes 1021
No 190
Name: hobbyist, dtype: int64

n_hobbyists = np.array([812, 1021])

n_rows = np.array([812 + 238, 1021 + 190])
from statsmodels.stats.proportion import proportions_ztest
z_score, p_value = proportions_ztest(count=n_hobbyists, nobs=n_rows,
alternative="two-sided")

(-4.223691463320559, 2.403330142685068e-05)

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Chi-square test of
independence
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Revisiting the proportion test
age_by_hobbyist = stack_overflow.groupby("age_cat")['hobbyist'].value_counts()

age_cat hobbyist
At least 30 Yes 812
No 238
Under 30 Yes 1021
No 190
Name: hobbyist, dtype: int64

from statsmodels.stats.proportion import proportions_ztest

n_hobbyists = np.array([812, 1021])
n_rows = np.array([812 + 238, 1021 + 190])
stat, p_value = proportions_ztest(count=n_hobbyists, nobs=n_rows,
alternative="two-sided")

(-4.223691463320559, 2.403330142685068e-05)

HYPOTHESIS TESTING IN PYTHON

Independence of variables
Previous hypothesis test result: evidence that hobbyist and age_cat are associated

Statistical independence - proportion of successes in the response variable is the same

across all categories of the explanatory variable

HYPOTHESIS TESTING IN PYTHON

Test for independence of variables
import pingouin
expected, observed, stats = pingouin.chi2_independence(data=stack_overflow, x='hobbyist',
y='age_cat', correction=False)
print(stats)

test lambda chi2 dof pval cramer power

0 pearson 1.000000 17.839570 1.0 0.000024 0.088826 0.988205
1 cressie-read 0.666667 17.818114 1.0 0.000024 0.088773 0.988126
2 log-likelihood 0.000000 17.802653 1.0 0.000025 0.088734 0.988069
3 freeman-tukey -0.500000 17.815060 1.0 0.000024 0.088765 0.988115
4 mod-log-likelihood -1.000000 17.848099 1.0 0.000024 0.088848 0.988236
5 neyman -2.000000 17.976656 1.0 0.000022 0.089167 0.988694

χ2 statistic = 17.839570 = (−4.223691463320559)2 = (z -score)2

HYPOTHESIS TESTING IN PYTHON

Job satisfaction and age category
stack_overflow['age_cat'].value_counts() stack_overflow['job_sat'].value_counts()

Under 30 1211 Very satisfied 879

At least 30 1050 Slightly satisfied 680
Name: age_cat, dtype: int64 Slightly dissatisfied 342
Neither 201
Very dissatisfied 159
Name: job_sat, dtype: int64

HYPOTHESIS TESTING IN PYTHON

Declaring the hypotheses
H0 : Age categories are independent of job satisfaction levels

HA : Age categories are not independent of job satisfaction levels

alpha = 0.1

Test statistic denoted χ2

Assuming independence, how far away are the observed results from the expected values?

HYPOTHESIS TESTING IN PYTHON

Exploratory visualization: proportional stacked bar plot
props = stack_overflow.groupby('job_sat')['age_cat'].value_counts(normalize=True)
wide_props = props.unstack()
wide_props.plot(kind="bar", stacked=True)

HYPOTHESIS TESTING IN PYTHON

Exploratory visualization: proportional stacked bar plot

HYPOTHESIS TESTING IN PYTHON

Chi-square independence test
import pingouin
expected, observed, stats = pingouin.chi2_independence(data=stack_overflow, x="job_sat", y="age_cat")
print(stats)

test lambda chi2 dof pval cramer power

0 pearson 1.000000 5.552373 4.0 0.235164 0.049555 0.437417
1 cressie-read 0.666667 5.554106 4.0 0.235014 0.049563 0.437545
2 log-likelihood 0.000000 5.558529 4.0 0.234632 0.049583 0.437871
3 freeman-tukey -0.500000 5.562688 4.0 0.234274 0.049601 0.438178
4 mod-log-likelihood -1.000000 5.567570 4.0 0.233854 0.049623 0.438538
5 neyman -2.000000 5.579519 4.0 0.232828 0.049676 0.439419

Degrees of freedom:

(No. of response categories − 1) × (No. of explanatory categories − 1)

(2 − 1) ∗ (5 − 1) = 4

HYPOTHESIS TESTING IN PYTHON

Swapping the variables?
props = stack_overflow.groupby('age_cat')['job_sat'].value_counts(normalize=True)
wide_props = props.unstack()
wide_props.plot(kind="bar", stacked=True)

HYPOTHESIS TESTING IN PYTHON

Swapping the variables?

HYPOTHESIS TESTING IN PYTHON

chi-square both ways
expected, observed, stats = pingouin.chi2_independence(data=stack_overflow, x="age_cat", y="job_sat")
print(stats[stats['test'] == 'pearson'])

test lambda chi2 dof pval cramer power

0 pearson 1.0 5.552373 4.0 0.235164 0.049555 0.437417

Ask: Are the variables X and Y independent?

Not: Is variable X independent from variable Y?

HYPOTHESIS TESTING IN PYTHON

What about direction and tails?
Observed and expected counts squared must be non-negative

chi-square tests are almost always right-tailed 1

1Left-tailed chi-square tests are used in statistical forensics to detect if a fit is suspiciously good because the
data was fabricated. Chi-square tests of variance can be two-tailed. These are niche uses, though.

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Chi-square
goodness of fit tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Purple links
How do you feel when you discover that you've already visited the top resource?

purple_link_counts = stack_overflow['purple_link'].value_counts()

purple_link_counts = purple_link_counts.rename_axis('purple_link')\
.reset_index(name='n')\
.sort_values('purple_link')

purple_link n
2 Amused 368
3 Annoyed 263
0 Hello, old friend 1225
1 Indifferent 405

HYPOTHESIS TESTING IN PYTHON

Declaring the hypotheses
hypothesized = pd.DataFrame({ purple_link prop
'purple_link': ['Amused', 'Annoyed', 'Hello, old friend', 'Indifferent'], 0 Amused 0.166667
'prop': [1/6, 1/6, 1/2, 1/6]}) 1 Annoyed 0.166667
2 Hello, old friend 0.500000
3 Indifferent 0.166667

H0 : The sample matches the hypothesized χ2 measures how far observed results are
distribution from expectations in each group

HA : The sample does not match the alpha = 0.01

hypothesized distribution

HYPOTHESIS TESTING IN PYTHON

Hypothesized counts by category
n_total = len(stack_overflow)
hypothesized["n"] = hypothesized["prop"] * n_total

purple_link prop n
0 Amused 0.166667 376.833333
1 Annoyed 0.166667 376.833333
2 Hello, old friend 0.500000 1130.500000
3 Indifferent 0.166667 376.833333

HYPOTHESIS TESTING IN PYTHON

Visualizing counts
import matplotlib.pyplot as plt

plt.bar(purple_link_counts['purple_link'], purple_link_counts['n'],
color='red', label='Observed')
plt.bar(hypothesized['purple_link'], hypothesized['n'], alpha=0.5,
color='blue', label='Hypothesized')

plt.legend()
plt.show()

HYPOTHESIS TESTING IN PYTHON

Visualizing counts

HYPOTHESIS TESTING IN PYTHON

chi-square goodness of fit test
print(hypothesized)

purple_link prop n
0 Amused 0.166667 376.833333
1 Annoyed 0.166667 376.833333
2 Hello, old friend 0.500000 1130.500000
3 Indifferent 0.166667 376.833333

from scipy.stats import chisquare

chisquare(f_obs=purple_link_counts['n'], f_exp=hypothesized['n'])

Power_divergenceResult(statistic=44.59840778416629, pvalue=1.1261810719413759e-09)

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Assumptions in
hypothesis testing
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Randomness
Assumption
The samples are random subsets of larger
populations

Consequence
Sample is not representative of population

How to check this

Understand how your data was collected

Speak to the data collector/domain expert

1 Sampling techniques are discussed in "Sampling in Python".

HYPOTHESIS TESTING IN PYTHON

Independence of observations
Assumption
Each observation (row) in the dataset is independent

Consequence
Increased chance of false negative/positive error

How to check this

Understand how our data was collected

HYPOTHESIS TESTING IN PYTHON

Large sample size
Assumption
The sample is big enough to mitigate uncertainty, so that the Central Limit Theorem applies

Consequence
Wider confidence intervals

Increased chance of false negative/positive errors

How to check this

It depends on the test

HYPOTHESIS TESTING IN PYTHON

Large sample size: t-test
One sample Two samples
At least 30 observations in the sample At least 30 observations in each sample

n ≥ 30 n1 ≥ 30, n2 ≥ 30

n: sample size ni : sample size for group i

Paired samples ANOVA

At least 30 pairs of observations across the At least 30 observations in each sample
samples
ni ≥ 30 for all values of i
Number of rows in our data ≥ 30

HYPOTHESIS TESTING IN PYTHON

Large sample size: proportion tests
One sample Two samples
Number of successes in sample is greater Number of successes in each sample is
than or equal to 10 greater than or equal to 10

n × p^ ≥ 10 n1 × p^1 ≥ 10

Number of failures in sample is greater n2 × p^2 ≥ 10

than or equal to 10
Number of failures in each sample is
n × (1 − p^) ≥ 10 greater than or equal to 10

n: sample size n1 × (1 − p^1 ) ≥ 10

p^: proportion of successes in sample
n2 × (1 − p^2 ) ≥ 10

HYPOTHESIS TESTING IN PYTHON

Large sample size: chi-square tests
The number of successes in each group in greater than or equal to 5
ni × p^i ≥ 5 for all values of i

The number of failures in each group in greater than or equal to 5

ni × (1 − p^i ) ≥ 5 for all values of i

ni : sample size for group i

p^i : proportion of successes in sample group i

HYPOTHESIS TESTING IN PYTHON

Sanity check
If the bootstrap distribution doesn't look normal, assumptions likely aren't valid

Revisit data collection to check for randomness, independence, and sample size

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Non-parametric
tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Parametric tests
z-test, t-test, and ANOVA are all parametric tests
Assume a normal distribution

Require sufficiently large sample sizes

HYPOTHESIS TESTING IN PYTHON

Smaller Republican votes data
print(repub_votes_small)

state county repub_percent_08 repub_percent_12

80 Texas Red River 68.507522 69.944817
84 Texas Walker 60.707197 64.971903
33 Kentucky Powell 57.059533 61.727293
81 Texas Schleicher 74.386503 77.384464
93 West Virginia Morgan 60.857614 64.068711

HYPOTHESIS TESTING IN PYTHON

Results with pingouin.ttest()
5 pairs is not enough to meet the sample size condition for the paired t-test:
At least 30 pairs of observations across the samples.

alpha = 0.01
import pingouin
pingouin.ttest(x=repub_votes_potus_08_12_small['repub_percent_08'],
y=repub_votes_potus_08_12_small['repub_percent_12'],
paired=True,
alternative="less")

T dof alternative p-val CI95% cohen-d BF10 power

T-test -5.875753 4 less 0.002096 [-inf, -2.11] 0.500068 26.468 0.239034

HYPOTHESIS TESTING IN PYTHON

Non-parametric tests
Non-parametric tests avoid the parametric assumptions and conditions
Many non-parametric tests use ranks of the data

x = [1, 15, 3, 10, 6]

from scipy.stats import rankdata

rankdata(x)

array([1., 5., 2., 4., 3.])

HYPOTHESIS TESTING IN PYTHON

Non-parametric tests
Non-parametric tests are more reliable than parametric tests for small sample sizes and
when data isn't normally distributed

HYPOTHESIS TESTING IN PYTHON

Non-parametric tests
Non-parametric tests are more reliable than parametric tests for small sample sizes and
when data isn't normally distributed

Wilcoxon-signed rank test

Developed by Frank Wilcoxon in 1945

One of the first non-parametric procedures

HYPOTHESIS TESTING IN PYTHON

Wilcoxon-signed rank test (Step 1)
Works on the ranked absolute differences between the pairs of data

repub_votes_small['diff'] = repub_votes_small['repub_percent_08'] -
repub_votes_small['repub_percent_12']
print(repub_votes_small)

state county repub_percent_08 repub_percent_12 diff

80 Texas Red River 68.507522 69.944817 -1.437295
84 Texas Walker 60.707197 64.971903 -4.264705
33 Kentucky Powell 57.059533 61.727293 -4.667760
81 Texas Schleicher 74.386503 77.384464 -2.997961
93 West Virginia Morgan 60.857614 64.068711 -3.211097

HYPOTHESIS TESTING IN PYTHON

Wilcoxon-signed rank test (Step 2)
Works on the ranked absolute differences between the pairs of data

repub_votes_small['abs_diff'] = repub_votes_small['diff'].abs()
print(repub_votes_small)

state county repub_percent_08 repub_percent_12 diff abs_diff

80 Texas Red River 68.507522 69.944817 -1.437295 1.437295
84 Texas Walker 60.707197 64.971903 -4.264705 4.264705
33 Kentucky Powell 57.059533 61.727293 -4.667760 4.667760
81 Texas Schleicher 74.386503 77.384464 -2.997961 2.997961
93 West Virginia Morgan 60.857614 64.068711 -3.211097 3.211097

HYPOTHESIS TESTING IN PYTHON

Wilcoxon-signed rank test (Step 3)
Works on the ranked absolute differences between the pairs of data

from scipy.stats import rankdata

repub_votes_small['rank_abs_diff'] = rankdata(repub_votes_small['abs_diff'])
print(repub_votes_small)

state county repub_percent_08 repub_percent_12 diff abs_diff rank_abs_diff

80 Texas Red River 68.507522 69.944817 -1.437295 1.437295 1.0
84 Texas Walker 60.707197 64.971903 -4.264705 4.264705 4.0
33 Kentucky Powell 57.059533 61.727293 -4.667760 4.667760 5.0
81 Texas Schleicher 74.386503 77.384464 -2.997961 2.997961 2.0
93 West Virginia Morgan 60.857614 64.068711 -3.211097 3.211097 3.0

HYPOTHESIS TESTING IN PYTHON

Wilcoxon-signed rank test (Step 4)
state county repub_percent_08 repub_percent_12 diff abs_diff rank_abs_diff
80 Texas Red River 68.507522 69.944817 -1.437295 1.437295 1.0
84 Texas Walker 60.707197 64.971903 -4.264705 4.264705 4.0
33 Kentucky Powell 57.059533 61.727293 -4.667760 4.667760 5.0
81 Texas Schleicher 74.386503 77.384464 -2.997961 2.997961 2.0
93 West Virginia Morgan 60.857614 64.068711 -3.211097 3.211097 3.0

Incorporate the sum of the ranks for negative and positive differences

T_minus = 1 + 4 + 5 + 2 + 3
T_plus = 0
W = np.min([T_minus, T_plus])

HYPOTHESIS TESTING IN PYTHON

Implementation with pingouin.wilcoxon()
alpha = 0.01
pingouin.wilcoxon(x=repub_votes_potus_08_12_small['repub_percent_08'],
y=repub_votes_potus_08_12_small['repub_percent_12'],
alternative="less")

W-val alternative p-val RBC CLES

Wilcoxon 0.0 less 0.03125 -1.0 0.72

Fail to reject H0 , since 0.03125 > 0.01

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Non-parametric
ANOVA and
unpaired t-tests
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Wilcoxon-Mann-Whitney test
Also know as the Mann Whitney U test
A t-test on the ranks of the numeric input

Works on unpaired data

HYPOTHESIS TESTING IN PYTHON

Wilcoxon-Mann-Whitney test setup
age_vs_comp = stack_overflow[['converted_comp', 'age_first_code_cut']]

age_vs_comp_wide = age_vs_comp.pivot(columns='age_first_code_cut',
values='converted_comp')

age_first_code_cut adult child

0 77556.0 NaN
1 NaN 74970.0
2 NaN 594539.0
... ... ...
2258 NaN 97284.0
2259 NaN 72000.0
2260 NaN 180000.0

[2261 rows x 2 columns]

HYPOTHESIS TESTING IN PYTHON

Wilcoxon-Mann-Whitney test
alpha=0.01

import pingouin
pingouin.mwu(x=age_vs_comp_wide['child'],
y=age_vs_comp_wide['adult'],
alternative='greater')

U-val alternative p-val RBC CLES

MWU 744365.5 greater 1.902723e-19 -0.222516 0.611258

HYPOTHESIS TESTING IN PYTHON

Kruskal-Wallis test
Kruskal-Wallis test is to Wilcoxon-Mann-Whitney test as ANOVA is to t-test

alpha=0.01

pingouin.kruskal(data=stack_overflow,
dv='converted_comp',
between='job_sat')

Source ddof1 H p-unc

Kruskal job_sat 4 72.814939 5.772915e-15

HYPOTHESIS TESTING IN PYTHON

Let's practice!
HYPOTHESIS TESTING IN PYTHON
Congratulations!
HYPOTHESIS TESTING IN PYTHON

James Chapman
Curriculum Manager, DataCamp
Course recap
Chapter 1 Chapter 3

Workflow for testing proportions vs. a Testing differences in sample proportions

hypothesized value between two groups using proportion tests

False negative/false positive errors Using chi-square independence/goodness

of fit tests

Chapter 2 Chapter 4

Testing differences in sample means Reviewing assumptions of parametric

between two groups using t-tests hypothesis tests

Extending this to more than two groups Examined non-parametric alternatives

using ANOVA and pairwise t-tests when assumptions aren't valid

HYPOTHESIS TESTING IN PYTHON

More courses
Inference
Statistics Fundamentals with Python skill track

Bayesian statistics
Bayesian Data Analysis in Python

Applications
Customer Analytics and A/B Testing in Python

HYPOTHESIS TESTING IN PYTHON

Congratulations!
HYPOTHESIS TESTING IN PYTHON

AI Fundamentals
90% (10)
AI Fundamentals
881 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
54 pages
1.Hypothesis Testing Fundamentals
No ratings yet
1.Hypothesis Testing Fundamentals
34 pages
Chapter 1
No ratings yet
Chapter 1
34 pages
Chapter 2 T Test
No ratings yet
Chapter 2 T Test
42 pages
Hypotesis Testing Chapter1
No ratings yet
Hypotesis Testing Chapter1
32 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
RM Presentation
No ratings yet
RM Presentation
19 pages
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
No ratings yet
Hypothesis Testing in Machine Learning Using Python - by Yogesh Agrawal - 151413
15 pages
Hypothesis Tesing
No ratings yet
Hypothesis Tesing
30 pages
Hands on With Probability and Statistical
No ratings yet
Hands on With Probability and Statistical
9 pages
Hypothesis Testing Statistics
No ratings yet
Hypothesis Testing Statistics
59 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
24 pages
Hypothesis Testing with z Tests
No ratings yet
Hypothesis Testing with z Tests
32 pages
17 Statistical Hypothesis Tests in Python (Cheat Sheet)
No ratings yet
17 Statistical Hypothesis Tests in Python (Cheat Sheet)
44 pages
15 Statistical Hypothesis Tests in Python (Cheat Sheet)
No ratings yet
15 Statistical Hypothesis Tests in Python (Cheat Sheet)
11 pages
Evaluation_Statistical Significance Testing
No ratings yet
Evaluation_Statistical Significance Testing
42 pages
Hypothesis Testing in ML
No ratings yet
Hypothesis Testing in ML
3 pages
5.convergence Informatics Week5
No ratings yet
5.convergence Informatics Week5
41 pages
chapter4
No ratings yet
chapter4
43 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Chapter 5
No ratings yet
Chapter 5
35 pages
Introduction To Hypothesis Test in R
No ratings yet
Introduction To Hypothesis Test in R
103 pages
Stats - Hypothesis - Testing - Ipynb at Main Pik1989 - Stats GitHub
No ratings yet
Stats - Hypothesis - Testing - Ipynb at Main Pik1989 - Stats GitHub
10 pages
AP SHAH ADS Notes Mod 2 p2
No ratings yet
AP SHAH ADS Notes Mod 2 p2
50 pages
AP Shah Ads Notes Pt 1
No ratings yet
AP Shah Ads Notes Pt 1
177 pages
What is Hypothesis Testing in Statistics Types a…
No ratings yet
What is Hypothesis Testing in Statistics Types a…
2 pages
Hypothesis
No ratings yet
Hypothesis
18 pages
Unit 3 (Hypothesis Testing)
No ratings yet
Unit 3 (Hypothesis Testing)
40 pages
Unit 4 Statistical Testing and Modeling in r
No ratings yet
Unit 4 Statistical Testing and Modeling in r
25 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
40 pages
BT_Wk5_LectureNotes_B(2)
No ratings yet
BT_Wk5_LectureNotes_B(2)
12 pages
Basic Concepts of Hypothesis Testing Discussion
No ratings yet
Basic Concepts of Hypothesis Testing Discussion
46 pages
Lecture 07 (Part I) Hypothesis Testing
No ratings yet
Lecture 07 (Part I) Hypothesis Testing
62 pages
Hypothesis (1)
No ratings yet
Hypothesis (1)
44 pages
Chapter IX Hypothesis Testing
No ratings yet
Chapter IX Hypothesis Testing
31 pages
Lecture8 PDF
No ratings yet
Lecture8 PDF
64 pages
A Complete Guide To Hypothesis Testing For Data Scientists Using Python - by Rashida Nasrin Sucky - Oct, 2020 - Towards Data Science
No ratings yet
A Complete Guide To Hypothesis Testing For Data Scientists Using Python - by Rashida Nasrin Sucky - Oct, 2020 - Towards Data Science
14 pages
chapter3
No ratings yet
chapter3
31 pages
Chapter 3
No ratings yet
Chapter 3
34 pages
14-UnknownMeans
No ratings yet
14-UnknownMeans
43 pages
Z-test-and-T-test
No ratings yet
Z-test-and-T-test
15 pages
Sbp Rm Coursework 31052025
No ratings yet
Sbp Rm Coursework 31052025
21 pages
Statistical Analysis Presentation
No ratings yet
Statistical Analysis Presentation
11 pages
Engineering Data Analysis Probability: Probability Is A Measure Quantifying The Likelihood That Events Will Occur
No ratings yet
Engineering Data Analysis Probability: Probability Is A Measure Quantifying The Likelihood That Events Will Occur
8 pages
BE186
No ratings yet
BE186
51 pages
ctp_mft
No ratings yet
ctp_mft
30 pages
Introduction to Statistical Hypothesis Testing in R
No ratings yet
Introduction to Statistical Hypothesis Testing in R
8 pages
UE23MA242A Unit-2 Class-25 26 Hypothesis and Inference
No ratings yet
UE23MA242A Unit-2 Class-25 26 Hypothesis and Inference
29 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
lab5
No ratings yet
lab5
7 pages
Inferential Statistics Final[1]
No ratings yet
Inferential Statistics Final[1]
49 pages
Lec3_2
No ratings yet
Lec3_2
9 pages
PPT_Lesson_4.1_Hypothesis Testing_Analyze_Phase
No ratings yet
PPT_Lesson_4.1_Hypothesis Testing_Analyze_Phase
77 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
59 pages
11 Hypothesis Testing
No ratings yet
11 Hypothesis Testing
118 pages
Essay On Hypothesis Testing
100% (2)
Essay On Hypothesis Testing
4 pages
Hypothesis Testing : Z-Test, T-Test, F-Test
No ratings yet
Hypothesis Testing : Z-Test, T-Test, F-Test
42 pages
Hypothesis Test
No ratings yet
Hypothesis Test
35 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
Supervised Learning With Scikit-Learn
No ratings yet
Supervised Learning With Scikit-Learn
178 pages
Introduction To TensorFlow in Python
100% (3)
Introduction To TensorFlow in Python
146 pages
Introduction To Statistics in Python
100% (2)
Introduction To Statistics in Python
211 pages
Finance Fundamentals in Python
100% (4)
Finance Fundamentals in Python
877 pages
Introduction and Intermediate Docker
100% (1)
Introduction and Intermediate Docker
255 pages
Applied Finance in Python
100% (2)
Applied Finance in Python
545 pages
Exercises chap 9
No ratings yet
Exercises chap 9
4 pages
A Project On A Statistical Analysis of The E Commerce Businesses in Bangladesh
No ratings yet
A Project On A Statistical Analysis of The E Commerce Businesses in Bangladesh
68 pages
M17 Levi3979 08 Ism C17
No ratings yet
M17 Levi3979 08 Ism C17
52 pages
Unit 5
No ratings yet
Unit 5
185 pages
10-12-10 PPT - Chen
No ratings yet
10-12-10 PPT - Chen
65 pages
Hypothesis Testing For Analysis
No ratings yet
Hypothesis Testing For Analysis
8 pages
Ppt - 1 (Test of Significance, T-test)
No ratings yet
Ppt - 1 (Test of Significance, T-test)
13 pages
Harambee University Faculty of Business and Economics Master of Business Administration (MBA)
No ratings yet
Harambee University Faculty of Business and Economics Master of Business Administration (MBA)
7 pages
Research Methodology Cadbury
No ratings yet
Research Methodology Cadbury
3 pages
Hero Training 1
No ratings yet
Hero Training 1
73 pages
Statistics Mcqs - Hypothesis Testing For One Population Part 4
No ratings yet
Statistics Mcqs - Hypothesis Testing For One Population Part 4
8 pages
Footwear Industry
No ratings yet
Footwear Industry
64 pages
Chi Square Test
No ratings yet
Chi Square Test
9 pages
The Impact of Strategic Planning On Crisis Management Styles in The 5 Star Hotels
No ratings yet
The Impact of Strategic Planning On Crisis Management Styles in The 5 Star Hotels
20 pages
Chapter 5
No ratings yet
Chapter 5
55 pages
Nonparametric Methods: Analysis of Ordinal Data
No ratings yet
Nonparametric Methods: Analysis of Ordinal Data
38 pages
Hypothesis Testing: Proportions and Means
No ratings yet
Hypothesis Testing: Proportions and Means
9 pages
Exercise6 ANGELITO
No ratings yet
Exercise6 ANGELITO
3 pages
A Study On Employee Absenteeism With Special Reference To Chettinad Cement Corporation Limited, Karur
No ratings yet
A Study On Employee Absenteeism With Special Reference To Chettinad Cement Corporation Limited, Karur
20 pages
Tests in R
No ratings yet
Tests in R
25 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
17 pages
The Problem and Its Background
No ratings yet
The Problem and Its Background
21 pages
Sonal Synopsis
No ratings yet
Sonal Synopsis
24 pages
PG Brand Management Cover Letter
100% (2)
PG Brand Management Cover Letter
6 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
2 pages
Median 876.02: of All Values Number of Values
No ratings yet
Median 876.02: of All Values Number of Values
3 pages
MidExam 6016201006 Muhammad Nahdi Febriansyah
No ratings yet
MidExam 6016201006 Muhammad Nahdi Febriansyah
9 pages
Halaman Rumus Perhitungan Sampel
No ratings yet
Halaman Rumus Perhitungan Sampel
1 page
OM Forecasting
No ratings yet
OM Forecasting
210 pages