Hypothesis Testing in ML
Hypothesis Testing in ML
Machine Learning
Hypothesis
Null and Alternative Hypothesis
One-Tailed and Two-Tailed Tests
Z-Test
T-Test
Chi-Square Test
ANOVA Test
SciPy Library Functions for Tests
Smitesh Tamboli
Hypothesis Testing
A hypothesis is a claim or belief. Hypothesis testing is a statistical process of either rejecting
or retaining a claim or belief, or association related to a business context, product, service
etc. It plays an important role in providing evidence of an association relationship between
an outcome variable and predictor variables.
Example:
The new version of the eComm website has a better conversion rate
The cash-On-Delivery payment method increases sales
The average annual salary of machine learning experts differs for males and females.
The average annual salary of machine learning experts differs for males and females.
Null Hypothesis (H0): The average annual salary of male machine learning experts is
equal to the average annual salary of female machine learning experts.
Alternative Hypothesis (HA): The average annual salary of male machine learning
experts is different from the average annual salary of female machine learning experts.
Smitesh Tamboli
Test Statistic, P-Value and Significance Level
Test Statistic
A test statistic is a standardized value that measures the distance between the observed
sample statistic and the parameter specified in the null hypothesis. It is used to determine
how far the sample data is from what we would expect if the null hypothesis were true.
The test statistic is the standardized value used for calculating the p-value (probability
value) in support of the null hypothesis.
P-Value
P-value is a conditional probability of observing the statistic value given that the null
hypothesis is true. The P-value is the evidence in support of the null hypothesis.
Significance Value
The primary task in hypothesis testing is to make a decision to either reject or fail to reject
the Null hypothesis. Significance level provides criteria used for making a decision regarding
the null hypothesis reject or fail to reject (retain) based on calculated P-value.
The significance value is the maximum threshold for the P-value. Usually, the value of
significance level = 0.05. The reason for choosing a very low value of 0.05 is that we start
the process of hypothesis testing with an assumption that the Null hypothesis is true.
Unless there is strong evidence against this assumption, we will not reject the Null
hypothesis.
Smitesh Tamboli
Criteria Decision
Right-Tailed Test
In a right-tailed test, the critical region for rejecting the Null hypothesis is located in the
right tail of the probability distribution. This test is used when the alternative hypothesis HA
specifies that the parameter of interest is greater than a certain value.
Smitesh Tamboli
Examples:
A pharmaceutical company claims that their new drug increases patient recovery rate more
than the standard treatment.
Null Hypothesis H0: The new drug does not increase the recovery rate more than the
standard treatment. H0:
Alternate Hypothesis HA: The new drug increases the recovery rate more than the
standard treatment.
Examples:
A new marketing campaign increases the average number of customers visiting their website
daily compared to the old campaign.
Null Hypothesis H0: The new marketing campaign does not increase the average number
of daily website visitors compared to the old campaign.
Alternate Hypothesis HA: The new marketing campaign increases the average number of
daily website visitors compared to the old campaign.
Left-Tailed Test
In a left-tailed test, the critical region for rejecting the Null hypothesis is located in the left
tail of the probability distribution. This test is used when the alternative hypothesis HA
specifies that the parameter of interest is less than a certain value.
Examples:
An educational researcher wants to determine if a new teaching method decreases the
average number of student failures compared to the traditional teaching method.
Null Hypothesis H0: The new teaching method does not decrease the average number of
student failures compared to the traditional method.
Alternate Hypothesis HA: The new teaching method decreases the average number of
student failures compared to the traditional method.
Smitesh Tamboli
Two-Tailed Test
A two-tailed test is a type of statistical test where the critical area of the distribution falls
in both tails (left and right) of the distribution. This test is used when the alternative
hypothesis does not specify the direction of the effect but states that there is a difference.
If the test statistic falls into either of these regions, the Null hypothesis is rejected
The significance level is split equally the two tails, so each tails has an area of
Examples:
A researcher wants to test whether a new diet program has a different effect on weight loss
compared to the standard diet program, without specifying if it is more or less effective.
Null Hypothesis H0: The new diet program has the same effect on weight loss as the
standard diet program.
Alternate Hypothesis HA: The new diet program has a different effect on weight loss
compared to the standard diet program.
Right-Tailed Test
Left-Tailed Test
Two-Tailed Test
Smitesh Tamboli
Steps To Perform Hypothesis Test
Calculate p-value
5
p-value is the evidence in support of the Null hypothesis.
p-value < -> Reject Null hypothesis p-value >= -> Retain Null hypothesis
Smitesh Tamboli
Type-I and Type-II Errors
In hypothesis testing we end up with two decisions
Reject Null hypothesis
Fail to reject (or retain) Null hypothesis
Smitesh Tamboli
Z-Test
A z-test is a statistical test used to determine whether there is a significant difference
between the means of two groups, or to test if a sample mean significantly differs from a
known population mean. It is called a z-test because it follows a normal distribution (z-
distribution) under the Null hypothesis.
When to use z-test?
We need to test the value of the population mean, given that population variance is
known.
The population is a normal distribution and the population variance is known.
The sample size is large and the population variance is known. i.e. sample size n > 30.
= Sample Mean
= Population Mean
= Population Standard Deviation
n = Sample size
Example
Suppose an e-commerce platform receives an average of 100 visitors per day (known
population mean). We want to test if the average number of visitors for a recent sample of
40 days is significantly different from this known average. Verify the claim at significance
level alpha = 0.05.
Null Hypothesis (H0): The average number of visitors for the sample period is not
significantly different from the population mean. I.e. Sample Mean = 100
Alternative Hypothesis (HA): The average number of visitors for the sample period is
significantly different from the population average. i.e. Sample Mean 100
Smitesh Tamboli
One-Sample Z-Test
Smitesh Tamboli
Two-Sample Z-Test
Example
Suppose an e-commerce platform runs two different campaigns to drive traffic to their
website. We want to test if there is a significant difference in the average number of daily
visitors between the two campaigns.
Null Hypothesis (H0): There is no significant difference in the average number of daily
visitors between the two campaigns. I.e.
Alternative Hypothesis (HA): There is a significant difference in the average number of
daily visitors between the two campaigns. i.e.
Smitesh Tamboli
Smitesh Tamboli
T-Test
A T-test is a statistical test used to compare the means of two groups and determine if they
are significantly different from each other.
When to use T-test?
The T-test is used when the population follows a normal distribution and the population
standard deviation is unknown and is estimated from the sample.
The sample size is small (n < 30).
One-Sample T-Test
Used when comparing the mean of a single sample to a known population mean and
variance unknown.
Example: An eComm platform believes that the average number of daily visitors is 150. Test
if the average number of visitors for a sample of 20 days is significantly different from this
value.
H0: The sample mean is not significantly different from the population mean
HA: The sample mean is significantly different from the population mean
Smitesh Tamboli
Two-Sample T-Test
Two-sample T-test is used to determine if there is a significant difference between the
means of two independent groups.
Example: A dataset contains Male and Female students marks. We need to test the means
marks of Male students are not equal to Female students
H0: The means of Marks for Male and Female students are equal
HA: The means of Marks for Male and Female students are not equal
Smitesh Tamboli
Chi-Square Tests
The chi-square test is a statistical test used to determine whether there is a significant
association between two categorical variables.
Example: Consider a dataset that contains preferred payment methods (Credit Card, Cash,
PayPal) and the satisfaction level ( Satisfied, Not Satisfied) of customers on an eComm
Platform. We need to test whether there is any association between payment methods and
satisfaction levels.
H0: There is no association between the mode of payment and the satisfaction level of
customers.
HA: There is an association between the mode of payment and the satisfaction level of
customers.
Smitesh Tamboli
ANOVA Tests
ANOVA or Analysis of Variance is a statistical test used to compare the means of three or
more groups to determine if there are statistically significant differences between them. It
assesses whether the means of several groups are equal or not by examining the variation
between and within groups.
Example: An eComm Platform wants to analyze the effect of different shipping options
(Standard, Express, Same-Day) on customers' purchase amounts.
H0: There is no significant difference between the group means.
HA: There are significant differences between the group means.
Smitesh Tamboli
Summary
Z-Test
We need to test the value of the population mean, given that population variance is
known.
The population is a normal distribution and the population variance is known.
The sample size is large and the population variance is known. i.e. sample size n > 30.
T-Test
The T-test is used when the population follows a normal distribution and the
population standard deviation is unknown and is estimated from the sample.
The sample size is small (n < 30).
Chi-Square Test
The chi-square test is a statistical test used to determine whether there is a significant
association between two categorical variables.
chi2, p, dof, expected = chi2_contingency(contingency_table)
ANOVA Test
compare the means of three or more groups to determine if there are statistically significant
differences between them.
Thank You
Smitesh Tamboli