Hypothesis Testing

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Hypothesis Testing

1
What is hypothesis testing ?

Hypothesis testing is a statistical method that is used in making statistical decisions


using experimental data. Hypothesis Testing is basically an assumption that we make
about the population parameter.

Ex : You say avg height of citizens of city is more than 5.8 ft.
Votes for the politician will be more than 60%

All those example we assume need some statistic way to prove those. we need some
mathematical conclusion what ever we are assuming is true.

2
Example

Process A Process
✔It is claimed that a process has been 89.7 B 84.7
improved in yield by bringing a 81.4 86.1
change in an important factor X. 84.5 83.2
Yield data are collected from old and 84.8 91.9

new processes. 87.3 86.3


79.7 79.3
✔Random samples are drawn from
85.1 82.6
yield data from old process A and 81.7 89.1
improved process B. 83.7 83.7
84.5 88.5

“Is there real difference between Process A and Process B?”


3
Example: Hypothesis Testing
⮚ Real Question:
Can we say that the yield of improved Process B is greater than old Process
A?

⮚ Descriptive Statistics
Variable Process N Mean Std. Dev.
Yield A 10 84.24 2.90
B 10 85.54 3.65

⮚ Statistical Question:
Is there a statistically significant difference between mean of Process B (85.54)
and mean of Process A (84.24)? Or, is this difference in mean just due to
chance?
4
Hypothesis Testing
Example: Medicine B for treating
Develop the hypothesis for population
headache that is newly developed by a
and make statistical decision by
pharmaceutical company has 30
determining the acceptance of
minutes longer effect than existing
hypothesis using sample data.
Medicine A.
• Null Hypothesis (H0): Argument • H0 : Medicine A and B have same
made so far, or hypothesis saying that effect
there is no change or difference • H1 : Medicine B has 30 minutes
• Alternative Hypothesis (H1): New longer effect than Medicine A
argument, that is a hypothesis that
you want to prove with solid ground
obtained from sample

5
Procedure of Hypothesis Testing

The steps for hypothesis tests are as follows:


1. Define null and alternative hypotheses.
2. Identify the test statistic to be used for testing the validity of the null hypothesis,
for example, Z-test or t-test.
3. Decide the significance value (Alpha). Typical value used for a is 0.05.
4. Calculate the p-value (probability value), which is the conditional probability of
observing the
test statistic value when the null hypothesis is true. We will use the functions
provided in scipy.stats module for calculating the p-value.
5. Take the decision to reject or retain the null hypothesis based on the p-value and
the significance
value a.
6
Procedure of Hypothesis Testing

7
1-tail, 2-Tail
1-Sample, 2-Sample

8
9
from scipy import stats
stats.norm.cdf(z)
stats.t.cdf(t,df=10)
10
11
12
Normal Distribution codes

13
Student’ t- Distribution codes

14
Simple Exercise

Conducting a hypothesis test is a bit like putting accused person on trial in front of
a jury.
The jury assumes that the accused person is innocent unless there is strong
evidence against him, but even after considering the evidence,

it’s still possible for the jury to make wrong decisions

15
Simple Exercise

A accused person is on trial for a crime, and you’re on the jury. The jury’s
task is to assume the prisoner is innocent, but if there’s enough
evidence against him, they need to convict him.

1. In the trial, what’s the null hypothesis?


2. What’s the alternate hypothesis?
3. In what ways can the jury make a verdict that’s correct?
4. In what ways can the jury make a verdict that’s incorrect?

16
Simple Exercise

In the trial, what’s the null hypothesis?


The null hypothesis is that the prisoner is innocent, as that is what we
have to assume until there’s proof otherwise.

What’s the alternate hypothesis?


The alternate hypothesis is that the prisoner is guilty. In other words, if
there’s sufficient proof that the prisoner is not innocent, then we’ll
accept that he’s guilty and convict him.

17
Simple Exercise

In what ways can the jury make a verdict that’s correct?


We can make a correct verdict if:
a) The prisoner is innocent, and we find him innocent.
b) The prisoner is guilty, and we find him guilty.

In what ways can the jury make a verdict that’s incorrect?


We can make an incorrect verdict if:
a) The prisoner is innocent, and we find him guilty.
b) The prisoner is guilty, and we find him innocent.

18
The errors we can make when conducting a hypothesis test are the same sort
of errors we could make when putting a prisoner on trial

Hypothesis tests are basically tests where you take a claim and put it on trial
by assessing the evidence against it. If there’s sufficient evidence against it,
you reject it, but if there’s insufficient evidence against it, you accept it.

You may correctly accept or reject the null hypothesis, but even considering
the evidence, it’s also possible to make an error. You may reject a valid null
hypothesis, or you might accept it when it’s actually false

19
Statisticians have special names for these types of errors.

A Type I error is :

when you wrongly reject a true null hypothesis (Punished a innocent guy), and

A Type II error is :

when you wrongly accept a false null hypothesis (Let guilty go free).

20
ERRORS

Actual
Situation

21
22
23
Let us solve the problem

Mean =4.0
Standard deviation
=3
Sample Size =50
Sample mean =4.6
import scipy
import numpy as np

T statistic = (4-4.6)/(3/np.sqrt(50))

2*stats.t.cdf(-1.41,df=49)

24
25
26
Let us do it in Python

scipy.stats.ttest_1samp(array,m
u)

27
One-sample and one tail t-tests

Ex. An outbreak of Salmonella-related illness was attributed to ice cream


produced at a certain factory. Scientists measured the level of Salmonella in 9
randomly sampled batches of ice cream. The levels (in MPN/g) were

0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418

Is there evidence that the mean level of Salmonella in the ice cream is
greater than 0.3 MPN/g?

Let us try in Python

28
Let be the mean level of Salmonella in all batches of ice cream. Here
the hypothesis of interest can be expressed as:

H0: <= 0.3


Ha: > 0.3

Data = pd.Series([0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418])
scipy.stats.ttest_1samp(data,0.3)

29
Two-sample t-tests
Ex. 6 subjects were given a drug (treatment group) and an additional 6 subjects a
placebo (control group). Their reaction time to a stimulus was measured (in ms).
We want to perform a two-sample t-test for comparing the means of the
treatment and control groups.

Control : 91, 87, 99, 77, 88, 91


Treat :101, 110, 103, 93, 99, 104

30
Let Mu1 be the mean of the population taking medicine and Mu2 the mean of
the untreated population. Here the hypothesis of interest can be expressed as:

H0: Mu1-Mu2=0
Ha: Mu1-Mu2 !=0

Control=pd.Series([91, 87, 99, 77, 88, 91])


Treat =pd.Series([101, 110, 103, 93, 99, 104])
stats.ttest_ind( control,Treat)

Ttest_indResult(statistic=-3.445612673536487, pvalue=0.006272124350809803)
31
2 Proportion t test
Usecase : Is there a significant difference between the population proportions of state 1
and state 2 who report that they have been placed immediately after education?

Populations: All Students who have completed graduation and Post graduation in both
both states
Parameter of Interest: p1 — p2, where p1 = state1 and p2 = state2

Data: 247 students from state 1. 36.8% of students report that they have got the job.
308 students from state 2. 38.9% of students report that they have got the job.

Hypothesis Definition:
Null Hypothesis: p1 - p2 = 0
Alternative Hypothesis: p1 -p2 ≠ 0
The difference in population proportion needs t-test. Also, the population
follows a binomial distribution here. We can just pass on the two population
quantities with the appropriate binomial distribution parameters to the t-test
function

We can use the ttest_ind() function from Statsmodels.


The function returns three values: (a) test statistic, (b) p-value of the t-test, and
(c) degrees of freedom used in the t-test

Data Given:
n1 = 247
p1 = .37

n2 = 308
p2 = .39
Thank you

35

You might also like