0% found this document useful (0 votes)
22 views61 pages

Hypothesis Testing

The document discusses the central limit theorem and how sampling distributions become normal as sample size increases. It also covers hypothesis testing, types of hypotheses, errors, significance levels, and some common statistical tests.

Uploaded by

Tanya Hinduja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views61 pages

Hypothesis Testing

The document discusses the central limit theorem and how sampling distributions become normal as sample size increases. It also covers hypothesis testing, types of hypotheses, errors, significance levels, and some common statistical tests.

Uploaded by

Tanya Hinduja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Hypothesis Testing

1
Central Limit Theorem
As Sample
Size Gets Sampling
Large Distribution
Enough Becomes
Almost Normal
regardless of
shape of
population

X
X
Central Limit Theorem
For almost all populations, the sample mean is
normally or approximately normally distributed, and
the mean of this distribution is equal to the mean of
the population and the standard deviation of this
distribution can be obtained by dividing the
population standard deviation by the square root of
the sample size

  
X ~ N  , 
 n 
because , CLT states that

 X   and  X  n
Central Limit Theorem
⬥ If the original population is normal, a
sample of only 1 case is normally distributed
⬥ The farther the original sample is from
normal, the larger the sample required to
approach normality
⬥ Even for samples that are far from normal a
modest number of cases will be
approximately normal
When the Population is Normal
Population Distribution
= 10
Central Tendency
x  
 = 50 X

Variation
 Sampling Distributions

x  n=4
n =16

n =5 X = 2.5

 X-X = 50 X
When The Population is Not Normal
Population Distribution
Central Tendency
σ= 10
x  
µ = 50
Variation X

x  Sampling Distributions
n
𝜇𝑥 = 1.8
n=4 n=30
𝜇𝑥 = 5
 X  50
The Normal Distribution
⬥ Along the X axis you see Z scores,
i.e. standardized deviations from
the mean
x
Z

• Just think of Z scores as std.
dev. denominated units.

• A Z score tells us how many


std. deviations a case lies
above or below the mean
The Normal Distribution
⬥ Note a property of the
Normal distribution

⬥ 68% of cases in a Normal


distribution fall within 1
std. deviation of the mean

⬥ 95% within 2 std. dev.


(actually 1.96)

⬥ 99.7% within 3 std. dev.


Confidence Intervals
⬥ We can use the Central Limit Theorem and the
properties of the normal distribution to construct
confidence intervals of the form:
⬩ The average salary is $40,000 plus or minus
$1,000 with 95% confidence
⬩ Presidential support is 45% plus or minus 4% with
95% confidence.

⬥ In other words, we can make our best estimate


using a sample and indicate a range of likely values
for what we wish to estimate
Confidence Intervals
⬥ Notice that our estimates of
the population parameter are
probabilistic.

⬥ So we report our sample


statistic together with a
measure of our (un)certainty

⬥ Most often, this takes the


form of a 95 percent
confidence interval
establishing a boundary
around the sample mean (x
bar) which will contain the
true population mean (μ) 95
out of 100 times.
Introduction to Hypothesis Testing
⬥ Hypothesis testing is one of the most
important concepts in Statistics
⬥ Heavily used by Statisticians, Machine
Learning Engineers, and Data Scientists
⬥ Statistical tests are used to check whether
the null hypothesis is rejected or not
rejected.
⬥ Statistical tests assume a null hypothesis of
no relationship or no difference between
groups.
⬥ Definition:
A hypothesis is defined as a formal
statement, which gives the explanation about
the relationship between the two or more
variables of the specified population

Example: Based on a sample data we may


wish to decide whether a serum is really
effective in curing Corona
Types of Hypothesis
⬥ Simple
⬥ Complex
⬥ Null
⬥ Alternative
⬥ Empirical
⬥ Statistical
What is test of Hypothesis?
⬥ If on the supposition that a particular
hypothesis is true we find that results
observed in a random sample differ
markedly from those expected, we say that
observed differences are significant and we
reject the hypothesis
⬥ Procedures that enable us to decide to
accept or reject hypothesis are called test
of hypothesis, test of significance,
decision rules
Type I and Type II Errors
⬥ Type I error:- Rejecting a hypothesis when it happens
to be true
⬥ Type II error:- Accepting a hypothesis when it is to be
rejected
⬥ These errors have to be minimized but the decrease in
one causes the increase in the other
⬥ The best solution is to increase the sample size
Type I, Type II Errors

17
Characteristics of Hypothesis

The important characteristics of the hypothesis


are:
⬥ The hypothesis should be short and precise
⬥ It should be specific
⬥ A hypothesis must be related to the existing
body of knowledge
⬥ It should be capable of verification
Statistical Hypothesis
⬥ It is a guess or assumption about the
parameters of population distribution.
⬥ It is established beforehand and may or
may not be true
⬥ Statistical Hypothesis can be either
⬦ Null Hypothesis
⬦ Alternative Hypothesis
Null Hypothesis(H0)
⬥ It is a statistical hypothesis which is to be
actually tested for acceptance or rejection
⬥ It is the hypothesis which is tested for
possible rejection under the assumption
that it is true
⬥ It is expressed in the form of equality
⬥ Example:- Independent variables have no
effect on the dependent variables.
Examples of Null Hypothesis
⬥ Null hypothesis is always a simple
hypothesis stated as an equality specifying
an exact value of the parameter
⬥ Examples
⬦ Population mean equals to a specified
constant µ0
⬦ The difference between the sample
means equals to a constant
Alternate Hypothesis(H1)
⬥ It is any other hypothesis other than null
hypothesis
⬥ It is expressed in the form of >.<.not =
⬥ We can accept alternative hypothesis if
there is sufficient evidence
⬥ This was originated by Neyman
⬥ Example: Independent events or variables
have effect on dependent variables
⬥ H1 : µ> µ0
Critical Region
⬥ In any test of hypothesis, a test statistic S*,
calculated from the sample data, is used to
accept or reject the null hypothesis of the test
⬥ The area under the probability curve of the
sampling distribution of the test statistic S*
which follows some known given distributions
⬥ This area under probability curve is divided into
two regions, region of rejection where null
hypothesis is rejected and region of acceptance
⬥ The critical region is the region of rejection
of null hypothesis
⬥ The area of the critical region equals to the
level of significance α
⬥ Critical Region lies on the tail(s) of the
distribution
⬥ Depending upon the nature of alternate
Hypothesis, critical region may lie on one
side or both sides of the tail(s)
Test of Significance
⬥ This is the procedure to decide whether to
accept or reject the null hypothesis
⬥ This test is used to determine whether
observed samples differ significantly from
expected results
⬥ Acceptance of hypothesis merely indicates
that the data do not give sufficient evident
to reject the hypothesis
⬥ However rejection of hypothesis is a firm
conclusion that the sample evidence rejects it
⬥ When null hypothesis is accepted the result is
said to be non-significant which means the
observed differences are due to chance caused by
the process of sampling
⬥ When null hypothesis is rejected which means
the alternate hypothesis is accepted the result
is said to be significant
⬥ Since the test is based on sample observation, the
decision of acceptance or rejection of null
hypothesis is subject to some error or risk
Two-tailed test

27
One-sided (right tailed) test

28
One-sided (left tailed) test

29
Level of Significance
⬥ Represented by α
⬥ This is the probability of committing the
Type I error
⬥ It measure the amount of risks associated
in taking decisions
⬥ This factor has to be chosen before sample
information is collected
⬥ It is either 0.01 or 0.05
How to compute the level of
significance?
⬥ To measure the level of statistical
significance of the result, the investigator
first needs to calculate the p-value
⬥ It defines the probability of identifying an
effect which provides that the null
hypothesis is true

When the p-value is less than the level of


significance (α), the null hypothesis is
rejected.
Interpretation of p-value based on level
of significance(10%)
⬥ If p > 0.1, then there will be no assumption for
the null hypothesis
⬥ If p > 0.05 and p ≤ 0.1, it means that there will
be a low assumption for the null hypothesis.
⬥ If p > 0.01 and p ≤ 0.05, then there must be a
strong assumption about the null hypothesis.
⬥ If p ≤ 0.01, then a very strong assumption
about the null hypothesis is indicated.
Rejection rule of null hypothesis
⬥ If p < α, then one must reject the null
hypothesis
⬥ If p > α, then one should not reject the null
hypothesis
Common Tests
Z Test T Test

z tests are a statistical way of t-tests are a statistical way of


testing a hypothesis when testing a hypothesis when:
either: 1. We do not know the
1. We know the population population variance
variance or 2. Our sample size is small,
2. We do not know the n < 30
population variance but our
sample size is large n ≥ 30

34
Common Tests

35
T-test
⬥ The t-test is a basic test that is limited to
two groups.
⬥ For multiple groups, you would have to
compare each pair of groups, for example
with three groups there would be three
tests (AB,AC, BC)
⬥ The basic principle is to test the null
hypothesis that the means of the two
groups are equal.
36
T-test
⬥ The t-test assumes:
 A normal distribution (parametric data)
 Underlying variances are equal (if not, use Welch's test)
⬥ It is used when there is random assignment and only two sets of
measurement to compare.
⬥ There are two main types of t-test:
⬦ Independent-measures t-test: when samples are not
matched.
⬦ Matched-pair t-test: When samples appear in pairs (eg.
before-and-after).
⬥ A single-sample t-test compares a sample against a known figure,
for example where measures of a manufactured item are
compared against the required standard.
37
T-test Applications
⬥ To compare the mean of a sample with population
mean.
⬥ To compare the mean of one sample with the mean
of another independent sample.
⬥ To compare between the values (readings) of one
sample but in 2 occasions.

38
One Sample Test
(Sample mean and population mean )
⬥ Ho: Sample mean=Population mean.
⬥ Degrees of freedom = n – 1

39
Sample mean and population mean
Example: The following data represents hemoglobin
values in gm/dl for 10 patients:
10.5 9 6.5 8 11 7 8.5 9.5 12 7.5
Is the mean value for patients significantly differ from the
mean value of general population (12 gm/dl)?

Df=9
Find tabulated value for 9df and % 0.05 level of significance= 2.262
Calculated value> tabulated value
Reject Ho.
There is a statistically significant difference between the mean of sample
and population mean, and this difference is unlikely due to chance. 40
T Table

Calculating p-
value from t-
value

41
Example in R
t.test(data$V1,mu=12)

One Sample t-test


If p-value is less than 0.05
data: data$V1 Reject Ho
t = -5.5678, df = 9, p-value = 0.0003484
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
7.640484 10.159516
sample estimates:
mean of x
8.9
42
Confidence Interval

43
Example
⬥ We measure the grams of protein for a
sample of energy bars. The label claims that
the bars have 20 grams of protein. We want
to know if the labels are correct or not.
Energy Bar - Grams of Protein
20.70 27.46 22.15 19.85 21.29 24.75
20.75 22.91 25.34 20.33 21.54 21.08
22.14 19.56 21.10 18.04 24.12 19.95
19.72 18.28 16.26 17.46 20.53 22.12
25.06 22.44 19.08 19.88 21.39 22.33 25.79 44
⬥ n=31
⬥ Ho:μ=20
Ha:μ≠20
⬥ t=Difference/Standard Error=1.40/0.456=3.07

⬥ The critical value of t with α = 0.05 and 30


degrees of freedom is +/- 2.043
⬥ Since 3.07 > 2.043, we reject the null
hypothesis that the mean grams of protein
is equal to 20.
45
Two Sample Tests
(Mean of Two samples)

⬥ Ho:
Mean of sample 1 = Mean of sample 2
⬥ Degrees of freedom = n1+n2 – 2

46
Two Sample Tests
(Mean of Two samples)
The following data represents weight in Kg for 10 males
and 12 females.

Males: 80 75 95 55 60 70 75 72 80 65
Females: 60 70 50 85 45 60 80 65 70 62 77 82

Is there a statistically significant difference between the


mean weight of males and females. Let alpha = 0.01?

47
Two Sample Tests
(Mean of Two samples)

Mean1=72.7 Mean2=67.17
Variance1=128.46 Variance2=157.787
Df = n1+n2-2=20
t = 1.074

The tabulated t, 2 sides, for alpha 0.01 is 2.845


Then accept Ho and conclude that there is no significant difference
between the 2 means.
48
This difference may be due to chance. P>0.01
Pisa Score -Two Sample Tests in R
t.test(df$Maths.F,df$Maths.M)

Welch Two Sample t-test

data: df$Maths.F and df$Maths.M


t = -0.45289, df = 133.84, p-value = 0.6514
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-22.31807 14.00156
sample estimates:
mean of x mean of y
458.1345 462.2928 51
Paired Two Sample Test
One sample in two occasions
Blood pressure of 8 patients, before & after treatment
BP before BP after d d2
180 140 40 1600
200 145 55 3025
230 150 80 6400
240 155 85 7225
170 120 50 2500
190 130 60 3600
200 140 60 3600
165 130 35 1225
465 29175
Mean d=465/8=58 52
Paired Two Sample Test
One sample in two occasions

⬥ The df here = n – 1=7

=9.38
Tabulated t (df7), with level of significance 0.05, two tails, = 2.36
We reject Ho and conclude that there is significant difference
between BP readings before and after treatment.
P<0.05.
53
Paired Two Sample Test
One sample in two occasions

t.test(data$V1,data$V2,paired = TRUE)

Paired t-test
data: data$V1 and data$V2
t = 9.3876, df = 7, p-value = 3.24e-05
alternative hypothesis: true difference in means is
not equal to 0
95 percent confidence interval:
43.48397 72.76603
sample estimates:
mean of the differences
54
58.125
⬥ t=3.15
55
Z test
Suppose we randomly sampled subjects from an honors program.
We want to determine whether their mean IQ score differs from
the general population. The general population’s IQ scores are
defined as having a mean of 100 and a standard deviation of 15.
IQ score sample mean (x̅): 107
Null (H0): µ = 100 Sample size (n): 25
Alternative (HA): µ ≠ 100 Hypothesized population mean (µ0): 100
Population standard deviation (σ): 15

⬥ 2.333 is greater than the critical value of 1.960


⬥ We can reject the null and conclude that the mean IQ score for the population of
honors students does not equal 100. Based on the sample mean of 107, we know their
mean IQ score is higher. 56
Z test

Significance Type of Test Critical Value(s)


Level

0.01 Two-Tailed ±2.576


0.01 Left Tail –2.326
0.01 Right Tail +2.326
0.05 Two-Tailed ±1.960
0.05 Left Tail +1.650
0.05 Right Tail –1.650

57
Suppose a teacher claims that his section's students will score
higher than his colleague's section. The mean score is 22.1 for
60 students belonging to his section with a standard deviation
of 4.8. For his colleague's section, the mean score is 18.8 for 40
students and the standard deviation is 8.1. Test his claim at α =
0.05.

right-tailed two-sample z test.


H0: μ1=μ2, H1: μ1>μ2
μ1-μ2=0
Thus, z = 2.32
As 2.32 > 1.645 thus, the null hypothesis can be rejected.

There is enough evidence to support the teacher's claim that


the scores of students are better in his class.

58
Power of test

⬥ α probability of committing Type I error


= P(reject H0/H1)

⬥ β probability of committing Type II error


⬥ = P(accept H0/H1)

⬥ Power of test = (1-β)

59
The Neyman-Pearson Lemma
⬥ The Neyman-Pearson Lemma holds immense importance
when it comes to solving problems that demand decision
making or conclusions to a higher accuracy.
⬥ It offers a powerful framework for making informed
decisions based on statistical evidence.
⬥ At the heart of the Neyman-Pearson Lemma lies
the concept of statistical power. Statistical power
represents the ability of a hypothesis test to detect a true
effect or difference when it exists in the population.
⬥ The lemma emphasizes the importance of optimizing this
power while controlling the risk of both Type I and Type II
errors. 60
The Neyman-Pearson Lemma
⬥ The Neyman-Pearson Lemma allows us to
strike a balance between these errors by
maximizing power while setting a
predetermined significance level (the
probability of Type I error).
⬥ It states that the likelihood ratio test is the
most powerful test for a given significance
level in binary hypothesis testing.

61
The Neyman-Pearson Lemma
⬥ The likelihood ratio test compares the likelihoods of
the observed data under the null and alternative
hypotheses and accepts the alternative hypothesis if
the likelihood ratio exceeds a certain threshold.
Mathematically, the likelihood ratio test is given by:
⬥ Reject H0 if L(x) = f1(x) / f0(x) > k
⬥ where k is a threshold determined based on the
desired significance level α. The threshold k is
chosen such that the probability of Type I error (false
positive) is equal to α.

62
NP Lemma Example
⬥ Null hypothesis (H0): The patient does not have the
disease.
⬥ Alternative hypothesis (H1): The patient has the disease.
⬥ We want to design a test that can accurately determine
whether a patient has a specific disease or not. We need to
balance the risks of two types of errors:
⬥ Type I error (false positive): Rejecting the null hypothesis
(saying the patient has the disease) when the patient is
actually healthy.
Type II error (false negative): Failing to reject the null
hypothesis (saying the patient is healthy) when the patient
actually has the disease.
63
⬥ H0: The biomarker levels follow a normal
distribution with parameters μ0 (mean under
H0) and σ0 (variance under H0).

⬥ H1: The biomarker levels follow a normal


distribution with parameters μ1 (mean under
H1) and σ1 (variance under H1), where μ1 > μ0.

64

You might also like